## Example Notebook: Using IngestionPipeline

### üõ†Ô∏è Setup Instructions

Before running this notebook:

- Ensure you have created an `.env` file in the **same directory** as this notebook. It must contain all required environment variables (e.g., Azure credentials, storage connection strings, and transcription keys).
- Make sure all required libraries are installed by running:


---

### About

The **IngestionPipeline** performs comprehensive processing of video file to extract transcript, frames, chapters, ai search index craation for downstream applications like `VideoAgent`. It includes the following steps:

1. **Audio Extraction** ‚Äì Extracts the audio from the input video.
2. **Transcription** ‚Äì Converts spoken content to text using the selected transcription service and language setting.
3. **Frame Extraction** ‚Äì Captures representative frames at 1 FPS intervals to support visual summarization and downstream VideoAgent.
4. **Chapter Generation** ‚Äì Aligns transcript segments with visual frames to form meaningful video chapters.
5. **Azure Search Indexing** ‚Äì Saves chapters and metadata to an Azure AI Search index to support retrieval.
6. **Summary File Generation** ‚Äì Outputs `summary_n_transcript.json` containing the full transcript and a visual summary.
7. **(Optional) Azure CV Indexing** ‚Äì Optionally indexes the video frames using Azure Computer Vision for advanced content-based search.

---

### Transcription Configuration

You can configure the transcription backend using the `TranslationServices` enum:

- `TranslationServices.WHISPER` ‚Äì Uses OpenAI Whisper.
- `TranslationServices.AZURE_STT` ‚Äì Uses Azure Speech-to-Text.

Specify the language of the video's audio using the `Languages` enum. For example:

- `Languages.ENGLISH_INDIA` ‚Äì English (India)
- `Languages.HINDI` ‚Äì Hindi

The `Languages` enum includes support for additional languages. Refer to the `Languages` enum definition to explore all available options.


### Importing Libaries

In [1]:
import asyncio
from mmct.video_pipeline import IngestionPipeline, Languages, TranscriptionServices
import nest_asyncio

nest_asyncio.apply()
import warnings

warnings.filterwarnings("ignore")

  from .autonotebook import tqdm as notebook_tqdm


### Executing Video Pipeline

In [None]:
video_path = r""
index = ""
source_language = Languages.ENGLISH_INDIA
ingestion = IngestionPipeline(
    video_path=video_path,
    index_name=index,
    transcription_service=TranscriptionServices.WHISPER, # ENUM also for this
    language=source_language,
    use_azure_computer_vision=False
)

asyncio.run(ingestion())

[32m2025-06-19 19:18:21.472[0m | [1mINFO    [0m | [36mmmct.blob_store_manager[0m:[36m__init__[0m:[36m34[0m - [1mSuccessfully initialized the blob service client[0m


Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x0000029594F01610>
Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x0000029594FFB250>
Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x0000029585E66950>


[32m2025-06-19 19:18:21.776[0m | [1mINFO    [0m | [36mmmct.video_pipeline.utils.helper[0m:[36mget_file_hash[0m:[36m205[0m - [1mHash Id Generated[0m
[32m2025-06-19 19:18:21.776[0m | [1mINFO    [0m | [36mmmct.video_pipeline.core.ingestion.ingestion_pipeline[0m:[36mget_transcription[0m:[36m127[0m - [1mSuccessfully generated the file hash for the video path: C:\Users\v-soumyade\Downloads\mastering_happiness_lesson.mp4
Hash Id: d678544d517a57050f6a6881b0eb26496536053c45711ac624104cd2fccc00dc[0m
[32m2025-06-19 19:18:22.412[0m | [1mINFO    [0m | [36mmmct.video_pipeline.core.ingestion.transcription.base_transcription[0m:[36m__init__[0m:[36m19[0m - [1mInitialized the LLM Client[0m
[32m2025-06-19 19:18:23.104[0m | [1mINFO    [0m | [36mmmct.video_pipeline.core.ingestion.transcription.base_transcription[0m:[36m__init__[0m:[36m21[0m - [1mInitialized the OpenAI STT Client[0m
[32m2025-06-19 19:18:23.116[0m | [1mINFO    [0m | [36mmmct.video_pipeline