# Audio Processing Pipeline - Google Colab Quickstart

This notebook will help you:
1. Clone the repository from GitHub
2. Install dependencies
3. Set up your Deepgram API key
4. Upload audio files
5. Run the pipeline
6. Download the results

## Step 1: Clone Repository from GitHub

In [None]:
# Replace <your-github-username> and <your-repo-name> with your actual GitHub info
!git clone https://github.com/<your-github-username>/<your-repo-name>.git
%cd <your-repo-name>

## Step 2: Install Dependencies

In [None]:
!pip install -q -r requirements.txt

## Step 3: Set Your Deepgram API Key

Get your API key from: https://console.deepgram.com/

In [None]:
import os

# Replace with your actual Deepgram API key
os.environ["DEEPGRAM_API_KEY"] = "YOUR_API_KEY_HERE"

## Step 4: Upload Your Audio Files

In [None]:
from google.colab import files

print("Select your audio files to upload...")
uploaded = files.upload()

print(f"\n‚úÖ Uploaded {len(uploaded)} file(s)")
for filename in uploaded.keys():
    print(f"  - {filename}")

## Step 5: Run the Pipeline

This will process all audio files in the current directory.

In [None]:
# Run the pipeline on the current directory
!python run_pipeline.py .

## Step 6: View Results

List all generated output folders:

In [None]:
import os

output_folders = [f for f in os.listdir() if f.endswith('_output')]

print(f"Found {len(output_folders)} output folder(s):\n")
for folder in output_folders:
    print(f"üìÅ {folder}")
    files_in_folder = os.listdir(folder)
    for f in files_in_folder:
        print(f"   - {f}")
    print()

## Step 7: Display Sample Graphs

View one of the generated dashboards:

In [None]:
from IPython.display import Image, display
import os

# Find the first output folder
output_folders = [f for f in os.listdir() if f.endswith('_output')]

if output_folders:
    output_folder = output_folders[0]
    dashboard_files = [f for f in os.listdir(output_folder) if 'dashboard' in f]
    
    if dashboard_files:
        dashboard_path = os.path.join(output_folder, dashboard_files[0])
        print(f"üìä Dashboard for: {output_folder}\n")
        display(Image(filename=dashboard_path))
    else:
        print("No dashboard found in output folder.")
else:
    print("No output folders found. Run the pipeline first!")

## Step 8: Download Results

Download all output folders as ZIP files:

In [None]:
from google.colab import files
import os
import shutil

output_folders = [f for f in os.listdir() if f.endswith('_output')]

print(f"Zipping {len(output_folders)} output folder(s)...\n")

for folder in output_folders:
    zip_name = folder
    # Create zip file
    shutil.make_archive(zip_name, 'zip', folder)
    print(f"üì¶ Created: {zip_name}.zip")
    # Download
    files.download(f"{zip_name}.zip")

print("\n‚úÖ All downloads complete!")

## Advanced: Run Individual Modules

If you want to process specific files or run steps individually:

In [None]:
# Transcription only
!python -m cutout_pipeline.transcribe audio1.wav

# Analysis only (requires transcription JSON)
!python -m cutout_pipeline.analyze audio1_output/audio1_english.json

## Configuration

To modify speaker configuration or analysis parameters, view the config file:

In [None]:
# View current configuration
!cat src/cutout_pipeline/config.py

To edit the configuration temporarily for this session:

In [None]:
# Example: Change agent speakers
import sys
sys.path.insert(0, 'src')

from cutout_pipeline import config
config.AGENT_SPEAKERS = {0, 1}  # Change to your speaker IDs
config.Z_SCORE_SILENCE = 2.5    # Adjust threshold

print("‚úÖ Configuration updated for this session")

---

## Need Help?

Check the README.md for more details:
```python
!cat README.md
```