##
Kaggle notebooks run on a Linux-based virtual machine in the cloud, and they handle system-level dependencies differently.
How to use Audio Libraries in Kaggle
Kaggle environments come pre-configured with most standard data science libraries (like NumPy and SciPy). You only need to ensure the Python audio interface libraries are installed in the notebook session.
Here are the exact steps to get microphone access working in a Kaggle notebook:
1. Install Required Libraries via Notebook Command
Kaggle often has these pre-installed, but running the command guarantees they are available in your session:

# ‚ö†Ô∏è Important Constraints
The BART summarization model has these limits:

Minimum: ~30 words (won't work well below this)
Maximum: ~500 words per summary
Recommended range: 50-300 words

In [None]:
!pip install -q openai-whisper
!pip install -q transformers accelerate

In [None]:
"""
Multilingual Audio Transcription & Summarization for KAGGLE NOTEBOOK
Supports: Hindi, English, Marathi (and 90+ languages)
Optimized for Kaggle environment with GPU support
"""

# ============================================================================
# STEP 1: INSTALL REQUIRED PACKAGES (Run this cell first in Kaggle)
# ============================================================================

# Uncomment and run these in a Kaggle notebook cell:
# !pip install -q openai-whisper
# !pip install -q git+https://github.com/openai/whisper.git
# !pip install -q transformers
# !pip install -q accelerate

# ============================================================================
# STEP 2: IMPORT LIBRARIES
# ============================================================================

import whisper
from transformers import pipeline
import torch
import os
import warnings
warnings.filterwarnings('ignore')

# ============================================================================
# STEP 3: KAGGLE-OPTIMIZED AUDIO SUMMARIZER CLASS
# ============================================================================

class KaggleAudioSummarizer:
    def __init__(self, whisper_model="medium"):
        """
        Initialize for Kaggle environment
        
        Args:
            whisper_model: 'tiny', 'base', 'small', 'medium', 'large'
                          For Kaggle: 'base' or 'medium' recommended
        """
        # Check GPU availability
        self.device = "cuda" if torch.cuda.is_available() else "cpu"
        #print(f"üîß Device: {self.device}")
        
        #if self.device == "cuda":
            #print(f"üöÄ GPU Detected: {torch.cuda.get_device_name(0)}")
            #print(f"üíæ GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
        
        # Load Whisper model
        print(f"\nüì• Loading Whisper '{whisper_model}' model...")
        self.whisper_model = whisper.load_model(whisper_model, device=self.device)
        print("‚úÖ Whisper loaded!")
        
        # Load summarization model
        print("\nüì• Loading BART summarization model...")
        self.summarizer = pipeline(
            "summarization",
            model="facebook/bart-large-cnn",
            device=0 if self.device == "cuda" else -1,
            torch_dtype=torch.float16 if self.device == "cuda" else torch.float32,
            framework="pt",
            truncation=True
        )
        print("‚úÖ BART loaded!")
        print("\n" + "="*60)
        print("‚ú® Ready to process audio files!")
        print("="*60 + "\n")
    
    def transcribe_audio(self, audio_path):
        """
        Transcribe audio and translate to English
        
        Args:
            audio_path: Path to audio file in Kaggle
                       e.g., '/kaggle/input/your-dataset/audio.mp3'
        """
        if not os.path.exists(audio_path):
            raise FileNotFoundError(f"‚ùå File not found: {audio_path}")
        
        file_size = os.path.getsize(audio_path) / (1024 * 1024)  # MB
        print(f"üéµ Audio file: {os.path.basename(audio_path)}")
        print(f"üì¶ Size: {file_size:.2f} MB")
        print(f"‚è≥ Transcribing... (this may take several minutes)\n")
        
        # Transcribe with auto-translate to English
        result = self.whisper_model.transcribe(
            audio_path,
            task='translate',  # Auto-translate to English
            language=None,     # Auto-detect language
            fp16=self.device == "cuda",  # Use FP16 on GPU
            verbose=False
        )
        
        detected_lang = result.get('language', 'unknown')
        lang_map = {
            'hi': 'Hindi (‡§π‡§ø‡§®‡•ç‡§¶‡•Ä)',
            'en': 'English',
            'mr': 'Marathi (‡§Æ‡§∞‡§æ‡§†‡•Ä)',
            'unknown': 'Unknown'
        }
        
        print(f"‚úÖ Transcription complete!")
        print(f"üåç Detected: {lang_map.get(detected_lang, detected_lang)}")
        print(f"üìù Length: {len(result['text'])} characters\n")
        #print(f"üìù Details: {(result)} \n")
        
        return {
            'text': result['text'].strip(),
            'language': detected_lang,
            'language_name': lang_map.get(detected_lang, detected_lang)
        }
    
    def summarize_text(self, text, max_length=250, min_length=80):
        """
        Generate summary from transcribed text with smart length handling
        """
        print("üìä Generating summary...")
    
        word_count = len(text.split())
    
    # If text is too short, return as-is
        # if word_count < 50:
        #     print(f"‚ö†Ô∏è Text too short ({word_count} words). Returning original text.")
        #     return text
    
    # Adjust max_length based on input length
    # Summary should be 30-50% of original length
        adjusted_max_length = min(max_length, int(word_count * 0.6))
        adjusted_min_length = min(min_length, int(word_count * 0.2))
    
    # Ensure min < max
        if adjusted_min_length >= adjusted_max_length:
            adjusted_min_length = max(10, adjusted_max_length - 20)
    
        print(f"üìè Input: {word_count} words ‚Üí Target summary: {adjusted_max_length} words")
    
    # Split into chunks (BART limit: ~1024 tokens ‚âà 800 words)
        chunk_size = 800  # words
        words = text.split()
        chunks = [' '.join(words[i:i+chunk_size]) for i in range(0, len(words), chunk_size)]
    
        print(f"üìÑ Processing {len(chunks)} chunk(s)...")
    
        summaries = []
        for idx, chunk in enumerate(chunks):
            chunk_word_count = len(chunk.split())
        
            if chunk_word_count < 50:
                summaries.append(chunk)  # Too short to summarize
                continue
        
        # Adjust length for this specific chunk
            chunk_max = min(adjusted_max_length, int(chunk_word_count * 0.6))
            chunk_min = min(adjusted_min_length, int(chunk_word_count * 0.3))
        
        # Ensure valid range
            if chunk_min >= chunk_max:
                chunk_min = max(10, chunk_max - 20)
        
            print(f"  ‚û§ Chunk {idx+1}/{len(chunks)} ({chunk_word_count} words ‚Üí {chunk_max} words)...", end=" ")
        # abv line is diff in the structured code check this if any chnges u need
            try:
                summary = self.summarizer(
                    chunk,
                    max_length=chunk_max,
                    min_length=chunk_min,
                    do_sample=False,
                    truncation=True
                )
                summaries.append(summary[0]['summary_text'])
                print("‚úì")
            except Exception as e:
                print(f"‚úó (Error: {str(e)[:30]})")
                summaries.append(chunk[:500])  # Fallback: use first 500 chars
                continue  # this summaries wala block is aalso diff in structrd code
    
        if not summaries:
            return text
    
        combined = ' '.join(summaries)
        combined_word_count = len(combined.split())
    
    # Final summary if still too long
        if len(chunks) > 1 and combined_word_count > adjusted_max_length:
            print(f"  ‚û§ Final summary ({combined_word_count} words ‚Üí {adjusted_max_length} words)...", end=" ")
        
            final_max = min(adjusted_max_length, int(combined_word_count * 0.7))
            final_min = min(adjusted_min_length, int(combined_word_count * 0.3))
        
            if final_min >= final_max:
                final_min = max(10, final_max - 20)
        
            try:
                final = self.summarizer(
                    combined,
                    max_length=final_max,
                    min_length=final_min,
                    do_sample=False,
                    truncation=True
                )
                combined = final[0]['summary_text']
                print("‚úì")
            except:
                print("‚úó")
    
        print("‚úÖ Summary ready!\n")
        return combined
    
    def process_audio_adaptive(self, audio_path, summary_ratio=0.5, save_output=True):
        """
        Process audio with adaptive summary length
    
        Args:
        audio_path: Path to audio file
        summary_ratio: Summary length as ratio of original (0.25 = 25%)
        save_output: Save results
        """
        print("\n" + "="*60)
        print("üéØ KAGGLE AUDIO SUMMARIZER (ADAPTIVE)")
        print("="*60 + "\n")
    
    # Transcribe
        transcription = self.transcribe_audio(audio_path)
        word_count = len(transcription['text'].split())
    
    # Calculate adaptive summary length
        summary_length = max(50, int(word_count * summary_ratio))  # Min 50 words
        summary_length = min(summary_length, 500)  # Max 500 words
    
        print(f"üìä Adaptive summary: {summary_length} words (from {word_count} words)\n")
    
    # Summarize
        summary = self.summarize_text(
            transcription['text'],
            max_length=summary_length,
            min_length=summary_length // 3
        )
    
        results = {
            'audio_file': os.path.basename(audio_path),
            'language': transcription['language_name'],
            'transcription': transcription['text'],
            'summary': summary,
            'transcription_word_count': word_count,
            'summary_word_count': len(summary.split())
        }
    
        self._display_results(results)
    
        if save_output:
            self._save_results(results)
    
        return results

#################### 
#smart way to get summary length less than the transcription word count.

#def smart_summary_length(word_count):
#    """Calculate optimal summary length based on original"""
#    if word_count < 100:
#        return word_count  # Too short to summarize
#    elif word_count < 500:
#        return int(word_count * 0.5)  # 50%
#    elif word_count < 2000:
#        return int(word_count * 0.3)  # 30%
 #   elif word_count < 5000:
 #       return int(word_count * 0.2)  # 20%
  #  else:
  #      return min(int(word_count * 0.15), 500)  # 15%, max 500

# Use it:
#transcription = summarizer.transcribe_audio(audio_file)
#word_count = len(transcription['text'].split())
#summary_length = smart_summary_length(word_count)

#summary = summarizer.summarize_text(
#    transcription['text'],
#    max_length=summary_length
#)

###################
   
    
    def _display_results(self, results):
        """Display results in notebook"""
        print("="*60)
        print("üìã RESULTS")
        print("="*60)
        print(f"üìÅ File: {results['audio_file']}")
        print(f"üåç Language: {results['language']}")
        print(f"üìè Original: {results['transcription_word_count']} words")
        print(f"üìè Summary: {results['summary_word_count']} words")
        print(f"üìâ Compression: {100 * (1 - results['summary_word_count']/results['transcription_word_count']):.1f}%")
        
        print("\n" + "="*60)
        print("‚ú® SUMMARY (English):")
        print("="*60)
        print(results['summary'])
        
        print("\n" + "="*60)
        print("üìÑ FULL TRANSCRIPTION (First 1000 chars):")
        print("="*60)
        print(results['transcription'][:1000] + "...")
        print("="*60 + "\n")
    
    def _save_results(self, results):
        """Save to Kaggle working directory (/kaggle/working/)"""
        output_file = "/kaggle/working/audio_summary.txt"
        
        with open(output_file, 'w', encoding='utf-8') as f:
            f.write("="*60 + "\n")
            f.write("KAGGLE AUDIO SUMMARY REPORT\n")
            f.write("="*60 + "\n\n")
            f.write(f"Audio File: {results['audio_file']}\n")
            f.write(f"Detected Language: {results['language']}\n")
            f.write(f"Original Length: {results['transcription_word_count']} words\n")
            f.write(f"Summary Length: {results['summary_word_count']} words\n")
            f.write("\n" + "="*60 + "\n")
            f.write("SUMMARY (English):\n")
            f.write("="*60 + "\n\n")
            f.write(results['summary'] + "\n\n")
            f.write("="*60 + "\n")
            f.write("FULL TRANSCRIPTION (English):\n")
            f.write("="*60 + "\n\n")
            f.write(results['transcription'] + "\n")
        
        print(f"üíæ Saved to: {output_file}")
        print("üì• Download from Kaggle Output section\n")


# ============================================================================
# USAGE IN KAGGLE NOTEBOOK
# ============================================================================

# Initialize the summarizer (run once)
summarizer = KaggleAudioSummarizer(
    whisper_model="large"  # Options: 'tiny', 'base', 'small', 'medium', 'large'
)

# Process your audio file
# Replace with your actual path from Kaggle Input
audio_file = "/kaggle/input/eng-hinbi-marathi-mix-audio/Japanese parenting style.m4a"

# Run the summarization
results = summarizer.process_audio_adaptive(
    audio_path=audio_file,
    save_output=True        # Save to /kaggle/working/
    #summary_length=500       # Adjust summary length
)

# Access individual results
print(f"\nüéØ Summary:\n{results['summary']}\n\n")


# ============================================================================
# EXAMPLE: PROCESS MULTIPLE FILES
# ============================================================================

def process_multiple_files(audio_folder):
    """Process all audio files in a Kaggle input folder"""
    
    import glob
    
    # Find all audio files
    audio_files = glob.glob(f"{audio_folder}/*.mp3") + \
                  glob.glob(f"{audio_folder}/*.wav") + \
                  glob.glob(f"{audio_folder}/*.m4a")
    
    print(f"Found {len(audio_files)} audio file(s)\n")
    
    all_results = []
    
    for audio in audio_files:
        print(f"\n{'='*60}")
        print(f"Processing: {os.path.basename(audio)}")
        print(f"{'='*60}\n")
        
        try:
            results = summarizer.process_audio(audio, save_output=False)
            all_results.append(results)
        except Exception as e:
            print(f"‚ùå Error: {e}\n")
            continue
    
    # Save combined results
    with open("/kaggle/working/all_summaries.txt", 'w', encoding='utf-8') as f:
        for r in all_results:
            f.write(f"\n{'='*60}\n")
            f.write(f"File: {r['audio_file']}\n")
            f.write(f"Language: {r['language']}\n")
            f.write(f"{'='*60}\n")
            f.write(f"SUMMARY:\n{r['summary']}\n\n\n")
    
    print(f"\n‚úÖ Processed {len(all_results)} files")
    print(f"üíæ Combined summaries saved to /kaggle/working/all_summaries.txt \n")
    
    return all_results

# Example usage:
# results = process_multiple_files("/kaggle/input/your-dataset-name")




2025-12-03 12:13:58.373261: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1764764038.394871     326 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1764764038.401302     326 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


AttributeError: 'MessageFactory' object has no attribute 'GetPrototype'

AttributeError: 'MessageFactory' object has no attribute 'GetPrototype'

AttributeError: 'MessageFactory' object has no attribute 'GetPrototype'

AttributeError: 'MessageFactory' object has no attribute 'GetPrototype'

AttributeError: 'MessageFactory' object has no attribute 'GetPrototype'


üì• Loading Whisper 'medium' model...
‚úÖ Whisper loaded!

üì• Loading BART summarization model...


Device set to use cuda:0


‚úÖ BART loaded!

‚ú® Ready to process audio files!


üéØ KAGGLE AUDIO SUMMARIZER (ADAPTIVE)

üéµ Audio file: Japanese parenting style.m4a
üì¶ Size: 1.92 MB
‚è≥ Transcribing... (this may take several minutes)

Detected language: English


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 23675/23675 [00:23<00:00, 1000.63frames/s]
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


‚úÖ Transcription complete!
üåç Detected: English
üìù Length: 3199 characters

üìä Adaptive summary: 250 words (from 501 words)

üìä Generating summary...
üìè Input: 501 words ‚Üí Target summary: 250 words
üìÑ Processing 1 chunk(s)...
  ‚û§ Chunk 1/1 (501 words ‚Üí 250 words)... ‚úì
‚úÖ Summary ready!

üìã RESULTS
üìÅ File: Japanese parenting style.m4a
üåç Language: English
üìè Original: 501 words
üìè Summary: 74 words
üìâ Compression: 85.2%

‚ú® SUMMARY (English):
Japanese obedience starts with emotional safety, not discipline. Before asking children to listen, the parents build an unshakable bond. Children who feel deeply secure do not fight for attention. They already have it. And children who feel safe listen faster. Japanese parents avoid power struggles altogether. They repeat expectations calmly, without changing the boundary. Natural consequences replace punishment. Children mirror the emotional climate they grew up in. The power of WE makes listening natural.

üìÑ

# ============================================================================
# QUICK START GUIDE FOR KAGGLE
# ============================================================================

"""
üöÄ QUICK START IN KAGGLE:

1. CREATE NEW NOTEBOOK
   - Go to kaggle.com
   - Create new notebook
   - Enable GPU (Settings ‚Üí Accelerator ‚Üí GPU T4 x2)

2. UPLOAD YOUR AUDIO
   - Add Dataset ‚Üí Upload
   - Upload your audio file(s)
   - Note the path: /kaggle/input/your-dataset-name/

3. RUN INSTALLATION (First cell):
   !pip install -q openai-whisper transformers accelerate

4. COPY THIS CODE to next cell and run

5. UPDATE AUDIO PATH:
   audio_file = "/kaggle/input/your-dataset-name/your_audio.mp3"

6. RUN THE CODE!

7. DOWNLOAD RESULTS:
   - Output section ‚Üí audio_summary.txt ‚Üí Download

‚ö° TIPS:
- Use GPU for 5-10x faster processing
- 'medium' model: best balance for Kaggle
- 'base' model: if you hit memory limits
- Files save to /kaggle/working/ automatically
"""

##
The ! prefix tells the notebook to run the command as a shell command within the cloud environment.
2. Install the System Dependency (PortAudio)
Even though Kaggle provides the Python packages, you still need the underlying system driver (libportaudio) on the cloud VM. Kaggle environments support installing these Linux packages using apt-get.
Add this to a code cell and run it: and after retart kernel

In [None]:
# Code after discussion with the team

# cloud code and system audio microphone cannot go hand in hand. hence wokr on a recoded audio mp3 file. 

In [None]:
## step 1: get audio file path 
# start this code on getting a user input : i.e when record audio icon is clicked at the frontend.
# once its clicked, start accessing the mic of the system and fetch the audio.
# then convert audio i.e speech to text.

# The PortAudioError: Error querying device -1 message confirms that the libraries are installed correctly within your Kaggle environment, but it cannot find a valid, accessible audio input device.
This error is expected because the Kaggle notebook runs in a cloud data center virtual machine that does not have a physical microphone attached to it.
The error happens when the sounddevice library tries to find the default microphone on the server and fails, returning a non-existent device ID (-1).

Summary of Your Situation
You successfully installed the necessary system libraries in Kaggle.
You successfully installed the Python libraries (sounddevice, etc.).
The limitation is physical hardware: You cannot access your MacBook's mic from the cloud server. 
What You Should Do Now
You must record your audio locally on your MacBook Air and upload the file to Kaggle to analyze it there.
Step 1: Record Locally
Use the local Python script we established earlier to record audio on your MacBook:

In [None]:
## Step 2: Speech to text: multilingual input speech to eng text.


In [None]:
## Step 3: Display the extracted text in the your notes display box. 
# so give that path. print the text in your notes section.
# save this text in a txt file in system or the directory for reference as rough_text.txt


In [None]:
## Step 4: On clicking the AI refined summmary key, i.e again get input from user and define this module,
# and run summarisation code on the rough_text.txt file.
# save this file as AI-refined-summary.txt for reference


In [None]:
## Step 5: On user input: fetch code for BRD or PO respectively 
# code for BRD: if user inputs BRD -Business Required Document format of the summary, 
# introduce this format to model and ask it to put the summary data in the fields required
# Similarly for PO - P order format.

Key:
YOUR_ANTHROPIC_API_KEY_HERE


curl https://api.anthropic.com/v1/messages \
        --header "x-api-key: YOUR_ANTHROPIC_API_KEY_HERE" \
        --header "anthropic-version: 2023-06-01" \
        --header "content-type: application/json" \
        --data \
    '{
        "model": "claude-sonnet-4-20250514",
        "max_tokens": 1024,
        "messages": [
            {"role": "user", "content": "Hello, world"}
        ]
    }'