<a href="https://www.kaggle.com/code/galenchen/audio-recording-to-markdown?scriptVersionId=167019294" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# Python Script for Automated Audio Transcription and Text Organization

This Python script is designed to automate the process of downloading audio files, transcribing them, organizing the transcriptions into markdown format, and then sending these organized texts via email. This guide will help you understand how to use the script, its requirements, and how to customize it for your needs.

## Prerequisites

- Python 3 environment (The code has been tested in a Kaggle Python environment)
- Installed libraries: `openai-whisper`, `google-generativeai`, `gdown`, `gspread`, `oauth2client`, `gspread_dataframe`, and other standard libraries like `os`, `numpy`, `pandas`, etc.
- A Kaggle account (for Kaggle Secrets) or access to API keys and service accounts for Google Sheets, Google Drive, SMTP server for email, and Google's Generative AI service.
- Google Sheets for storing links to audio files and configuration parameters.
- For example: [This Google Sheet](https://docs.google.com/spreadsheets/d/1SGzPdPSJdxxKHX2cx2bMjnKeveaOXtwT0q1w1x5YO80/edit?usp=sharing) is used to control this kaggle notebook that excutes every day. If the code finds new files, it will do the audio to markdown notes conversion.

## Installation

1. Ensure that you have Python 3.x installed on your machine.
2. Install the required Python packages by running the following commands in your terminal:


```bash
pip install openai-whisper google-generativeai gdown gspread oauth2client gspread_dataframe
```

3. Clone or download the script from the provided source (not applicable here; follow instructions from where you found this README).

## Setting Up Your Environment

1. **Google Sheets Authentication**: You need to create a service account on Google Cloud, download the JSON key, and place it securely in your project. This key will be used to access Google Sheets where your audio file links and parameters are stored.

2. **API Keys**: Obtain the necessary API keys for Whisper (for audio transcription), Google's Generative AI service (for organizing text), and any other service you're using.

3. **Secrets**: If you're running this script in an environment that supports secrets (like Kaggle), add your API keys and service account information as secrets. Otherwise, ensure they are securely stored and accessed within your script.

## Configuration

Before running the script, perform the following configurations:
1. **Google Sheets and Drive Authentication**:
   - Create a Google Cloud service account.
   - Download the JSON key for the service account.
   - Load the JSON key into the script to authenticate with Google Sheets and Drive.


2. **API Keys and Secrets**:
   - Use Kaggle Secrets or a secure method to store and access your API keys for Google's Generative AI and SMTP credentials for sending emails.


3. **Google Sheets Setup**:
   - Prepare two Google Sheets:
     1. The first sheet stores the links to the audio files to be downloaded and transcribed.
     2. The second sheet contains configuration parameters for transcription and text organization, such as model names, language settings, and API keys.
   - You must configure these sheets according to your needs. Refer to [this Google sheet](https://docs.google.com/spreadsheets/d/1SGzPdPSJdxxKHX2cx2bMjnKeveaOXtwT0q1w1x5YO80/edit?usp=sharing) for details on how to structure your sheets.


4. **Email SMTP Server**:
   - Configure your SMTP server details for sending emails with the organized markdown files as attachments.


## Running the Script
To execute the script, simply run it in your Python environment. The script will automatically:
1. Download audio files from the links provided in the Google Sheet.
2. Transcribe audio files using OpenAI's Whisper model.
3. Organize the transcribed text into markdown format using Google's Generative AI.
4. Email the organized markdown files to the specified recipient.
Make sure your environment variables, API keys, and Google Sheets are correctly set up and accessible by the script.

## Customization
The script functions are modular, allowing you to modify specific processes, such as how files are downloaded, the transcription process, or the text organization algorithm. You can adjust parameters within the Google Sheet or modify the script directly to meet your specific requirements.

## Troubleshooting
If you encounter any issues, check the following:
- Ensure all dependencies are installed.
- Verify your API keys and service accounts are correctly set up and have the necessary permissions.
- Check your Google Sheets for correct formatting and data.

## Conclusion
This script automates a comprehensive workflow for processing audio files, making it a valuable tool for anyone needing to transcribe and organize audio content efficiently. Customize it to fit your project, and enjoy streamlined audio processing!

In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

!pip install openai-whisper
!pip install google-generativeai
!pip install gdown
!pip install gspread oauth2client
!pip install gspread_dataframe

import os
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

import shutil
import whisper
import google.generativeai as genai
from pathlib import Path
import gdown
import requests
from datetime import datetime

import smtplib
from email.mime.multipart import MIMEMultipart
from email.mime.base import MIMEBase
from email.mime.text import MIMEText
from email import encoders

from kaggle_secrets import UserSecretsClient # These values should be kept secret.
user_secrets = UserSecretsClient()
secret_value_0 = user_secrets.get_secret("gcp_gsheets")
secret_value_1 = user_secrets.get_secret("gemini_api")
secret_value_2 = user_secrets.get_secret("gmail_smtp")
secret_value_3 = user_secrets.get_secret("gmail_username")

import gspread
from oauth2client.service_account import ServiceAccountCredentials
from gspread_dataframe import get_as_dataframe

import json

# Since the secret is in string format, convert it back to a dictionary
service_account_info = json.loads(secret_value_0)

# Use this dictionary to authenticate
scope = ['https://spreadsheets.google.com/feeds', 'https://www.googleapis.com/auth/drive']
creds = ServiceAccountCredentials.from_json_keyfile_dict(service_account_info, scope)
client = gspread.authorize(creds)

print()
print("Installation and import completed.")

Collecting openai-whisper
  Downloading openai-whisper-20231117.tar.gz (798 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m798.6/798.6 kB[0m [31m21.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l- \ | / done
[?25h  Getting requirements to build wheel ... [?25l- done
[?25h  Preparing metadata (pyproject.toml) ... [?25l- done
[?25hCollecting triton<3,>=2.0.0 (from openai-whisper)
  Downloading triton-2.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.4 kB)
Collecting tiktoken (from openai-whisper)
  Downloading tiktoken-0.6.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)
Downloading triton-2.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (167.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m167.9/167.9 MB[0m [31m9.1 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading tiktoken-0.6.0-cp310-cp310-manylinux_2_17_x86_64.ma

In [2]:
def gdown_download(dataset_directory, df):
    errors = []  # List to store error messages
    for index, row in df.iterrows():
        google_drive_link = row['Google Drive Link of Files']
        file_name_provided = row.get('File name')  # Get the file name if provided
        
        if pd.isna(file_name_provided):
            # Skip this file and record an error if file name is not provided
            errors.append(f"File name not provided for link at row {index+1}. Skipping download.")
            continue
        
        try:
            # Try to extract the file ID from the link
            file_id = google_drive_link.split('/d/')[1].split('/')[0]
        except IndexError:
            errors.append(f"Could not extract file ID from link at row {index+1}. Skipping download.")
            continue
        
        # Construct the Google Drive download URL
        download_url = f"https://drive.google.com/uc?id={file_id}"
        
        filename = file_name_provided  # Use the provided file name
        
        # Define the output path
        output_path = f"{dataset_directory}/{filename}"
        
        # Download the file using gdown
        try:
            gdown.download(download_url, output_path, quiet=False)
            print(f"File {index+1} has been downloaded and saved to: {output_path}")
        except Exception as e:
            errors.append(f"Error downloading file at row {index+1}: {str(e)}")

    # Check for errors and print them
    if errors:
        for error in errors:
            print(error)
    else:
        print("All files with provided names have been successfully downloaded.")
    

def transcribe_audio_files(dataset_directory, model_name="base", language="en", without_timestamps=True):
    base_output_directory = Path(f"{dataset_directory}/transcribed_files")
    os.makedirs(base_output_directory, exist_ok=True)

    print(f"Loading model '{model_name}'...")
    whisper_model = whisper.load_model(model_name)

    for root, dirs, files in os.walk(dataset_directory):
        for file in files:
            if file.endswith((".mp3", ".wav", ".flac", ".m4a")):
                audio_path = os.path.join(root, file)
                print(f"Transcribing {audio_path} with model '{model_name}' in {language}...")
                
                # Perform transcription without timestamps
                print("Transcribing...")
                result = whisper_model.transcribe(audio_path, language=language, without_timestamps=without_timestamps)
                transcription = result["text"]

                # Construct output path
                output_directory = base_output_directory / Path(root).relative_to(dataset_directory)
                os.makedirs(output_directory, exist_ok=True)

                # Use the original file's base name for the transcription file
                original_base_name = Path(file).stem
                t_filename = f"{original_base_name}_{model_name}_{language}.txt"
                t_path = output_directory / t_filename

                with open(t_path, 'w') as text_file:
                    text_file.write(transcription)
                
                print(f"Transcription completed and saved to {t_path}\n")
                
                
def texts_to_organized_mds(api_key, directory_path, temp=0.5, top_p=0.5, top_k=2, max_tokens=4096, prompt=""):
    try:
        # Configure API with the provided key
        genai.configure(api_key=api_key)

        # Iterate over all .txt files in the specified directory
        for filename in os.listdir(directory_path):
            if filename.endswith('.txt'):
                s_path = os.path.join(directory_path, filename)
                base_name = os.path.splitext(filename)[0]
                organized_output = os.path.join(directory_path, f"{base_name}_organized.md")

                # Read from the text file
                print(f"Reading from text file {s_path}...")
                with open(s_path, 'r') as file:
                    transcription = file.read()

                # Organize text using the provided API
                full_prompt = prompt + "\n\n" + transcription
                model = genai.GenerativeModel(model_name="gemini-pro", 
                                              generation_config={"temperature": temp, "top_p": top_p, "top_k": top_k, "max_output_tokens": max_tokens}, 
                                              safety_settings=[{"category": "HARM_CATEGORY_HARASSMENT", "threshold": "BLOCK_ONLY_HIGH"}, 
                                                               {"category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "BLOCK_ONLY_HIGH"}, 
                                                               {"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "BLOCK_ONLY_HIGH"}, 
                                                               {"category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "BLOCK_ONLY_HIGH"}])
                print("Organizing text...")
                response = model.generate_content([full_prompt])

                # Save the organized text in markdown format
                with open(organized_output, 'w') as file:
                    file.write(response.text)
                print(f"Organized text saved to {organized_output}")

    except Exception as e:
        print(f"An error occurred: {e}")


def send_files_via_email(send_from, send_to, subject, message, files, server, port, username, password):
    msg = MIMEMultipart()
    msg['From'] = send_from
    msg['To'] = send_to
    msg['Subject'] = subject

    msg.attach(MIMEText(message))

    for file in files:
        part = MIMEBase('application', "octet-stream")
        with open(file, 'rb') as file_attachment:
            part.set_payload(file_attachment.read())
        encoders.encode_base64(part)
        part.add_header('Content-Disposition', 'attachment; filename="{}"'.format(os.path.basename(file)))
        msg.attach(part)

    try:
        with smtplib.SMTP_SSL(server, port) as smtp:
            smtp.login(username, password)
            smtp.sendmail(send_from, send_to, msg.as_string())
            smtp.quit()

        print("Email sent successfully!")
        
    except Exception as e:
        print(f"Failed to send email: {e}")
        

def fetch_sheet_data_as_dataframe(spreadsheet_title, sheet_number):
    # Open the spreadsheet by its title
    spreadsheet = client.open(spreadsheet_title)
    
    # Select the sheet by number (1-indexed to match user expectation, so subtract 1 for 0-indexed)
    sheet = spreadsheet.get_worksheet(sheet_number - 1)
    
    # Convert the entire sheet to a DataFrame
    df = get_as_dataframe(sheet, evaluate_formulas=True, headers=None, skiprows=0)
    
    # Drop rows where all elements are nan
    df.dropna(how='all', inplace=True)
    
    # Drop columns where all elements are nan
    df.dropna(axis=1, how='all', inplace=True)
    
    # Reset the index after dropping nan rows
    df.reset_index(drop=True, inplace=True)
    
    return df

def assign_values_from_df(df):
    values_dict = {}
    for _, row in df.iterrows():
        if pd.notna(row['Parameters']):
            if row['Parameters'] in ['without_timestamp', 'top_k', 'port', 'max_tokens']:
                value = int(row['Values'])
            elif row['Parameters'] in ['temp', 'top_p']:
                value = float(row['Values'])
            else:
                value = row['Values']
            values_dict[row['Parameters']] = value
    return values_dict

def create_variable():
    # String you want to turn into a variable name
    string_name = "variable"
    
    # Value you want to assign to the variable
    value = "This is the value of the variable."
    
    # Use the string to create a local variable with the specified value
    locals()[string_name] = value
    
    
# Map variable names to their values
variable_mapping = {
    "secret_value_0": secret_value_0,
    "secret_value_1": secret_value_1,
    "secret_value_2": secret_value_2,
    "secret_value_3": secret_value_3,
}

def get_secret_value(variable_name):
    """
    Returns the value of a secret variable based on its name.

    Args:
    - variable_name: The name of the variable as a string.

    Returns:
    - The value of the variable if it exists, otherwise None.
    """
    return variable_mapping.get(variable_name)


def check_new_files(df):
    # Check if the DataFrame is empty by verifying if it has any rows beyond the title row.
    new_files = not df.empty
    return new_files


def audio_to_md(new_files):
    if not new_files:
        print("There are no new files for transcribing, therefore code terminates.")
        return

    # Assuming the existence of all functions called below.
    print("New files found for transcribing.")
    df2 = fetch_sheet_data_as_dataframe("Audio recording to Markdown", 2)

    values_dict = assign_values_from_df(df2)

    dataset_directory = values_dict['dataset_directory']

    model_name = values_dict['model_name']
    language = values_dict['language']
    without_timestamp = values_dict['without_timestamp']

    api_key = get_secret_value(values_dict['api_key'])
    temp = values_dict['temp']
    top_p = values_dict['top_p']
    top_k = values_dict['top_k']
    max_tokens = values_dict['max_tokens']
    prompt = values_dict['prompt']

    send_from = get_secret_value(values_dict['send_from'])
    send_to = values_dict['send_to']
    subject = values_dict['subject']
    message = values_dict['message']
    server = values_dict['server']
    port = values_dict['port']
    username = send_from  # Since the smtp service is hosted by the same user account.
    password = get_secret_value(values_dict['password'])

    print("\nParameters fetched.")

    gdown_download(dataset_directory, df1)
    
    transcribe_audio_files(dataset_directory, model_name, language, without_timestamp)
    
    texts_to_organized_mds(api_key=api_key, directory_path=f"{dataset_directory}/transcribed_files",
                           temp=temp, top_p=top_p, top_k=top_k, max_tokens=max_tokens, prompt=prompt)
    
    directory_path = f"{dataset_directory}/transcribed_files"
    files = [os.path.join(directory_path, f) for f in os.listdir(directory_path) if os.path.isfile(os.path.join(directory_path, f))]
    
    send_files_via_email(send_from, send_to, subject, message, files, server, port, username, password)

    

print()
print("All functions loaded.")


All functions loaded.


In [3]:
df1 = fetch_sheet_data_as_dataframe("Audio recording to Markdown", 1)

new_files = check_new_files(df1)

audio_to_md(new_files)

New files found for transcribing.

Parameters fetched.


Downloading...
From: https://drive.google.com/uc?id=1k2WDfxZZOl4cLF5ACdyMsCT2uRMEnRmC
To: /kaggle/working/Test.m4a
100%|██████████| 37.4k/37.4k [00:00<00:00, 37.5MB/s]


File 1 has been downloaded and saved to: /kaggle/working/Test.m4a
All files with provided names have been successfully downloaded.
Loading model 'large-v3'...


100%|█████████████████████████████████████| 2.88G/2.88G [00:44<00:00, 68.7MiB/s]


Transcribing /kaggle/working/Test.m4a with model 'large-v3' in zh...
Transcribing...
Transcription completed and saved to /kaggle/working/transcribed_files/Test_large-v3_zh.txt

Reading from text file /kaggle/working/transcribed_files/Test_large-v3_zh.txt...
Organizing text...
Organized text saved to /kaggle/working/transcribed_files/Test_large-v3_zh_organized.md
Email sent successfully!
