# OrthodoxAI - Speech to Text

> Konstantinos Mpouros <br>
> Github: https://github.com/konstantinosmpouros?tab=repositories<br>
> Year: 2025

## About the Project

The **OrthodoxAI Speech to Text** is a project aimed at transcribing Orthodox sermons and speeches from MP3 audio files into accurate text format. The transcription process will be powered by the **Eleven Labs**, ensuring high accuracy and efficiency. By leveraging this advanced AI-driven speech recognition technology, the project will provide **precise and structured transcriptions**, preserving Orthodox teachings for research, study, and digital archiving.


## Libraries

In [1]:
# Data handling and manipulation
import pandas as pd
from utils import (
    search_for_audio_files,
    extract_theme,
    get_audio_file,
    sum_audio_duration
)

# Speech to Text API
from elevenlabs import ElevenLabs
from openai import OpenAI

# Transcription and logging
from tqdm import tqdm
import json
import logging
from concurrent.futures import ThreadPoolExecutor, as_completed

# Load the API keys
import os
from dotenv import load_dotenv, find_dotenv
dotenv_path = find_dotenv()
_ = load_dotenv(dotenv_path)

## Data Extraction

### Athanasios Mitilinaios

In [2]:
data_path = "../src/multi_agent/rag/data/Omilies/speeches_athanasios_mitilinaios"
knowledge_base_path = "../src/multi_agent/rag/knowledge_base/Omilies/speeches_athanasios_mitilinaios"

In [3]:
athanasios_mitilinaios = search_for_audio_files(data_path)
athanasios_mitilinaios["theme"] = athanasios_mitilinaios["file_name"].apply(extract_theme)
athanasios_mitilinaios.sample(10)

Unnamed: 0,file_name,file_path,theme
3025,3019_08-05-97_ΔΙΑΚΑΙΝΗΣΙΜΟΣ_ΕΒΔΟΜΑΣ_π_ΑΘ_ΜΥΤΙΛ...,../src/multi_agent/rag/data/Omilies/speeches_a...,ΔΙΑΚΑΙΝΗΣΙΜΟΣ_ΕΒΔΟΜΑΣ
1400,1394_31-10-83_ΚΑΤΗΧΗΣΕΙΣ_ΑΓΙΟΥ_ΚΥΡΙΛΛΟΥ_π_ΑΘ_Μ...,../src/multi_agent/rag/data/Omilies/speeches_a...,ΚΑΤΗΧΗΣΕΙΣ_ΑΓΙΟΥ_ΚΥΡΙΛΛΟΥ
4117,4111_ΑΠΑΝΤΗΣΕΙΣ_ΑΠΟΡΙΩΝ_ΑΝΩΤ_ΚΑΤΗΧΗΤΙΚΟΥ_π_ΑΘ_...,../src/multi_agent/rag/data/Omilies/speeches_a...,ΑΠΑΝΤΗΣΕΙΣ_ΑΠΟΡΙΩΝ_ΑΝΩΤ_ΚΑΤΗΧΗΤΙΚΟΥ
1781,1775_29-03-81_ΚΥΡΙΑΚΗ_Γ_ΝΗΣΤΕΙΩΝ_π_ΑΘ_ΜΥΤΙΛΗΝΑ...,../src/multi_agent/rag/data/Omilies/speeches_a...,ΚΥΡΙΑΚΗ_Γ_ΝΗΣΤΕΙΩΝ
4067,4061_ΑΠΑΝΤΗΣΕΙΣ_ΑΠΟΡΙΩΝ_ΑΝΩΤ_ΚΑΤΗΧΗΤΙΚΟΥ_π_ΑΘ_...,../src/multi_agent/rag/data/Omilies/speeches_a...,ΑΠΑΝΤΗΣΕΙΣ_ΑΠΟΡΙΩΝ_ΑΝΩΤ_ΚΑΤΗΧΗΤΙΚΟΥ
1423,1417_15-10-84_ΚΑΤΗΧΗΣΕΙΣ_ΑΓΙΟΥ_ΚΥΡΙΛΛΟΥ_π_ΑΘ_Μ...,../src/multi_agent/rag/data/Omilies/speeches_a...,ΚΑΤΗΧΗΣΕΙΣ_ΑΓΙΟΥ_ΚΥΡΙΛΛΟΥ
1000,0994_22-03-92_ΠΡΑΞΕΙΣ_ΤΩΝ_ΑΠΟΣΤΟΛΩΝ_π_ΑΘ_ΜΥΤΙΛ...,../src/multi_agent/rag/data/Omilies/speeches_a...,ΠΡΑΞΕΙΣ_ΤΩΝ_ΑΠΟΣΤΟΛΩΝ
1932,1926_09-09-84_ΚΥΡΙΑΚΗ_ΠΡΟ_ΤΗΣ_ΥΨΩΣΕΩΣ_π_ΑΘ_ΜΥΤ...,../src/multi_agent/rag/data/Omilies/speeches_a...,ΚΥΡΙΑΚΗ_ΠΡΟ_ΤΗΣ_ΥΨΩΣΕΩΣ
1310,1304_14-01-74_ΠΕΡΙ_ΑΓΑΠΗΣ_Γ_π_ΑΘ_ΜΥΤΙΛΗΝΑΙΟΥ.mp3,../src/multi_agent/rag/data/Omilies/speeches_a...,ΠΕΡΙ_ΑΓΑΠΗΣ_Γ
2007,2001_25-05-86_ΚΥΡΙΑΚΗ_ΤΟΥ_ΠΑΡΑΛΥΤΟΥ_π_ΑΘ_ΜΥΤΙΛ...,../src/multi_agent/rag/data/Omilies/speeches_a...,ΚΥΡΙΑΚΗ_ΤΟΥ_ΠΑΡΑΛΥΤΟΥ


In [4]:
athanasios_mitilinaios['theme'].nunique()

278

In [5]:
theme_counts = athanasios_mitilinaios["theme"].value_counts()
frequent_themes = theme_counts[theme_counts > 10]
pd.DataFrame(frequent_themes)

Unnamed: 0_level_0,count
theme,Unnamed: 1_level_1
ΑΠΑΝΤΗΣΕΙΣ_ΑΠΟΡΙΩΝ_ΑΝΩΤ_ΚΑΤΗΧΗΤΙΚΟΥ,1017
ΑΠΑΝΤΗΣΕΙΣ_ΑΠΟΡΙΩΝ,321
ΣΟΦΙΑ_ΣΕΙΡΑΧ,296
ΠΡΑΞΕΙΣ_ΤΩΝ_ΑΠΟΣΤΟΛΩΝ,263
ΚΑΤΗΧΗΣΕΙΣ_ΑΓΙΟΥ_ΚΥΡΙΛΛΟΥ,211
ΕΙΣ_ΤΗΝ_ΥΠΕΡΑΓΙΑΝ_ΘΕΟΤΟΚΟΝ,110
ΙΕΡΑ_ΑΠΟΚΑΛΥΨΙΣ,103
ΕΙΣ_ΠΡΟΣΚΥΝΗΤΑΣ,102
ΠΡΟΦΗΤΗΣ_ΗΣΑΙΑΣ,92
ΣΥΓΧΡΟΝΑ_ΚΑΥΤΑ_ΘΕΜΑΤΑ,70


In [6]:
example = athanasios_mitilinaios.sample(5)
example

Unnamed: 0,file_name,file_path,theme
1897,1891_20-11-83_ΚΥΡΙΑΚΗ_Θ_ΛΟΥΚΑ_π_ΑΘ_ΜΥΤΙΛΗΝΑΙΟΥ...,../src/multi_agent/rag/data/Omilies/speeches_a...,ΚΥΡΙΑΚΗ_Θ_ΛΟΥΚΑ
2045,2039_07-06-87_ΚΥΡΙΑΚΗ_ΑΓΙΑ_ΠΕΝΤΗΚΟΣΤΗ_-_ΕΙΣ_ΤΟ...,../src/multi_agent/rag/data/Omilies/speeches_a...,ΚΥΡΙΑΚΗ_ΑΓΙΑ_ΠΕΝΤΗΚΟΣΤΗ_-_ΕΙΣ_ΤΟΝ_ΕΣΠΕΡΙΝΟΝ
4018,4012_ΑΠΑΝΤΗΣΕΙΣ_ΑΠΟΡΙΩΝ_ΑΝΩΤ_ΚΑΤΗΧΗΤΙΚΟΥ_π_ΑΘ_...,../src/multi_agent/rag/data/Omilies/speeches_a...,ΑΠΑΝΤΗΣΕΙΣ_ΑΠΟΡΙΩΝ_ΑΝΩΤ_ΚΑΤΗΧΗΤΙΚΟΥ
3695,3689_ΑΠΑΝΤΗΣΕΙΣ_ΑΠΟΡΙΩΝ_ΑΝΩΤ_ΚΑΤΗΧΗΤΙΚΟΥ_π_ΑΘ_...,../src/multi_agent/rag/data/Omilies/speeches_a...,ΑΠΑΝΤΗΣΕΙΣ_ΑΠΟΡΙΩΝ_ΑΝΩΤ_ΚΑΤΗΧΗΤΙΚΟΥ
3725,3719_ΑΠΑΝΤΗΣΕΙΣ_ΑΠΟΡΙΩΝ_ΑΝΩΤ_ΚΑΤΗΧΗΤΙΚΟΥ_π_ΑΘ_...,../src/multi_agent/rag/data/Omilies/speeches_a...,ΑΠΑΝΤΗΣΕΙΣ_ΑΠΟΡΙΩΝ_ΑΝΩΤ_ΚΑΤΗΧΗΤΙΚΟΥ


In [7]:
example['file_path']

1897    ../src/multi_agent/rag/data/Omilies/speeches_a...
2045    ../src/multi_agent/rag/data/Omilies/speeches_a...
4018    ../src/multi_agent/rag/data/Omilies/speeches_a...
3695    ../src/multi_agent/rag/data/Omilies/speeches_a...
3725    ../src/multi_agent/rag/data/Omilies/speeches_a...
Name: file_path, dtype: object

In [8]:
print(type(example['file_path'].values))
print(example['file_path'].values[0])

<class 'numpy.ndarray'>
../src/multi_agent/rag/data/Omilies/speeches_athanasios_mitilinaios/1891_20-11-83_ΚΥΡΙΑΚΗ_Θ_ΛΟΥΚΑ_π_ΑΘ_ΜΥΤΙΛΗΝΑΙΟΥ.mp3


In [9]:
audio_file = get_audio_file(example['file_path'].values[0])
audio_file

In [10]:
duration_seconds = sum_audio_duration(data_path)
print(f"Total duration: {duration_seconds:.2f} seconds")
print(f"Total duration: {duration_seconds / 60:.2f} minutes")

Total duration: 9114800.59 seconds
Total duration: 151913.34 minutes


## Speech to Text

In [11]:
def transcription(file_name, file_path, transcribe_func, knowledge_base_path):
    json_file_name = os.path.splitext(file_name)[0] + ".json"
    json_file_path = os.path.join(knowledge_base_path, json_file_name)

    # Skip if transcription already exists
    if os.path.exists(json_file_path):
        logging.info(f"Skipping {file_name}: Transcription already exists.")
        return None

    # Load the audio file
    try:
        audio_file = open(file_path, "rb")
    except Exception as e:
        logging.error(f"Error loading audio file {file_name}: {e}")
        return None

    # Transcribe the audio using the passed function
    try:
        transcription_text = transcribe_func(audio_file, file_name)
    except Exception as e:
        logging.error(f"Error transcribing {file_name}: {e}")
        return None

    # Save transcription as JSON
    transcription_dict = {"content": transcription_text}
    try:
        with open(json_file_path, "w", encoding="utf-8") as f:
            json.dump(transcription_dict, f, ensure_ascii=False, indent=4)
        logging.info(f"Successfully transcribed and saved {file_name}.")
        return file_name  # Return file_name to track processed files
    except Exception as e:
        logging.error(f"Error saving transcription for {file_name}: {e}")
        return None

In [12]:
def parallel_transcription(bata_path, knowledge_base_path, transcribe_fn):
    try:
        # Ensure knowledge base directory exists
        os.makedirs(knowledge_base_path, exist_ok=True)
        
        # Search for all audio files
        df_files = search_for_audio_files(bata_path)
    
        # Use ThreadPoolExecutor to process files in parallel
        with ThreadPoolExecutor(max_workers=10) as executor:
            futures = {
                executor.submit(transcription,
                                row['file_name'],
                                row['file_path'],
                                transcribe_fn,
                                knowledge_base_path): row['file_name']
                for _, row in df_files.iterrows()
            }
            
            for future in tqdm(as_completed(futures), total=len(futures), desc="Processing audio files", unit='file'):
                future.result()  # This will raise any exceptions encountered during processing
        
        logging.info("All files processed.")
    except Exception as ex:
        logging.error(f"Error in the transcription process: {ex}")

In [13]:
def elevenlabs_transcribe(audio_file, file_name):    
    client = ElevenLabs(api_key=os.getenv('ELEVENLABS_API_KEY'))
        
    result = client.speech_to_text.convert(
        model_id='scribe_v1',
        file=audio_file,
        tag_audio_events=False
    )
    return result.text

In [14]:
def openai_transcribe(audio_file, file_name):
    client = OpenAI()
    
    transcription = client.audio.transcriptions.create(
      model="whisper-1", 
      file=audio_file
    )
    return transcription.text

### Athanasios Mitilinaios

In [15]:
# Configure logging
logging.basicConfig(
    filename="logs/athanasios_mitilinaios_transcription.log",
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s",
)

# Paths to audio files and knowledge base where the transcription text will be stored
athanasios_mitilinaios_data = "../src/multi_agent/rag/data/Omilies/speeches_athanasios_mitilinaios"
athanasios_mitilinaios_knowledge_base = "../src/multi_agent/rag/knowledge_base/Omilies/speeches_athanasios_mitilinaios"

* ElevenLabs

In [None]:
parallel_transcription(athanasios_mitilinaios_data,
                       athanasios_mitilinaios_knowledge_base,
                       elevenlabs_transcribe)

* OpenAI

In [None]:
parallel_transcription(athanasios_mitilinaios_data,
                       athanasios_mitilinaios_knowledge_base,
                       openai_transcribe)