# Orthodox Speech-to-Text Converter

> Konstantinos Mpouros <br>
> Github: https://github.com/konstantinosmpouros?tab=repositories<br>
> Year: 2025

## About the Project

> The **Orthodox Speech-to-Text Converter** is a project aimed at transcribing Orthodox sermons and speeches from MP3 audio files into accurate text format. The transcription process will be powered by the **Eleven Labs API** and **Azure Speech-to-Text service**, ensuring high accuracy and efficiency. By leveraging these advanced AI-driven speech recognition technologies, the project will provide **precise and structured transcriptions**, preserving Orthodox teachings for research, study, and digital archiving.


## Libraries

In [1]:
# Data handling and manupilation
import pandas as pd
from modules import search_for_audio_files, extract_theme

# Speech to Text APIs
import elevenlabs

# Load the API keys
import os
from dotenv import load_dotenv
_ = load_dotenv()

## Speech to Text

### Speeches of Athanasios Mitilinaios

In [None]:
# Example usage:
athanasios_mitilinaios = search_for_audio_files("../src/rag/content_store/data/Omilies/speeches_athanasios_mitilinaios")
athanasios_mitilinaios["theme"] = athanasios_mitilinaios["file_name"].apply(extract_theme)
athanasios_mitilinaios.sample(10)

Unnamed: 0,file_name,file_path,theme
4408,4402_ΑΠΑΝΤΗΣΕΙΣ_ΑΠΟΡΙΩΝ_π_ΑΘ_ΜΥΤΙΛΗΝΑΙΟΥ.mp3,Data/speeches_athanasios_mitilinaios/4402_ΑΠΑΝ...,ΑΠΑΝΤΗΣΕΙΣ_ΑΠΟΡΙΩΝ
1247,1241_12-12-82_ΙΕΡΑ_ΑΠΟΚΑΛΥΨΙΣ_π_ΑΘ_ΜΥΤΙΛΗΝΑΙΟΥ...,Data/speeches_athanasios_mitilinaios/1241_12-1...,ΙΕΡΑ_ΑΠΟΚΑΛΥΨΙΣ
2250,2244_02-05-93_ΚΥΡΙΑΚΗ_ΤΩΝ_ΜΥΡΟΦΟΡΩΝ_π_ΑΘ_ΜΥΤΙΛ...,Data/speeches_athanasios_mitilinaios/2244_02-0...,ΚΥΡΙΑΚΗ_ΤΩΝ_ΜΥΡΟΦΟΡΩΝ
2109,2103_29-01-89_ΚΥΡΙΑΚΗ_ΙΕ_ΛΟΥΚΑ_π_ΑΘ_ΜΥΤΙΛΗΝΑΙΟ...,Data/speeches_athanasios_mitilinaios/2103_29-0...,ΚΥΡΙΑΚΗ_ΙΕ_ΛΟΥΚΑ
523,0517_12-12-77_ΑΝΑΛΥΣΙΣ_ΨΑΛΜΩΝ_π_ΑΘ_ΜΥΤΙΛΗΝΑΙΟΥ...,Data/speeches_athanasios_mitilinaios/0517_12-1...,ΑΝΑΛΥΣΙΣ_ΨΑΛΜΩΝ
1814,1808_11-04-82_ΚΥΡΙΑΚΗ_ΤΩΝ_ΒΑΙΩΝ_π_ΑΘ_ΜΥΤΙΛΗΝΑΙ...,Data/speeches_athanasios_mitilinaios/1808_11-0...,ΚΥΡΙΑΚΗ_ΤΩΝ_ΒΑΙΩΝ
2021,2015_28-09-86_ΚΥΡΙΑΚΗ_Α_ΛΟΥΚΑ_π_ΑΘ_ΜΥΤΙΛΗΝΑΙΟΥ...,Data/speeches_athanasios_mitilinaios/2015_28-0...,ΚΥΡΙΑΚΗ_Α_ΛΟΥΚΑ
1671,1665_14-08-84_ΕΙΣ_ΤΗΝ_ΥΠΕΡΑΓΙΑΝ_ΘΕΟΤΟΚΟΝ_π_ΑΘ_...,Data/speeches_athanasios_mitilinaios/1665_14-0...,ΕΙΣ_ΤΗΝ_ΥΠΕΡΑΓΙΑΝ_ΘΕΟΤΟΚΟΝ
21,0022_10-04-83_ΑΝΘΡΩΠΟΛΟΓΙΑ_π_ΑΘ_ΜΥΤΙΛΗΝΑΙΟΥ.mp3,Data/speeches_athanasios_mitilinaios/0022_10-0...,ΑΝΘΡΩΠΟΛΟΓΙΑ
4003,3997_ΑΠΑΝΤΗΣΕΙΣ_ΑΠΟΡΙΩΝ_ΑΝΩΤ_ΚΑΤΗΧΗΤΙΚΟΥ_π_ΑΘ_...,Data/speeches_athanasios_mitilinaios/3997_ΑΠΑΝ...,ΑΠΑΝΤΗΣΕΙΣ_ΑΠΟΡΙΩΝ_ΑΝΩΤ_ΚΑΤΗΧΗΤΙΚΟΥ


In [8]:
athanasios_mitilinaios['theme'].nunique()

278

In [3]:
theme_counts = athanasios_mitilinaios["theme"].value_counts()
frequent_themes = theme_counts[theme_counts > 10]
pd.DataFrame(frequent_themes)

Unnamed: 0_level_0,count
theme,Unnamed: 1_level_1
ΑΠΑΝΤΗΣΕΙΣ_ΑΠΟΡΙΩΝ_ΑΝΩΤ_ΚΑΤΗΧΗΤΙΚΟΥ,1017
ΑΠΑΝΤΗΣΕΙΣ_ΑΠΟΡΙΩΝ,321
ΣΟΦΙΑ_ΣΕΙΡΑΧ,296
ΠΡΑΞΕΙΣ_ΤΩΝ_ΑΠΟΣΤΟΛΩΝ,263
ΚΑΤΗΧΗΣΕΙΣ_ΑΓΙΟΥ_ΚΥΡΙΛΛΟΥ,211
ΕΙΣ_ΤΗΝ_ΥΠΕΡΑΓΙΑΝ_ΘΕΟΤΟΚΟΝ,110
ΙΕΡΑ_ΑΠΟΚΑΛΥΨΙΣ,103
ΕΙΣ_ΠΡΟΣΚΥΝΗΤΑΣ,102
ΠΡΟΦΗΤΗΣ_ΗΣΑΙΑΣ,92
ΣΥΓΧΡΟΝΑ_ΚΑΥΤΑ_ΘΕΜΑΤΑ,70


In [6]:
example = athanasios_mitilinaios.sample(1)
example

Unnamed: 0,file_name,file_path,theme
3679,3673_ΑΠΑΝΤΗΣΕΙΣ_ΑΠΟΡΙΩΝ_ΑΝΩΤ_ΚΑΤΗΧΗΤΙΚΟΥ_π_ΑΘ_...,Data/speeches_athanasios_mitilinaios/3673_ΑΠΑΝ...,ΑΠΑΝΤΗΣΕΙΣ_ΑΠΟΡΙΩΝ_ΑΝΩΤ_ΚΑΤΗΧΗΤΙΚΟΥ
