# X-SAMPA TTS Synthesizer

**Objective**: To build a simple Text-to-Speech Synthesizer using X-SAMPA transcriptions.

**Technologies**:
- Python
    - csv
    - keras
    - sklearn
- X-Sampa
- librosa (for audio processing)

#### 1.1 Split the data into training, validation, and test sets.

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split

# Load the CSV file
data = pd.read_csv('transcription.csv')

# Shuffle the data (important to ensure random distribution)
data = data.sample(frac=1).reset_index(drop=True)

# Split the data into training, validation, and test sets
train, temp = train_test_split(data, test_size=0.20, random_state=42)
valid, test = train_test_split(temp, test_size=0.50, random_state=42)

# Save the split data into new CSV files
train.to_csv('train.csv', index=False)
valid.to_csv('valid.csv', index=False)
test.to_csv('test.csv', index=False)

print(f"Train set has {train.shape[0]} samples.")
print(f"Validation set has {valid.shape[0]} samples.")
print(f"Test set has {test.shape[0]} samples.")

Train set has 184 samples.
Validation set has 23 samples.
Test set has 23 samples.


#### 2.1 Convert X-SAMPA transcriptions to a set of phonetic features.

In [2]:
from keras.preprocessing.sequence import pad_sequences

# Load the training data
data = pd.read_csv('train.csv')
transcriptions = data['Transcription'].tolist()

# Create a mapping of X-SAMPA symbols to integers
all_symbols = set(''.join(transcriptions))
symbol_to_int = {symbol: idx for idx, symbol in enumerate(all_symbols, 1)}
symbol_to_int['<PAD>'] = 0  # padding symbol
symbol_to_int['<OOV>'] = len(symbol_to_int)  # out-of-vocabulary symbol

# Convert transcriptions to integer sequences
int_sequences = []
for transcription in transcriptions:
    int_seq = [symbol_to_int.get(symbol, symbol_to_int['<OOV>']) for symbol in transcription]
    int_sequences.append(int_seq)

# Pad sequences to the same length
max_length = max([len(seq) for seq in int_sequences])
padded_sequences = pad_sequences(int_sequences, maxlen=max_length, padding='post', value=symbol_to_int['<PAD>'])

print(padded_sequences)


ModuleNotFoundError: No module named 'keras'

#### 2.2 Spectrogram of the audio files as the target to predict

In [None]:
spectrograms = []

for file in audio_files:
    y, sr = librosa.load(file, sr=None)
    D = librosa.stft(y)  # Short-Time Fourier Transform
    spec = np.abs(D)  # get magnitude
    spectrograms.append(spec.T)

# As with MFCCs, you can pad your spectrograms if required
max_length = max([len(spec) for spec in spectrograms])
spectrograms_padded = [np.pad(spec, ((0, max_length - len(spec)), (0, 0)), mode='constant') for spec in spectrograms]
