<a href="https://colab.research.google.com/github/radhakrishnan-omotec/speechkraft-repo/blob/main/MrityunjayGupta_Project_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Development of an Embedded Speech-to-Text and Emotion Recognition Device with Haptic Feedback for Individuals with Hearing Impairments

**Author** : MRITYUNJAY GUPTA

**Abstract** : This notebook provides a detailed stepwise methodology, incorporating literature review, hardware selection, and optimization strategies for embedded system development.


### Notebook: Speech-to-Text and Emotion Recognition with Haptic Feedback


# Step 1: Import Necessary Libraries

In [None]:
# Step 1: Import Necessary Libraries
```python
import numpy as np
import matplotlib.pyplot as plt
import librosa
import librosa.display
import tensorflow as tf
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import classification_report
import time

# Step 2: Literature Review
- Conduct a comprehensive survey of existing technologies in:
  - Speech-to-text conversion: Explore models like DeepSpeech, Vosk, and lightweight RNNs.
  - Emotion recognition: Study feature extraction techniques like MFCCs, pitch, and prosody.
  - Haptic feedback mechanisms: Research vibrotactile motors and sensory substitution techniques.


In [None]:
# Placeholder: Example of loading reviewed literature references
literature = ["Paper A on Speech Recognition", "Paper B on Emotion Recognition"]
for ref in literature:
    print("Reviewed:", ref)

# Step 3: Data Collection and Preprocessing
- **Speech Data**: Collect audio samples containing speech with labeled emotional states.
- **Haptic Feedback Design**: Map predefined vibration patterns to specific emotions and speakers.



In [None]:
# Load audio file
file_path = 'path/to/audio/file.wav'

# Load audio and extract features
audio, sr = librosa.load(file_path, sr=None)
mfccs = librosa.feature.mfcc(y=audio, sr=sr, n_mfcc=13)

# Visualize waveform and MFCCs
plt.figure(figsize=(10, 4))
librosa.display.waveshow(audio, sr=sr)
plt.title('Waveform')
plt.show()

plt.figure(figsize=(10, 4))
librosa.display.specshow(mfccs, x_axis='time', sr=sr)
plt.colorbar()
plt.title('MFCCs')
plt.show()

# Step 4: Feature Extraction
- Extract **MFCCs**, **pitch**, and **prosody** for emotion classification.


In [None]:
# Extract pitch and prosody
pitch, voiced_flag, _ = librosa.pyin(audio, fmin=50, fmax=300, sr=sr)
energy = librosa.feature.rms(y=audio)

# Handle NaN values in pitch
pitch = np.nan_to_num(pitch)

# Combine features into a single array
features = np.concatenate((mfccs.mean(axis=1), [np.mean(pitch)], [np.mean(energy)]))

# Step 5: Emotion Classification Model
- Train a lightweight model (e.g., **SVM** or **CNN**) for classifying emotions.


In [None]:
# Load dataset (example placeholder)
data = np.load('path/to/emotion_dataset.npz')
X = data['features']
y = data['labels']

# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train SVM classifier
clf = SVC(kernel='linear', probability=True)
clf.fit(X_train, y_train)

# Evaluate model
predictions = clf.predict(X_test)
print(classification_report(y_test, predictions))

# Step 6: Speech-to-Text Conversion
- Use pre-trained lightweight models like **DeepSpeech** or **RNNs** optimized for embedded systems.


In [None]:
from vosk import Model, KaldiRecognizer

# Initialize Vosk model
model = Model("path/to/vosk-model")
recognizer = KaldiRecognizer(model, sr)

# Convert audio to text
if recognizer.AcceptWaveform(audio):
    result = recognizer.Result()
    print("Speech-to-Text Result:", result)

# Step 7: Haptic Feedback Design
- Develop distinct vibration patterns for emotions and speaker identity using **actuators**.



In [None]:
# Define haptic patterns
def generate_haptic_feedback(emotion):
    patterns = {
        'happy': [0.5, 0.2, 0.5],  # vibration ON-OFF-ON pattern
        'sad': [1.0, 0.5],         # longer vibration
        'angry': [0.2, 0.2, 0.2]   # short bursts
    }
    return patterns.get(emotion, [0.5])  # Default pattern

# Trigger actuator (example function)
def trigger_haptic(pattern):
    for duration in pattern:
        print(f"Vibrating for {duration} seconds")
        time.sleep(duration)

# Example usage
emotion = 'happy'
pattern = generate_haptic_feedback(emotion)
trigger_haptic(pattern)

# Step 8: Embedded Hardware Platform Selection
- Select a Raspberry Pi (Rpi) or similar embedded system with adequate computational power.
- Develop lightweight models optimized for embedded systems using TensorFlow Lite or ONNX.



In [None]:
# Deploy TensorFlow Lite model
import tflite_runtime.interpreter as tflite

# Load TFLite model
interpreter = tflite.Interpreter(model_path="path/to/model.tflite")
interpreter.allocate_tensors()

# Get input and output details
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

# Predict emotion
interpreter.set_tensor(input_details[0]['index'], features.reshape(1, -1))
interpreter.invoke()
emotion_prediction = interpreter.get_tensor(output_details[0]['index'])
print("Predicted Emotion:", emotion_prediction)


# Step 9 [OPTIONAL] : Power Management Strategies
- Implement sleep modes and power-saving techniques to optimize battery usage.



In [None]:
# Example power-saving function
def manage_power(mode):
    if mode == 'low-power':
        print("Activating low-power mode...")
    elif mode == 'performance':
        print("Activating performance mode...")

# Usage
manage_power('low-power')

# Step 10: Testing and Validation
- Evaluate the integrated system with test cases covering:
  - Speech-to-text accuracy
  - Emotion recognition performance
  - Usability of haptic feedback

In [None]:
# Test speech-to-text
text_result = recognizer.Result()
print("Transcribed Text:", text_result)

# Test emotion classification
emotion_result = clf.predict(features.reshape(1, -1))
print("Detected Emotion:", emotion_result)

# Validate haptic feedback
trigger_haptic(generate_haptic_feedback(emotion_result[0]))

----
----

In [None]:
### Notebook: Speech-to-Text and Emotion Recognition with Haptic Feedback

# Step 1: Import Necessary Libraries
```python
import numpy as np
import matplotlib.pyplot as plt
import librosa
import librosa.display
import tensorflow as tf
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import classification_report
import time
```

# Step 2: Literature Review
- Conduct a comprehensive survey of existing technologies in:
  - Speech-to-text conversion: Explore models like DeepSpeech, Vosk, and lightweight RNNs.
  - Emotion recognition: Study feature extraction techniques like MFCCs, pitch, and prosody.
  - Haptic feedback mechanisms: Research vibrotactile motors and sensory substitution techniques.

```python
# Placeholder: Example of loading reviewed literature references
literature = ["Paper A on Speech Recognition", "Paper B on Emotion Recognition"]
for ref in literature:
    print("Reviewed:", ref)
```

# Step 3: Data Collection and Preprocessing
- **Speech Data**: Collect audio samples containing speech with labeled emotional states.
- **Haptic Feedback Design**: Map predefined vibration patterns to specific emotions and speakers.

```python
# Load audio file
file_path = 'path/to/audio/file.wav'

# Load audio and extract features
audio, sr = librosa.load(file_path, sr=None)
mfccs = librosa.feature.mfcc(y=audio, sr=sr, n_mfcc=13)

# Visualize waveform and MFCCs
plt.figure(figsize=(10, 4))
librosa.display.waveshow(audio, sr=sr)
plt.title('Waveform')
plt.show()

plt.figure(figsize=(10, 4))
librosa.display.specshow(mfccs, x_axis='time', sr=sr)
plt.colorbar()
plt.title('MFCCs')
plt.show()
```

# Step 4: Feature Extraction
- Extract **MFCCs**, **pitch**, and **prosody** for emotion classification.

```python
# Extract pitch and prosody
pitch, voiced_flag, _ = librosa.pyin(audio, fmin=50, fmax=300, sr=sr)
energy = librosa.feature.rms(y=audio)

# Handle NaN values in pitch
pitch = np.nan_to_num(pitch)

# Combine features into a single array
features = np.concatenate((mfccs.mean(axis=1), [np.mean(pitch)], [np.mean(energy)]))
```

# Step 5: Emotion Classification Model
- Train a lightweight model (e.g., **SVM** or **CNN**) for classifying emotions.

```python
# Load dataset (example placeholder)
data = np.load('path/to/emotion_dataset.npz')
X = data['features']
y = data['labels']

# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train SVM classifier
clf = SVC(kernel='linear', probability=True)
clf.fit(X_train, y_train)

# Evaluate model
predictions = clf.predict(X_test)
print(classification_report(y_test, predictions))
```

# Step 6: Speech-to-Text Conversion
- Use pre-trained lightweight models like **DeepSpeech** or **RNNs** optimized for embedded systems.

```python
from vosk import Model, KaldiRecognizer

# Initialize Vosk model
model = Model("path/to/vosk-model")
recognizer = KaldiRecognizer(model, sr)

# Convert audio to text
if recognizer.AcceptWaveform(audio):
    result = recognizer.Result()
    print("Speech-to-Text Result:", result)
```

# Step 7: Haptic Feedback Design
- Develop distinct vibration patterns for emotions and speaker identity using **actuators**.

```python
# Define haptic patterns
def generate_haptic_feedback(emotion):
    patterns = {
        'happy': [0.5, 0.2, 0.5],  # vibration ON-OFF-ON pattern
        'sad': [1.0, 0.5],         # longer vibration
        'angry': [0.2, 0.2, 0.2]   # short bursts
    }
    return patterns.get(emotion, [0.5])  # Default pattern

# Trigger actuator (example function)
def trigger_haptic(pattern):
    for duration in pattern:
        print(f"Vibrating for {duration} seconds")
        time.sleep(duration)

# Example usage
emotion = 'happy'
pattern = generate_haptic_feedback(emotion)
trigger_haptic(pattern)
```

# Step 8: Embedded Hardware Platform Selection
- Select a Raspberry Pi (Rpi) or similar embedded system with adequate computational power.
- Develop lightweight models optimized for embedded systems using TensorFlow Lite or ONNX.

```python
# Deploy TensorFlow Lite model
import tflite_runtime.interpreter as tflite

# Load TFLite model
interpreter = tflite.Interpreter(model_path="path/to/model.tflite")
interpreter.allocate_tensors()

# Get input and output details
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

# Predict emotion
interpreter.set_tensor(input_details[0]['index'], features.reshape(1, -1))
interpreter.invoke()
emotion_prediction = interpreter.get_tensor(output_details[0]['index'])
print("Predicted Emotion:", emotion_prediction)
```

# Step 9: Power Management Strategies
- Implement sleep modes and power-saving techniques to optimize battery usage.

```python
# Example power-saving function
def manage_power(mode):
    if mode == 'low-power':
        print("Activating low-power mode...")
    elif mode == 'performance':
        print("Activating performance mode...")

# Usage
manage_power('low-power')
```

# Step 10: Testing and Validation
- Evaluate the integrated system with test cases covering:
  - Speech-to-text accuracy
  - Emotion recognition performance
  - Usability of haptic feedback

```python
# Test speech-to-text
text_result = recognizer.Result()
print("Transcribed Text:", text_result)

# Test emotion classification
emotion_result = clf.predict(features.reshape(1, -1))
print("Detected Emotion:", emotion_result)

# Validate haptic feedback
trigger_haptic(generate_haptic_feedback(emotion_result[0]))
```

This enhanced Python notebook provides a detailed stepwise methodology, incorporating literature review, hardware selection, and optimization strategies for embedded system development.


###OLD

In [None]:
### Notebook: Speech-to-Text and Emotion Recognition with Haptic Feedback

# Step 1: Import Necessary Libraries
```python
import numpy as np
import matplotlib.pyplot as plt
import librosa
import librosa.display
import tensorflow as tf
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import classification_report
```

# Step 2: Data Collection and Preprocessing
- **Speech Data**: Collect audio samples containing speech with labeled emotional states.
- **Haptic Feedback Design**: Map predefined vibration patterns to specific emotions and speakers.

```python
# Load audio file
file_path = 'path/to/audio/file.wav'

# Load audio and extract features
audio, sr = librosa.load(file_path, sr=None)
mfccs = librosa.feature.mfcc(y=audio, sr=sr, n_mfcc=13)
```

# Step 3: Feature Extraction
- Extract **MFCCs**, **pitch**, and **prosody** for emotion classification.

```python
# Extract pitch and prosody
pitch = librosa.pyin(audio, fmin=50, fmax=300, sr=sr)[0]
energy = librosa.feature.rms(y=audio)

# Combine features into a single array
features = np.concatenate((mfccs.mean(axis=1), [np.mean(pitch)], [np.mean(energy)]))
```

# Step 4: Emotion Classification Model
- Train a lightweight model (e.g., **SVM** or **CNN**) for classifying emotions.

```python
# Load dataset (example placeholder)
data = np.load('path/to/emotion_dataset.npz')
X = data['features']
y = data['labels']

# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train SVM classifier
clf = SVC(kernel='linear', probability=True)
clf.fit(X_train, y_train)

# Evaluate model
predictions = clf.predict(X_test)
print(classification_report(y_test, predictions))
```

# Step 5: Speech-to-Text Conversion
- Use pre-trained lightweight models like **DeepSpeech** or **RNNs** optimized for embedded systems.

```python
from vosk import Model, KaldiRecognizer

# Initialize Vosk model
model = Model("path/to/vosk-model")
recognizer = KaldiRecognizer(model, sr)

# Convert audio to text
if recognizer.AcceptWaveform(audio):
    result = recognizer.Result()
    print(result)
```

# Step 6: Haptic Feedback Design
- Develop distinct vibration patterns for emotions and speaker identity using **actuators**.

```python
# Define haptic patterns
def generate_haptic_feedback(emotion):
    patterns = {
        'happy': [0.5, 0.2, 0.5],  # vibration ON-OFF-ON pattern
        'sad': [1.0, 0.5],         # longer vibration
        'angry': [0.2, 0.2, 0.2]   # short bursts
    }
    return patterns.get(emotion, [0.5])  # Default pattern

# Trigger actuator (example function)
def trigger_haptic(pattern):
    for duration in pattern:
        print(f"Vibrating for {duration} seconds")
        time.sleep(duration)

# Example usage
emotion = 'happy'
pattern = generate_haptic_feedback(emotion)
trigger_haptic(pattern)
```

# Step 7: Integration on Embedded Platform
- Use Raspberry Pi or similar microcontroller for integration.
- Optimize models and manage power consumption for wearability.

```python
# Deploy TensorFlow Lite model
import tflite_runtime.interpreter as tflite

# Load TFLite model
interpreter = tflite.Interpreter(model_path="path/to/model.tflite")
interpreter.allocate_tensors()

# Get input and output details
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

# Predict emotion
interpreter.set_tensor(input_details[0]['index'], features.reshape(1, -1))
interpreter.invoke()
emotion_prediction = interpreter.get_tensor(output_details[0]['index'])
print("Predicted Emotion:", emotion_prediction)
```

# Step 8: Testing and Validation
- Evaluate the integrated system with test cases covering speech-to-text accuracy, emotion recognition performance, and usability of haptic feedback.

```python
# Test speech-to-text
text_result = recognizer.Result()
print("Transcribed Text:", text_result)

# Test emotion classification
emotion_result = clf.predict(features.reshape(1, -1))
print("Detected Emotion:", emotion_result)

# Validate haptic feedback
trigger_haptic(generate_haptic_feedback(emotion_result[0]))
```

This Python notebook represents a modular stepwise approach, emphasizing modularity for easy debugging and optimization during embedded implementation.
