# Speech Recognition with Neural Networks

---

## Introduction  

As part of the capstone project for Udacity's Natural Language Processing Nanodegree program, I have built a deep neural network and integrated it with an end-to-end automatic speech recognition (ASR) pipeline. The pipeline gets raw audio as input and returns a predicted transcription of the spoken words in the audio. The full pipeline is summarized in the figure below.

<img src="images/pipeline.png">

- **STEP 1** is a pre-processing step that converts raw audio to one of two feature representations that are commonly used for ASR-- that is, spectrogram and Mel-Frequency Cepstral Coefficients (MFCCs). 
- **STEP 2** is an acoustic model, based on my developed deep neural network architecture, which accepts audio features as input and returns a probability distribution over all potential transcriptions.
- **STEP 3** in the pipeline takes the output from the acoustic model and returns a predicted transcription.  

The notebook is organized as follows:
- [The Data](#thedata)
- [**STEP 1**](#step1): Acoustic Features for Speech Recognition
- [**STEP 2**](#step2): Deep Neural Network for Acoustic Modeling
    - [Model](#Model): CNN + Bidirectional RNN + TimeDistributed Dense
    - [Compare the Models](#compare)
- [**STEP 3**](#step3): Obtain Predictions

In [11]:
%matplotlib inline
%load_ext autoreload
%autoreload 2

from IPython.display import Markdown, display
from IPython.display import Audio

import numpy as np
from keras import backend as K
from keras.backend.tensorflow_backend import set_session
import tensorflow as tf 
from keras.optimizers import Adam

# allocate 50% of GPU memory
# config = tf.ConfigProto()
# config.gpu_options.per_process_gpu_memory_fraction = 0.5
# set_session(tf.Session(config=config))

# import functions for visualizing the extracted features from raw audio
from data_generator import vis_train_features, plot_raw_audio, plot_spectrogram_feature, plot_mfcc_feature

# import deep nueral network architecture for ASR model
from model_utils import asr_model

# import function for training the ASR model
from train_utils import train_model

# import function for getting transcription prediction from the ASR model
from prediction_utils import get_predictions

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


### Other parts of this notebook will be added soon!