# Introduction

Do not spend too much time trying to get very tiny metrics improvement. Once you have a model with a correct predictive power, you should better spend time explaining your data cleaning & preparation pipeline as well as explanations & visualizations of the results.

The goal is to see your fit with our company culture & engineering needs, spending 50h on an over-complicated approach will not give you bonus points compared to a simple, yet effective, to-the-point solution.

## About the data

The dataset you will be working with is called Emo-DB and can be found [here](http://emodb.bilderbar.info/index-1280.html).

It is a database containing samples of emotional speech in German. It contains samples labeled with one of 7 different emotions: Anger, Boredom, Disgust, Fear, Happiness, Sadness and Neutral. 

Please download the full database and refer to the documentation to understand how the samples are labeled (see "Additional information")
   
The goal of this project is to develop a model which is able to **classify samples of emotional speech**. Feel free to use any available library you would need, but beware of re-using someone else's code without mentionning it!

## Deliverable

The end-goal is to deliver us:
* This report filled with your approach, in the form of an **iPython Notebook**.
* A **5-10 slides PDF file**, containing a very brief presentation covering the following points:
    * Introduction to the problem (what are we trying to achieve and why) - max 1 slide
    * Libraries used - max 1 slide
    * Data Processing Pipeline - max 2 slides
    * Feature Engineering (if relevant) - max 1 slide
    * Modeling - max 1 slide
    * Results & Visualization - max 2 slides
* The goal of the presentation is to make it **understandable by a business person**, apart from how modeling techniques which do not have to be explained how they work.

# Libraries Loading

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

In [11]:
import platform
import glob
import ntpath
import os
from scipy.io import wavfile

# Data Preparation & Cleaning

In [4]:
data_main_folder = 'data'
wav_folder = 'wav'

In [7]:
platform_system = platform.system()
file_separator=''
if(platform_system=='Windows'):
    file_separator = '\\'
else:#(platform_system=='Linux'):
    file_separator = '/'

In [9]:
wav_path = data_main_folder + file_separator + wav_folder

In [12]:
wav_files = glob.glob(wav_path+file_separator+'*.wav')
wav_files.sort()

In [15]:
wav_files[:1]

['data\\wav\\03a01Fa.wav']

## About the data

<h2>Additional Information</h2>


Every utterance is named according to the same scheme:
<ul><li>Positions 1-2: number of speaker
    <li>Positions 3-5: code for text
    <li>Position 6: emotion (sorry, letter stands for german emotion word)
    <li>Position 7: if there are more than two versions these are numbered a, b, c ....
</ul>

Example: 03a01Fa.wav is the audio file from Speaker 03 speaking text a01 with the emotion "Freude" (Happiness).

<h3>Information about the speakers</h3>

<ul><li>03 - male, 31 years old
    <li>08 - female, 34 years
    <li>09 - female, 21 years
    <li>10 - male, 32 years
    <li>11 - male, 26 years
    <li>12 - male, 30 years
    <li>13 - female, 32 years
    <li>14 - female, 35 years
    <li>15 - male, 25 years
    <li>16 - female, 31 years
</ul>

<table class="aussen" border="0" align="left" width="100%">
<tr><td>

<h3>Code of texts</h3>

<table align="left" width="800" bgcolor="#313131" border="1" frame="solid" rules="rows" cellpadding="3" cellspacing="1">
<tr><th>code</th><th>text (german)</th><th>try of an english translation</th></tr>
<tr><td class="mittig">a01</td><td>Der Lappen liegt auf dem Eisschrank.</td><td>The tablecloth is lying on the frigde.</td></tr>
<tr><td class="mittig">a02</td><td>Das will sie am Mittwoch abgeben.</td><td>She will hand it in on Wednesday.</td></tr>
<tr><td class="mittig">a04</td><td>Heute abend könnte ich es ihm sagen.</td><td>Tonight I could tell him.</td></tr>
<tr><td class="mittig">a05</td><td>Das schwarze Stück Papier befindet sich da oben neben dem Holzstück.</td><td>The black sheet of paper is located up there besides the piece of timber.</td></tr>
<tr><td class="mittig">a07</td><td>In sieben Stunden wird es soweit sein.</td><td>In seven hours it will be.</td></tr>
<tr><td class="mittig">b01</td><td>Was sind denn das für Tüten, die da unter dem Tisch stehen?</td><td>What about the bags standing there under the table?</td></tr>
<tr><td class="mittig">b02</td><td>Sie haben es gerade hochgetragen und jetzt gehen sie wieder runter.</td><td>They just carried it upstairs and now they are going down again.</td></tr>
<tr><td class="mittig">b03</td><td>An den Wochenenden bin ich jetzt immer nach Hause gefahren und habe Agnes besucht.</td><td>Currently at the weekends I always went home and saw Agnes.</td></tr>
<tr><td class="mittig">b09</td><td>Ich will das eben wegbringen und dann mit Karl was trinken gehen.</td><td>I will just discard this and then go for a drink with Karl.</td></tr>
<tr><td class="mittig">b10</td><td>Die wird auf dem Platz sein, wo wir sie immer hinlegen.</td><td>It will be in the place where we always store it.</td></tr>
</table>
<p>&nbsp;<br><p>

</td></tr>

<tr><td>

<h3><br>Code of emotions:</h3>

<table align="left" width="600" bgcolor="#313131" border="1" frame="solid" rules="rows" cellpadding="3" cellspacing="1">
<tr><th>letter</th><th>emotion (english)</th><th>letter</th><th>emotion (german)</th></tr>
<tr><td class="mittig">A</td><td>anger</td><td class="mittig">W</td><td>Ärger (Wut)</td></tr>
<tr><td class="mittig">B</td><td>boredom</td><td class="mittig">L</td><td>Langeweile</td></tr>
<tr><td class="mittig">D</td><td>disgust</td><td class="mittig">E</td><td>Ekel</td></tr>
<tr><td class="mittig">F</td><td>anxiety/fear</td><td class="mittig">A</td><td>Angst</td></tr>
<tr><td class="mittig">H</td><td>happiness</td><td class="mittig">F</td><td>Freude</td></tr>
<tr><td class="mittig">S</td><td>sadness</td><td class="mittig">T</td><td>Trauer</td></tr>
<tr><td colspan="4">N = neutral version</td></tr>

</td></tr>
</table>

</table>


# Constructing the dataset

In [16]:
file_wav = wav_files[0]
path, filename = os.path.split(file_wav)
filename, file_extension = os.path.splitext(filename)

In [28]:
individual_number = int(filename[0:2])
code_text = filename[2:5]
emotion_label = filename[5]

In [29]:
emotion_label

'F'

# Feature Engineering & Modeling

# Results & Visualizations