# Disease Classification by CNN

Based on [Disease Classification by CNN using MFCC](https://www.kaggle.com/gizemtanriver/disease-classification-by-cnn-using-mfcc) but implemented at the edge and pushed to [GitHub](www.github.com)

## Tensorflow version

This notebook will work the latest stable tensorflow 2 version (**CPU-only**). Change the `requirements.txt` file to `tensorflow-gpu` if you would like to work with the NVIDIA-compatible version of Tensorflow. Stay tuned for a version for AMD GPUs using Docker in the near future.

In [2]:
# Install required libraries
# Consider using a virtual environment to protect your device's state
!pip install -r ../requirements.txt

Collecting tensorflow
  Downloading tensorflow-2.1.0-cp37-cp37m-manylinux2010_x86_64.whl (421.8 MB)
[K     |████████████████████████████████| 421.8 MB 80 kB/s  eta 0:00:015   |████▊                           | 61.9 MB 3.8 MB/s eta 0:01:36     |██████████▌                     | 138.0 MB 4.0 MB/s eta 0:01:11     |█████████████▌                  | 178.2 MB 3.4 MB/s eta 0:01:13     |██████████████                  | 184.7 MB 3.4 MB/s eta 0:01:11     |████████████████████▌           | 269.6 MB 3.8 MB/s eta 0:00:41
[?25hCollecting librosa
  Downloading librosa-0.7.2.tar.gz (1.6 MB)
[K     |████████████████████████████████| 1.6 MB 3.8 MB/s eta 0:00:01
[?25hCollecting Keras
  Using cached Keras-2.3.1-py2.py3-none-any.whl (377 kB)
Processing /home/carlos/.cache/pip/wheels/46/ef/c3/157e41f5ee1372d1be90b09f74f82b10e391eaacca8f22d33e/sklearn-0.0-py2.py3-none-any.whl
Collecting opt-einsum>=2.3.2
  Downloading opt_einsum-3.2.1-py3-none-any.whl (63 kB)
[K     |████████████████████████████████| 6

Collecting pyasn1-modules>=0.2.1
  Downloading pyasn1_modules-0.2.8-py2.py3-none-any.whl (155 kB)
[K     |████████████████████████████████| 155 kB 3.7 MB/s eta 0:00:01
[?25hCollecting cachetools<5.0,>=2.0.0
  Downloading cachetools-4.1.0-py3-none-any.whl (10 kB)
Collecting oauthlib>=3.0.0
  Downloading oauthlib-3.1.0-py2.py3-none-any.whl (147 kB)
[K     |████████████████████████████████| 147 kB 3.7 MB/s eta 0:00:01
Building wheels for collected packages: librosa, audioread, resampy
  Building wheel for librosa (setup.py) ... [?25ldone
[?25h  Created wheel for librosa: filename=librosa-0.7.2-py3-none-any.whl size=1612883 sha256=d89e8deaab876f00e113771378feb32ed3e30c171fd7c38c9b028c4f4cebca77
  Stored in directory: /home/carlos/.cache/pip/wheels/18/9e/42/3224f85730f92fa2925f0b4fb6ef7f9c5431a64dfc77b95b39
  Building wheel for audioread (setup.py) ... [?25ldone
[?25h  Created wheel for audioread: filename=audioread-2.1.8-py3-none-any.whl size=23091 sha256=f4a3d24cacbe8ae567b070145f7

In [3]:
# Load various imports 
from datetime import datetime
from os import listdir
from os.path import isfile, join

import librosa
import librosa.display

import numpy as np
import pandas as pd

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Conv2D, MaxPooling2D, GlobalAveragePooling2D
from keras.utils import to_categorical
from tensorflow.keras.callbacks import ModelCheckpoint

from sklearn.metrics import confusion_matrix, classification_report, roc_curve, auc
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder

import matplotlib.pyplot as plt
import seaborn as sns

Using TensorFlow backend.


In [7]:
mypath = "../data/respiratory_sound_database/audio_and_txt_files"
filenames = [f for f in listdir(mypath) if (isfile(join(mypath, f)) and f.endswith('.wav'))] 

In [8]:
p_id_in_file = [] # patient IDs corresponding to each file
for name in filenames:
    p_id_in_file.append(int(name[:3]))

p_id_in_file = np.array(p_id_in_file) 

# EXPLAIN WHAT MFCC IS!!

Using [LibROSA Demo](https://nbviewer.jupyter.org/github/librosa/librosa/blob/master/examples/LibROSA%20demo.ipynb) 
or 
[Read and Visualize Audio Files in Python](librosa module)(https://www.youtube.com/watch?v=vJ_WL9aYfNI)
Use them to deploy this part then put them below as resources

In [9]:
max_pad_len = 862 # to make the length of all MFCC equal

def extract_features(file_name):
    """
    This function takes in the path for an audio file as a string, loads it, and returns the MFCC
    of the audio"""
   
    try:
        audio, sample_rate = librosa.load(file_name, res_type='kaiser_fast', duration=20) 
        mfccs = librosa.feature.mfcc(y=audio, sr=sample_rate, n_mfcc=40)
        pad_width = max_pad_len - mfccs.shape[1]
        mfccs = np.pad(mfccs, pad_width=((0, 0), (0, pad_width)), mode='constant')
        
    except Exception as e:
        print("Error encountered while parsing file: ", file_name)
        return None 
     
    return mfccs

**WORK IN PROGRESS** 