# Childhood pneumonia detection 

**Basics of Building a Neural Network with Keras:**
**Basics of Building a Neural Network with Keras:**
1. **Import required modules**
    - **For general neural network**
        - `from keras import models, layers,optimizers`
    - **For text:**
        - `from keras.preprocessing.text import Tokenizer`
        - `from keras.utils import to_categorical`
    - **For images:**
        - `from keras.preprocessing.image import ImageDataGenerator, img_to_array, array_to_img`
    - **For relocating image files:**
        - `import os, shutil`
        
        2. **Decide on a network architecture (have only discussed sequential thus far)**
    - `model = models.Sequential()`

3. **Adding layers - specifying layer type, number of neurons, activation functions, and, optionally, the input shape.**
    - `model.add(layers.Dense(units, activation='relu', input_shape))`
    - `model.add(layers.Dense(units, activation='relu')`
    - **3B. Final layer choice:**
        - Want to have as many neurons as classes you are trying to predict
        -  Final activation function:
            - For binary classificaiton, use `activation='sigmoid'`
            - For multi classificaiton, use `activation='softmax'`
        - For regression tasks, have a single final neuron.

5. **Training the model**
    - `model.fit(X_train, y_train, epochs=20,batch_size=512,validation_data=(x_val,y_val))`
        - Note: if using images with ImageDataGenerator, use `model.fit_generator()`
    
    - **batches:**
        - a set of N samples, processed independently in parallel
        - a batch determines how many samples are fed through before back-propagation. 
        - model only updates after a batch is complete.
        - ideally have as large of a batch as your hardware can handle without going out of memory.
            - larger batches usually run faster than smaller ones for evaluation/prediction. 
    - **epoch:**
        - arbitrary cutoff / "one pass over the entire dataset", useful for logging and periodic evaluation
        - when using kera's `model.fit` parameters `validation_data` or `validation_split`, these evaluations run at the end of every epoch.
        - Within Keras can add callbacksto be run at the end of an epoch. Examples of these are learning rate changes and model checkpointing (saving).
        

6. **Evaluation / Predictions**
    - To get predicted results:
        - `y_hat_test = model.predict(test)`
    - To get evaluation metrics:
        - `results_test = model.evaluate(test, label_test)`
        
        
7. **Visualization**
    - **`history =  model.fit()` creates history object with .history attribute.**
        - `history.history()` returns a dictionary of metrics from each epoch. 
            - `history.history['loss']` and `history.history['acc']` 
            
### 💡 Data Augmentation (not covered in class)
- Simplest way to reduce overfitting is to increase the size of the training data.
- Difficult to do with large datasets, but can be implemented with images as shown below:
- **For augmenting image data:**
    - Can alter the images already present in the training data by shifting, shearing, scaling, rotating.<br><br> <img src ="https://www.dropbox.com/s/9i1hl3quwo294jr/data_augmentation_example.png?raw=1" width=300>
    - This usually provides a big leap in improving the accuracy of the model. It can be considered as a mandatory trick in order to improve our predictions.

- **In Keras:**
    - `ImageDataGenerator` contains several augmentations available.
    - Example below:
    
```python
from keras.preprocessing.image import ImageDataGenerator
datagen = ImageDataGenerator(horizontal flip=True)
datagen.fit(train)
 

In [1]:
!pip install -U imageio
!pip install -U scikit-image

!pip install pillow
!pip install opencv-contrib-python
!pip install -U fsds
!pip install -U tensorflow

!pip install -U keras
!pip install -U pandas 
!pip install -U pandas-profiling

%conda update matploltib
%conda update scikit-learn
!pip install -U matplotlib 


Collecting imageio
  Using cached imageio-2.8.0-py3-none-any.whl (3.3 MB)
Installing collected packages: imageio
  Attempting uninstall: imageio
    Found existing installation: imageio 2.4.1


ERROR: Cannot uninstall 'imageio'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.


Requirement already up-to-date: scikit-image in c:\users\zachmih\anaconda3\lib\site-packages (0.17.2)
Requirement already up-to-date: fsds in c:\users\zachmih\anaconda3\lib\site-packages (0.2.10)


Collecting tensorflow
  Using cached tensorflow-2.2.0-cp37-cp37m-win_amd64.whl (459.2 MB)
Collecting astunparse==1.6.3
  Using cached astunparse-1.6.3-py2.py3-none-any.whl (12 kB)
Collecting tensorflow-estimator<2.3.0,>=2.2.0
  Using cached tensorflow_estimator-2.2.0-py2.py3-none-any.whl (454 kB)
Processing c:\users\zachmih\appdata\local\pip\cache\wheels\b1\c2\ed\d62208260edbd3fa7156545c00ef966f45f2063d0a84f8208a\wrapt-1.12.1-cp37-none-any.whl
Collecting h5py<2.11.0,>=2.10.0
  Using cached h5py-2.10.0-cp37-cp37m-win_amd64.whl (2.5 MB)
Collecting gast==0.3.3
  Using cached gast-0.3.3-py2.py3-none-any.whl (9.7 kB)
Collecting tensorboard<2.3.0,>=2.2.0
  Using cached tensorboard-2.2.2-py3-none-any.whl (3.0 MB)
Collecting keras-preprocessing>=1.1.0
  Using cached Keras_Preprocessing-1.1.2-py2.py3-none-any.whl (42 kB)
Collecting scipy==1.4.1; python_version >= "3"
  Using cached scipy-1.4.1-cp37-cp37m-win_amd64.whl (30.9 MB)
Collecting tensorboard-plugin-wit>=1.6.0
  Using cached tensorboard

ERROR: tensorboard 2.2.2 has requirement grpcio>=1.24.3, but you'll have grpcio 1.20.1 which is incompatible.
ERROR: Cannot uninstall 'wrapt'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.


Collecting keras
  Downloading Keras-2.3.1-py2.py3-none-any.whl (377 kB)
Installing collected packages: keras
  Attempting uninstall: keras
    Found existing installation: Keras 2.2.4
    Uninstalling Keras-2.2.4:
      Successfully uninstalled Keras-2.2.4
Successfully installed keras-2.3.1
Collecting pandas
  Downloading pandas-1.0.4-cp37-cp37m-win_amd64.whl (8.7 MB)
Installing collected packages: pandas
  Attempting uninstall: pandas
    Found existing installation: pandas 1.0.3
    Uninstalling pandas-1.0.3:
      Successfully uninstalled pandas-1.0.3
Successfully installed pandas-1.0.4
Collecting pandas-profiling
  Downloading pandas_profiling-2.8.0-py2.py3-none-any.whl (259 kB)
Collecting astropy>=4.0
  Downloading astropy-4.0.1.post1-cp37-cp37m-win_amd64.whl (6.1 MB)
Collecting phik>=0.9.10
  Downloading phik-0.10.0-py3-none-any.whl (599 kB)
Collecting tqdm>=4.43.0
  Downloading tqdm-4.46.1-py2.py3-none-any.whl (63 kB)
Collecting ipywidgets>=7.5.1
  Downloading ipywidgets-7.5.1-

ERROR: twisted 18.7.0 requires PyHamcrest>=1.9.0, which is not installed.
UsageError: Line magic function `%conda` not found.


In [2]:
!pip install -U fsds_100719
from fsds_100719.imports import *

fsds_1007219  v0.7.22 loaded.  Read the docs: https://fsds.readthedocs.io/en/latest/ 


Handle,Package,Description
dp,IPython.display,Display modules with helpful display and clearing commands.
fs,fsds_100719,Custom data science bootcamp student package
mpl,matplotlib,Matplotlib's base OOP module with formatting artists
plt,matplotlib.pyplot,Matplotlib's matlab-like plotting module
np,numpy,scientific computing with Python
pd,pandas,High performance data structures and tools
sns,seaborn,High-level data visualization library based on matplotlib


[i] Pandas .iplot() method activated.


In [3]:
from PIL import Image
from keras.preprocessing import image
from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
from keras.utils import to_categorical
from imageio import imread

from skimage.transform import resize
import cv2
from tqdm import tqdm

Using TensorFlow backend.

Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.


Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.


Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.


Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.


Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.


Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.



In [4]:
#import os,glob
#print(os.path.abspath(os.curdir))

C:\Users\zachmih\Documents\2020 FlatIron School\Proj4-image-classification\pneumonia-detection


In [6]:
# change dataset_folder to match where you stored the files
dataset_folder = "chest_xray"


In [7]:
base_folder = dataset_folder
base_folder

'chest_xray'

In [None]:
import shutil,glob
# ## DOG VS CAT
train_base_dir = base_folder+'training_set/'
test_base_dir =base_folder+'test_set/' 

train_dogs = train_base_dir+'dogs/'
train_cats = train_base_dir+'cats/'

test_dogs = test_base_dir+'dogs/'
test_cats = test_base_dir+'cats/'


dog_train_files = glob.glob(train_dogs+'*.jpg')
cat_train_files = glob.glob(train_cats+'*.jpg')
all_train_files = [*dog_train_files,*cat_train_files]

dog_test_files = glob.glob(test_dogs+'*.jpg')
cat_test_files = glob.glob(test_cats+'*.jpg')
all_test_files = [*dog_test_files,*cat_test_files]

# print(len(img_filenames))
# img_filenames[:10]

all_filename_vars = [dog_train_files, cat_train_files,
                                             dog_test_files,cat_test_files]