<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>
<br></br>

# Major Neural Network Architectures Challenge
## *Data Science Unit 4 Sprint 3 Challenge*

In this sprint challenge, you'll explore some of the cutting edge of Data Science. This week we studied several famous neural network architectures: 
recurrent neural networks (RNNs), long short-term memory (LSTMs), convolutional neural networks (CNNs), and Autoencoders. In this sprint challenge, you will revisit these models. Remember, we are testing your knowledge of these architectures not your ability to fit a model with high accuracy. 

__*Caution:*__  these approaches can be pretty heavy computationally. All problems were designed so that you should be able to achieve results within at most 5-10 minutes of runtime on SageMaker, Colab or a comparable environment. If something is running longer, doublecheck your approach!

## Challenge Objectives
*You should be able to:*
* <a href="#p1">Part 1</a>: Train a LSTM classification model
* <a href="#p2">Part 2</a>: Utilize a pre-trained CNN for objective detection
* <a href="#p3">Part 3</a>: Describe the components of an autoencoder
* <a href="#p4">Part 4</a>: Describe yourself as a Data Science and elucidate your vision of AI

<a id="p1"></a>
## Part 1 - RNNs

Use an RNN/LSTM to fit a multi-class classification model on reuters news articles to distinguish topics of articles. The data is already encoded properly for use in an RNN model. 

Your Tasks: 
- Use Keras to fit a predictive model, classifying news articles into topics. 
- Report your overall score and accuracy

For reference, the [Keras IMDB sentiment classification example](https://github.com/keras-team/keras/blob/master/examples/imdb_lstm.py) will be useful, as well the RNN code we used in class.

__*Note:*__  Focus on getting a running model, not on maxing accuracy with extreme data size or epoch numbers. Only revisit and push accuracy if you get everything else done!

In [1]:
from tensorflow.keras.datasets import reuters

(X_train, y_train), (X_test, y_test) = reuters.load_data(num_words=None,
                                                         skip_top=0,
                                                         maxlen=None,
                                                         test_split=0.2,
                                                         seed=723812,
                                                         start_char=1,
                                                         oov_char=2,
                                                         index_from=3)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/reuters.npz


In [2]:
# Demo of encoding

word_index = reuters.get_word_index(path="reuters_word_index.json")

print(f"Iran is encoded as {word_index['iran']} in the data")
print(f"London is encoded as {word_index['london']} in the data")
print("Words are encoded as numbers in our dataset.")

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/reuters_word_index.json
Iran is encoded as 779 in the data
London is encoded as 544 in the data
Words are encoded as numbers in our dataset.


In [8]:
from tensorflow.keras.preprocessing import sequence
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Embedding, LSTM

batch_size = 46
max_features = len(word_index.values())
maxlen = 200

print(len(X_train), 'train sequences')
print(len(X_test), 'test sequences')

print('Pad sequences (samples x time)')
X_train = sequence.pad_sequences(X_train, maxlen=maxlen)
X_test = sequence.pad_sequences(X_test, maxlen=maxlen)
print('X_train shape:', X_train.shape)
print('X_test shape:', X_test.shape)


print('Build model...')
model = Sequential()
model.add(Embedding(max_features+1, 128))
model.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(max_features, activation='sigmoid'))


8982 train sequences
2246 test sequences
Pad sequences (samples x time)
X_train shape: (8982, 200)
X_test shape: (2246, 200)
Build model...


In [9]:
# You should only run this cell once your model has been properly configured

model.compile(loss='sparse_categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

print('Train...')
model.fit(X_train, y_train,
          batch_size=batch_size,
          epochs=3,
          validation_data=(X_test, y_test))

score, acc = model.evaluate(X_test, y_test,
                            batch_size=batch_size)
print('Test score:', score)
print('Test accuracy:', acc)

Train...
Train on 8982 samples, validate on 2246 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3
Test score: 2.391159341681354
Test accuracy: 0.3664292


## Sequence Data Question
#### *Describe the `pad_sequences` method used on the training dataset. What does it do? Why do you need it?*

The pad_sequences method is a keras preprocessing method that makes sure all of the sequences in a dataset are the same length. It lengthens shorter sequences, and can set a max length on longer sequences. This is done to ensure consistency.

## RNNs versus LSTMs
#### *What are the primary motivations behind using Long-ShortTerm Memory Cell unit over traditional Recurrent Neural Networks?*

RNNs aren't great at remembering long-term dependencies, so LSTMs add memory gates so they can use earlier weights.


## RNN / LSTM Use Cases
#### *Name and Describe 3 Use Cases of LSTMs or RNNs and why they are suited to that use case*

LSTMs can be used for Time Series data, as well as text that is treated like a time series. They are well-suited to this because they remember weights from earlier times (so they can find patterns in year-long data). This allows them to create/remember context! LTSMs can then be used to generate text!

<a id="p2"></a>
## Part 2- CNNs

### Find the Frog

Time to play "find the frog!" Use Keras and ResNet50 (pre-trained) to detect which of the following images contain frogs:

<img align="left" src="https://d3i6fh83elv35t.cloudfront.net/newshour/app/uploads/2017/03/GettyImages-654745934-1024x687.jpg" width=400>


In [10]:
!pip install google_images_download

Collecting google_images_download
  Downloading google_images_download-2.8.0.tar.gz (14 kB)
Building wheels for collected packages: google-images-download
  Building wheel for google-images-download (setup.py): started
  Building wheel for google-images-download (setup.py): finished with status 'done'
  Created wheel for google-images-download: filename=google_images_download-2.8.0-py2.py3-none-any.whl size=14555 sha256=6f29882ad50bd3f85838f478fc3c4740b1babe2611b8125a19ec080ce4d27306
  Stored in directory: c:\users\jwill\appdata\local\pip\cache\wheels\e3\98\42\0d3a76d46cd5a6659afb2f5612d4908ca42d34060973d46727
Successfully built google-images-download
Installing collected packages: google-images-download
Successfully installed google-images-download-2.8.0


In [19]:
from google_images_download import google_images_download


response = google_images_download.googleimagesdownload()
arguments = {"keywords": "frog", "limit": 5, "print_urls": True}
absolute_image_paths = response.download(arguments)



Item no.: 1 --> Item name = frog
Evaluating...
Starting Download...


Unfortunately all 5 could not be downloaded because some images were not downloadable. 0 is all we got for this search filter!

Errors: 0



At time of writing at least a few do, but since the Internet changes - it is possible your 5 won't. You can easily verify yourself, and (once you have working code) increase the number of images you pull to be more sure of getting a frog. Your goal is to validly run ResNet50 on the input images - don't worry about tuning or improving the model.

*Hint* - ResNet 50 doesn't just return "frog". The three labels it has for frogs are: `bullfrog, tree frog, tailed frog`

*Stretch goals* 
- Check for fish or other labels
- Create a matplotlib visualizations of the images and your prediction as the visualization label

In [1]:
# You've got something to do in this cell. ;)
import glob
import numpy as np

from tensorflow.keras.applications.resnet50 import ResNet50
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.resnet50 import preprocess_input, decode_predictions

def process_img_path(img_path):
    return image.load_img(img_path, target_size=(224, 224))

def img_contains_frog(img):
    """ Scans image for Frogs
    
    Should return a boolean (True/False) if a frog is in the image.
    
    Inputs:
    ---------
    img:  Precrossed image ready for prediction. The `process_img_path`             function should already be applied to the image. 
    
    Returns: 
    ---------
    frogs (boolean):  TRUE or FALSE - There are frogs in the image.
    
    """
    x = image.img_to_array(img)
    x = np.expand_dims(x, axis=0)
    x = preprocess_input(x)
    model = ResNet50(weights='imagenet')
    features = model.predict(x)
    results = decode_predictions(features, top=3)[0]
    print(results)
    if 'frog' in results[0][1] and results[0][2] > 0.25:
        return True
    else:
        return False
            


In [3]:
# Added an image of an otter just to make sure it's working

imagelist = glob.glob('downloads/frog/*.jpg') + glob.glob('downloads/frog/*.png')
for x in imagelist:
    print(img_contains_frog(process_img_path(x)))

[('n01644373', 'tree_frog', 0.93838465), ('n02259212', 'leafhopper', 0.01597091), ('n01693334', 'green_lizard', 0.014538255)]
True
[('n01644900', 'tailed_frog', 0.32799807), ('n01641577', 'bullfrog', 0.22553708), ('n01644373', 'tree_frog', 0.16573411)]
True
[('n02077923', 'sea_lion', 0.9197687), ('n02442845', 'mink', 0.021568758), ('n02444819', 'otter', 0.016232025)]
False
[('n01644373', 'tree_frog', 0.4067175), ('n01693334', 'green_lizard', 0.16490583), ('n01682714', 'American_chameleon', 0.11815245)]
True
[('n01644373', 'tree_frog', 0.9423204), ('n01644900', 'tailed_frog', 0.056342434), ('n01641577', 'bullfrog', 0.00067995396)]
True
[('n01644373', 'tree_frog', 0.992164), ('n02169497', 'leaf_beetle', 0.0034639419), ('n01644900', 'tailed_frog', 0.0028726757)]
True


#### Stretch Goal: Displaying Predictions

In [1]:
import matplotlib.pyplot as plt



<a id="p3"></a>
## Part 3 - Autoencoders

Describe a use case for an autoencoder given that an autoencoder tries to predict its own input. 

__*Your Answer:*__  
Autoencoders can be used for denoising images, to make the images smoother, or to enhance a certain part of the image beyond what has previously been capable.

<a id="p4"></a>
## Part 4 - More...

Answer the following questions, with a target audience of a fellow Data Scientist:

**- What do you consider your strongest area, as a Data Scientist?**
Natural Language Processing, given that I've actually done a few projects with it. I still have a BUNCH more to learn though!


**- What area of Data Science would you most like to learn more about, and why?**
NLP again. I believe language isn't going to be going away anytime soon, and in an era of misinformation, being able to both communicate effectively and detect misinformation is essential.


**- Where do you think Data Science will be in 5 years?**
All I can predict is that the techniques we have will be better, we'll be able to train NNs faster/easier. Better computer vision recognition of things will probably happen.


**- What are the threats posed by AI to our society?**
Unemployment, run-away biases, privacy


**- How do you think we can counteract those threats?**
For unemployment: UBI, taxes, regulation!
For bias: Creating proper models and making sure datasets are as unbiased as possible
For privacy: I... don't know that we can counteract it, really.


**- Do you think achieving General Artifical Intelligence is ever possible?**
Yes, but... not for a long time, due to processing power and the complexity of a solution



A few sentences per answer is fine - only elaborate if time allows.

## Congratulations! 

Thank you for your hard work, and congratulations! You've learned a lot, and you should proudly call yourself a Data Scientist.


In [1]:
from IPython.display import HTML

HTML("""<iframe src="https://giphy.com/embed/26xivLqkv86uJzqWk" width="480" height="270" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/mumm-champagne-saber-26xivLqkv86uJzqWk">via GIPHY</a></p>""")