### Build a DNN using Keras with `RELU` and `ADAM`

#### Load tensorflow

In [0]:
import tensorflow as tf
tf.reset_default_graph()
tf.set_random_seed(42)

#### Collect Fashion mnist data from tf.keras.datasets 

In [2]:
(trainX, trainY),(testX, testY) = tf.keras.datasets.fashion_mnist.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz


#### Change train and test labels into one-hot vectors

In [0]:
trainY = tf.keras.utils.to_categorical(trainY, num_classes=10)
testY = tf.keras.utils.to_categorical(testY, num_classes=10)

#### Build the Graph

#### Initialize model, reshape & normalize data

In [4]:
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Reshape((784,),input_shape=(28,28,)))
model.add(tf.keras.layers.BatchNormalization())

Instructions for updating:
Colocations handled automatically by placer.


#### Add two fully connected layers with 200 and 100 neurons respectively with `relu` activations. Add a dropout layer with `p=0.25`

In [5]:
#Hidden layers
model.add(tf.keras.layers.Dense(200, activation='relu'))
model.add(tf.keras.layers.Dense(100, activation='relu'))

#Dropout layer
model.add(tf.keras.layers.Dropout(0.25))

Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.


### Add the output layer with a fully connected layer with 10 neurons with `softmax` activation. Use `categorical_crossentropy` loss and `adam` optimizer and train the network. And, report the final validation.

In [6]:
#Output layer
model.add(tf.keras.layers.Dense(10, activation='softmax'))

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

#Train the model
model.fit(trainX,trainY,          
          validation_data=(testX,testY),
          epochs=5, batch_size=32)

Train on 60000 samples, validate on 10000 samples
Instructions for updating:
Use tf.cast instead.
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x7f6f0f5cb400>

## Word Embeddings in Python with Gensim

In this, you will practice how to train and load word embedding models for natural language processing applications in Python using Gensim.


1. How to train your own word2vec word embedding model on text data.
2. How to visualize a trained word embedding model using Principal Component Analysis.
3. How to load pre-trained word2vec word embedding models.

### Run the below two commands to install gensim and the wiki dataset

In [7]:
!pip install --upgrade gensim --user

Collecting gensim
[?25l  Downloading https://files.pythonhosted.org/packages/d3/4b/19eecdf07d614665fa889857dc56ac965631c7bd816c3476d2f0cac6ea3b/gensim-3.7.3-cp36-cp36m-manylinux1_x86_64.whl (24.2MB)
[K     |████████████████████████████████| 24.2MB 61.1MB/s 
Installing collected packages: gensim
Successfully installed gensim-3.7.3


In [8]:
!pip install wikipedia 

Collecting wikipedia
  Downloading https://files.pythonhosted.org/packages/67/35/25e68fbc99e672127cc6fbb14b8ec1ba3dfef035bf1e4c90f78f24a80b7d/wikipedia-1.4.0.tar.gz
Building wheels for collected packages: wikipedia
  Building wheel for wikipedia (setup.py) ... [?25l[?25hdone
  Stored in directory: /root/.cache/pip/wheels/87/2a/18/4e471fd96d12114d16fe4a446d00c3b38fb9efcb744bd31f4a
Successfully built wikipedia
Installing collected packages: wikipedia
Successfully installed wikipedia-1.4.0


### Import gensim

In [0]:
import gensim

### Obtain Text

Import search and page functions from wikipedia module
search(/key word/): search function takes keyword as argument and gives top 10 article titles matching the given keyword.

page(/title of article/): page function takes page title as argument and gives content in the output.

In [0]:
## Usage: 

## from wikipedia import search, page
## titles = search("<Key word goes here>")
## wikipage = page(titles[0])
## print wikipage.content

### Print the top 10 titles for the keyword `Machine Learning`

In [0]:
#import wikipedia
from wikipedia import search,page
titles = search('machine learning')

### Get the content from the first title from the above obtained 10 titles.

In [0]:
titles
wikipage = page(titles[0])

In [13]:
wikipage.content

'Machine learning (ML) is the scientific study of algorithms and statistical models that computer systems use to effectively perform a specific task without using explicit instructions, relying on patterns and inference instead. It is seen as a subset of artificial intelligence. Machine learning algorithms build a mathematical model based on sample data, known as "training data", in order to make predictions or decisions without being explicitly programmed to perform the task. Machine learning algorithms are used in a wide variety of applications, such as email filtering, and computer vision, where it is infeasible to develop an algorithm of specific instructions for performing the task. Machine learning is closely related to computational statistics, which focuses on making predictions using computers. The study of mathematical optimization delivers methods, theory and application domains to the field of machine learning. Data mining is a field of study within machine learning, and fo

### Create a list with name `documents` and append all the words in the 10 pages' content using the above 10 titles.

In [14]:
#List to hold all words in each review
documents = []

#Iterate over each review
for title in titles:
    wikipage = page(title)
    documents.append(wikipage.content.split(' '))

print(len(documents))
# print(documents[0])

10


### Build the gensim model for word2vec with by considering all the words with frequency >=1 with embedding size=50

In [0]:
#Build the model
model = gensim.models.Word2Vec(documents, #Word list
                               min_count=1, #Ignore all words with total frequency lower than this                           
                               workers=4, #Number of CPUs
                               size=50,  #Embedding size
                               window=5, #Maximum Distance between current and predicted word
                               iter=10   #Number of iterations over the text corpus
                              )  

### Exploring the model

In [16]:
#Model size
model.wv.syn0.shape

  """Entry point for launching an IPython kernel.


(8430, 50)

#### Check how many words in the model

In [17]:
# Vocablury of the model
model.wv.vocab

{'Machine': <gensim.models.keyedvectors.Vocab at 0x7f6f03c25b70>,
 'learning': <gensim.models.keyedvectors.Vocab at 0x7f6f03c25ba8>,
 '(ML)': <gensim.models.keyedvectors.Vocab at 0x7f6f03c25be0>,
 'is': <gensim.models.keyedvectors.Vocab at 0x7f6f03c25c18>,
 'the': <gensim.models.keyedvectors.Vocab at 0x7f6f03c25c50>,
 'scientific': <gensim.models.keyedvectors.Vocab at 0x7f6f03c25c88>,
 'study': <gensim.models.keyedvectors.Vocab at 0x7f6f03c254e0>,
 'of': <gensim.models.keyedvectors.Vocab at 0x7f6f03c255f8>,
 'algorithms': <gensim.models.keyedvectors.Vocab at 0x7f6f03c25cc0>,
 'and': <gensim.models.keyedvectors.Vocab at 0x7f6f03c25cf8>,
 'statistical': <gensim.models.keyedvectors.Vocab at 0x7f6f03c25d30>,
 'models': <gensim.models.keyedvectors.Vocab at 0x7f6f03c25d68>,
 'that': <gensim.models.keyedvectors.Vocab at 0x7f6f03c25da0>,
 'computer': <gensim.models.keyedvectors.Vocab at 0x7f6f03c25dd8>,
 'systems': <gensim.models.keyedvectors.Vocab at 0x7f6f03c25e10>,
 'use': <gensim.models.ke

### Get an embedding for word `SVM`

In [18]:
model.wv['SVM']

array([-0.11147016,  0.7009398 ,  0.72818124,  1.7822976 ,  0.6411494 ,
        0.3452001 , -0.05995334,  0.8140099 , -0.3797853 ,  0.3166234 ,
       -0.71597356,  0.12336544, -0.06295379,  0.0738177 ,  0.55682886,
       -0.06830621, -0.4623229 ,  0.61136514, -0.5585408 , -0.15185885,
       -0.4807359 , -0.81657344,  0.50333554,  0.7786398 , -0.33683318,
        0.16909143,  0.5213303 , -0.79882103,  0.6659902 ,  1.1696696 ,
        0.08202843, -0.3425487 ,  1.7838373 ,  0.2760251 ,  0.10822981,
        0.9365173 ,  0.2779633 , -0.23974812, -0.7772426 ,  0.20304373,
       -0.03263509, -0.5770012 , -0.90844387,  0.03966242, -0.4126161 ,
       -0.95961905,  0.0488852 , -0.49298295, -1.2846265 ,  0.7754028 ],
      dtype=float32)

### Finding most similar words for word `learning`

In [19]:
model.wv.most_similar('learning')

  if np.issubdtype(vec.dtype, np.int):


[('machine', 0.9996310472488403),
 ('algorithms', 0.9993427395820618),
 ('field', 0.9992608428001404),
 ('{1}{2}}\\sum', 0.9992415904998779),
 ('usually', 0.9990450143814087),
 ('_{i=1}^{n}\\sum', 0.9989975690841675),
 ('===\nMachine', 0.9989721775054932),
 ('increasing', 0.9988956451416016),
 ('unsupervised', 0.9988889694213867),
 (';', 0.9988709688186646)]

### Find the word which is not like others from `machine, svm, ball, learning`

In [20]:
model.doesnt_match("machine svm ball learning".split())

  """Entry point for launching an IPython kernel.
  vectors = vstack(self.word_vec(word, use_norm=True) for word in used_words).astype(REAL)
  if np.issubdtype(vec.dtype, np.int):


'ball'

### Save the model with name `word2vec-wiki-10`

In [0]:
model.save('word2vec-wiki-10')

### Load the model `word2vec-wiki-10`

In [0]:
#Load model from memory
model = gensim.models.Word2Vec.load('word2vec-wiki-10')