# Building a neural network in Keras with Tensorflow #

### A walkthrough for the Machine Learning Club

The goal is to demonstrate how to build a simple neural network using Keras, a popular open source neural network library.

The demonstration will use the spam assassin corpus that was used in the Coursera Machine Learning course that many members of the group have taken already (see week 7). In the coursera course, we trained a support vector machine (SVM) to classify spam. Here we will use a neural network instead.

In [1]:
import os
import numpy as np
import re
from matplotlib import pyplot as plt
%matplotlib inline

In [2]:
import tensorflow as tf

In [3]:
import keras

Using TensorFlow backend.


Setup steps:
- Create new environment and activate it
- Install python and packages per requirements.txt
- Run <code>conda install jupyter</code>
- Use <code>conda install nb_conda</code> to get Jupyter to use the environment


#### Warning: Keras and Tensorflow have a lot of dependencies - it will take a while to install them all.

requirements.txt

- python
- tensorflow
- keras
- matplotlib

In [4]:
# import library written for coursera exercise 6, providing functions for preprocessing emails (slightly modified for this demo)
# and tell it where the vocab list is saved
import utils
spamAssassinPath = 'C:\\Users\\Jo\\Documents\\coursera\\ml-coursera-python-assignments\\Exercise6\\Data\\'
utils.setVocabListPath(os.path.join(spamAssassinPath, 'vocab.txt'))

In [20]:
# Take a look at the vocab list for info
with open(os.path.join(spamAssassinPath, 'vocab.txt')) as fid:
    vocab_list_contents = fid.read()
vocab_list_contents.split()[1:25:2]

['aa',
 'ab',
 'abil',
 'abl',
 'about',
 'abov',
 'absolut',
 'abus',
 'ac',
 'accept',
 'access',
 'accord']

In [7]:
# Extract Features from a sample email
with open(os.path.join(spamAssassinPath, 'emailSample1.txt')) as fid:
    file_contents = fid.read()
print('----------------')
print('Unprocessed email:')
print('----------------')
print(file_contents)

word_indices  = utils.processEmail(file_contents)
features      = utils.emailFeatures(word_indices)

# Print Stats
print('\nLength of feature vector: %d' % len(features))
print('Number of non-zero entries: %d' % sum(features > 0))

----------------
Unprocessed email:
----------------
> Anyone knows how much it costs to host a web portal ?
>
Well, it depends on how many visitors you're expecting.
This can be anywhere from less than 10 bucks a month to a couple of $100. 
You should checkout http://www.rackspace.com/ or perhaps Amazon EC2 
if youre running something big..

To unsubscribe yourself from this mailing list, send an email to:
groupname-unsubscribe@egroups.com


----------------
Processed email:
----------------
anyon know how much it cost to host a web portal well it depend on how mani visitor your expect thi can be anywher from less than number buck a month to a coupl of dollar number you should checkout httpaddr or perhap amazon ec number if your run someth big to unsubscrib yourself from thi mail list send an email to emailaddr

Length of feature vector: 1899
Number of non-zero entries: 45
