## Simple Neural Network
> A simple, classic neural network to by following standard workflow, to predict whether a review is positive or negative.

In [1]:
import pandas as pd
import re
from sklearn.model_selection import train_test_split
import nltk
from nltk.corpus import stopwords
from sklearn.feature_extraction.text import CountVectorizer

**Reading and pre-processing the Input**

In [2]:
df = pd.read_csv('./Datasets/train_comment_small_50.csv', sep=',')

In [3]:
def clean_comment(text):
    
    text = re.sub('<[^<]+?>', ' ', text)
    text = text.replace('\\"', '') #.replace('\n', '')
    text = text.replace('"', '')
    return text

In [4]:
df['cleaned_comment'] = df['comment_text'].apply(clean_comment)

**Splitting of data and conversion of text into binary Sparse matrix using CountVectorizer**

In [5]:
X_train, X_test, y_train, y_test = train_test_split(df['cleaned_comment'], df['toxic'], test_size=0.2)

In [6]:
vectorizer = CountVectorizer(binary=True, stop_words=stopwords.words('english'),
                             lowercase=True, min_df=3, max_df=0.9, max_features=5000)

X_train_onehot = vectorizer.fit_transform(X_train)

**Creating a 2-Layer neural network.**

In [7]:
from keras.models import Sequential, load_model
from keras.layers import Dense

Using TensorFlow backend.


In [8]:
nn = Sequential()
nn.add(Dense(units=500, activation='relu', input_dim = len(vectorizer.get_feature_names())))
nn.add(Dense(units=1, activation='sigmoid'))

In [9]:
nn.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

In [10]:
nn.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 500)               29000     
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 501       
Total params: 29,501
Trainable params: 29,501
Non-trainable params: 0
_________________________________________________________________


**Fitting and validating the train data**

In [11]:
nn.fit(X_train_onehot[:-20], y_train[:-20], epochs=10, batch_size=128, verbose=1,
      validation_data=(X_train_onehot[-20:], y_train[-20:]))

Train on 20 samples, validate on 20 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.callbacks.History at 0x7fb58444e080>

**Evaluation on test data**

In [12]:
scores = nn.evaluate(vectorizer.transform(X_test), y_test, verbose=1)
print('Accuracy:', scores[1])

Accuracy: 1.0


In [13]:
nn.save('nn.hd5')