# How to build a text classifier web app.  
Today we are going to train a model, build a web interface that allows people to submit data, send that data to our model, make a prediction on it, then return the results.

##### This is how your app directory should look.  
```
MyProject/
|-- my_app/
|   |-- the_app.py
|   |-- build_model.py
|   |-- data
|   |   |-- my_data.csv
|   |   |-- my_model.pkl
|   |   |-- my_vectorizer.pkl
```

<br>
## Step 1: Build your model
Step 1 should take about *30min–60min*  

'1.  Build ANY text classifier model and place it into the **build_model.py** python file.  
'2.  Pickle and export your trained model and vectorizer into your data folder.  
'3. *See below if you want a step by step guide for this*

In [77]:
%%writefile mornApp/build_model.py
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

categories = ['alt.atheism', 'soc.religion.christian', 'comp.graphics', 'sci.med']
from sklearn.datasets import fetch_20newsgroups
twenty_train = fetch_20newsgroups(subset='train',categories=categories, shuffle=True, random_state=42)

from sklearn.feature_extraction.text import CountVectorizer
count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(twenty_train.data)

from sklearn.feature_extraction.text import TfidfTransformer
tf_transformer = TfidfTransformer(use_idf=False).fit(X_train_counts)
X_train_tf = tf_transformer.transform(X_train_counts)

tfidf_transformer = TfidfTransformer()
X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)

from sklearn.naive_bayes import MultinomialNB
clf = MultinomialNB().fit(X_train_tfidf, twenty_train.target)

import pickle
pickle.dump(count_vect, open( "mornApp/data/my_vectorizer.pkl", "wb" ))
pickle.dump(clf, open( "mornApp/data/my_model.pkl", "wb" ))
pickle.dump(tfidf_transformer, open( "mornApp/data/my_transformer.pkl", "wb" ))
clf2 = pickle.load(open( "mornApp/data/my_model.pkl", "rb" ) )
count_vect2 = pickle.load(open( "mornApp/data/my_vectorizer.pkl", "rb" ) )

docs_new = ['God is love', 'OpenGL on the GPU is fast']
X_new_counts = count_vect2.transform(docs_new)
X_new_tfidf = tfidf_transformer.transform(X_new_counts)

predicted = clf2.predict(X_new_tfidf)

Overwriting mornApp/build_model.py


In [78]:
!python mornApp/build_model.py

<br>
## Step 2:  Build your site
Step 2 should take anywhere from 30min–120min  
'1.  Create an the_app.py file in your my_app folder  

```
MyProject/
|-- my_app/pytho
|   |-- **the_app.py**
|   |-- build_model.py
|   |-- data
|   |   |-- my_data.csv
|   |   |-- my_model.pkl
|   |   |-- my_vectorizer.pkl
```

'1.  Build a simple web homepage using flask.
'2.  Once you have setup a working homepage...
'3.  Build a submission_page that has an html form for the user to submit new text data.
'4.  Build a predict_page that processes the user submitted form data, and returns the result of your prediction.  


---

### Step 1:  Step by step

'1. Use the `articles.csv` file in the data folder to create a text classifier.

In [1]:
%%writefile mornApp/mornApp.py
from flask import Flask
from flask import request
app = Flask(__name__)
from flask import render_template

# Form page to submit text
#============================================
# create page with a form on it
@app.route('/')
def submission_page():
    #content = 'hello'
    return render_template('template.html')

@app.route('/about')
def about_page():
    #content = 'hello'
    return render_template('about.html')

    '''
    '''
# <form action="/word_counter" method='POST' >
#         <input type="text" name="user_input" />
#         <input type="submit" />
#     </form>
# My word counter app
#==============================================
# create the page the form goes to
@app.route('/word_counter', methods=['POST','GET'] )
def word_counter():
#     if request.method == 'POST':
#         return ''
    # get data from request form, the key is the name you set in your form
    data = request.form['user_input']

    # convert data to list
    data = [data]

    import pickle
    import pandas as pd
    import matplotlib.pyplot as plt
    import numpy as np
    from sklearn.feature_extraction.text import CountVectorizer
    from sklearn.feature_extraction.text import TfidfTransformer
    from sklearn.naive_bayes import MultinomialNB
    
    clf2 = pickle.load(open( "mornApp/data/my_model.pkl", "rb" ) )
    count_vect2 = pickle.load(open( "mornApp/data/my_vectorizer.pkl", "rb" ) )
    tfidf_transformer2 = pickle.load(open ( "mornApp/data/my_transformer.pkl", "rb" ))

    #process new data
    X_new_counts = count_vect2.transform(data)
    X_new_tfidf = tfidf_transformer2.transform(X_new_counts)
    predicted = clf2.predict(X_new_tfidf)
    
    #output the category that the text is in
    categories = ['alt.atheism', 'comp.graphics', 'sci.med', 'soc.religion.christian']
    for doc, category in zip(data, predicted):
        #return('%r => %s' % (doc, categories[category]))
        return render_template('template2.html', doc=doc, category=categories[category])

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=8087, debug=True)

Overwriting mornApp/mornApp.py


In [None]:
!python mornApp/mornApp.py

 * Running on http://0.0.0.0:8087/ (Press CTRL+C to quit)
 * Restarting with stat
127.0.0.1 - - [27/Jul/2015 22:05:28] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [27/Jul/2015 22:05:28] "GET /dist/js/bootstrap.min.js HTTP/1.1" 404 -
127.0.0.1 - - [27/Jul/2015 22:05:28] "GET /assets/js/ie10-viewport-bug-workaround.js HTTP/1.1" 404 -
127.0.0.1 - - [27/Jul/2015 22:05:28] "GET /favicon.ico HTTP/1.1" 404 -
127.0.0.1 - - [27/Jul/2015 22:05:45] "POST /word_counter HTTP/1.1" 200 -
127.0.0.1 - - [27/Jul/2015 22:05:45] "GET /assets/js/ie10-viewport-bug-workaround.js HTTP/1.1" 404 -
127.0.0.1 - - [27/Jul/2015 22:05:45] "GET /dist/js/bootstrap.min.js HTTP/1.1" 404 -
127.0.0.1 - - [27/Jul/2015 22:05:53] "POST /word_counter HTTP/1.1" 200 -
127.0.0.1 - - [27/Jul/2015 22:05:53] "GET /dist/js/bootstrap.min.js HTTP/1.1" 404 -
127.0.0.1 - - [27/Jul/2015 22:05:53] "GET /assets/js/ie10-viewport-bug-workaround.js HTTP/1.1" 404 -
127.0.0.1 - - [27/Jul/2015 22:05:57] "GET /about HTTP/1.1" 200 -
127.0.0.1 - - [27/Jul

##below is from sklearn

In [6]:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
pd.set_option('display.max_columns', 999)

In [27]:
categories = ['alt.atheism', 'soc.religion.christian',           'comp.graphics', 'sci.med']
from sklearn.datasets import fetch_20newsgroups
twenty_train = fetch_20newsgroups(subset='train',categories=categories, shuffle=True, random_state=42)



In [28]:
twenty_train.target_names 

['alt.atheism', 'comp.graphics', 'sci.med', 'soc.religion.christian']

In [29]:
from sklearn.feature_extraction.text import CountVectorizer
count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(twenty_train.data)
X_train_counts.shape

(2257, 35788)

In [30]:
from sklearn.feature_extraction.text import TfidfTransformer
tf_transformer = TfidfTransformer(use_idf=False).fit(X_train_counts)
X_train_tf = tf_transformer.transform(X_train_counts)
X_train_tf.shape

(2257, 35788)

In [31]:
tfidf_transformer = TfidfTransformer()
X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)
X_train_tfidf.shape

(2257, 35788)

In [32]:
from sklearn.naive_bayes import MultinomialNB
clf = MultinomialNB().fit(X_train_tfidf, twenty_train.target)

In [33]:
docs_new = ['God is love', 'OpenGL on the GPU is fast']
X_new_counts = count_vect.transform(docs_new)
X_new_tfidf = tfidf_transformer.transform(X_new_counts)

predicted = clf.predict(X_new_tfidf)

for doc, category in zip(docs_new, predicted):
    print('%r => %s' % (doc, twenty_train.target_names[category]))

'God is love' => soc.religion.christian
'OpenGL on the GPU is fast' => comp.graphics


In [35]:
from sklearn.pipeline import Pipeline
text_clf = Pipeline([('vect', CountVectorizer()),
                      ('tfidf', TfidfTransformer()),
                      ('clf', MultinomialNB()),
])
text_clf = text_clf.fit(twenty_train.data, twenty_train.target)

In [38]:
import pickle
vectorizer_pickle = pickle.dumps(count_vect)
model_pickle = pickle.dumps(clf)
clf2 = pickle.loads(model_pickle)
count_vect2 = pickle.loads(vectorizer_pickle)

docs_new = ['God is love', 'OpenGL on the GPU is fast']
X_new_counts = count_vect2.transform(docs_new)
X_new_tfidf = tfidf_transformer.transform(X_new_counts)

predicted = clf2.predict(X_new_tfidf)

for doc, category in zip(docs_new, predicted):
    print('%r => %s' % (doc, twenty_train.target_names[category]))

'God is love' => soc.religion.christian
'OpenGL on the GPU is fast' => comp.graphics


'2. Set your text data column to `X`.

'3. Set your label data column to `y`.

'4. Initialize a multinomial naive bayes classifier.  

'5. Initialize a TFIDF vectorizer.

'6. With your TFIDF vectorizer, fit and transform your `X` text data. Name the output `vectorized_X`
.
'7.  Initialize your MultinomialNB model
```clf = MultinomialNB()```

'8.  Fit your model with the `transformed_X` data, and the `y` labels.  

'9.  Export your fitted model using pickle.

'10.  Export your fitted vectorizer using pickle.

'9.  Take a break.
---