# ML Pipeline Preparation
Follow the instructions below to help you create your ML pipeline.
### 1. Import libraries and load data from database.
- Import Python libraries
- Load dataset from database with [`read_sql_table`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_sql_table.html)
- Define feature and target variables X and Y

In [1]:
# import libraries
import nltk
nltk.download(['punkt', 'wordnet'])

import re
import numpy as np
import pandas as pd
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
from sklearn.metrics import classification_report, accuracy_score
from sklearn.pipeline import Pipeline
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer



from sqlalchemy import create_engine



from sklearn.model_selection import GridSearchCV
from sklearn.multioutput import MultiOutputClassifier
from sklearn.metrics import confusion_matrix
import pickle
from nltk.stem.porter import PorterStemmer




[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


In [2]:
# load data from database
engine = create_engine('sqlite:///MessagesDB.db')
df = pd.read_sql("SELECT * FROM MessagesTB", engine)
X = df['message']
y = df.iloc[:,4:]
y=y.astype('int') 

category_names = list(df.columns[4:])

In [3]:
category_names

['related',
 'request',
 'offer',
 'aid_related',
 'medical_help',
 'medical_products',
 'search_and_rescue',
 'security',
 'military',
 'child_alone',
 'water',
 'food',
 'shelter',
 'clothing',
 'money',
 'missing_people',
 'refugees',
 'death',
 'other_aid',
 'infrastructure_related',
 'transport',
 'buildings',
 'electricity',
 'tools',
 'hospitals',
 'shops',
 'aid_centers',
 'other_infrastructure',
 'weather_related',
 'floods',
 'storm',
 'fire',
 'earthquake',
 'cold',
 'other_weather',
 'direct_report']

### 2. Write a tokenization function to process your text data

In [4]:
def tokenize(text):
    
    #Remove punctuation
    text = re.sub(r'[^a-zA-Z0-9]', ' ',text)
    
    
    tokens = word_tokenize(text)
    lemmatizer = WordNetLemmatizer()

    clean_tokens = []
    
    for tok in tokens:
    #Normalize text - (lower case and remove blank spaces)
        clean_tok = lemmatizer.lemmatize(tok).lower().strip()
        clean_tokens.append(clean_tok)

    return clean_tokens
  

### 3. Build a machine learning pipeline
This machine pipeline should take in the `message` column as input and output classification results on the other 36 categories in the dataset. You may find the [MultiOutputClassifier](http://scikit-learn.org/stable/modules/generated/sklearn.multioutput.MultiOutputClassifier.html) helpful for predicting multiple target variables.

In [22]:
pipeline = Pipeline([
    ('vect', CountVectorizer(tokenizer=tokenize)),
    ('tfidf', TfidfTransformer()),
    ('clf', MultiOutputClassifier(RandomForestClassifier()))
])

### 4. Train pipeline
- Split data into train and test sets
- Train pipeline

In [23]:
# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

In [24]:
#Train pipeline
pipeline.fit(X_train, y_train)


Pipeline(memory=None,
     steps=[('vect', CountVectorizer(analyzer='word', binary=False, decode_error='strict',
        dtype=<class 'numpy.int64'>, encoding='utf-8', input='content',
        lowercase=True, max_df=1.0, max_features=None, min_df=1,
        ngram_range=(1, 1), preprocessor=None, stop_words=None,
        strip...oob_score=False, random_state=None, verbose=0,
            warm_start=False),
           n_jobs=1))])

In [25]:
y_pred = pipeline.predict(X_test)
y_pred

array([[1, 0, 0, ..., 0, 0, 0],
       [1, 0, 0, ..., 0, 0, 0],
       [1, 0, 0, ..., 0, 0, 0],
       ..., 
       [0, 0, 0, ..., 0, 0, 0],
       [1, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]])

In [9]:
#X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
#pipeline.fit(X_train, y_train)

Pipeline(memory=None,
     steps=[('vect', CountVectorizer(analyzer='word', binary=False, decode_error='strict',
        dtype=<class 'numpy.int64'>, encoding='utf-8', input='content',
        lowercase=True, max_df=1.0, max_features=None, min_df=1,
        ngram_range=(1, 1), preprocessor=None, stop_words=None,
        strip...oob_score=False, random_state=None, verbose=0,
            warm_start=False),
           n_jobs=1))])

### 5. Test your model
Report the f1 score, precision and recall for each output category of the dataset. You can do this by iterating through the columns and calling sklearn's `classification_report` on each.

In [10]:
#y_pred = pipeline.predict(X_test)

for i in range(len(category_names)):
    print('Category: {} '.format(category_names[i]))
    print(classification_report(y_test.iloc[:, i].values, y_pred[:, i]))
    print('Accuracy {}\n\n'.format(accuracy_score(y_test.iloc[:, i].values, y_pred[:, i])))

y_pred = pipeline_cv.predict(X_test)
    
print(classification_report(y_test.iloc[:, 1:].values, np.array([x[1:] for x in Y_pred]), target_names = category_names))
print('Accuracy {}\n\n'.format(accuracy_score(y_test.iloc[:, i].values, y_pred[:, i])))

Category: related 
             precision    recall  f1-score   support

          0       0.61      0.35      0.44      2054
          1       0.81      0.93      0.87      6534
          2       0.67      0.12      0.21        64

avg / total       0.76      0.78      0.76      8652

Accuracy 0.7845584835876098


Category: request 
             precision    recall  f1-score   support

          0       0.88      0.98      0.93      7180
          1       0.82      0.36      0.50      1472

avg / total       0.87      0.88      0.86      8652

Accuracy 0.8776005547850209


Category: offer 
             precision    recall  f1-score   support

          0       1.00      1.00      1.00      8614
          1       0.00      0.00      0.00        38

avg / total       0.99      1.00      0.99      8652

Accuracy 0.9956079519186315


Category: aid_related 
             precision    recall  f1-score   support

          0       0.73      0.88      0.80      5107
          1       0.75     

  'precision', 'predicted', average, warn_for)


NameError: name 'pipeline_cv' is not defined

In [27]:
target_names = y.columns
labels = np.unique(y_pred)

print(classification_report(y_test, y_pred, target_names=target_names))

ValueError: Unknown label type: (       related  request  offer  aid_related  medical_help  medical_products  \
7917         1        0      0            0             0                 0   
25322        1        0      0            0             0                 0   
22191        1        0      0            1             0                 0   
18442        0        0      0            0             0                 0   
1336         0        0      0            0             0                 0   
24449        1        0      1            1             1                 0   
7976         0        0      0            0             0                 0   
17210        1        0      0            0             0                 0   
14652        1        0      0            1             0                 0   
20339        0        0      0            0             0                 0   
9317         0        0      0            0             0                 0   
25097        1        0      0            1             1                 1   
19871        1        0      0            0             0                 0   
9246         1        0      0            0             0                 0   
3776         1        1      0            1             1                 0   
16581        1        0      0            0             0                 0   
20736        1        0      0            1             0                 0   
21706        1        0      0            1             0                 0   
5227         0        0      0            0             0                 0   
9254         1        0      0            0             0                 0   
10201        1        0      0            0             0                 0   
23850        1        0      0            0             0                 0   
25077        1        0      0            0             0                 0   
20388        1        0      0            0             0                 0   
26051        1        0      0            0             0                 0   
17686        1        0      0            0             0                 0   
1427         0        0      0            0             0                 0   
8214         1        0      0            1             0                 0   
17425        0        0      0            0             0                 0   
1347         1        0      0            0             0                 0   
...        ...      ...    ...          ...           ...               ...   
15993        1        0      0            1             0                 0   
1476         1        1      0            1             0                 0   
25700        0        0      0            0             0                 0   
5748         0        0      0            0             0                 0   
16867        1        0      0            1             0                 1   
4731         0        0      0            0             0                 0   
10746        1        0      0            0             0                 0   
19712        1        0      0            0             0                 0   
12821        0        0      0            0             0                 0   
50           1        0      0            0             0                 0   
9325         0        0      0            0             0                 0   
9668         0        0      0            0             0                 0   
23169        1        0      0            1             0                 0   
1686         1        1      0            1             0                 0   
22501        1        0      0            1             0                 0   
4398         1        1      0            1             0                 0   
7413         0        0      0            0             0                 0   
16525        1        0      0            1             1                 0   
8182         0        0      0            0             0                 0   
12218        0        0      0            0             0                 0   
14353        1        0      0            1             1                 0   
20787        1        0      0            0             0                 0   
5404         1        1      0            1             0                 0   
9602         0        0      0            0             0                 0   
8922         0        0      0            0             0                 0   
3198         1        1      0            1             0                 0   
5254         0        0      0            0             0                 0   
4224         0        0      0            0             0                 0   
9989         1        0      0            1             0                 0   
24094        1        0      0            1             0                 0   

       search_and_rescue  security  military  child_alone      ...        \
7917                   0         0         0            0      ...         
25322                  0         0         0            0      ...         
22191                  0         0         0            0      ...         
18442                  0         0         0            0      ...         
1336                   0         0         0            0      ...         
24449                  0         0         0            0      ...         
7976                   0         0         0            0      ...         
17210                  0         0         0            0      ...         
14652                  0         0         0            0      ...         
20339                  0         0         0            0      ...         
9317                   0         0         0            0      ...         
25097                  0         0         0            0      ...         
19871                  0         0         0            0      ...         
9246                   0         0         0            0      ...         
3776                   0         0         0            0      ...         
16581                  0         0         0            0      ...         
20736                  0         0         0            0      ...         
21706                  0         0         0            0      ...         
5227                   0         0         0            0      ...         
9254                   0         0         0            0      ...         
10201                  0         0         0            0      ...         
23850                  0         0         0            0      ...         
25077                  0         0         0            0      ...         
20388                  0         0         0            0      ...         
26051                  0         0         0            0      ...         
17686                  0         0         0            0      ...         
1427                   0         0         0            0      ...         
8214                   0         0         1            0      ...         
17425                  0         0         0            0      ...         
1347                   0         0         0            0      ...         
...                  ...       ...       ...          ...      ...         
15993                  0         0         0            0      ...         
1476                   0         0         0            0      ...         
25700                  0         0         0            0      ...         
5748                   0         0         0            0      ...         
16867                  0         0         0            0      ...         
4731                   0         0         0            0      ...         
10746                  0         0         0            0      ...         
19712                  0         0         0            0      ...         
12821                  0         0         0            0      ...         
50                     0         0         0            0      ...         
9325                   0         0         0            0      ...         
9668                   0         0         0            0      ...         
23169                  0         0         0            0      ...         
1686                   0         0         0            0      ...         
22501                  0         0         0            0      ...         
4398                   0         0         0            0      ...         
7413                   0         0         0            0      ...         
16525                  0         0         0            0      ...         
8182                   0         0         0            0      ...         
12218                  0         0         0            0      ...         
14353                  0         0         0            0      ...         
20787                  0         0         0            0      ...         
5404                   0         0         0            0      ...         
9602                   0         0         0            0      ...         
8922                   0         0         0            0      ...         
3198                   0         0         0            0      ...         
5254                   0         0         0            0      ...         
4224                   0         0         0            0      ...         
9989                   0         0         0            0      ...         
24094                  0         0         1            0      ...         

       aid_centers  other_infrastructure  weather_related  floods  storm  \
7917             0                     0                0       0      0   
25322            0                     0                0       0      0   
22191            0                     0                1       0      0   
18442            0                     0                0       0      0   
1336             0                     0                0       0      0   
24449            0                     0                0       0      0   
7976             0                     0                0       0      0   
17210            0                     0                1       0      1   
14652            0                     0                1       0      0   
20339            0                     0                0       0      0   
9317             0                     0                0       0      0   
25097            0                     0                0       0      0   
19871            0                     1                0       0      0   
9246             0                     0                0       0      0   
3776             0                     0                0       0      0   
16581            0                     0                1       0      0   
20736            0                     0                0       0      0   
21706            0                     0                0       0      0   
5227             0                     0                0       0      0   
9254             0                     0                1       0      1   
10201            0                     0                1       0      0   
23850            0                     0                0       0      0   
25077            0                     0                0       0      0   
20388            0                     0                0       0      0   
26051            0                     0                0       0      0   
17686            0                     0                0       0      0   
1427             0                     0                0       0      0   
8214             0                     0                1       0      0   
17425            0                     0                0       0      0   
1347             0                     0                0       0      0   
...            ...                   ...              ...     ...    ...   
15993            0                     0                1       0      1   
1476             0                     0                0       0      0   
25700            0                     0                0       0      0   
5748             0                     0                0       0      0   
16867            0                     0                1       1      0   
4731             0                     0                0       0      0   
10746            0                     0                0       0      0   
19712            0                     0                0       0      0   
12821            0                     0                0       0      0   
50               0                     0                0       0      0   
9325             0                     0                0       0      0   
9668             0                     0                0       0      0   
23169            0                     0                0       0      0   
1686             0                     0                0       0      0   
22501            1                     0                1       1      1   
4398             0                     0                0       0      0   
7413             0                     0                0       0      0   
16525            0                     0                0       0      0   
8182             0                     0                0       0      0   
12218            0                     0                0       0      0   
14353            0                     0                1       0      0   
20787            0                     0                0       0      0   
5404             0                     0                0       0      0   
9602             0                     0                0       0      0   
8922             0                     0                0       0      0   
3198             0                     0                0       0      0   
5254             0                     0                0       0      0   
4224             0                     0                0       0      0   
9989             0                     0                1       0      0   
24094            0                     0                0       0      0   

       fire  earthquake  cold  other_weather  direct_report  
7917      0           0     0              0              0  
25322     0           0     0              0              0  
22191     0           0     1              1              0  
18442     0           0     0              0              0  
1336      0           0     0              0              0  
24449     0           0     0              0              0  
7976      0           0     0              0              0  
17210     0           0     0              0              1  
14652     0           1     0              0              0  
20339     0           0     0              0              0  
9317      0           0     0              0              0  
25097     0           0     0              0              0  
19871     0           0     0              0              0  
9246      0           0     0              0              0  
3776      0           0     0              0              1  
16581     1           0     0              0              0  
20736     0           0     0              0              0  
21706     0           0     0              0              0  
5227      0           0     0              0              0  
9254      0           0     0              1              0  
10201     0           1     0              1              0  
23850     0           0     0              0              0  
25077     0           0     0              0              1  
20388     0           0     0              0              0  
26051     0           0     0              0              0  
17686     0           0     0              0              0  
1427      0           0     0              0              0  
8214      0           0     1              0              1  
17425     0           0     0              0              0  
1347      0           0     0              0              0  
...     ...         ...   ...            ...            ...  
15993     0           0     0              0              0  
1476      0           0     0              0              1  
25700     0           0     0              0              0  
5748      0           0     0              0              0  
16867     0           0     0              0              0  
4731      0           0     0              0              0  
10746     0           0     0              0              1  
19712     0           0     0              0              0  
12821     0           0     0              0              0  
50        0           0     0              0              0  
9325      0           0     0              0              0  
9668      0           0     0              0              0  
23169     0           0     0              0              0  
1686      0           0     0              0              1  
22501     0           1     0              0              0  
4398      0           0     0              0              1  
7413      0           0     0              0              0  
16525     0           0     0              0              0  
8182      0           0     0              0              0  
12218     0           0     0              0              0  
14353     0           1     0              0              0  
20787     0           0     0              0              0  
5404      0           0     0              0              1  
9602      0           0     0              0              0  
8922      0           0     0              0              0  
3198      0           0     0              0              1  
5254      0           0     0              0              0  
4224      0           0     0              0              0  
9989      0           1     0              0              0  
24094     0           0     0              0              0  

[8652 rows x 36 columns], array([[1, 0, 0, ..., 0, 0, 0],
       [1, 0, 0, ..., 0, 0, 0],
       [1, 0, 0, ..., 0, 0, 0],
       ..., 
       [0, 0, 0, ..., 0, 0, 0],
       [1, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]]))

### 6. Improve your model
Use grid search to find better parameters. 

In [29]:
pipeline.get_params()

parameters = {'tfidf__norm': ['l1','l2'],
              'clf__estimator__criterion': ["gini", "entropy"]
    
             }

cv = GridSearchCV(pipeline, param_grid=parameters)



### 7. Test your model
Show the accuracy, precision, and recall of the tuned model.  

Since this project focuses on code quality, process, and  pipelines, there is no minimum performance metric needed to pass. However, make sure to fine tune your models for accuracy, precision and recall to make your project stand out - especially for your portfolio!

In [30]:
cv.fit(X_train, y_train)

GridSearchCV(cv=None, error_score='raise',
       estimator=Pipeline(memory=None,
     steps=[('vect', CountVectorizer(analyzer='word', binary=False, decode_error='strict',
        dtype=<class 'numpy.int64'>, encoding='utf-8', input='content',
        lowercase=True, max_df=1.0, max_features=None, min_df=1,
        ngram_range=(1, 1), preprocessor=None, stop_words=None,
        strip...oob_score=False, random_state=None, verbose=0,
            warm_start=False),
           n_jobs=1))]),
       fit_params=None, iid=True, n_jobs=1,
       param_grid={'tfidf__norm': ['l1', 'l2'], 'clf__estimator__criterion': ['gini', 'entropy']},
       pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
       scoring=None, verbose=0)

In [31]:
y_pred = cv.predict(X_test)

In [32]:
y_pred

array([[1, 0, 0, ..., 0, 0, 0],
       [1, 0, 0, ..., 0, 0, 0],
       [1, 0, 0, ..., 0, 0, 0],
       ..., 
       [1, 0, 0, ..., 0, 0, 0],
       [1, 0, 0, ..., 0, 0, 0],
       [1, 0, 0, ..., 0, 0, 0]])

### 8. Try improving your model further. Here are a few ideas:
* try other machine learning algorithms
* add other features besides the TF-IDF

In [33]:
target_names = y.columns
print(classification_report(y_test, y_pred, target_names=target_names))

ValueError: Unknown label type: (       related  request  offer  aid_related  medical_help  medical_products  \
7917         1        0      0            0             0                 0   
25322        1        0      0            0             0                 0   
22191        1        0      0            1             0                 0   
18442        0        0      0            0             0                 0   
1336         0        0      0            0             0                 0   
24449        1        0      1            1             1                 0   
7976         0        0      0            0             0                 0   
17210        1        0      0            0             0                 0   
14652        1        0      0            1             0                 0   
20339        0        0      0            0             0                 0   
9317         0        0      0            0             0                 0   
25097        1        0      0            1             1                 1   
19871        1        0      0            0             0                 0   
9246         1        0      0            0             0                 0   
3776         1        1      0            1             1                 0   
16581        1        0      0            0             0                 0   
20736        1        0      0            1             0                 0   
21706        1        0      0            1             0                 0   
5227         0        0      0            0             0                 0   
9254         1        0      0            0             0                 0   
10201        1        0      0            0             0                 0   
23850        1        0      0            0             0                 0   
25077        1        0      0            0             0                 0   
20388        1        0      0            0             0                 0   
26051        1        0      0            0             0                 0   
17686        1        0      0            0             0                 0   
1427         0        0      0            0             0                 0   
8214         1        0      0            1             0                 0   
17425        0        0      0            0             0                 0   
1347         1        0      0            0             0                 0   
...        ...      ...    ...          ...           ...               ...   
15993        1        0      0            1             0                 0   
1476         1        1      0            1             0                 0   
25700        0        0      0            0             0                 0   
5748         0        0      0            0             0                 0   
16867        1        0      0            1             0                 1   
4731         0        0      0            0             0                 0   
10746        1        0      0            0             0                 0   
19712        1        0      0            0             0                 0   
12821        0        0      0            0             0                 0   
50           1        0      0            0             0                 0   
9325         0        0      0            0             0                 0   
9668         0        0      0            0             0                 0   
23169        1        0      0            1             0                 0   
1686         1        1      0            1             0                 0   
22501        1        0      0            1             0                 0   
4398         1        1      0            1             0                 0   
7413         0        0      0            0             0                 0   
16525        1        0      0            1             1                 0   
8182         0        0      0            0             0                 0   
12218        0        0      0            0             0                 0   
14353        1        0      0            1             1                 0   
20787        1        0      0            0             0                 0   
5404         1        1      0            1             0                 0   
9602         0        0      0            0             0                 0   
8922         0        0      0            0             0                 0   
3198         1        1      0            1             0                 0   
5254         0        0      0            0             0                 0   
4224         0        0      0            0             0                 0   
9989         1        0      0            1             0                 0   
24094        1        0      0            1             0                 0   

       search_and_rescue  security  military  child_alone      ...        \
7917                   0         0         0            0      ...         
25322                  0         0         0            0      ...         
22191                  0         0         0            0      ...         
18442                  0         0         0            0      ...         
1336                   0         0         0            0      ...         
24449                  0         0         0            0      ...         
7976                   0         0         0            0      ...         
17210                  0         0         0            0      ...         
14652                  0         0         0            0      ...         
20339                  0         0         0            0      ...         
9317                   0         0         0            0      ...         
25097                  0         0         0            0      ...         
19871                  0         0         0            0      ...         
9246                   0         0         0            0      ...         
3776                   0         0         0            0      ...         
16581                  0         0         0            0      ...         
20736                  0         0         0            0      ...         
21706                  0         0         0            0      ...         
5227                   0         0         0            0      ...         
9254                   0         0         0            0      ...         
10201                  0         0         0            0      ...         
23850                  0         0         0            0      ...         
25077                  0         0         0            0      ...         
20388                  0         0         0            0      ...         
26051                  0         0         0            0      ...         
17686                  0         0         0            0      ...         
1427                   0         0         0            0      ...         
8214                   0         0         1            0      ...         
17425                  0         0         0            0      ...         
1347                   0         0         0            0      ...         
...                  ...       ...       ...          ...      ...         
15993                  0         0         0            0      ...         
1476                   0         0         0            0      ...         
25700                  0         0         0            0      ...         
5748                   0         0         0            0      ...         
16867                  0         0         0            0      ...         
4731                   0         0         0            0      ...         
10746                  0         0         0            0      ...         
19712                  0         0         0            0      ...         
12821                  0         0         0            0      ...         
50                     0         0         0            0      ...         
9325                   0         0         0            0      ...         
9668                   0         0         0            0      ...         
23169                  0         0         0            0      ...         
1686                   0         0         0            0      ...         
22501                  0         0         0            0      ...         
4398                   0         0         0            0      ...         
7413                   0         0         0            0      ...         
16525                  0         0         0            0      ...         
8182                   0         0         0            0      ...         
12218                  0         0         0            0      ...         
14353                  0         0         0            0      ...         
20787                  0         0         0            0      ...         
5404                   0         0         0            0      ...         
9602                   0         0         0            0      ...         
8922                   0         0         0            0      ...         
3198                   0         0         0            0      ...         
5254                   0         0         0            0      ...         
4224                   0         0         0            0      ...         
9989                   0         0         0            0      ...         
24094                  0         0         1            0      ...         

       aid_centers  other_infrastructure  weather_related  floods  storm  \
7917             0                     0                0       0      0   
25322            0                     0                0       0      0   
22191            0                     0                1       0      0   
18442            0                     0                0       0      0   
1336             0                     0                0       0      0   
24449            0                     0                0       0      0   
7976             0                     0                0       0      0   
17210            0                     0                1       0      1   
14652            0                     0                1       0      0   
20339            0                     0                0       0      0   
9317             0                     0                0       0      0   
25097            0                     0                0       0      0   
19871            0                     1                0       0      0   
9246             0                     0                0       0      0   
3776             0                     0                0       0      0   
16581            0                     0                1       0      0   
20736            0                     0                0       0      0   
21706            0                     0                0       0      0   
5227             0                     0                0       0      0   
9254             0                     0                1       0      1   
10201            0                     0                1       0      0   
23850            0                     0                0       0      0   
25077            0                     0                0       0      0   
20388            0                     0                0       0      0   
26051            0                     0                0       0      0   
17686            0                     0                0       0      0   
1427             0                     0                0       0      0   
8214             0                     0                1       0      0   
17425            0                     0                0       0      0   
1347             0                     0                0       0      0   
...            ...                   ...              ...     ...    ...   
15993            0                     0                1       0      1   
1476             0                     0                0       0      0   
25700            0                     0                0       0      0   
5748             0                     0                0       0      0   
16867            0                     0                1       1      0   
4731             0                     0                0       0      0   
10746            0                     0                0       0      0   
19712            0                     0                0       0      0   
12821            0                     0                0       0      0   
50               0                     0                0       0      0   
9325             0                     0                0       0      0   
9668             0                     0                0       0      0   
23169            0                     0                0       0      0   
1686             0                     0                0       0      0   
22501            1                     0                1       1      1   
4398             0                     0                0       0      0   
7413             0                     0                0       0      0   
16525            0                     0                0       0      0   
8182             0                     0                0       0      0   
12218            0                     0                0       0      0   
14353            0                     0                1       0      0   
20787            0                     0                0       0      0   
5404             0                     0                0       0      0   
9602             0                     0                0       0      0   
8922             0                     0                0       0      0   
3198             0                     0                0       0      0   
5254             0                     0                0       0      0   
4224             0                     0                0       0      0   
9989             0                     0                1       0      0   
24094            0                     0                0       0      0   

       fire  earthquake  cold  other_weather  direct_report  
7917      0           0     0              0              0  
25322     0           0     0              0              0  
22191     0           0     1              1              0  
18442     0           0     0              0              0  
1336      0           0     0              0              0  
24449     0           0     0              0              0  
7976      0           0     0              0              0  
17210     0           0     0              0              1  
14652     0           1     0              0              0  
20339     0           0     0              0              0  
9317      0           0     0              0              0  
25097     0           0     0              0              0  
19871     0           0     0              0              0  
9246      0           0     0              0              0  
3776      0           0     0              0              1  
16581     1           0     0              0              0  
20736     0           0     0              0              0  
21706     0           0     0              0              0  
5227      0           0     0              0              0  
9254      0           0     0              1              0  
10201     0           1     0              1              0  
23850     0           0     0              0              0  
25077     0           0     0              0              1  
20388     0           0     0              0              0  
26051     0           0     0              0              0  
17686     0           0     0              0              0  
1427      0           0     0              0              0  
8214      0           0     1              0              1  
17425     0           0     0              0              0  
1347      0           0     0              0              0  
...     ...         ...   ...            ...            ...  
15993     0           0     0              0              0  
1476      0           0     0              0              1  
25700     0           0     0              0              0  
5748      0           0     0              0              0  
16867     0           0     0              0              0  
4731      0           0     0              0              0  
10746     0           0     0              0              1  
19712     0           0     0              0              0  
12821     0           0     0              0              0  
50        0           0     0              0              0  
9325      0           0     0              0              0  
9668      0           0     0              0              0  
23169     0           0     0              0              0  
1686      0           0     0              0              1  
22501     0           1     0              0              0  
4398      0           0     0              0              1  
7413      0           0     0              0              0  
16525     0           0     0              0              0  
8182      0           0     0              0              0  
12218     0           0     0              0              0  
14353     0           1     0              0              0  
20787     0           0     0              0              0  
5404      0           0     0              0              1  
9602      0           0     0              0              0  
8922      0           0     0              0              0  
3198      0           0     0              0              1  
5254      0           0     0              0              0  
4224      0           0     0              0              0  
9989      0           1     0              0              0  
24094     0           0     0              0              0  

[8652 rows x 36 columns], array([[1, 0, 0, ..., 0, 0, 0],
       [1, 0, 0, ..., 0, 0, 0],
       [1, 0, 0, ..., 0, 0, 0],
       ..., 
       [1, 0, 0, ..., 0, 0, 0],
       [1, 0, 0, ..., 0, 0, 0],
       [1, 0, 0, ..., 0, 0, 0]]))

### 9. Export your model as a pickle file

In [34]:
with open('MLclassifier.pkl', 'wb') as file:
    pickle.dump(cv, file)

In [35]:
cv.grid_scores_



[mean: 0.22130, std: 0.00639, params: {'clf__estimator__criterion': 'gini', 'tfidf__norm': 'l1'},
 mean: 0.22706, std: 0.00363, params: {'clf__estimator__criterion': 'gini', 'tfidf__norm': 'l2'},
 mean: 0.21726, std: 0.00196, params: {'clf__estimator__criterion': 'entropy', 'tfidf__norm': 'l1'},
 mean: 0.21504, std: 0.00168, params: {'clf__estimator__criterion': 'entropy', 'tfidf__norm': 'l2'}]

In [36]:
cv.best_estimator_

Pipeline(memory=None,
     steps=[('vect', CountVectorizer(analyzer='word', binary=False, decode_error='strict',
        dtype=<class 'numpy.int64'>, encoding='utf-8', input='content',
        lowercase=True, max_df=1.0, max_features=None, min_df=1,
        ngram_range=(1, 1), preprocessor=None, stop_words=None,
        strip...oob_score=False, random_state=None, verbose=0,
            warm_start=False),
           n_jobs=1))])

### 10. Use this notebook to complete `train.py`
Use the template file attached in the Resources folder to write a script that runs the steps above to create a database and export a model based on a new dataset specified by the user.