# ML Pipeline Preparation
Follow the instructions below to help you create your ML pipeline.
### 1. Import libraries and load data from database.
- Import Python libraries
- Load dataset from database with [`read_sql_table`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_sql_table.html)
- Define feature and target variables X and Y

In [2]:
# import libraries
import os
import re
import pandas as pd
from sqlalchemy import create_engine
import matplotlib.pyplot as plt
%matplotlib inline

import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
from nltk.stem import PorterStemmer
nltk.download(['punkt', 'wordnet', 'stopwords'])

from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.multioutput import MultiOutputClassifier
from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.metrics import classification_report

import pickle
import warnings
warnings.filterwarnings('ignore')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Unzipping corpora/wordnet.zip.
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


In [3]:
# load data from database
engine = create_engine('sqlite:///project3.db')
df = pd.read_sql_table( 'disaster1',con=engine)

X = df['message']
y = df.drop(columns=['id', 'message', 'original', 'genre'], axis=1)



### 2. Write a tokenization function to process your text data

In [4]:

def tokenize(text):
    # check if there are urls within the text
    url_regex = 'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+'
    detected_urls = re.findall(url_regex,text)
    for url in detected_urls:
        text = text.replace(url,"urlplaceholder")
    
    # remove punctuation
    text = re.sub(r"[^a-zA-Z0-9]"," ",text)
    
    # tokenize the text
    tokens = word_tokenize(text)
    
    # remove stop words
    tokens = [tok for tok in tokens if tok not in stopwords.words("english")]
    
    # lemmatization
    lemmatizer = WordNetLemmatizer()
    
    clean_tokens = []
    for tok in tokens:
        clean_tok = lemmatizer.lemmatize(tok).lower().strip()
        clean_tokens.append(clean_tok)
        
    return clean_tokens

### 3. Build a machine learning pipeline
This machine pipeline should take in the `message` column as input and output classification results on the other 36 categories in the dataset. You may find the [MultiOutputClassifier](http://scikit-learn.org/stable/modules/generated/sklearn.multioutput.MultiOutputClassifier.html) helpful for predicting multiple target variables.

In [5]:
# create a pipeline: a count vectorizer, a tfid transformer and a classifier
rfc_pipe = Pipeline([
    ('vect', CountVectorizer(tokenizer=tokenize)),
    ('tfidf', TfidfTransformer()),
    ('clf', MultiOutputClassifier(RandomForestClassifier()))
])

In [6]:
rfc_pipe.get_params()


{'memory': None,
 'steps': [('vect',
   CountVectorizer(analyzer='word', binary=False, decode_error='strict',
           dtype=<class 'numpy.int64'>, encoding='utf-8', input='content',
           lowercase=True, max_df=1.0, max_features=None, min_df=1,
           ngram_range=(1, 1), preprocessor=None, stop_words=None,
           strip_accents=None, token_pattern='(?u)\\b\\w\\w+\\b',
           tokenizer=<function tokenize at 0x7fb032b53950>, vocabulary=None)),
  ('tfidf',
   TfidfTransformer(norm='l2', smooth_idf=True, sublinear_tf=False, use_idf=True)),
  ('clf',
   MultiOutputClassifier(estimator=RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
               max_depth=None, max_features='auto', max_leaf_nodes=None,
               min_impurity_decrease=0.0, min_impurity_split=None,
               min_samples_leaf=1, min_samples_split=2,
               min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
               oob_score=False, random_state=None,

### 4. Train pipeline
- Split data into train and test sets
- Train pipeline

In [7]:

# train test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)

# train the pipeline
rfc_pipe.fit(X_train,y_train)

Pipeline(memory=None,
     steps=[('vect', CountVectorizer(analyzer='word', binary=False, decode_error='strict',
        dtype=<class 'numpy.int64'>, encoding='utf-8', input='content',
        lowercase=True, max_df=1.0, max_features=None, min_df=1,
        ngram_range=(1, 1), preprocessor=None, stop_words=None,
        strip...oob_score=False, random_state=None, verbose=0,
            warm_start=False),
           n_jobs=1))])

In [8]:
y_test.head()

Unnamed: 0,related,request,offer,aid_related,medical_help,medical_products,search_and_rescue,security,military,child_alone,...,aid_centers,other_infrastructure,weather_related,floods,storm,fire,earthquake,cold,other_weather,direct_report
13346,1,0,0,0,0,0,0,0,0,0,...,0,0,1,0,0,0,1,0,0,0
10648,1,0,0,0,0,0,0,0,0,0,...,0,0,1,1,0,0,0,0,0,0
11229,1,0,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
23632,1,0,0,1,1,0,0,0,0,0,...,0,0,1,1,1,0,0,0,0,0
15743,1,0,0,1,0,0,1,0,0,0,...,0,0,1,1,0,0,1,0,0,0


In [9]:
X.head()

0    Weather update - a cold front from Cuba that c...
1              Is the Hurricane over or is it not over
2                      Looking for someone but no name
3    UN reports Leogane 80-90 destroyed. Only Hospi...
4    says: west side of Haiti, rest of the country ...
Name: message, dtype: object

### 5. Test your model
Report the f1 score, precision and recall for each output category of the dataset. You can do this by iterating through the columns and calling sklearn's `classification_report` on each.

In [10]:
# predict on train set
y_rfc_trainpred = rfc_pipe.predict(X_train)

In [11]:
y_train=y_train.reset_index(drop=True)

In [12]:
y_train.head()

Unnamed: 0,related,request,offer,aid_related,medical_help,medical_products,search_and_rescue,security,military,child_alone,...,aid_centers,other_infrastructure,weather_related,floods,storm,fire,earthquake,cold,other_weather,direct_report
0,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,1,0,0,1,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,1,0,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
4,1,0,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [13]:
y_rfc_trainpred = pd.DataFrame(y_rfc_trainpred, columns = [y.columns.values])

In [22]:
y_rfc_trainpred.reset_index(drop=True, inplace=True)

In [23]:
y_rfc_trainpred.columns

MultiIndex(levels=[['aid_centers', 'aid_related', 'buildings', 'child_alone', 'clothing', 'cold', 'death', 'direct_report', 'earthquake', 'electricity', 'fire', 'floods', 'food', 'hospitals', 'infrastructure_related', 'medical_help', 'medical_products', 'military', 'missing_people', 'money', 'offer', 'other_aid', 'other_infrastructure', 'other_weather', 'refugees', 'related', 'request', 'search_and_rescue', 'security', 'shelter', 'shops', 'storm', 'tools', 'transport', 'water', 'weather_related']],
           labels=[[25, 26, 20, 1, 15, 16, 27, 28, 17, 3, 34, 12, 29, 4, 19, 18, 24, 6, 21, 14, 33, 2, 9, 32, 13, 30, 0, 22, 35, 11, 31, 10, 8, 5, 23, 7]])

In [24]:
y_train.columns

Index(['related', 'request', 'offer', 'aid_related', 'medical_help',
       'medical_products', 'search_and_rescue', 'security', 'military',
       'child_alone', 'water', 'food', 'shelter', 'clothing', 'money',
       'missing_people', 'refugees', 'death', 'other_aid',
       'infrastructure_related', 'transport', 'buildings', 'electricity',
       'tools', 'hospitals', 'shops', 'aid_centers', 'other_infrastructure',
       'weather_related', 'floods', 'storm', 'fire', 'earthquake', 'cold',
       'other_weather', 'direct_report'],
      dtype='object')

In [25]:
# classification report on train set
print(classification_report(y_train, y_rfc_trainpred, target_names=y.columns.values))

ValueError: Unknown label type: (       related  request  offer  aid_related  medical_help  medical_products  \
0            1        0      0            0             0                 0   
1            1        0      0            0             0                 0   
2            1        0      0            1             0                 1   
3            1        0      0            1             0                 0   
4            1        0      0            1             0                 0   
5            1        0      0            0             0                 0   
6            0        0      0            0             0                 0   
7            1        1      0            1             0                 0   
8            1        0      0            1             0                 1   
9            1        0      0            0             0                 0   
10           1        0      0            1             0                 0   
11           1        1      0            1             0                 0   
12           1        0      0            0             0                 0   
13           1        0      0            0             0                 0   
14           1        1      0            1             0                 0   
15           1        1      0            1             0                 0   
16           1        0      0            1             0                 0   
17           0        0      0            0             0                 0   
18           1        0      0            1             0                 0   
19           0        0      0            0             0                 0   
20           1        1      0            1             0                 0   
21           1        0      0            1             1                 1   
22           1        0      0            0             0                 0   
23           0        0      0            0             0                 0   
24           1        1      0            0             0                 0   
25           1        0      0            1             0                 0   
26           0        0      0            0             0                 0   
27           1        0      0            1             0                 0   
28           1        0      0            1             0                 0   
29           1        0      1            1             0                 1   
...        ...      ...    ...          ...           ...               ...   
19632        1        1      0            1             0                 0   
19633        1        0      0            1             1                 0   
19634        1        0      0            0             0                 0   
19635        1        0      0            0             0                 0   
19636        1        0      0            1             0                 0   
19637        1        0      0            0             0                 0   
19638        0        0      0            0             0                 0   
19639        0        0      0            0             0                 0   
19640        0        0      0            0             0                 0   
19641        1        0      0            0             0                 0   
19642        0        0      0            0             0                 0   
19643        0        0      0            0             0                 0   
19644        1        0      1            1             0                 0   
19645        0        0      0            0             0                 0   
19646        1        0      0            1             0                 0   
19647        1        1      0            1             0                 0   
19648        0        0      0            0             0                 0   
19649        1        0      0            1             0                 0   
19650        1        0      0            0             0                 0   
19651        1        0      0            1             0                 0   
19652        1        0      0            0             0                 0   
19653        1        0      0            1             0                 0   
19654        1        1      0            1             1                 1   
19655        0        0      0            0             0                 0   
19656        1        0      0            1             0                 0   
19657        1        0      0            1             0                 0   
19658        1        1      0            1             0                 0   
19659        0        0      0            0             0                 0   
19660        1        0      0            1             1                 0   
19661        1        0      0            1             0                 0   

       search_and_rescue  security  military  child_alone      ...        \
0                      0         0         0            0      ...         
1                      0         0         0            0      ...         
2                      0         0         0            0      ...         
3                      0         0         0            0      ...         
4                      0         0         0            0      ...         
5                      0         0         0            0      ...         
6                      0         0         0            0      ...         
7                      0         0         0            0      ...         
8                      0         0         0            0      ...         
9                      0         0         0            0      ...         
10                     1         0         0            0      ...         
11                     0         0         0            0      ...         
12                     0         0         0            0      ...         
13                     0         0         0            0      ...         
14                     0         0         0            0      ...         
15                     0         0         0            0      ...         
16                     0         0         0            0      ...         
17                     0         0         0            0      ...         
18                     0         0         0            0      ...         
19                     0         0         0            0      ...         
20                     0         0         0            0      ...         
21                     0         0         0            0      ...         
22                     0         0         0            0      ...         
23                     0         0         0            0      ...         
24                     0         0         0            0      ...         
25                     0         0         1            0      ...         
26                     0         0         0            0      ...         
27                     0         0         0            0      ...         
28                     0         0         0            0      ...         
29                     0         0         0            0      ...         
...                  ...       ...       ...          ...      ...         
19632                  0         0         0            0      ...         
19633                  0         0         0            0      ...         
19634                  0         0         0            0      ...         
19635                  0         0         0            0      ...         
19636                  0         0         0            0      ...         
19637                  0         0         0            0      ...         
19638                  0         0         0            0      ...         
19639                  0         0         0            0      ...         
19640                  0         0         0            0      ...         
19641                  0         0         0            0      ...         
19642                  0         0         0            0      ...         
19643                  0         0         0            0      ...         
19644                  0         0         1            0      ...         
19645                  0         0         0            0      ...         
19646                  0         0         0            0      ...         
19647                  0         0         0            0      ...         
19648                  0         0         0            0      ...         
19649                  0         0         0            0      ...         
19650                  0         0         0            0      ...         
19651                  1         0         0            0      ...         
19652                  0         0         0            0      ...         
19653                  0         0         0            0      ...         
19654                  0         0         0            0      ...         
19655                  0         0         0            0      ...         
19656                  1         0         0            0      ...         
19657                  0         0         0            0      ...         
19658                  0         0         0            0      ...         
19659                  0         0         0            0      ...         
19660                  0         0         0            0      ...         
19661                  0         0         1            0      ...         

       aid_centers  other_infrastructure  weather_related  floods  storm  \
0                0                     0                0       0      0   
1                0                     0                0       0      0   
2                0                     0                0       0      0   
3                0                     0                0       0      0   
4                0                     0                0       0      0   
5                0                     0                1       0      1   
6                0                     0                0       0      0   
7                0                     0                0       0      0   
8                0                     0                0       0      0   
9                0                     0                0       0      0   
10               1                     1                1       1      1   
11               0                     0                0       0      0   
12               0                     1                1       0      1   
13               0                     0                1       0      1   
14               0                     0                0       0      0   
15               0                     0                0       0      0   
16               0                     0                1       1      0   
17               0                     0                0       0      0   
18               0                     0                0       0      0   
19               0                     0                0       0      0   
20               0                     0                1       0      0   
21               1                     1                0       0      0   
22               0                     1                0       0      0   
23               0                     0                0       0      0   
24               0                     0                0       0      0   
25               0                     0                0       0      0   
26               0                     0                0       0      0   
27               0                     0                0       0      0   
28               0                     0                0       0      0   
29               0                     0                0       0      0   
...            ...                   ...              ...     ...    ...   
19632            0                     0                0       0      0   
19633            0                     0                0       0      0   
19634            0                     0                1       1      0   
19635            0                     0                1       0      0   
19636            0                     0                1       0      1   
19637            0                     0                0       0      0   
19638            0                     0                0       0      0   
19639            0                     0                0       0      0   
19640            0                     0                0       0      0   
19641            0                     0                1       1      0   
19642            0                     0                0       0      0   
19643            0                     0                0       0      0   
19644            0                     0                0       0      0   
19645            0                     0                0       0      0   
19646            0                     0                0       0      0   
19647            0                     0                1       0      1   
19648            0                     0                0       0      0   
19649            0                     0                1       0      1   
19650            0                     0                0       0      0   
19651            0                     0                1       0      0   
19652            0                     0                1       0      0   
19653            0                     0                1       0      1   
19654            0                     0                0       0      0   
19655            0                     0                0       0      0   
19656            1                     0                1       0      0   
19657            0                     0                0       0      0   
19658            0                     0                0       0      0   
19659            0                     0                0       0      0   
19660            0                     0                1       1      1   
19661            0                     0                0       0      0   

       fire  earthquake  cold  other_weather  direct_report  
0         0           0     0              0              0  
1         0           0     0              0              0  
2         0           0     0              0              0  
3         0           0     0              0              1  
4         0           0     0              0              0  
5         0           0     0              0              0  
6         0           0     0              0              0  
7         0           0     0              0              1  
8         0           0     0              0              0  
9         0           0     0              0              0  
10        1           1     0              0              0  
11        0           0     0              0              1  
12        0           0     0              0              0  
13        0           0     0              0              1  
14        0           0     0              0              1  
15        0           0     0              0              1  
16        0           0     0              0              0  
17        0           0     0              0              0  
18        0           0     0              0              0  
19        0           0     0              0              0  
20        0           1     0              0              1  
21        0           0     0              0              1  
22        0           0     0              0              0  
23        0           0     0              0              0  
24        0           0     0              0              0  
25        0           0     0              0              0  
26        0           0     0              0              0  
27        0           0     0              0              0  
28        0           0     0              0              0  
29        0           0     0              0              0  
...     ...         ...   ...            ...            ...  
19632     0           0     0              0              1  
19633     0           0     0              0              0  
19634     0           0     0              0              0  
19635     0           1     0              0              0  
19636     0           0     0              0              0  
19637     0           0     0              0              0  
19638     0           0     0              0              0  
19639     0           0     0              0              0  
19640     0           0     0              0              0  
19641     0           0     0              0              0  
19642     0           0     0              0              0  
19643     0           0     0              0              0  
19644     0           0     0              0              0  
19645     0           0     0              0              0  
19646     0           0     0              0              0  
19647     0           0     0              0              1  
19648     0           0     0              0              0  
19649     0           1     0              0              0  
19650     0           0     0              0              1  
19651     0           1     0              0              0  
19652     0           1     0              0              0  
19653     0           0     0              0              0  
19654     0           0     0              0              1  
19655     0           0     0              0              0  
19656     1           0     0              0              0  
19657     0           0     0              0              0  
19658     0           0     0              0              1  
19659     0           0     0              0              0  
19660     0           0     1              1              0  
19661     0           0     0              0              0  

[19662 rows x 36 columns],       related request offer aid_related medical_help medical_products  \
0           1       0     0           0            0                0   
1           1       0     0           1            0                0   
2           1       0     0           1            0                1   
3           1       0     0           1            0                0   
4           1       0     0           1            0                0   
5           1       0     0           0            0                0   
6           0       0     0           0            0                0   
7           1       1     0           1            0                0   
8           1       0     0           1            0                1   
9           1       0     0           0            0                0   
10          1       0     0           1            0                0   
11          1       1     0           1            0                0   
12          1       0     0           0            0                0   
13          1       0     0           0            0                0   
14          1       1     0           1            0                0   
15          1       1     0           1            0                0   
16          1       0     0           1            0                0   
17          0       0     0           0            0                0   
18          1       0     0           1            0                0   
19          0       0     0           0            0                0   
20          1       1     0           1            0                0   
21          1       0     0           1            1                1   
22          1       0     0           0            0                0   
23          0       0     0           0            0                0   
24          1       1     0           0            0                0   
25          1       0     0           1            0                0   
26          0       0     0           0            0                0   
27          1       0     0           1            0                0   
28          1       0     0           1            0                0   
29          1       0     1           1            0                1   
...       ...     ...   ...         ...          ...              ...   
19632       1       1     0           1            0                0   
19633       1       0     0           1            1                0   
19634       1       0     0           0            0                0   
19635       1       0     0           0            0                0   
19636       1       0     0           1            0                0   
19637       1       0     0           0            0                0   
19638       0       0     0           0            0                0   
19639       0       0     0           0            0                0   
19640       0       0     0           0            0                0   
19641       1       0     0           0            0                0   
19642       0       0     0           0            0                0   
19643       0       0     0           0            0                0   
19644       1       0     1           1            0                0   
19645       0       0     0           0            0                0   
19646       1       0     0           1            0                0   
19647       1       1     0           1            0                0   
19648       0       0     0           0            0                0   
19649       1       0     0           1            0                0   
19650       1       0     0           0            0                0   
19651       1       0     0           1            0                0   
19652       1       0     0           0            0                0   
19653       1       0     0           1            0                0   
19654       1       1     0           1            1                1   
19655       0       0     0           0            0                0   
19656       1       0     0           1            0                0   
19657       1       0     0           1            0                0   
19658       1       1     0           1            0                0   
19659       0       0     0           0            0                0   
19660       1       0     0           1            1                0   
19661       1       0     0           1            0                0   

      search_and_rescue security military child_alone      ...       \
0                     0        0        0           0      ...        
1                     0        0        0           0      ...        
2                     0        0        0           0      ...        
3                     0        0        0           0      ...        
4                     0        0        0           0      ...        
5                     0        0        0           0      ...        
6                     0        0        0           0      ...        
7                     0        0        0           0      ...        
8                     0        0        0           0      ...        
9                     0        0        0           0      ...        
10                    1        0        0           0      ...        
11                    0        0        0           0      ...        
12                    0        0        0           0      ...        
13                    0        0        0           0      ...        
14                    0        0        0           0      ...        
15                    0        0        0           0      ...        
16                    0        0        0           0      ...        
17                    0        0        0           0      ...        
18                    0        0        0           0      ...        
19                    0        0        0           0      ...        
20                    0        0        0           0      ...        
21                    0        0        0           0      ...        
22                    0        0        0           0      ...        
23                    0        0        0           0      ...        
24                    0        0        0           0      ...        
25                    0        0        1           0      ...        
26                    0        0        0           0      ...        
27                    0        0        0           0      ...        
28                    0        0        0           0      ...        
29                    0        0        0           0      ...        
...                 ...      ...      ...         ...      ...        
19632                 0        0        0           0      ...        
19633                 0        0        0           0      ...        
19634                 0        0        0           0      ...        
19635                 0        0        0           0      ...        
19636                 0        0        0           0      ...        
19637                 0        0        0           0      ...        
19638                 0        0        0           0      ...        
19639                 0        0        0           0      ...        
19640                 0        0        0           0      ...        
19641                 0        0        0           0      ...        
19642                 0        0        0           0      ...        
19643                 0        0        0           0      ...        
19644                 0        0        0           0      ...        
19645                 0        0        0           0      ...        
19646                 0        0        0           0      ...        
19647                 0        0        0           0      ...        
19648                 0        0        0           0      ...        
19649                 0        0        0           0      ...        
19650                 0        0        0           0      ...        
19651                 1        0        0           0      ...        
19652                 0        0        0           0      ...        
19653                 0        0        0           0      ...        
19654                 0        0        0           0      ...        
19655                 0        0        0           0      ...        
19656                 0        0        0           0      ...        
19657                 0        0        0           0      ...        
19658                 0        0        0           0      ...        
19659                 0        0        0           0      ...        
19660                 0        0        0           0      ...        
19661                 0        0        1           0      ...        

      aid_centers other_infrastructure weather_related floods storm fire  \
0               0                    0               0      0     0    0   
1               0                    0               0      0     0    0   
2               0                    0               0      0     0    0   
3               0                    0               0      0     0    0   
4               0                    0               0      0     0    0   
5               0                    0               1      0     1    0   
6               0                    0               0      0     0    0   
7               0                    0               0      0     0    0   
8               0                    0               0      0     0    0   
9               0                    0               0      0     0    0   
10              1                    1               1      1     1    1   
11              0                    0               0      0     0    0   
12              0                    1               1      0     1    0   
13              0                    0               1      0     0    0   
14              0                    0               0      0     0    0   
15              0                    0               0      0     0    0   
16              0                    0               1      1     0    0   
17              0                    0               0      0     0    0   
18              0                    0               0      0     0    0   
19              0                    0               0      0     0    0   
20              0                    0               1      0     0    0   
21              1                    0               0      0     0    0   
22              0                    1               0      0     0    0   
23              0                    0               0      0     0    0   
24              0                    0               0      0     0    0   
25              0                    0               0      0     0    0   
26              0                    0               0      0     0    0   
27              0                    0               0      0     0    0   
28              0                    0               0      0     0    0   
29              0                    0               0      0     0    0   
...           ...                  ...             ...    ...   ...  ...   
19632           0                    0               0      0     0    0   
19633           0                    0               0      0     0    0   
19634           0                    0               1      1     0    0   
19635           0                    0               1      0     0    0   
19636           0                    0               1      0     1    0   
19637           0                    0               0      0     0    0   
19638           0                    0               0      0     0    0   
19639           0                    0               0      0     0    0   
19640           0                    0               0      0     0    0   
19641           0                    0               1      1     0    0   
19642           0                    0               0      0     0    0   
19643           0                    0               0      0     0    0   
19644           0                    0               0      0     0    0   
19645           0                    0               0      0     0    0   
19646           0                    0               0      0     0    0   
19647           0                    0               0      0     1    0   
19648           0                    0               0      0     0    0   
19649           0                    0               1      0     1    0   
19650           0                    0               0      0     0    0   
19651           0                    0               1      0     0    0   
19652           0                    0               1      0     0    0   
19653           0                    0               1      0     0    0   
19654           0                    0               0      0     0    0   
19655           0                    0               0      0     0    0   
19656           1                    0               1      0     0    1   
19657           0                    0               0      0     0    0   
19658           0                    0               0      0     0    0   
19659           0                    0               0      0     0    0   
19660           0                    0               1      1     1    0   
19661           0                    0               0      0     0    0   

      earthquake cold other_weather direct_report  
0              0    0             0             0  
1              0    0             0             0  
2              0    0             0             0  
3              0    0             0             0  
4              0    0             0             0  
5              0    0             0             0  
6              0    0             0             0  
7              0    0             0             1  
8              0    0             0             0  
9              0    0             0             0  
10             1    0             0             0  
11             0    0             0             1  
12             0    0             0             0  
13             0    0             0             1  
14             0    0             0             1  
15             0    0             0             1  
16             0    0             0             0  
17             0    0             0             0  
18             0    0             0             0  
19             0    0             0             0  
20             1    0             0             1  
21             0    0             0             0  
22             0    0             0             0  
23             0    0             0             0  
24             0    0             0             0  
25             0    0             0             0  
26             0    0             0             0  
27             0    0             0             0  
28             0    0             0             0  
29             0    0             0             0  
...          ...  ...           ...           ...  
19632          0    0             0             1  
19633          0    0             0             0  
19634          0    0             0             0  
19635          1    0             0             0  
19636          0    0             0             0  
19637          0    0             0             0  
19638          0    0             0             0  
19639          0    0             0             0  
19640          0    0             0             0  
19641          0    0             0             0  
19642          0    0             0             0  
19643          0    0             0             0  
19644          0    0             0             0  
19645          0    0             0             0  
19646          0    0             0             0  
19647          0    0             0             1  
19648          0    0             0             0  
19649          0    0             0             0  
19650          0    0             0             1  
19651          1    0             0             0  
19652          1    0             0             0  
19653          0    0             0             0  
19654          0    0             0             1  
19655          0    0             0             0  
19656          0    0             0             0  
19657          0    0             0             0  
19658          0    0             0             1  
19659          0    0             0             0  
19660          0    0             1             0  
19661          0    0             0             0  

[19662 rows x 36 columns])

In [None]:
# Model accuracy score on train dataset
rfc_test_accuracy = (y_rfc_trainpred == y_train).mean()
print(rfc_test_accuracy)

In [None]:
# predict on test set
y_rfc_testpred = rfc_pipe.predict(X_test)

In [None]:
y_rfc_testpred 

In [None]:
#convert predicted data from numpy array to dataframe
y_rfc_testpred = pd.DataFrame(y_rfc_testpred, columns = [y.columns.values])

In [None]:
y_rfc_testpred.head() 

In [None]:
y_test.head()

In [None]:
#reset index for y_test
y_test=y_test.reset_index(drop=True)
y_test.head()

In [None]:
y.columns.values

In [None]:
len(y.columns.values)

In [None]:
# classification report on test set
print(classification_report(y_test, y_rfc_testpred, target_names=y.columns.values))

In [None]:
# Model accuracy score on test test
rfc_test_accuracy = (y_rfc_testpred == y_test).mean()
print(rfc_test_accuracy)

### 6. Improve your model
Use grid search to find better parameters. 

In [48]:
parameters = {'clf__estimator__n_estimators':[100,200],
              'clf__estimator__max_depth':[5]}

# create grid search object #model=gridsearch
grid_rfc = GridSearchCV(rfc_pipe, param_grid=parameters , cv=3, verbose=2)

In [49]:
grid_rfc.fit(X_train,y_train)

Fitting 3 folds for each of 2 candidates, totalling 6 fits
[CV] clf__estimator__max_depth=5, clf__estimator__n_estimators=100 ...
[CV]  clf__estimator__max_depth=5, clf__estimator__n_estimators=100, score=0.19118095819346964, total= 1.7min
[CV] clf__estimator__max_depth=5, clf__estimator__n_estimators=100 ...


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:  2.7min remaining:    0.0s


[CV]  clf__estimator__max_depth=5, clf__estimator__n_estimators=100, score=0.19713152273420811, total= 1.7min
[CV] clf__estimator__max_depth=5, clf__estimator__n_estimators=100 ...


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:  5.4min remaining:    0.0s


[CV]  clf__estimator__max_depth=5, clf__estimator__n_estimators=100, score=0.19743667989014344, total= 1.7min
[CV] clf__estimator__max_depth=5, clf__estimator__n_estimators=200 ...
[CV]  clf__estimator__max_depth=5, clf__estimator__n_estimators=200, score=0.19118095819346964, total= 1.9min
[CV] clf__estimator__max_depth=5, clf__estimator__n_estimators=200 ...
[CV]  clf__estimator__max_depth=5, clf__estimator__n_estimators=200, score=0.19728410131217577, total= 2.1min
[CV] clf__estimator__max_depth=5, clf__estimator__n_estimators=200 ...
[CV]  clf__estimator__max_depth=5, clf__estimator__n_estimators=200, score=0.19728410131217577, total= 1.9min


[Parallel(n_jobs=1)]: Done   6 out of   6 | elapsed: 17.3min finished


GridSearchCV(cv=3, error_score='raise',
       estimator=Pipeline(memory=None,
     steps=[('vect', CountVectorizer(analyzer='word', binary=False, decode_error='strict',
        dtype=<class 'numpy.int64'>, encoding='utf-8', input='content',
        lowercase=True, max_df=1.0, max_features=None, min_df=1,
        ngram_range=(1, 1), preprocessor=None, stop_words=None,
        strip...oob_score=False, random_state=None, verbose=0,
            warm_start=False),
           n_jobs=1))]),
       fit_params=None, iid=True, n_jobs=1,
       param_grid={'clf__estimator__n_estimators': [100, 200], 'clf__estimator__max_depth': [5]},
       pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
       scoring=None, verbose=3)

In [50]:
grid_rfc.best_params_


{'clf__estimator__max_depth': 5, 'clf__estimator__n_estimators': 100}

In [51]:
y_grid_rfc_testpred = grid_rfc.predict(X_test)


In [52]:
# classification report on test set
print('Grid tree Test Scores')
print(classification_report(y_test.values, y_grid_rfc_testpred, target_names=y.columns.values))
print('\n')

# accuracy score on test set
print('Grid tree Accuracy')
gridtree_test_accuracy = (y_grid_rfc_testpred == y_test).mean()
print(grid_rfc_test_accuracy)

Grid tree Test Scores


ValueError: Mix type of y not allowed, got types {'multiclass-multioutput', 'multilabel-indicator'}

### 7. Test your model
Show the accuracy, precision, and recall of the tuned model.  

Since this project focuses on code quality, process, and  pipelines, there is no minimum performance metric needed to pass. However, make sure to fine tune your models for accuracy, precision and recall to make your project stand out - especially for your portfolio!

In [None]:

grid_rfc.best_params_

In [None]:
y_gridrfc_trainpred = grid_rfc.predict(X_train)
y_gridrfc_testpred = grid_rfc.predict(X_test)

In [None]:
# classification report on train set
print('Grid rfc Train Scores')
print(classification_report(y_train.values, y_gridrfc_trainpred, target_names=y.columns.values))
print('\n')

# classification report on test set
print('Grid rfc Test Scores')
print(classification_report(y_test.values, y_gridrfc_testpred, target_names=y.columns.values))
print('\n')

# accuracy score on test set
print('GridSearch rfc Accuracy')
gridrfc_test_accuracy = (y_gridrfc_testpred == y_test).mean()
print(gridrfc_test_accuracy)

In [None]:
grid_rfc.fit(X_train,y_train)


In [None]:
# compare with rfc without gridsearch tuning
gridrfc_test_accuracy - rfc_test_accuracy

### 8. Try improving your model further. Here are a few ideas:
* try other machine learning algorithms
* add other features besides the TF-IDF

#### MODEL 2. KNN

In [None]:
  knn_pipeline = Pipeline([
        ('vect', CountVectorizer(tokenizer=tokenize)),
        ('tfidf', TfidfTransformer()),
        ('clf', MultiOutputClassifier(KNeighborsClassifier()))
    ])

In [None]:
knn_pipeline.get_params()

In [None]:
# train the pipeline
knn_pipeline.fit(X_train,y_train)

In [None]:
# predict on train set
y_knn_trainpred = knn_pipeline.predict(X_train)

In [None]:
# classification report on test set
print(classification_report(y_test, y_knn_testpred, target_names=y.columns.values))

In [None]:
# Model accuracy score on train dataset
knn_test_accuracy = (y_knn_trainpred == y_train).mean()
print(knn_test_accuracy)

### 9. Export your model as a pickle file

In [None]:
import pickle
pickle.dump(grid_rfc,open(model_filepath,'wb'))

### 10. Use this notebook to complete `train.py`
Use the template file attached in the Resources folder to write a script that runs the steps above to create a database and export a model based on a new dataset specified by the user.