# ML Pipeline Preparation
Follow the instructions below to help you create your ML pipeline.
### 1. Import libraries and load data from database.
- Import Python libraries
- Load dataset from database with [`read_sql_table`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_sql_table.html)
- Define feature and target variables X and Y

In [1]:
# import libraries
import pandas as pd
import numpy as np
from sqlalchemy import create_engine
import re
import pickle
import nltk

nltk.download('punkt')
nltk.download('stopwords')

from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem.porter import PorterStemmer

from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.multioutput import MultiOutputClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, make_scorer
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC

import warnings

warnings.simplefilter('ignore')

[nltk_data] Error loading punkt: <urlopen error [Errno 11001]
[nltk_data]     getaddrinfo failed>
[nltk_data] Error loading stopwords: <urlopen error [Errno 11001]
[nltk_data]     getaddrinfo failed>


In [2]:
# load data from database
engine = create_engine('sqlite:///Messages.db')
df = pd.read_sql("SELECT * FROM Messages", engine)

X = df['message']
Y = df.drop(['id', 'message', 'original', 'genre'], axis = 1)

### 2. Write a tokenization function to process your text data

In [3]:
def tokenize(text):
    """Normalize, tokenize and stem text string
    
    Args:
    text: string. String containing message for processing
       
    Returns:
    stemmed: list of strings. List containing normalized and stemmed word tokens
    """
    # Convert text to lowercase and remove punctuation
    text = re.sub(r"[^a-zA-Z0-9]", " ", text.lower())
    
    # Tokenize words
    tokens = word_tokenize(text)
    
    # Stem word tokens and remove stop words
    stemmer = PorterStemmer()
    stop_words = stopwords.words("english")
    
    stemmed = [stemmer.stem(word) for word in tokens if word not in stop_words]
    
    return stemmed

### 3. Build a machine learning pipeline
- You'll find the [MultiOutputClassifier](http://scikit-learn.org/stable/modules/generated/sklearn.multioutput.MultiOutputClassifier.html) helpful for predicting multiple target variables.

In [4]:
pipeline = Pipeline([
    ('vect', CountVectorizer(tokenizer = tokenize)),
    ('tfidf', TfidfTransformer()),
    ('clf', MultiOutputClassifier(RandomForestClassifier()))
])

### 4. Train pipeline
- Split data into train and test sets
- Train pipeline

In [5]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, random_state = 1)

np.random.seed(17)
pipeline.fit(X_train, Y_train)

Pipeline(steps=[('vect', CountVectorizer(analyzer='word', binary=False, decode_error='strict',
        dtype=<class 'numpy.int64'>, encoding='utf-8', input='content',
        lowercase=True, max_df=1.0, max_features=None, min_df=1,
        ngram_range=(1, 1), preprocessor=None, stop_words=None,
        strip...oob_score=False, random_state=None,
            verbose=0, warm_start=False),
           n_jobs=1))])

### 5. Test your model
Report the f1 score, precision and recall on both the training set and the test set. You can use sklearn's `classification_report` function here. 

In [6]:
def get_eval_metrics(actual, predicted, col_names):
    """Calculate evaluation metrics for ML model
    
    Args:
    actual: array. Array containing actual labels.
    predicted: array. Array containing predicted labels.
    col_names: list of strings. List containing names for each of the predicted fields.
       
    Returns:
    metrics_df: dataframe. Dataframe containing the accuracy, precision, recall 
    and f1 score for a given set of actual and predicted labels.
    """
    metrics = []
    
    # Calculate evaluation metrics for each set of labels
    for i in range(len(col_names)):
        accuracy = accuracy_score(actual[:, i], predicted[:, i])
        precision = precision_score(actual[:, i], predicted[:, i])
        recall = recall_score(actual[:, i], predicted[:, i])
        f1 = f1_score(actual[:, i], predicted[:, i])
        
        metrics.append([accuracy, precision, recall, f1])
    
    # Create dataframe containing metrics
    metrics = np.array(metrics)
    metrics_df = pd.DataFrame(data = metrics, index = col_names, columns = ['Accuracy', 'Precision', 'Recall', 'F1'])
      
    return metrics_df    

In [8]:
# Calculate evaluation metrics for training set
Y_train_pred = pipeline.predict(X_train)
col_names = list(Y.columns.values)

print(get_eval_metrics(np.array(Y_train), Y_train_pred, col_names))

                        Accuracy  Precision    Recall        F1
related                 0.990267   0.992651  0.994645  0.993647
request                 0.987603   0.997442  0.930233  0.962666
offer                   0.998617   1.000000  0.715789  0.834356
aid_related             0.984325   0.994847  0.967608  0.981039
medical_help            0.988935   0.999260  0.862620  0.925926
medical_products        0.992367   0.998817  0.850806  0.918889
search_and_rescue       0.993955   0.995624  0.796848  0.885214
security                0.995543   1.000000  0.749280  0.856672
military                0.994980   0.996317  0.849294  0.916949
water                   0.995031   0.999139  0.923567  0.959868
food                    0.995390   0.998111  0.960891  0.979147
shelter                 0.993545   0.997526  0.929683  0.962411
clothing                0.998207   1.000000  0.886364  0.939759
money                   0.995492   1.000000  0.809935  0.894988
missing_people          0.996773   1.000

In [20]:
# Calculate evaluation metrics for test set
Y_test_pred = pipeline.predict(X_test)

eval_metrics0 = get_eval_metrics(np.array(Y_test), Y_test_pred, col_names)
print(eval_metrics0)

                        Accuracy  Precision    Recall        F1
related                 0.805133   0.846543  0.909603  0.876941
request                 0.887967   0.795764  0.469643  0.590679
offer                   0.996465   0.000000  0.000000  0.000000
aid_related             0.744275   0.734845  0.592758  0.656198
medical_help            0.920240   0.500000  0.067437  0.118846
medical_products        0.953896   0.666667  0.130841  0.218750
search_and_rescue       0.976026   0.363636  0.026144  0.048780
security                0.980636   0.000000  0.000000  0.000000
military                0.966959   0.785714  0.049327  0.092827
water                   0.950515   0.809211  0.295673  0.433099
food                    0.937145   0.816901  0.560773  0.665029
shelter                 0.931919   0.772000  0.333333  0.465621
clothing                0.986476   0.846154  0.113402  0.200000
money                   0.978331   0.500000  0.042553  0.078431
missing_people          0.987245   0.000

Although test accuracy is high for all categories, for the majority of categories, the F1 score is unacceptably low. This is likely due to the unbalanced nature of the dataset, as is evidenced by the following:

In [10]:
# Calculation the proportion of each column that have label == 1
Y.sum()/len(Y)

related                   0.764792
request                   0.171892
offer                     0.004534
aid_related               0.417243
medical_help              0.080068
medical_products          0.050446
search_and_rescue         0.027816
security                  0.018096
military                  0.033041
water                     0.064239
food                      0.112302
shelter                   0.088904
clothing                  0.015560
money                     0.023206
missing_people            0.011449
refugees                  0.033618
death                     0.045874
other_aid                 0.132396
infrastructure_related    0.065506
transport                 0.046143
buildings                 0.051214
electricity               0.020440
tools                     0.006109
hospitals                 0.010873
shops                     0.004610
aid_centers               0.011872
other_infrastructure      0.044222
weather_related           0.280352
floods              

In many cases, fewer than 5% of the dataset have a label of 1, making it more difficult for any model to predict these cases than if the data were balanced. 

Ideally, we should have used stratified sampling to create the train and test sets (this is what we would have done had there just been one column in the y dataset). However, due to the fact that we have multiple labels for each datapoint, this is not practical. We would effectively have to create a separate train and test set for each set of y-labels, which would then mean that we would have to fit a separate model to each of the y-columns. This is not something that we wish to do.

### 6. Improve your model
Use grid search to find better parameters. 

In [11]:
# Define performance metric for use in grid search scoring object
def performance_metric(y_true, y_pred):
    """Calculate median F1 score for all of the output classifiers
    
    Args:
    y_true: array. Array containing actual labels.
    y_pred: array. Array containing predicted labels.
        
    Returns:
    score: float. Median F1 score for all of the output classifiers
    """
    f1_list = []
    for i in range(np.shape(y_pred)[1]):
        f1 = f1_score(np.array(y_true)[:, i], y_pred[:, i])
        f1_list.append(f1)
        
    score = np.median(f1_list)
    return score

We have chosen to use the median F1 score for all of the output classifiers, rather than the mean, to avoid the situation where we are selecting a set of parameters that result in a small number of the output classifiers having very high test F1 scores, but the majority of the output classifiers having test F1 scores close to zero.

In [12]:
# Create grid search object

parameters = {'vect__min_df': [1, 5],
              'tfidf__use_idf':[True, False],
              'clf__estimator__n_estimators':[10, 25], 
              'clf__estimator__min_samples_split':[2, 5, 10]}

scorer = make_scorer(performance_metric)
cv = GridSearchCV(pipeline, param_grid = parameters, scoring = scorer, verbose = 10)

# Find best parameters
np.random.seed(81)
tuned_model = cv.fit(X_train, Y_train)

Fitting 3 folds for each of 24 candidates, totalling 72 fits
[CV] clf__estimator__min_samples_split=2, clf__estimator__n_estimators=10, tfidf__use_idf=True, vect__min_df=1 
[CV]  clf__estimator__min_samples_split=2, clf__estimator__n_estimators=10, tfidf__use_idf=True, vect__min_df=1, score=0.124324, total=  56.9s
[CV] clf__estimator__min_samples_split=2, clf__estimator__n_estimators=10, tfidf__use_idf=True, vect__min_df=1 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:  1.2min remaining:    0.0s


[CV]  clf__estimator__min_samples_split=2, clf__estimator__n_estimators=10, tfidf__use_idf=True, vect__min_df=1, score=0.106195, total=  60.0s
[CV] clf__estimator__min_samples_split=2, clf__estimator__n_estimators=10, tfidf__use_idf=True, vect__min_df=1 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:  2.5min remaining:    0.0s


[CV]  clf__estimator__min_samples_split=2, clf__estimator__n_estimators=10, tfidf__use_idf=True, vect__min_df=1, score=0.100719, total=  53.2s
[CV] clf__estimator__min_samples_split=2, clf__estimator__n_estimators=10, tfidf__use_idf=True, vect__min_df=5 


[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed:  3.6min remaining:    0.0s


[CV]  clf__estimator__min_samples_split=2, clf__estimator__n_estimators=10, tfidf__use_idf=True, vect__min_df=5, score=0.171875, total=  37.6s
[CV] clf__estimator__min_samples_split=2, clf__estimator__n_estimators=10, tfidf__use_idf=True, vect__min_df=5 


[Parallel(n_jobs=1)]: Done   4 out of   4 | elapsed:  4.5min remaining:    0.0s


[CV]  clf__estimator__min_samples_split=2, clf__estimator__n_estimators=10, tfidf__use_idf=True, vect__min_df=5, score=0.187702, total=  37.6s
[CV] clf__estimator__min_samples_split=2, clf__estimator__n_estimators=10, tfidf__use_idf=True, vect__min_df=5 


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:  5.3min remaining:    0.0s


[CV]  clf__estimator__min_samples_split=2, clf__estimator__n_estimators=10, tfidf__use_idf=True, vect__min_df=5, score=0.166667, total=  37.6s
[CV] clf__estimator__min_samples_split=2, clf__estimator__n_estimators=10, tfidf__use_idf=False, vect__min_df=1 


[Parallel(n_jobs=1)]: Done   6 out of   6 | elapsed:  6.2min remaining:    0.0s


[CV]  clf__estimator__min_samples_split=2, clf__estimator__n_estimators=10, tfidf__use_idf=False, vect__min_df=1, score=0.100000, total=  50.0s
[CV] clf__estimator__min_samples_split=2, clf__estimator__n_estimators=10, tfidf__use_idf=False, vect__min_df=1 


[Parallel(n_jobs=1)]: Done   7 out of   7 | elapsed:  7.3min remaining:    0.0s


[CV]  clf__estimator__min_samples_split=2, clf__estimator__n_estimators=10, tfidf__use_idf=False, vect__min_df=1, score=0.154589, total=  47.5s
[CV] clf__estimator__min_samples_split=2, clf__estimator__n_estimators=10, tfidf__use_idf=False, vect__min_df=1 


[Parallel(n_jobs=1)]: Done   8 out of   8 | elapsed:  8.3min remaining:    0.0s


[CV]  clf__estimator__min_samples_split=2, clf__estimator__n_estimators=10, tfidf__use_idf=False, vect__min_df=1, score=0.106713, total=  47.0s
[CV] clf__estimator__min_samples_split=2, clf__estimator__n_estimators=10, tfidf__use_idf=False, vect__min_df=5 


[Parallel(n_jobs=1)]: Done   9 out of   9 | elapsed:  9.3min remaining:    0.0s


[CV]  clf__estimator__min_samples_split=2, clf__estimator__n_estimators=10, tfidf__use_idf=False, vect__min_df=5, score=0.172249, total=  34.4s
[CV] clf__estimator__min_samples_split=2, clf__estimator__n_estimators=10, tfidf__use_idf=False, vect__min_df=5 
[CV]  clf__estimator__min_samples_split=2, clf__estimator__n_estimators=10, tfidf__use_idf=False, vect__min_df=5, score=0.148649, total=  33.2s
[CV] clf__estimator__min_samples_split=2, clf__estimator__n_estimators=10, tfidf__use_idf=False, vect__min_df=5 
[CV]  clf__estimator__min_samples_split=2, clf__estimator__n_estimators=10, tfidf__use_idf=False, vect__min_df=5, score=0.131687, total=  34.3s
[CV] clf__estimator__min_samples_split=2, clf__estimator__n_estimators=25, tfidf__use_idf=True, vect__min_df=1 
[CV]  clf__estimator__min_samples_split=2, clf__estimator__n_estimators=25, tfidf__use_idf=True, vect__min_df=1, score=0.182104, total= 1.5min
[CV] clf__estimator__min_samples_split=2, clf__estimator__n_estimators=25, tfidf__use_i

[CV]  clf__estimator__min_samples_split=5, clf__estimator__n_estimators=25, tfidf__use_idf=False, vect__min_df=1, score=0.119874, total= 1.1min
[CV] clf__estimator__min_samples_split=5, clf__estimator__n_estimators=25, tfidf__use_idf=False, vect__min_df=1 
[CV]  clf__estimator__min_samples_split=5, clf__estimator__n_estimators=25, tfidf__use_idf=False, vect__min_df=1, score=0.122093, total= 1.2min
[CV] clf__estimator__min_samples_split=5, clf__estimator__n_estimators=25, tfidf__use_idf=False, vect__min_df=1 
[CV]  clf__estimator__min_samples_split=5, clf__estimator__n_estimators=25, tfidf__use_idf=False, vect__min_df=1, score=0.115226, total= 1.2min
[CV] clf__estimator__min_samples_split=5, clf__estimator__n_estimators=25, tfidf__use_idf=False, vect__min_df=5 
[CV]  clf__estimator__min_samples_split=5, clf__estimator__n_estimators=25, tfidf__use_idf=False, vect__min_df=5, score=0.193103, total=  49.8s
[CV] clf__estimator__min_samples_split=5, clf__estimator__n_estimators=25, tfidf__use

[Parallel(n_jobs=1)]: Done  72 out of  72 | elapsed: 79.0min finished


In [13]:
# Get results of grid search
tuned_model.cv_results_

{'mean_fit_time': array([ 48.80229942,  30.63276331,  41.31991569,  27.63146353,
         80.56032419,  57.90921696,  84.40504519,  53.07245533,
         30.96989163,  24.45328593,  31.2448051 ,  23.87683916,
         60.95307461,  44.52400939,  59.9849844 ,  42.74574137,
         26.69366407,  22.55170949,  27.93534462,  21.98424117,
         50.55584462,  40.29529627,  55.34990438,  40.39567153]),
 'mean_score_time': array([ 8.0182933 ,  7.09069133,  6.9417549 ,  6.46107388,  8.89388887,
         8.37596361,  8.85665544,  7.91749907,  6.98731232,  6.45905209,
         6.93411422,  6.47899397,  8.88157256,  8.19009089,  8.84734567,
         7.84068759,  6.85068019,  6.5129358 ,  7.07210008,  6.53254994,
         8.85168219,  8.0248584 ,  9.16380938,  8.25459743]),
 'mean_test_score': array([ 0.11041281,  0.17541464,  0.12043398,  0.15086157,  0.14351677,
         0.19335531,  0.13239604,  0.19945974,  0.18115013,  0.18715604,
         0.14114688,  0.19347891,  0.1103511 ,  0.19999305,

In [15]:
# Best mean test score
np.max(tuned_model.cv_results_['mean_test_score'])

0.20679501377175796

In [18]:
# Parameters for best mean test score
tuned_model.best_params_

{'clf__estimator__min_samples_split': 10,
 'clf__estimator__n_estimators': 10,
 'tfidf__use_idf': True,
 'vect__min_df': 5}

The best results (with regard to median F1 score) were achieved using the following parameters:
* CountVectorizer minimum df = 5
* TfidfTransformer use_idf = True
* Random Forest Classifier number of estimators = 10
* Random Forest Classifier minimum samples split = 10

### 7. Test your model
Show the accuracy, precision, and recall of the tuned model.

In [21]:
# Calculate evaluation metrics for test set
tuned_pred_test = tuned_model.predict(X_test)

eval_metrics1 = get_eval_metrics(np.array(Y_test), tuned_pred_test, col_names)

print(eval_metrics1)

                        Accuracy  Precision    Recall        F1
related                 0.809897   0.840577  0.926716  0.881547
request                 0.890733   0.778990  0.509821  0.616298
offer                   0.996465   0.000000  0.000000  0.000000
aid_related             0.751959   0.702087  0.690556  0.696274
medical_help            0.920547   0.505814  0.167630  0.251809
medical_products        0.957123   0.756098  0.193146  0.307692
search_and_rescue       0.978792   0.777778  0.137255  0.233333
security                0.980636   0.250000  0.008065  0.015625
military                0.967881   0.652174  0.134529  0.223048
water                   0.956201   0.822660  0.401442  0.539580
food                    0.939142   0.788732  0.618785  0.693498
shelter                 0.935915   0.745455  0.424870  0.541254
clothing                0.988167   0.777778  0.288660  0.421053
money                   0.979253   0.650000  0.092199  0.161491
missing_people          0.987245   0.000

In [23]:
# Get summary stats for first model
eval_metrics0.describe()

Unnamed: 0,Accuracy,Precision,Recall,F1
count,35.0,35.0,35.0,35.0
mean,0.9424,0.564538,0.187315,0.241425
std,0.057651,0.333674,0.243477,0.269195
min,0.744275,0.0,0.0,0.0
25%,0.931689,0.381818,0.016632,0.032458
50%,0.955586,0.733333,0.06474,0.116667
75%,0.980559,0.813056,0.324296,0.453287
max,0.996465,1.0,0.909603,0.876941


In [24]:
# Get summary stats for tuned model
eval_metrics1.describe()

Unnamed: 0,Accuracy,Precision,Recall,F1
count,35.0,35.0,35.0,35.0
mean,0.945009,0.578224,0.248886,0.311893
std,0.055767,0.301351,0.261627,0.273151
min,0.751959,0.0,0.0,0.0
25%,0.937529,0.472225,0.033616,0.064037
50%,0.957123,0.71049,0.149378,0.242424
75%,0.981712,0.783861,0.413156,0.540417
max,0.996465,1.0,0.926716,0.881547


Tuning the model parameters has resulted in an increase in the median and mean (test) F1 score for the output classifiers. However, it is still the case that 50% of the ouput classifiers have an F1 score of less than 0.24, and 25% have an F1 score of less than 0.064. This is due to low recall values (i.e. the proportion of positive points that were correctly labelled). Ideally, we would like to try to improve on this.

### 8. Try improving your model further. Here are a few ideas:
* try other machine learning algorithms
* add other features besides the TF-IDF

To try to improve the model further, we will change the Random Forest Classifier in the pipeline to a polynomial SVM classifier. SVMs are often used for text categorization tasks due to their “ability to process many thousand different inputs. This opens the opportunity to use all words in a text directly as features” [(Diederich, et al. (2003))](https://dl.acm.org/citation.cfm?id=776982). It is for this reason that this decision was made.

To keep the number of grid search cases to a minimum, we will keep the tuned parameter values for the CountVectorizer and TfidfTransformer found in the previous secion.

In [26]:
# Try using SVM instead of Random Forest Classifier
pipeline2 = Pipeline([
    ('vect', CountVectorizer(tokenizer = tokenize)),
    ('tfidf', TfidfTransformer()),
    ('clf', MultiOutputClassifier(SVC()))
])

parameters2 = {'vect__min_df': [5],
              'tfidf__use_idf':[True],
              'clf__estimator__kernel': ['poly'], 
              'clf__estimator__degree': [1, 2, 3],
              'clf__estimator__C':[1, 10, 100]}

cv2 = GridSearchCV(pipeline2, param_grid = parameters2, scoring = scorer, verbose = 10)

# Find best parameters
np.random.seed(81)
tuned_model2 = cv2.fit(X_train, Y_train)

Fitting 3 folds for each of 9 candidates, totalling 27 fits
[CV] clf__estimator__C=1, clf__estimator__degree=1, clf__estimator__kernel=poly, tfidf__use_idf=True, vect__min_df=5 
[CV]  clf__estimator__C=1, clf__estimator__degree=1, clf__estimator__kernel=poly, tfidf__use_idf=True, vect__min_df=5, score=0.000000, total= 2.9min
[CV] clf__estimator__C=1, clf__estimator__degree=1, clf__estimator__kernel=poly, tfidf__use_idf=True, vect__min_df=5 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:  4.8min remaining:    0.0s


[CV]  clf__estimator__C=1, clf__estimator__degree=1, clf__estimator__kernel=poly, tfidf__use_idf=True, vect__min_df=5, score=0.000000, total= 2.9min
[CV] clf__estimator__C=1, clf__estimator__degree=1, clf__estimator__kernel=poly, tfidf__use_idf=True, vect__min_df=5 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:  9.6min remaining:    0.0s


[CV]  clf__estimator__C=1, clf__estimator__degree=1, clf__estimator__kernel=poly, tfidf__use_idf=True, vect__min_df=5, score=0.000000, total= 2.9min
[CV] clf__estimator__C=1, clf__estimator__degree=2, clf__estimator__kernel=poly, tfidf__use_idf=True, vect__min_df=5 


[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed: 14.4min remaining:    0.0s


[CV]  clf__estimator__C=1, clf__estimator__degree=2, clf__estimator__kernel=poly, tfidf__use_idf=True, vect__min_df=5, score=0.000000, total= 2.9min
[CV] clf__estimator__C=1, clf__estimator__degree=2, clf__estimator__kernel=poly, tfidf__use_idf=True, vect__min_df=5 


[Parallel(n_jobs=1)]: Done   4 out of   4 | elapsed: 19.2min remaining:    0.0s


[CV]  clf__estimator__C=1, clf__estimator__degree=2, clf__estimator__kernel=poly, tfidf__use_idf=True, vect__min_df=5, score=0.000000, total= 2.9min
[CV] clf__estimator__C=1, clf__estimator__degree=2, clf__estimator__kernel=poly, tfidf__use_idf=True, vect__min_df=5 


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed: 23.9min remaining:    0.0s


[CV]  clf__estimator__C=1, clf__estimator__degree=2, clf__estimator__kernel=poly, tfidf__use_idf=True, vect__min_df=5, score=0.000000, total= 2.8min
[CV] clf__estimator__C=1, clf__estimator__degree=3, clf__estimator__kernel=poly, tfidf__use_idf=True, vect__min_df=5 


[Parallel(n_jobs=1)]: Done   6 out of   6 | elapsed: 28.6min remaining:    0.0s


[CV]  clf__estimator__C=1, clf__estimator__degree=3, clf__estimator__kernel=poly, tfidf__use_idf=True, vect__min_df=5, score=0.000000, total= 2.8min
[CV] clf__estimator__C=1, clf__estimator__degree=3, clf__estimator__kernel=poly, tfidf__use_idf=True, vect__min_df=5 


[Parallel(n_jobs=1)]: Done   7 out of   7 | elapsed: 33.3min remaining:    0.0s


[CV]  clf__estimator__C=1, clf__estimator__degree=3, clf__estimator__kernel=poly, tfidf__use_idf=True, vect__min_df=5, score=0.000000, total= 2.8min
[CV] clf__estimator__C=1, clf__estimator__degree=3, clf__estimator__kernel=poly, tfidf__use_idf=True, vect__min_df=5 


[Parallel(n_jobs=1)]: Done   8 out of   8 | elapsed: 37.9min remaining:    0.0s


[CV]  clf__estimator__C=1, clf__estimator__degree=3, clf__estimator__kernel=poly, tfidf__use_idf=True, vect__min_df=5, score=0.000000, total= 2.8min
[CV] clf__estimator__C=10, clf__estimator__degree=1, clf__estimator__kernel=poly, tfidf__use_idf=True, vect__min_df=5 


[Parallel(n_jobs=1)]: Done   9 out of   9 | elapsed: 42.6min remaining:    0.0s


[CV]  clf__estimator__C=10, clf__estimator__degree=1, clf__estimator__kernel=poly, tfidf__use_idf=True, vect__min_df=5, score=0.000000, total= 3.3min
[CV] clf__estimator__C=10, clf__estimator__degree=1, clf__estimator__kernel=poly, tfidf__use_idf=True, vect__min_df=5 
[CV]  clf__estimator__C=10, clf__estimator__degree=1, clf__estimator__kernel=poly, tfidf__use_idf=True, vect__min_df=5, score=0.000000, total= 3.2min
[CV] clf__estimator__C=10, clf__estimator__degree=1, clf__estimator__kernel=poly, tfidf__use_idf=True, vect__min_df=5 
[CV]  clf__estimator__C=10, clf__estimator__degree=1, clf__estimator__kernel=poly, tfidf__use_idf=True, vect__min_df=5, score=0.000000, total= 3.2min
[CV] clf__estimator__C=10, clf__estimator__degree=2, clf__estimator__kernel=poly, tfidf__use_idf=True, vect__min_df=5 
[CV]  clf__estimator__C=10, clf__estimator__degree=2, clf__estimator__kernel=poly, tfidf__use_idf=True, vect__min_df=5, score=0.000000, total= 2.9min
[CV] clf__estimator__C=10, clf__estimator__

[Parallel(n_jobs=1)]: Done  27 out of  27 | elapsed: 141.4min finished


In [27]:
# Get results of grid search
tuned_model2.cv_results_

{'mean_fit_time': array([ 118.0909942 ,  114.99976007,  112.9869291 ,  133.46948107,
         120.80593928,  112.75887338,  157.64516552,  157.77353684,
         143.85721374]),
 'mean_score_time': array([ 56.57208792,  56.29817152,  54.97068222,  61.2765375 ,
         58.40214101,  55.44194261,  71.04607312,  69.71109605,  64.53057861]),
 'mean_test_score': array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]),
 'mean_train_score': array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]),
 'param_clf__estimator__C': masked_array(data = [1 1 1 10 10 10 100 100 100],
              mask = [False False False False False False False False False],
        fill_value = ?),
 'param_clf__estimator__degree': masked_array(data = [1 2 3 1 2 3 1 2 3],
              mask = [False False False False False False False False False],
        fill_value = ?),
 'param_clf__estimator__kernel': masked_array(data = ['poly' 'poly' 'poly' 'poly' 'poly' 'poly' 'poly' 'poly' 'poly'],
              mask = [False False

In all cases, the median F1 score is 0. Therefore, we can't properly select between cases.

In [28]:
# Calculate evaluation metrics for test set
tuned_pred_test2 = tuned_model2.predict(X_test)

eval_metrics2 = get_eval_metrics(np.array(Y_test), tuned_pred_test2, col_names)

print(eval_metrics2)

                        Accuracy  Precision  Recall        F1
related                 0.763332   0.763332     1.0  0.865784
request                 0.827878   0.000000     0.0  0.000000
offer                   0.996465   0.000000     0.0  0.000000
aid_related             0.588290   0.000000     0.0  0.000000
medical_help            0.920240   0.000000     0.0  0.000000
medical_products        0.950669   0.000000     0.0  0.000000
search_and_rescue       0.976487   0.000000     0.0  0.000000
security                0.980944   0.000000     0.0  0.000000
military                0.965729   0.000000     0.0  0.000000
water                   0.936069   0.000000     0.0  0.000000
food                    0.888735   0.000000     0.0  0.000000
shelter                 0.911019   0.000000     0.0  0.000000
clothing                0.985093   0.000000     0.0  0.000000
money                   0.978331   0.000000     0.0  0.000000
missing_people          0.987398   0.000000     0.0  0.000000
refugees

The model performs well with regard to F1 score in one case ("related") but terribly in all other cases. We could try some more parameter values for the SVM in order to try to find a combination that will work, but instead, we shall just stick with the original tuned Random Forest Classifier based model.

### 9. Export your model as a pickle file

In [29]:
# Pickle best model
pickle.dump(tuned_model, open('disaster_model.sav', 'wb'))

### 10. Use this notebook to complete `train.py`
Use the template file attached in the Resources folder to write a script that runs the steps above to create a database and export a model based on a new dataset specified by the user.