# Using Reddit's API for Predicting Comments

In this project, we will practice two major skills. Collecting data via an API request and then building a binary predictor.

As we discussed in week 2, and earlier today, there are two components to starting a data science problem: the problem statement, and acquiring the data.

For this article, your problem statement will be: _What characteristics of a post on Reddit contribute most to what subreddit it belongs to?_

Your method for acquiring the data will be scraping threads from at least two subreddits. 

Once you've got the data, you will build a classification model that, using Natural Language Processing and any other relevant features, predicts which subreddit a given post belongs to.

### Scraping Thread Info from Reddit.com

In [1]:
import json
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
# borrowed from http://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html
from sklearn.pipeline import Pipeline
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.gaussian_process import GaussianProcessClassifier
from sklearn.gaussian_process.kernels import RBF
from sklearn.tree import DecisionTreeClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier, GradientBoostingClassifier
from sklearn.naive_bayes import GaussianNB, BernoulliNB, MultinomialNB
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis

  from numpy.core.umath_tests import inner1d


In [2]:
# I ended up using Node and the pushshift.io API

# the code for getting the files is located in the pushshift.js file

# I also made a function for merging all of the files in the merge.js file

In [3]:
sm = pd.read_json("./json2/SequelMemes.json")
pm = pd.read_json("./json2/PrequelMemes.json")

In [4]:
sm_titles = sm[["title"]]
pm_titles = pm[["title"]]

sm_titles["is_sequel_meme"] = 1
pm_titles["is_sequel_meme"] = 0

meme_titles = pd.concat([pm_titles,sm_titles])
print(meme_titles.head())
print(meme_titles.tail())

                                title  is_sequel_meme
0                      Drunk Politics               0
1                 When the Fun Begins               0
2                      Just one Windu               0
3  dlmoisttlotjidnftdsaydihbpjfastmne               0
4                     Drunk Democracy               0
                                                   title  is_sequel_meme
14595                        His swoleness got him #6!!!               1
14596  Looks like someone at my local brewery is a Se...               1
14597  MAGA.... Nah! Time to make the Republic great ...               1
14598           Take On Me except it's Leia slapping Poe               1
14599           Take On Me except it's Leia slapping Poe               1


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  after removing the cwd from sys.path.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """


In [5]:
corpus = [title[0] for title in meme_titles[["title"]].values]

## NLP

#### Use `CountVectorizer` or `TfidfVectorizer` from scikit-learn to create features from the thread titles and descriptions (NOTE: Not all threads have a description)
- Examine using count or binary features in the model
- Re-evaluate your models using these. Does this improve the model performance? 
- What text features are the most valuable? 

In [6]:
## YOUR CODE HERE


# Fit the vectorizer on our corpus
cvec = CountVectorizer()
cvec.fit(corpus)

# Transform the corpus
new_corpus = cvec.transform(corpus)

In [7]:
df  = pd.DataFrame(new_corpus.todense(),
                   columns=cvec.get_feature_names())
df.head()

Unnamed: 0,00,000,00001,00100000,007,009,00am,01,01100001,01100100,...,œìž,œðÿ,širl,šã,šðÿ,žirl,žã,žæ,žè,ˆì
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [8]:
# Visualize Top Features
# https://medium.com/@aneesha/visualising-top-features-in-linear-svm-with-scikit-learn-and-matplotlib-3454ab18a14d

# Visualize frequency
# http://sdsawtelle.github.io/blog/output/spam-classification-part2-vectorization-and-svm-pipeline.html



In [22]:
vectorizers_start = {
    "cvec" : CountVectorizer,
    "tfidf" : TfidfVectorizer,
}

vectorizers = {}

for key,vec in vectorizers_start.items():
    vectorizers[key] = vec(stop_words='english')
    for max_df in (0.25, 0.5, 0.75):
        vectorizers[f"{key}__max_df_of_{max_df}"] = vec(stop_words='english', max_df=max_df)
        for n_gram_range in [(1, 1), (1, 2), (1, 3), (1, 4)]:
            vectorizers[f"{key}__n_gram_range_of_{n_gram_range}"] = vec(stop_words='english', ngram_range=n_gram_range)
            vectorizers[f"{key}__n_gram_range_of_{n_gram_range}_max_df_of{max_df}"] = vec(stop_words='english', ngram_range=n_gram_range, max_df=max_df)

classifiers = {
    "bnb" : BernoulliNB(), #=> Fast and good enough score
    "mnb" : MultinomialNB(), #=> Fast and good enough score
    "logr": LogisticRegression(), #=> Best score, but super slow
#     "knn" : KNeighborsClassifier(), #=> Super overfit, not great scores
#     "tree" : DecisionTreeClassifier(),
    "rfc" : RandomForestClassifier(), #=> Very overfit, not much better than Naive Bayes
#     "gbc" : GradientBoostingClassifier(), #=> Not great, worse than Naive Bayes
#     "ada" : AdaBoostClassifier(), #=> Not great, worse than Naive Bayes
#     "svm" : SVC(), #=> painstakingly slow, couldn't run on my computer :/
}

# Borrowed from https://www.kaggle.com/mayu0116/hyper-parameters-tuning-of-dtree-rf-svm-knn
# and https://optunity.readthedocs.io/en/stable/notebooks/notebooks/sklearn-automated-classification.html
hyper_parameters = {
    "bnb" : {
        "alpha": np.linspace(0.0,1.0,10)
    },
    "mnb" : {
        "alpha": np.linspace(0.0,1.0,10)
    },
    "logr": {
        "penalty" : ['l1', 'l2'],
        "C": np.logspace(0, 10, 20)
    },
    "knn" : {
        'n_neighbors':[1,2,3,4,5],
        'weights':['uniform', 'distance'],
    },
    "svm" : {
        'kernel': ['linear', 'poly', 'rbf'],
        'C': [1, 2, 10, 50],
        'gamma': [0, 1],
        'degree': [2, 5],
        'coef0': [0, 1]
    },
    "tree": {
        
    },
    "rfc":{
        
    },
    "gbc":{
        "n_estimators" : [50,100,150]
    },
    "ada":{
        "base_estimator": [BernoulliNB(), MultinomialNB(), LogisticRegression()],
        "n_estimators" : [50,100,150]
    }
}

for key,val in vectorizers.items():
    print(f"Fitting {key}")
    val.fit(corpus)
    # Transform the corpus
    X  = val.transform(corpus)
    y = meme_titles[["is_sequel_meme"]]
    X_train, X_test, y_train, y_test = train_test_split(X, y)
    for class_key, classifier in classifiers.items():
        print(f"Scores for {key} using {class_key} Classifier")
        gs = GridSearchCV(classifier, param_grid=hyper_parameters[class_key], n_jobs=-1);
        gs.fit(X_train,y_train);
        print(f"Train data: {gs.score(X_train, y_train)}")
        print(f"{gs.best_estimator_}")
        print(f"{gs.best_params_}")
        print(f"Test data: {gs.score(X_test, y_test)}")
        print(f"{gs.best_estimator_}")
        print(f"{gs.best_params_}")
        [print() for i in range(0,3)]

Fitting cvec
Scores for cvec using bnb Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.8430602767271509
BernoulliNB(alpha=1.0, binarize=0.0, class_prior=None, fit_prior=True)
{'alpha': 1.0}
Test data: 0.7528364210828666
BernoulliNB(alpha=1.0, binarize=0.0, class_prior=None, fit_prior=True)
{'alpha': 1.0}



Scores for cvec using mnb Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.8424857566907646
MultinomialNB(alpha=0.8888888888888888, class_prior=None, fit_prior=True)
{'alpha': 0.8888888888888888}
Test data: 0.7539853511417492
MultinomialNB(alpha=0.8888888888888888, class_prior=None, fit_prior=True)
{'alpha': 0.8888888888888888}



Scores for cvec using logr Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.8680518983099536
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)
{'C': 1.0, 'penalty': 'l2'}
Test data: 0.7529800373402269
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)
{'C': 1.0, 'penalty': 'l2'}



Scores for cvec using rfc Classifier


  self.best_estimator_.fit(X, y, **fit_params)


Train data: 0.9568152439316321
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}
Test data: 0.7340226913686629
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}



Fitting cvec__max_df_of_0.25
Scores for cvec__max_df_of_0.25 using bnb Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.8504811605304735
BernoulliNB(alpha=0.4444444444444444, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.4444444444444444}
Test data: 0.7598736176935229
BernoulliNB(alpha=0.4444444444444444, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.4444444444444444}



Scores for cvec__max_df_of_0.25 using mnb Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.8490927371092066
MultinomialNB(alpha=0.4444444444444444, class_prior=None, fit_prior=True)
{'alpha': 0.4444444444444444}
Test data: 0.756426827516875
MultinomialNB(alpha=0.4444444444444444, class_prior=None, fit_prior=True)
{'alpha': 0.4444444444444444}



Scores for cvec__max_df_of_0.25 using logr Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.8667113515583856
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)
{'C': 1.0, 'penalty': 'l2'}
Test data: 0.7588683038920006
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)
{'C': 1.0, 'penalty': 'l2'}



Scores for cvec__max_df_of_0.25 using rfc Classifier


  self.best_estimator_.fit(X, y, **fit_params)


Train data: 0.9548044238042802
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}
Test data: 0.7465173057590119
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}



Fitting cvec__n_gram_range_of_(1, 1)
Scores for cvec__n_gram_range_of_(1, 1) using bnb C

  y = column_or_1d(y, warn=True)


Train data: 0.850337530521377
BernoulliNB(alpha=0.6666666666666666, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.6666666666666666}
Test data: 0.7495332471635789
BernoulliNB(alpha=0.6666666666666666, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.6666666666666666}



Scores for cvec__n_gram_range_of_(1, 1) using mnb Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.8417676066452817
MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True)
{'alpha': 1.0}
Test data: 0.7463736895016516
MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True)
{'alpha': 1.0}



Scores for cvec__n_gram_range_of_(1, 1) using logr Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.8698233350888113
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)
{'C': 1.0, 'penalty': 'l2'}
Test data: 0.7509694097371823
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)
{'C': 1.0, 'penalty': 'l2'}



Scores for cvec__n_gram_range_of_(1, 1) using rfc Classifier


  self.best_estimator_.fit(X, y, **fit_params)


Train data: 0.9552831905012688
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}
Test data: 0.7304322849346546
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}



Fitting cvec__n_gram_range_of_(1, 1)_max_df_of0.25
Scores for cvec__n_gram_range_of_(1, 

  y = column_or_1d(y, warn=True)


Train data: 0.8500502705031838
BernoulliNB(alpha=0.6666666666666666, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.6666666666666666}
Test data: 0.7552778974579922
BernoulliNB(alpha=0.6666666666666666, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.6666666666666666}



Scores for cvec__n_gram_range_of_(1, 1)_max_df_of0.25 using mnb Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.8463637669363719
MultinomialNB(alpha=0.7777777777777777, class_prior=None, fit_prior=True)
{'alpha': 0.7777777777777777}
Test data: 0.755995978744794
MultinomialNB(alpha=0.7777777777777777, class_prior=None, fit_prior=True)
{'alpha': 0.7777777777777777}



Scores for cvec__n_gram_range_of_(1, 1)_max_df_of0.25 using logr Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.8682912816584478
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)
{'C': 1.0, 'penalty': 'l2'}
Test data: 0.7555651299727129
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)
{'C': 1.0, 'penalty': 'l2'}



Scores for cvec__n_gram_range_of_(1, 1)_max_df_of0.25 using rfc Classifier


  self.best_estimator_.fit(X, y, **fit_params)


Train data: 0.9570067506104275
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}
Test data: 0.7291397386184116
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}



Fitting cvec__n_gram_range_of_(1, 2)
Scores for cvec__n_gram_range_of_(1, 2) using bnb C

  y = column_or_1d(y, warn=True)


Train data: 0.9219131517211663
BernoulliNB(alpha=0.7777777777777777, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.7777777777777777}
Test data: 0.7610225477524055
BernoulliNB(alpha=0.7777777777777777, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.7777777777777777}



Scores for cvec__n_gram_range_of_(1, 2) using mnb Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.9152582946330253
MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True)
{'alpha': 1.0}
Test data: 0.7590119201493609
MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True)
{'alpha': 1.0}



Scores for cvec__n_gram_range_of_(1, 2) using logr Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.9398669028582372
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)
{'C': 1.0, 'penalty': 'l2'}
Test data: 0.764182105414333
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)
{'C': 1.0, 'penalty': 'l2'}



Scores for cvec__n_gram_range_of_(1, 2) using rfc Classifier


  self.best_estimator_.fit(X, y, **fit_params)


Train data: 0.9557619571982573
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}
Test data: 0.7347407726554646
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}



Fitting cvec__n_gram_range_of_(1, 2)_max_df_of0.25
Scores for cvec__n_gram_range_of_(1, 

  y = column_or_1d(y, warn=True)


Train data: 0.9215301383635754
BernoulliNB(alpha=0.7777777777777777, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.7777777777777777}
Test data: 0.7679161281057015
BernoulliNB(alpha=0.7777777777777777, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.7777777777777777}



Scores for cvec__n_gram_range_of_(1, 2)_max_df_of0.25 using mnb Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.9247378752333988
MultinomialNB(alpha=0.5555555555555556, class_prior=None, fit_prior=True)
{'alpha': 0.5555555555555556}
Test data: 0.7663363492747379
MultinomialNB(alpha=0.5555555555555556, class_prior=None, fit_prior=True)
{'alpha': 0.5555555555555556}



Scores for cvec__n_gram_range_of_(1, 2)_max_df_of0.25 using logr Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.9397232728491406
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)
{'C': 1.0, 'penalty': 'l2'}
Test data: 0.7630331753554502
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)
{'C': 1.0, 'penalty': 'l2'}



Scores for cvec__n_gram_range_of_(1, 2)_max_df_of0.25 using rfc Classifier


  self.best_estimator_.fit(X, y, **fit_params)


Train data: 0.9548523004739791
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}
Test data: 0.7343099238833836
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}



Fitting cvec__n_gram_range_of_(1, 3)
Scores for cvec__n_gram_range_of_(1, 3) using bnb C

  y = column_or_1d(y, warn=True)


Train data: 0.9368027959975104
BernoulliNB(alpha=0.5555555555555556, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.5555555555555556}
Test data: 0.7627459428407296
BernoulliNB(alpha=0.5555555555555556, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.5555555555555556}



Scores for cvec__n_gram_range_of_(1, 3) using mnb Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.9351271125580505
MultinomialNB(alpha=0.5555555555555556, class_prior=None, fit_prior=True)
{'alpha': 0.5555555555555556}
Test data: 0.7617406290392073
MultinomialNB(alpha=0.5555555555555556, class_prior=None, fit_prior=True)
{'alpha': 0.5555555555555556}



Scores for cvec__n_gram_range_of_(1, 3) using logr Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.9477186766888496
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)
{'C': 1.0, 'penalty': 'l2'}
Test data: 0.7595863851788023
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)
{'C': 1.0, 'penalty': 'l2'}



Scores for cvec__n_gram_range_of_(1, 3) using rfc Classifier


  self.best_estimator_.fit(X, y, **fit_params)


Train data: 0.9571025039498252
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}
Test data: 0.73273014505242
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}



Fitting cvec__n_gram_range_of_(1, 3)_max_df_of0.25
Scores for cvec__n_gram_range_of_(1, 3)

  y = column_or_1d(y, warn=True)


Train data: 0.9320151290276248
BernoulliNB(alpha=0.5555555555555556, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.5555555555555556}
Test data: 0.7795490449518886
BernoulliNB(alpha=0.5555555555555556, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.5555555555555556}



Scores for cvec__n_gram_range_of_(1, 3)_max_df_of0.25 using mnb Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.9300043089002729
MultinomialNB(alpha=0.6666666666666666, class_prior=None, fit_prior=True)
{'alpha': 0.6666666666666666}
Test data: 0.7772511848341233
MultinomialNB(alpha=0.6666666666666666, class_prior=None, fit_prior=True)
{'alpha': 0.6666666666666666}



Scores for cvec__n_gram_range_of_(1, 3)_max_df_of0.25 using logr Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.9469047733039689
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)
{'C': 1.0, 'penalty': 'l2'}
Test data: 0.7805543587534108
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)
{'C': 1.0, 'penalty': 'l2'}



Scores for cvec__n_gram_range_of_(1, 3)_max_df_of0.25 using rfc Classifier


  self.best_estimator_.fit(X, y, **fit_params)


Train data: 0.9561449705558481
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}
Test data: 0.740916271721959
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}



Fitting cvec__n_gram_range_of_(1, 4)
Scores for cvec__n_gram_range_of_(1, 4) using bnb Cl

  y = column_or_1d(y, warn=True)


Train data: 0.933259922439795
BernoulliNB(alpha=0.5555555555555556, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.5555555555555556}
Test data: 0.7605916989803245
BernoulliNB(alpha=0.5555555555555556, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.5555555555555556}



Scores for cvec__n_gram_range_of_(1, 4) using mnb Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.931632115670034
MultinomialNB(alpha=0.6666666666666666, class_prior=None, fit_prior=True)
{'alpha': 0.6666666666666666}
Test data: 0.7620278615539279
MultinomialNB(alpha=0.6666666666666666, class_prior=None, fit_prior=True)
{'alpha': 0.6666666666666666}



Scores for cvec__n_gram_range_of_(1, 4) using logr Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.9492507301192129
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)
{'C': 1.0, 'penalty': 'l2'}
Test data: 0.7677725118483413
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)
{'C': 1.0, 'penalty': 'l2'}



Scores for cvec__n_gram_range_of_(1, 4) using rfc Classifier


  self.best_estimator_.fit(X, y, **fit_params)


Train data: 0.9540383970890984
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}
Test data: 0.733160993824501
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}



Fitting cvec__n_gram_range_of_(1, 4)_max_df_of0.25
Scores for cvec__n_gram_range_of_(1, 4

  y = column_or_1d(y, warn=True)


Train data: 0.9328290324125054
BernoulliNB(alpha=0.6666666666666666, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.6666666666666666}
Test data: 0.7626023265833692
BernoulliNB(alpha=0.6666666666666666, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.6666666666666666}



Scores for cvec__n_gram_range_of_(1, 4)_max_df_of0.25 using mnb Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.9350792358883516
MultinomialNB(alpha=0.5555555555555556, class_prior=None, fit_prior=True)
{'alpha': 0.5555555555555556}
Test data: 0.7607353152376849
MultinomialNB(alpha=0.5555555555555556, class_prior=None, fit_prior=True)
{'alpha': 0.5555555555555556}



Scores for cvec__n_gram_range_of_(1, 4)_max_df_of0.25 using logr Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.9484368267343324
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)
{'C': 1.0, 'penalty': 'l2'}
Test data: 0.7644693379290536
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)
{'C': 1.0, 'penalty': 'l2'}



Scores for cvec__n_gram_range_of_(1, 4)_max_df_of0.25 using rfc Classifier


  self.best_estimator_.fit(X, y, **fit_params)


Train data: 0.9549480538133768
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}
Test data: 0.7410598879793192
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}



Fitting cvec__max_df_of_0.5
Scores for cvec__max_df_of_0.5 using bnb Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.843347536745344
BernoulliNB(alpha=1.0, binarize=0.0, class_prior=None, fit_prior=True)
{'alpha': 1.0}
Test data: 0.749102398391498
BernoulliNB(alpha=1.0, binarize=0.0, class_prior=None, fit_prior=True)
{'alpha': 1.0}



Scores for cvec__max_df_of_0.5 using mnb Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.8473691770000479
MultinomialNB(alpha=0.5555555555555556, class_prior=None, fit_prior=True)
{'alpha': 0.5555555555555556}
Test data: 0.7455119919574896
MultinomialNB(alpha=0.5555555555555556, class_prior=None, fit_prior=True)
{'alpha': 0.5555555555555556}



Scores for cvec__max_df_of_0.5 using logr Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.8673816249341696
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)
{'C': 1.0, 'penalty': 'l2'}
Test data: 0.7521183397960649
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)
{'C': 1.0, 'penalty': 'l2'}



Scores for cvec__max_df_of_0.5 using rfc Classifier


  self.best_estimator_.fit(X, y, **fit_params)


Train data: 0.9553310671709676
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}
Test data: 0.740916271721959
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}



Fitting cvec__n_gram_range_of_(1, 1)_max_df_of0.5
Scores for cvec__n_gram_range_of_(1, 1)

  y = column_or_1d(y, warn=True)


Train data: 0.8505290372001724
BernoulliNB(alpha=0.5555555555555556, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.5555555555555556}
Test data: 0.7534108861123079
BernoulliNB(alpha=0.5555555555555556, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.5555555555555556}



Scores for cvec__n_gram_range_of_(1, 1)_max_df_of0.5 using mnb Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.8506247905395701
MultinomialNB(alpha=0.5555555555555556, class_prior=None, fit_prior=True)
{'alpha': 0.5555555555555556}
Test data: 0.7502513284503806
MultinomialNB(alpha=0.5555555555555556, class_prior=None, fit_prior=True)
{'alpha': 0.5555555555555556}



Scores for cvec__n_gram_range_of_(1, 1)_max_df_of0.5 using logr Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.893809546607938
LogisticRegression(C=3.3598182862837818, class_weight=None, dual=False,
          fit_intercept=True, intercept_scaling=1, max_iter=100,
          multi_class='ovr', n_jobs=1, penalty='l2', random_state=None,
          solver='liblinear', tol=0.0001, verbose=0, warm_start=False)
{'C': 3.3598182862837818, 'penalty': 'l2'}
Test data: 0.7528364210828666
LogisticRegression(C=3.3598182862837818, class_weight=None, dual=False,
          fit_intercept=True, intercept_scaling=1, max_iter=100,
          multi_class='ovr', n_jobs=1, penalty='l2', random_state=None,
          solver='liblinear', tol=0.0001, verbose=0, warm_start=False)
{'C': 3.3598182862837818, 'penalty': 'l2'}



Scores for cvec__n_gram_range_of_(1, 1)_max_df_of0.5 using rfc Classifier


  self.best_estimator_.fit(X, y, **fit_params)


Train data: 0.9552353138315699
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}
Test data: 0.7390492603762746
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}



Fitting cvec__n_gram_range_of_(1, 2)_max_df_of0.5
Scores for cvec__n_gram_range_of_(1, 2

  y = column_or_1d(y, warn=True)


Train data: 0.9253602719394839
BernoulliNB(alpha=0.5555555555555556, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.5555555555555556}
Test data: 0.7738043946574752
BernoulliNB(alpha=0.5555555555555556, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.5555555555555556}



Scores for cvec__n_gram_range_of_(1, 2)_max_df_of0.5 using mnb Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.9219610283908651
MultinomialNB(alpha=0.6666666666666666, class_prior=None, fit_prior=True)
{'alpha': 0.6666666666666666}
Test data: 0.7689214419072239
MultinomialNB(alpha=0.6666666666666666, class_prior=None, fit_prior=True)
{'alpha': 0.6666666666666666}



Scores for cvec__n_gram_range_of_(1, 2)_max_df_of0.5 using logr Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.9399626561976349
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)
{'C': 1.0, 'penalty': 'l2'}
Test data: 0.7762458710326009
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)
{'C': 1.0, 'penalty': 'l2'}



Scores for cvec__n_gram_range_of_(1, 2)_max_df_of0.5 using rfc Classifier


  self.best_estimator_.fit(X, y, **fit_params)


Train data: 0.9557619571982573
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}
Test data: 0.7453683757001293
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}



Fitting cvec__n_gram_range_of_(1, 3)_max_df_of0.5
Scores for cvec__n_gram_range_of_(1, 3

  y = column_or_1d(y, warn=True)


Train data: 0.9367549193278115
BernoulliNB(alpha=0.5555555555555556, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.5555555555555556}
Test data: 0.764612954186414
BernoulliNB(alpha=0.5555555555555556, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.5555555555555556}



Scores for cvec__n_gram_range_of_(1, 3)_max_df_of0.5 using mnb Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.9264135586728587
MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True)
{'alpha': 1.0}
Test data: 0.7649001867011346
MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True)
{'alpha': 1.0}



Scores for cvec__n_gram_range_of_(1, 3)_max_df_of0.5 using logr Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.9490113467707186
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)
{'C': 1.0, 'penalty': 'l2'}
Test data: 0.7628895590980899
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)
{'C': 1.0, 'penalty': 'l2'}



Scores for cvec__n_gram_range_of_(1, 3)_max_df_of0.5 using rfc Classifier


  self.best_estimator_.fit(X, y, **fit_params)


Train data: 0.9574376406377172
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}
Test data: 0.7341663076260233
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}



Fitting cvec__n_gram_range_of_(1, 4)_max_df_of0.5
Scores for cvec__n_gram_range_of_(1, 4

  y = column_or_1d(y, warn=True)


Train data: 0.9375688227126922
BernoulliNB(alpha=0.5555555555555556, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.5555555555555556}
Test data: 0.7663363492747379
BernoulliNB(alpha=0.5555555555555556, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.5555555555555556}



Scores for cvec__n_gram_range_of_(1, 4)_max_df_of0.5 using mnb Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.9313927323215397
MultinomialNB(alpha=0.8888888888888888, class_prior=None, fit_prior=True)
{'alpha': 0.8888888888888888}
Test data: 0.7653310354732156
MultinomialNB(alpha=0.8888888888888888, class_prior=None, fit_prior=True)
{'alpha': 0.8888888888888888}



Scores for cvec__n_gram_range_of_(1, 4)_max_df_of0.5 using logr Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.9497773734859003
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)
{'C': 1.0, 'penalty': 'l2'}
Test data: 0.764612954186414
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)
{'C': 1.0, 'penalty': 'l2'}



Scores for cvec__n_gram_range_of_(1, 4)_max_df_of0.5 using rfc Classifier


  self.best_estimator_.fit(X, y, **fit_params)


Train data: 0.956527983913439
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}
Test data: 0.7367514002585093
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}



Fitting cvec__max_df_of_0.75
Scores for cvec__max_df_of_0.75 using bnb Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.8523483506487288
BernoulliNB(alpha=0.4444444444444444, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.4444444444444444}
Test data: 0.7525491885681459
BernoulliNB(alpha=0.4444444444444444, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.4444444444444444}



Scores for cvec__max_df_of_0.75 using mnb Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.8517738306123426
MultinomialNB(alpha=0.4444444444444444, class_prior=None, fit_prior=True)
{'alpha': 0.4444444444444444}
Test data: 0.7536981186270286
MultinomialNB(alpha=0.4444444444444444, class_prior=None, fit_prior=True)
{'alpha': 0.4444444444444444}



Scores for cvec__max_df_of_0.75 using logr Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.8959639967443864
LogisticRegression(C=3.3598182862837818, class_weight=None, dual=False,
          fit_intercept=True, intercept_scaling=1, max_iter=100,
          multi_class='ovr', n_jobs=1, penalty='l2', random_state=None,
          solver='liblinear', tol=0.0001, verbose=0, warm_start=False)
{'C': 3.3598182862837818, 'penalty': 'l2'}
Test data: 0.7518311072813443
LogisticRegression(C=3.3598182862837818, class_weight=None, dual=False,
          fit_intercept=True, intercept_scaling=1, max_iter=100,
          multi_class='ovr', n_jobs=1, penalty='l2', random_state=None,
          solver='liblinear', tol=0.0001, verbose=0, warm_start=False)
{'C': 3.3598182862837818, 'penalty': 'l2'}



Scores for cvec__max_df_of_0.75 using rfc Classifier


  self.best_estimator_.fit(X, y, **fit_params)


Train data: 0.9529851103557236
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}
Test data: 0.7386184116041936
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}



Fitting cvec__n_gram_range_of_(1, 1)_max_df_of0.75
Scores for cvec__n_gram_range_of_(1, 

  y = column_or_1d(y, warn=True)


Train data: 0.8431081533968497
BernoulliNB(alpha=1.0, binarize=0.0, class_prior=None, fit_prior=True)
{'alpha': 1.0}
Test data: 0.749102398391498
BernoulliNB(alpha=1.0, binarize=0.0, class_prior=None, fit_prior=True)
{'alpha': 1.0}



Scores for cvec__n_gram_range_of_(1, 1)_max_df_of0.75 using mnb Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.8424857566907646
MultinomialNB(alpha=0.8888888888888888, class_prior=None, fit_prior=True)
{'alpha': 0.8888888888888888}
Test data: 0.7468045382737326
MultinomialNB(alpha=0.8888888888888888, class_prior=None, fit_prior=True)
{'alpha': 0.8888888888888888}



Scores for cvec__n_gram_range_of_(1, 1)_max_df_of0.75 using logr Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.8694881984009192
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)
{'C': 1.0, 'penalty': 'l2'}
Test data: 0.7567140600315956
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)
{'C': 1.0, 'penalty': 'l2'}



Scores for cvec__n_gram_range_of_(1, 1)_max_df_of0.75 using rfc Classifier


  self.best_estimator_.fit(X, y, **fit_params)


Train data: 0.9557140805285584
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}
Test data: 0.7406290392072382
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}



Fitting cvec__n_gram_range_of_(1, 2)_max_df_of0.75
Scores for cvec__n_gram_range_of_(1, 

  y = column_or_1d(y, warn=True)


Train data: 0.9237803418394217
BernoulliNB(alpha=0.6666666666666666, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.6666666666666666}
Test data: 0.7687778256498635
BernoulliNB(alpha=0.6666666666666666, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.6666666666666666}



Scores for cvec__n_gram_range_of_(1, 2)_max_df_of0.75 using mnb Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.9324460190549145
MultinomialNB(alpha=0.3333333333333333, class_prior=None, fit_prior=True)
{'alpha': 0.3333333333333333}
Test data: 0.7679161281057015
MultinomialNB(alpha=0.3333333333333333, class_prior=None, fit_prior=True)
{'alpha': 0.3333333333333333}



Scores for cvec__n_gram_range_of_(1, 2)_max_df_of0.75 using logr Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.9397232728491406
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)
{'C': 1.0, 'penalty': 'l2'}
Test data: 0.7686342093925033
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)
{'C': 1.0, 'penalty': 'l2'}



Scores for cvec__n_gram_range_of_(1, 2)_max_df_of0.75 using rfc Classifier


  self.best_estimator_.fit(X, y, **fit_params)


Train data: 0.9548044238042802
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}
Test data: 0.7466609220163722
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}



Fitting cvec__n_gram_range_of_(1, 3)_max_df_of0.75
Scores for cvec__n_gram_range_of_(1, 

  y = column_or_1d(y, warn=True)


Train data: 0.9315363623306363
BernoulliNB(alpha=0.7777777777777777, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.7777777777777777}
Test data: 0.7657618842452966
BernoulliNB(alpha=0.7777777777777777, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.7777777777777777}



Scores for cvec__n_gram_range_of_(1, 3)_max_df_of0.75 using mnb Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.9331641691003973
MultinomialNB(alpha=0.6666666666666666, class_prior=None, fit_prior=True)
{'alpha': 0.6666666666666666}
Test data: 0.7637512566422519
MultinomialNB(alpha=0.6666666666666666, class_prior=None, fit_prior=True)
{'alpha': 0.6666666666666666}



Scores for cvec__n_gram_range_of_(1, 3)_max_df_of0.75 using logr Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.948245320055537
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)
{'C': 1.0, 'penalty': 'l2'}
Test data: 0.7684905931351429
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)
{'C': 1.0, 'penalty': 'l2'}



Scores for cvec__n_gram_range_of_(1, 3)_max_df_of0.75 using rfc Classifier


  self.best_estimator_.fit(X, y, **fit_params)


Train data: 0.9573897639680183
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}
Test data: 0.7366077840011489
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}



Fitting cvec__n_gram_range_of_(1, 4)_max_df_of0.75
Scores for cvec__n_gram_range_of_(1, 

  y = column_or_1d(y, warn=True)


Train data: 0.9300521855699717
BernoulliNB(alpha=0.6666666666666666, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.6666666666666666}
Test data: 0.7618842452965676
BernoulliNB(alpha=0.6666666666666666, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.6666666666666666}



Scores for cvec__n_gram_range_of_(1, 4)_max_df_of0.75 using mnb Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.9312012256427443
MultinomialNB(alpha=0.8888888888888888, class_prior=None, fit_prior=True)
{'alpha': 0.8888888888888888}
Test data: 0.7656182679879362
MultinomialNB(alpha=0.8888888888888888, class_prior=None, fit_prior=True)
{'alpha': 0.8888888888888888}



Scores for cvec__n_gram_range_of_(1, 4)_max_df_of0.75 using logr Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.9488677167616221
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)
{'C': 1.0, 'penalty': 'l2'}
Test data: 0.7654746517305759
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)
{'C': 1.0, 'penalty': 'l2'}



Scores for cvec__n_gram_range_of_(1, 4)_max_df_of0.75 using rfc Classifier


  self.best_estimator_.fit(X, y, **fit_params)


Train data: 0.9549001771436779
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}
Test data: 0.7308631337067356
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}



Fitting tfidf
Scores for tfidf using bnb Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.8464116436060708
BernoulliNB(alpha=0.7777777777777777, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.7777777777777777}
Test data: 0.7503949447077409
BernoulliNB(alpha=0.7777777777777777, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.7777777777777777}



Scores for tfidf using mnb Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.8608703978551252
MultinomialNB(alpha=0.4444444444444444, class_prior=None, fit_prior=True)
{'alpha': 0.4444444444444444}
Test data: 0.7492460146488582
MultinomialNB(alpha=0.4444444444444444, class_prior=None, fit_prior=True)
{'alpha': 0.4444444444444444}



Scores for tfidf using logr Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.8786326423134007
LogisticRegression(C=3.3598182862837818, class_weight=None, dual=False,
          fit_intercept=True, intercept_scaling=1, max_iter=100,
          multi_class='ovr', n_jobs=1, penalty='l2', random_state=None,
          solver='liblinear', tol=0.0001, verbose=0, warm_start=False)
{'C': 3.3598182862837818, 'penalty': 'l2'}
Test data: 0.7511130259945425
LogisticRegression(C=3.3598182862837818, class_weight=None, dual=False,
          fit_intercept=True, intercept_scaling=1, max_iter=100,
          multi_class='ovr', n_jobs=1, penalty='l2', random_state=None,
          solver='liblinear', tol=0.0001, verbose=0, warm_start=False)
{'C': 3.3598182862837818, 'penalty': 'l2'}



Scores for tfidf using rfc Classifier


  self.best_estimator_.fit(X, y, **fit_params)


Train data: 0.9559055872073539
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}
Test data: 0.7426396668102829
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}



Fitting tfidf__max_df_of_0.25
Scores for tfidf__max_df_of_0.25 using bnb Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.8420548666634748
BernoulliNB(alpha=0.8888888888888888, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.8888888888888888}
Test data: 0.7577193738331179
BernoulliNB(alpha=0.8888888888888888, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.8888888888888888}



Scores for tfidf__max_df_of_0.25 using mnb Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.8536888974002969
MultinomialNB(alpha=0.6666666666666666, class_prior=None, fit_prior=True)
{'alpha': 0.6666666666666666}
Test data: 0.7558523624874336
MultinomialNB(alpha=0.6666666666666666, class_prior=None, fit_prior=True)
{'alpha': 0.6666666666666666}



Scores for tfidf__max_df_of_0.25 using logr Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.8493799971273999
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)
{'C': 1.0, 'penalty': 'l2'}
Test data: 0.7498204796782996
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)
{'C': 1.0, 'penalty': 'l2'}



Scores for tfidf__max_df_of_0.25 using rfc Classifier


  self.best_estimator_.fit(X, y, **fit_params)


Train data: 0.9558098338679561
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}
Test data: 0.7462300732442912
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}



Fitting tfidf__n_gram_range_of_(1, 1)
Scores for tfidf__n_gram_range_of_(1, 1) using bnb

  y = column_or_1d(y, warn=True)


Train data: 0.8472255469909513
BernoulliNB(alpha=0.5555555555555556, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.5555555555555556}
Test data: 0.7611661640097659
BernoulliNB(alpha=0.5555555555555556, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.5555555555555556}



Scores for tfidf__n_gram_range_of_(1, 1) using mnb Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.8509120505577632
MultinomialNB(alpha=0.7777777777777777, class_prior=None, fit_prior=True)
{'alpha': 0.7777777777777777}
Test data: 0.7617406290392073
MultinomialNB(alpha=0.7777777777777777, class_prior=None, fit_prior=True)
{'alpha': 0.7777777777777777}



Scores for tfidf__n_gram_range_of_(1, 1) using logr Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.8756642887920716
LogisticRegression(C=3.3598182862837818, class_weight=None, dual=False,
          fit_intercept=True, intercept_scaling=1, max_iter=100,
          multi_class='ovr', n_jobs=1, penalty='l2', random_state=None,
          solver='liblinear', tol=0.0001, verbose=0, warm_start=False)
{'C': 3.3598182862837818, 'penalty': 'l2'}
Test data: 0.7637512566422519
LogisticRegression(C=3.3598182862837818, class_weight=None, dual=False,
          fit_intercept=True, intercept_scaling=1, max_iter=100,
          multi_class='ovr', n_jobs=1, penalty='l2', random_state=None,
          solver='liblinear', tol=0.0001, verbose=0, warm_start=False)
{'C': 3.3598182862837818, 'penalty': 'l2'}



Scores for tfidf__n_gram_range_of_(1, 1) using rfc Classifier


  self.best_estimator_.fit(X, y, **fit_params)


Train data: 0.956192847225547
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}
Test data: 0.7475226195605342
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}



Fitting tfidf__n_gram_range_of_(1, 1)_max_df_of0.25
Scores for tfidf__n_gram_range_of_(1,

  y = column_or_1d(y, warn=True)


Train data: 0.8453583568726959
BernoulliNB(alpha=0.6666666666666666, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.6666666666666666}
Test data: 0.7509694097371823
BernoulliNB(alpha=0.6666666666666666, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.6666666666666666}



Scores for tfidf__n_gram_range_of_(1, 1)_max_df_of0.25 using mnb Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.8604873844975344
MultinomialNB(alpha=0.4444444444444444, class_prior=None, fit_prior=True)
{'alpha': 0.4444444444444444}
Test data: 0.7499640959356599
MultinomialNB(alpha=0.4444444444444444, class_prior=None, fit_prior=True)
{'alpha': 0.4444444444444444}



Scores for tfidf__n_gram_range_of_(1, 1)_max_df_of0.25 using logr Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.8517259539426437
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)
{'C': 1.0, 'penalty': 'l2'}
Test data: 0.7535545023696683
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)
{'C': 1.0, 'penalty': 'l2'}



Scores for tfidf__n_gram_range_of_(1, 1)_max_df_of0.25 using rfc Classifier


  self.best_estimator_.fit(X, y, **fit_params)


Train data: 0.957485517307416
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}
Test data: 0.739623725405716
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}



Fitting tfidf__n_gram_range_of_(1, 2)
Scores for tfidf__n_gram_range_of_(1, 2) using bnb C

  y = column_or_1d(y, warn=True)


Train data: 0.91966294824532
BernoulliNB(alpha=0.7777777777777777, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.7777777777777777}
Test data: 0.7686342093925033
BernoulliNB(alpha=0.7777777777777777, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.7777777777777777}



Scores for tfidf__n_gram_range_of_(1, 2) using mnb Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.9443673098099297
MultinomialNB(alpha=0.2222222222222222, class_prior=None, fit_prior=True)
{'alpha': 0.2222222222222222}
Test data: 0.7682033606204223
MultinomialNB(alpha=0.2222222222222222, class_prior=None, fit_prior=True)
{'alpha': 0.2222222222222222}



Scores for tfidf__n_gram_range_of_(1, 2) using logr Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.9462823765978838
LogisticRegression(C=3.3598182862837818, class_weight=None, dual=False,
          fit_intercept=True, intercept_scaling=1, max_iter=100,
          multi_class='ovr', n_jobs=1, penalty='l2', random_state=None,
          solver='liblinear', tol=0.0001, verbose=0, warm_start=False)
{'C': 3.3598182862837818, 'penalty': 'l2'}
Test data: 0.7660491167600172
LogisticRegression(C=3.3598182862837818, class_weight=None, dual=False,
          fit_intercept=True, intercept_scaling=1, max_iter=100,
          multi_class='ovr', n_jobs=1, penalty='l2', random_state=None,
          solver='liblinear', tol=0.0001, verbose=0, warm_start=False)
{'C': 3.3598182862837818, 'penalty': 'l2'}



Scores for tfidf__n_gram_range_of_(1, 2) using rfc Classifier


  self.best_estimator_.fit(X, y, **fit_params)


Train data: 0.9563364772346435
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}
Test data: 0.739192876633635
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}



Fitting tfidf__n_gram_range_of_(1, 2)_max_df_of0.25
Scores for tfidf__n_gram_range_of_(1,

  y = column_or_1d(y, warn=True)


Train data: 0.9295255422032843
BernoulliNB(alpha=0.4444444444444444, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.4444444444444444}
Test data: 0.7686342093925033
BernoulliNB(alpha=0.4444444444444444, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.4444444444444444}



Scores for tfidf__n_gram_range_of_(1, 2)_max_df_of0.25 using mnb Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.9388136161248624
MultinomialNB(alpha=0.4444444444444444, class_prior=None, fit_prior=True)
{'alpha': 0.4444444444444444}
Test data: 0.7677725118483413
MultinomialNB(alpha=0.4444444444444444, class_prior=None, fit_prior=True)
{'alpha': 0.4444444444444444}



Scores for tfidf__n_gram_range_of_(1, 2)_max_df_of0.25 using logr Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.9458514865705941
LogisticRegression(C=3.3598182862837818, class_weight=None, dual=False,
          fit_intercept=True, intercept_scaling=1, max_iter=100,
          multi_class='ovr', n_jobs=1, penalty='l2', random_state=None,
          solver='liblinear', tol=0.0001, verbose=0, warm_start=False)
{'C': 3.3598182862837818, 'penalty': 'l2'}
Test data: 0.7611661640097659
LogisticRegression(C=3.3598182862837818, class_weight=None, dual=False,
          fit_intercept=True, intercept_scaling=1, max_iter=100,
          multi_class='ovr', n_jobs=1, penalty='l2', random_state=None,
          solver='liblinear', tol=0.0001, verbose=0, warm_start=False)
{'C': 3.3598182862837818, 'penalty': 'l2'}



Scores for tfidf__n_gram_range_of_(1, 2)_max_df_of0.25 using rfc Classifier


  self.best_estimator_.fit(X, y, **fit_params)


Train data: 0.9580121606741036
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}
Test data: 0.7420652017808416
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}



Fitting tfidf__n_gram_range_of_(1, 3)
Scores for tfidf__n_gram_range_of_(1, 3) using bnb

  y = column_or_1d(y, warn=True)


Train data: 0.9323981423852157
BernoulliNB(alpha=0.6666666666666666, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.6666666666666666}
Test data: 0.7722246158265116
BernoulliNB(alpha=0.6666666666666666, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.6666666666666666}



Scores for tfidf__n_gram_range_of_(1, 3) using mnb Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.9514051802556615
MultinomialNB(alpha=0.2222222222222222, class_prior=None, fit_prior=True)
{'alpha': 0.2222222222222222}
Test data: 0.7667671980468189
MultinomialNB(alpha=0.2222222222222222, class_prior=None, fit_prior=True)
{'alpha': 0.2222222222222222}



Scores for tfidf__n_gram_range_of_(1, 3) using logr Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.9530329870254225
LogisticRegression(C=3.3598182862837818, class_weight=None, dual=False,
          fit_intercept=True, intercept_scaling=1, max_iter=100,
          multi_class='ovr', n_jobs=1, penalty='l2', random_state=None,
          solver='liblinear', tol=0.0001, verbose=0, warm_start=False)
{'C': 3.3598182862837818, 'penalty': 'l2'}
Test data: 0.7684905931351429
LogisticRegression(C=3.3598182862837818, class_weight=None, dual=False,
          fit_intercept=True, intercept_scaling=1, max_iter=100,
          multi_class='ovr', n_jobs=1, penalty='l2', random_state=None,
          solver='liblinear', tol=0.0001, verbose=0, warm_start=False)
{'C': 3.3598182862837818, 'penalty': 'l2'}



Scores for tfidf__n_gram_range_of_(1, 3) using rfc Classifier


  self.best_estimator_.fit(X, y, **fit_params)


Train data: 0.9576770239862116
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}
Test data: 0.7498204796782996
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}



Fitting tfidf__n_gram_range_of_(1, 3)_max_df_of0.25
Scores for tfidf__n_gram_range_of_(1

  y = column_or_1d(y, warn=True)


Train data: 0.9339301958155791
BernoulliNB(alpha=0.5555555555555556, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.5555555555555556}
Test data: 0.7689214419072239
BernoulliNB(alpha=0.5555555555555556, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.5555555555555556}



Scores for tfidf__n_gram_range_of_(1, 3)_max_df_of0.25 using mnb Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.9500167568343946
MultinomialNB(alpha=0.3333333333333333, class_prior=None, fit_prior=True)
{'alpha': 0.3333333333333333}
Test data: 0.7673416630762603
MultinomialNB(alpha=0.3333333333333333, class_prior=None, fit_prior=True)
{'alpha': 0.3333333333333333}



Scores for tfidf__n_gram_range_of_(1, 3)_max_df_of0.25 using logr Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.9534160003830133
LogisticRegression(C=3.3598182862837818, class_weight=None, dual=False,
          fit_intercept=True, intercept_scaling=1, max_iter=100,
          multi_class='ovr', n_jobs=1, penalty='l2', random_state=None,
          solver='liblinear', tol=0.0001, verbose=0, warm_start=False)
{'C': 3.3598182862837818, 'penalty': 'l2'}
Test data: 0.7651874192158552
LogisticRegression(C=3.3598182862837818, class_weight=None, dual=False,
          fit_intercept=True, intercept_scaling=1, max_iter=100,
          multi_class='ovr', n_jobs=1, penalty='l2', random_state=None,
          solver='liblinear', tol=0.0001, verbose=0, warm_start=False)
{'C': 3.3598182862837818, 'penalty': 'l2'}



Scores for tfidf__n_gram_range_of_(1, 3)_max_df_of0.25 using rfc Classifier


  self.best_estimator_.fit(X, y, **fit_params)


Train data: 0.9584430507013932
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}
Test data: 0.740485422949878
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}



Fitting tfidf__n_gram_range_of_(1, 4)
Scores for tfidf__n_gram_range_of_(1, 4) using bnb 

  y = column_or_1d(y, warn=True)


Train data: 0.9339301958155791
BernoulliNB(alpha=0.5555555555555556, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.5555555555555556}
Test data: 0.7601608502082435
BernoulliNB(alpha=0.5555555555555556, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.5555555555555556}



Scores for tfidf__n_gram_range_of_(1, 4) using mnb Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.9499688801646957
MultinomialNB(alpha=0.3333333333333333, class_prior=None, fit_prior=True)
{'alpha': 0.3333333333333333}
Test data: 0.7786873474077266
MultinomialNB(alpha=0.3333333333333333, class_prior=None, fit_prior=True)
{'alpha': 0.3333333333333333}



Scores for tfidf__n_gram_range_of_(1, 4) using logr Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.9532244937042179
LogisticRegression(C=3.3598182862837818, class_weight=None, dual=False,
          fit_intercept=True, intercept_scaling=1, max_iter=100,
          multi_class='ovr', n_jobs=1, penalty='l2', random_state=None,
          solver='liblinear', tol=0.0001, verbose=0, warm_start=False)
{'C': 3.3598182862837818, 'penalty': 'l2'}
Test data: 0.7792618124371679
LogisticRegression(C=3.3598182862837818, class_weight=None, dual=False,
          fit_intercept=True, intercept_scaling=1, max_iter=100,
          multi_class='ovr', n_jobs=1, penalty='l2', random_state=None,
          solver='liblinear', tol=0.0001, verbose=0, warm_start=False)
{'C': 3.3598182862837818, 'penalty': 'l2'}



Scores for tfidf__n_gram_range_of_(1, 4) using rfc Classifier


  self.best_estimator_.fit(X, y, **fit_params)


Train data: 0.9584430507013932
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}
Test data: 0.7447939106706879
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}



Fitting tfidf__n_gram_range_of_(1, 4)_max_df_of0.25
Scores for tfidf__n_gram_range_of_(1

  y = column_or_1d(y, warn=True)


Train data: 0.9338823191458802
BernoulliNB(alpha=0.5555555555555556, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.5555555555555556}
Test data: 0.772799080855953
BernoulliNB(alpha=0.5555555555555556, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.5555555555555556}



Scores for tfidf__n_gram_range_of_(1, 4)_max_df_of0.25 using mnb Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.9514051802556615
MultinomialNB(alpha=0.3333333333333333, class_prior=None, fit_prior=True)
{'alpha': 0.3333333333333333}
Test data: 0.7723682320838718
MultinomialNB(alpha=0.3333333333333333, class_prior=None, fit_prior=True)
{'alpha': 0.3333333333333333}



Scores for tfidf__n_gram_range_of_(1, 4)_max_df_of0.25 using logr Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.9540862737587973
LogisticRegression(C=3.3598182862837818, class_weight=None, dual=False,
          fit_intercept=True, intercept_scaling=1, max_iter=100,
          multi_class='ovr', n_jobs=1, penalty='l2', random_state=None,
          solver='liblinear', tol=0.0001, verbose=0, warm_start=False)
{'C': 3.3598182862837818, 'penalty': 'l2'}
Test data: 0.7707884532529082
LogisticRegression(C=3.3598182862837818, class_weight=None, dual=False,
          fit_intercept=True, intercept_scaling=1, max_iter=100,
          multi_class='ovr', n_jobs=1, penalty='l2', random_state=None,
          solver='liblinear', tol=0.0001, verbose=0, warm_start=False)
{'C': 3.3598182862837818, 'penalty': 'l2'}



Scores for tfidf__n_gram_range_of_(1, 4)_max_df_of0.25 using rfc Classifier


  self.best_estimator_.fit(X, y, **fit_params)


Train data: 0.9586824340498875
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}
Test data: 0.7449375269280483
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}



Fitting tfidf__max_df_of_0.5
Scores for tfidf__max_df_of_0.5 using bnb Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.8501460238425815
BernoulliNB(alpha=0.6666666666666666, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.6666666666666666}
Test data: 0.7501077121930203
BernoulliNB(alpha=0.6666666666666666, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.6666666666666666}



Scores for tfidf__max_df_of_0.5 using mnb Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.8512471872456552
MultinomialNB(alpha=0.8888888888888888, class_prior=None, fit_prior=True)
{'alpha': 0.8888888888888888}
Test data: 0.7479534683326152
MultinomialNB(alpha=0.8888888888888888, class_prior=None, fit_prior=True)
{'alpha': 0.8888888888888888}



Scores for tfidf__max_df_of_0.5 using logr Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.8532580073730072
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)
{'C': 1.0, 'penalty': 'l2'}
Test data: 0.7496768634209392
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)
{'C': 1.0, 'penalty': 'l2'}



Scores for tfidf__max_df_of_0.5 using rfc Classifier


  self.best_estimator_.fit(X, y, **fit_params)


Train data: 0.9569109972710298
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}
Test data: 0.7394801091483556
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}



Fitting tfidf__n_gram_range_of_(1, 1)_max_df_of0.5
Scores for tfidf__n_gram_range_of_(1,

  y = column_or_1d(y, warn=True)


Train data: 0.8481352037152295
BernoulliNB(alpha=0.5555555555555556, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.5555555555555556}
Test data: 0.7581502226051989
BernoulliNB(alpha=0.5555555555555556, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.5555555555555556}



Scores for tfidf__n_gram_range_of_(1, 1)_max_df_of0.5 using mnb Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.8584765643701824
MultinomialNB(alpha=0.4444444444444444, class_prior=None, fit_prior=True)
{'alpha': 0.4444444444444444}
Test data: 0.7582938388625592
MultinomialNB(alpha=0.4444444444444444, class_prior=None, fit_prior=True)
{'alpha': 0.4444444444444444}



Scores for tfidf__n_gram_range_of_(1, 1)_max_df_of0.5 using logr Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.8493799971273999
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)
{'C': 1.0, 'penalty': 'l2'}
Test data: 0.7610225477524055
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)
{'C': 1.0, 'penalty': 'l2'}



Scores for tfidf__n_gram_range_of_(1, 1)_max_df_of0.5 using rfc Classifier


  self.best_estimator_.fit(X, y, **fit_params)


Train data: 0.9563843539043424
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}
Test data: 0.7446502944133276
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}



Fitting tfidf__n_gram_range_of_(1, 2)_max_df_of0.5
Scores for tfidf__n_gram_range_of_(1,

  y = column_or_1d(y, warn=True)


Train data: 0.922679178436348
BernoulliNB(alpha=0.6666666666666666, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.6666666666666666}
Test data: 0.7659055005026569
BernoulliNB(alpha=0.6666666666666666, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.6666666666666666}



Scores for tfidf__n_gram_range_of_(1, 2)_max_df_of0.5 using mnb Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.941255326279504
MultinomialNB(alpha=0.3333333333333333, class_prior=None, fit_prior=True)
{'alpha': 0.3333333333333333}
Test data: 0.7657618842452966
MultinomialNB(alpha=0.3333333333333333, class_prior=None, fit_prior=True)
{'alpha': 0.3333333333333333}



Scores for tfidf__n_gram_range_of_(1, 2)_max_df_of0.5 using logr Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.9456599798917987
LogisticRegression(C=3.3598182862837818, class_weight=None, dual=False,
          fit_intercept=True, intercept_scaling=1, max_iter=100,
          multi_class='ovr', n_jobs=1, penalty='l2', random_state=None,
          solver='liblinear', tol=0.0001, verbose=0, warm_start=False)
{'C': 3.3598182862837818, 'penalty': 'l2'}
Test data: 0.7663363492747379
LogisticRegression(C=3.3598182862837818, class_weight=None, dual=False,
          fit_intercept=True, intercept_scaling=1, max_iter=100,
          multi_class='ovr', n_jobs=1, penalty='l2', random_state=None,
          solver='liblinear', tol=0.0001, verbose=0, warm_start=False)
{'C': 3.3598182862837818, 'penalty': 'l2'}



Scores for tfidf__n_gram_range_of_(1, 2)_max_df_of0.5 using rfc Classifier


  self.best_estimator_.fit(X, y, **fit_params)


Train data: 0.958203667352899
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}
Test data: 0.7424960505529226
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}



Fitting tfidf__n_gram_range_of_(1, 3)_max_df_of0.5
Scores for tfidf__n_gram_range_of_(1, 

  y = column_or_1d(y, warn=True)


Train data: 0.9338344424761813
BernoulliNB(alpha=0.5555555555555556, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.5555555555555556}
Test data: 0.7752405572310785
BernoulliNB(alpha=0.5555555555555556, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.5555555555555556}



Scores for tfidf__n_gram_range_of_(1, 3)_max_df_of0.5 using mnb Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.9486762100828267
MultinomialNB(alpha=0.4444444444444444, class_prior=None, fit_prior=True)
{'alpha': 0.4444444444444444}
Test data: 0.7742352434295562
MultinomialNB(alpha=0.4444444444444444, class_prior=None, fit_prior=True)
{'alpha': 0.4444444444444444}



Scores for tfidf__n_gram_range_of_(1, 3)_max_df_of0.5 using logr Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.9527936036769282
LogisticRegression(C=3.3598182862837818, class_weight=None, dual=False,
          fit_intercept=True, intercept_scaling=1, max_iter=100,
          multi_class='ovr', n_jobs=1, penalty='l2', random_state=None,
          solver='liblinear', tol=0.0001, verbose=0, warm_start=False)
{'C': 3.3598182862837818, 'penalty': 'l2'}
Test data: 0.7670544305615395
LogisticRegression(C=3.3598182862837818, class_weight=None, dual=False,
          fit_intercept=True, intercept_scaling=1, max_iter=100,
          multi_class='ovr', n_jobs=1, penalty='l2', random_state=None,
          solver='liblinear', tol=0.0001, verbose=0, warm_start=False)
{'C': 3.3598182862837818, 'penalty': 'l2'}



Scores for tfidf__n_gram_range_of_(1, 3)_max_df_of0.5 using rfc Classifier


  self.best_estimator_.fit(X, y, **fit_params)


Train data: 0.9594484607650692
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}
Test data: 0.7449375269280483
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}



Fitting tfidf__n_gram_range_of_(1, 4)_max_df_of0.5
Scores for tfidf__n_gram_range_of_(1,

  y = column_or_1d(y, warn=True)


Train data: 0.9330205390913008
BernoulliNB(alpha=0.5555555555555556, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.5555555555555556}
Test data: 0.7581502226051989
BernoulliNB(alpha=0.5555555555555556, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.5555555555555556}



Scores for tfidf__n_gram_range_of_(1, 4)_max_df_of0.5 using mnb Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.948245320055537
MultinomialNB(alpha=0.4444444444444444, class_prior=None, fit_prior=True)
{'alpha': 0.4444444444444444}
Test data: 0.7705012207381876
MultinomialNB(alpha=0.4444444444444444, class_prior=None, fit_prior=True)
{'alpha': 0.4444444444444444}



Scores for tfidf__n_gram_range_of_(1, 4)_max_df_of0.5 using logr Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.9539905204193996
LogisticRegression(C=3.3598182862837818, class_weight=None, dual=False,
          fit_intercept=True, intercept_scaling=1, max_iter=100,
          multi_class='ovr', n_jobs=1, penalty='l2', random_state=None,
          solver='liblinear', tol=0.0001, verbose=0, warm_start=False)
{'C': 3.3598182862837818, 'penalty': 'l2'}
Test data: 0.764182105414333
LogisticRegression(C=3.3598182862837818, class_weight=None, dual=False,
          fit_intercept=True, intercept_scaling=1, max_iter=100,
          multi_class='ovr', n_jobs=1, penalty='l2', random_state=None,
          solver='liblinear', tol=0.0001, verbose=0, warm_start=False)
{'C': 3.3598182862837818, 'penalty': 'l2'}



Scores for tfidf__n_gram_range_of_(1, 4)_max_df_of0.5 using rfc Classifier


  self.best_estimator_.fit(X, y, **fit_params)


Train data: 0.9583951740316944
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}
Test data: 0.7436449806118053
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}



Fitting tfidf__max_df_of_0.75
Scores for tfidf__max_df_of_0.75 using bnb Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.8505290372001724
BernoulliNB(alpha=0.5555555555555556, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.5555555555555556}
Test data: 0.756426827516875
BernoulliNB(alpha=0.5555555555555556, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.5555555555555556}



Scores for tfidf__max_df_of_0.75 using mnb Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.8607746445157275
MultinomialNB(alpha=0.4444444444444444, class_prior=None, fit_prior=True)
{'alpha': 0.4444444444444444}
Test data: 0.7535545023696683
MultinomialNB(alpha=0.4444444444444444, class_prior=None, fit_prior=True)
{'alpha': 0.4444444444444444}



Scores for tfidf__max_df_of_0.75 using logr Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.8525877339972232
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)
{'C': 1.0, 'penalty': 'l2'}
Test data: 0.7574321413183972
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)
{'C': 1.0, 'penalty': 'l2'}



Scores for tfidf__max_df_of_0.75 using rfc Classifier


  self.best_estimator_.fit(X, y, **fit_params)


Train data: 0.9562407238952458
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}
Test data: 0.7356024701996265
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}



Fitting tfidf__n_gram_range_of_(1, 1)_max_df_of0.75
Scores for tfidf__n_gram_range_of_(1

  y = column_or_1d(y, warn=True)


Train data: 0.8455019868817925
BernoulliNB(alpha=0.7777777777777777, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.7777777777777777}
Test data: 0.7515438747666235
BernoulliNB(alpha=0.7777777777777777, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.7777777777777777}



Scores for tfidf__n_gram_range_of_(1, 1)_max_df_of0.75 using mnb Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.8499545171637861
MultinomialNB(alpha=0.8888888888888888, class_prior=None, fit_prior=True)
{'alpha': 0.8888888888888888}
Test data: 0.7493896309062186
MultinomialNB(alpha=0.8888888888888888, class_prior=None, fit_prior=True)
{'alpha': 0.8888888888888888}



Scores for tfidf__n_gram_range_of_(1, 1)_max_df_of0.75 using logr Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.8517259539426437
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)
{'C': 1.0, 'penalty': 'l2'}
Test data: 0.7531236535975873
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)
{'C': 1.0, 'penalty': 'l2'}



Scores for tfidf__n_gram_range_of_(1, 1)_max_df_of0.75 using rfc Classifier


  self.best_estimator_.fit(X, y, **fit_params)


Train data: 0.9558098338679561
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}
Test data: 0.7442194456412465
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}



Fitting tfidf__n_gram_range_of_(1, 2)_max_df_of0.75
Scores for tfidf__n_gram_range_of_(1

  y = column_or_1d(y, warn=True)


Train data: 0.9233973284818308
BernoulliNB(alpha=0.6666666666666666, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.6666666666666666}
Test data: 0.7692086744219445
BernoulliNB(alpha=0.6666666666666666, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.6666666666666666}



Scores for tfidf__n_gram_range_of_(1, 2)_max_df_of0.75 using mnb Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.9412074496098052
MultinomialNB(alpha=0.3333333333333333, class_prior=None, fit_prior=True)
{'alpha': 0.3333333333333333}
Test data: 0.7705012207381876
MultinomialNB(alpha=0.3333333333333333, class_prior=None, fit_prior=True)
{'alpha': 0.3333333333333333}



Scores for tfidf__n_gram_range_of_(1, 2)_max_df_of0.75 using logr Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.962177430937904
LogisticRegression(C=11.28837891684689, class_weight=None, dual=False,
          fit_intercept=True, intercept_scaling=1, max_iter=100,
          multi_class='ovr', n_jobs=1, penalty='l2', random_state=None,
          solver='liblinear', tol=0.0001, verbose=0, warm_start=False)
{'C': 11.28837891684689, 'penalty': 'l2'}
Test data: 0.7651874192158552
LogisticRegression(C=11.28837891684689, class_weight=None, dual=False,
          fit_intercept=True, intercept_scaling=1, max_iter=100,
          multi_class='ovr', n_jobs=1, penalty='l2', random_state=None,
          solver='liblinear', tol=0.0001, verbose=0, warm_start=False)
{'C': 11.28837891684689, 'penalty': 'l2'}



Scores for tfidf__n_gram_range_of_(1, 2)_max_df_of0.75 using rfc Classifier


  self.best_estimator_.fit(X, y, **fit_params)


Train data: 0.959496337434768
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}
Test data: 0.7386184116041936
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}



Fitting tfidf__n_gram_range_of_(1, 3)_max_df_of0.75
Scores for tfidf__n_gram_range_of_(1,

  y = column_or_1d(y, warn=True)


Train data: 0.9350792358883516
BernoulliNB(alpha=0.5555555555555556, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.5555555555555556}
Test data: 0.764612954186414
BernoulliNB(alpha=0.5555555555555556, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.5555555555555556}



Scores for tfidf__n_gram_range_of_(1, 3)_max_df_of0.75 using mnb Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.9522669603102408
MultinomialNB(alpha=0.2222222222222222, class_prior=None, fit_prior=True)
{'alpha': 0.2222222222222222}
Test data: 0.7697831394513859
MultinomialNB(alpha=0.2222222222222222, class_prior=None, fit_prior=True)
{'alpha': 0.2222222222222222}



Scores for tfidf__n_gram_range_of_(1, 3)_max_df_of0.75 using logr Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.9528414803466271
LogisticRegression(C=3.3598182862837818, class_weight=None, dual=False,
          fit_intercept=True, intercept_scaling=1, max_iter=100,
          multi_class='ovr', n_jobs=1, penalty='l2', random_state=None,
          solver='liblinear', tol=0.0001, verbose=0, warm_start=False)
{'C': 3.3598182862837818, 'penalty': 'l2'}
Test data: 0.7621714778112882
LogisticRegression(C=3.3598182862837818, class_weight=None, dual=False,
          fit_intercept=True, intercept_scaling=1, max_iter=100,
          multi_class='ovr', n_jobs=1, penalty='l2', random_state=None,
          solver='liblinear', tol=0.0001, verbose=0, warm_start=False)
{'C': 3.3598182862837818, 'penalty': 'l2'}



Scores for tfidf__n_gram_range_of_(1, 3)_max_df_of0.75 using rfc Classifier


  self.best_estimator_.fit(X, y, **fit_params)


Train data: 0.9587303107195864
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}
Test data: 0.7440758293838863
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}



Fitting tfidf__n_gram_range_of_(1, 4)_max_df_of0.75
Scores for tfidf__n_gram_range_of_(1

  y = column_or_1d(y, warn=True)


Train data: 0.9338823191458802
BernoulliNB(alpha=0.5555555555555556, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.5555555555555556}
Test data: 0.7608789314950453
BernoulliNB(alpha=0.5555555555555556, binarize=0.0, class_prior=None,
      fit_prior=True)
{'alpha': 0.5555555555555556}



Scores for tfidf__n_gram_range_of_(1, 4)_max_df_of0.75 using mnb Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.9520754536314454
MultinomialNB(alpha=0.3333333333333333, class_prior=None, fit_prior=True)
{'alpha': 0.3333333333333333}
Test data: 0.7664799655320982
MultinomialNB(alpha=0.3333333333333333, class_prior=None, fit_prior=True)
{'alpha': 0.3333333333333333}



Scores for tfidf__n_gram_range_of_(1, 4)_max_df_of0.75 using logr Classifier


  y = column_or_1d(y, warn=True)


Train data: 0.9548044238042802
LogisticRegression(C=3.3598182862837818, class_weight=None, dual=False,
          fit_intercept=True, intercept_scaling=1, max_iter=100,
          multi_class='ovr', n_jobs=1, penalty='l2', random_state=None,
          solver='liblinear', tol=0.0001, verbose=0, warm_start=False)
{'C': 3.3598182862837818, 'penalty': 'l2'}
Test data: 0.7663363492747379
LogisticRegression(C=3.3598182862837818, class_weight=None, dual=False,
          fit_intercept=True, intercept_scaling=1, max_iter=100,
          multi_class='ovr', n_jobs=1, penalty='l2', random_state=None,
          solver='liblinear', tol=0.0001, verbose=0, warm_start=False)
{'C': 3.3598182862837818, 'penalty': 'l2'}



Scores for tfidf__n_gram_range_of_(1, 4)_max_df_of0.75 using rfc Classifier


  self.best_estimator_.fit(X, y, **fit_params)


Train data: 0.9603581174893474
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}
Test data: 0.7381875628321126
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
{}





## Predicting subreddit using Random Forests + Another Classifier

In [10]:
# ## YOUR CODE HERE
# from sklearn.cross_validation import train_test_split
# from sklearn.ensemble import RandomForestClassifier
# from sklearn.tree import DecisionTreeClassifier

# X = meme_titles[["title"]]
# y = meme_titles[["is_sequel_meme"]]

# X_train, X_test, y_train, y_test = train_test_split(X, y)

In [11]:
# rf = RandomForestClassifier()
# rf.fit(X_train,y_train)


#### We want to predict a binary variable - class `0` for one of your subreddits and `1` for the other.

In [12]:
## YOUR CODE HERE

#### Thought experiment: What is the baseline accuracy for this model?

In [23]:
# a Coin flip

#### Create a `RandomForestClassifier` model to predict which subreddit a given post belongs to.

In [14]:
## YOUR CODE HERE

#### Use cross-validation in scikit-learn to evaluate the model above. 
- Evaluate the accuracy of the model, as well as any other metrics you feel are appropriate. 
- **Bonus**: Use `GridSearchCV` with `Pipeline` to optimize your `CountVectorizer`/`TfidfVectorizer` and classification model.

In [15]:
## YOUR CODE HERE

#### Repeat the model-building process using a different classifier (e.g. `MultinomialNB`, `LogisticRegression`, etc)

In [16]:
## YOUR CODE HERE

# Executive Summary
---
Put your executive summary in a Markdown cell below.