# Neural Network: DC Heros Classifier
Deep Learning Module: Neural Networks
Goal: Create a multi-layer perceptron neural network model to predict on a labeled dataset of your choosing. Then, we will compare this model to a random forest model and describe the relative tradeoffs between complexity and accuracy. Vary the hyperparameters of our MLP.

## Data Set Description:
    This folder contains data behind the story Comic Books Are Still Made By Men, For Men And About Men.

The data comes from DC Wikia. Characters were scraped on August 24. Appearance counts were scraped on September 2. The month and year of the first issue each character appeared in was pulled on October 6.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

from collections import Counter

import os
import seaborn as sns

import matplotlib.pyplot as plt
plt.style.use('ggplot')
from tqdm import tqdm

import re
from scipy.cluster.vq import kmeans, vq
from pylab import plot, show
from matplotlib.lines import Line2D
import matplotlib.colors as mcolors

from sklearn.cluster import KMeans
from sklearn import neighbors
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler

# Import Perceptron.
from sklearn.linear_model import Perceptron

In [2]:
# Load Dataset
dc_hero_df = pd.read_csv('/Users/mehrunisaqayyum/Downloads/dc-wikia-data.csv')
dc_hero_df

Unnamed: 0,page_id,name,urlslug,ID,ALIGN,EYE,HAIR,SEX,GSM,ALIVE,APPEARANCES,FIRST APPEARANCE,YEAR
0,1422,Batman (Bruce Wayne),\/wiki\/Batman_(Bruce_Wayne),Secret Identity,Good Characters,Blue Eyes,Black Hair,Male Characters,,Living Characters,3093.0,"1939, May",1939.0
1,23387,Superman (Clark Kent),\/wiki\/Superman_(Clark_Kent),Secret Identity,Good Characters,Blue Eyes,Black Hair,Male Characters,,Living Characters,2496.0,"1986, October",1986.0
2,1458,Green Lantern (Hal Jordan),\/wiki\/Green_Lantern_(Hal_Jordan),Secret Identity,Good Characters,Brown Eyes,Brown Hair,Male Characters,,Living Characters,1565.0,"1959, October",1959.0
3,1659,James Gordon (New Earth),\/wiki\/James_Gordon_(New_Earth),Public Identity,Good Characters,Brown Eyes,White Hair,Male Characters,,Living Characters,1316.0,"1987, February",1987.0
4,1576,Richard Grayson (New Earth),\/wiki\/Richard_Grayson_(New_Earth),Secret Identity,Good Characters,Blue Eyes,Black Hair,Male Characters,,Living Characters,1237.0,"1940, April",1940.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
6891,66302,Nadine West (New Earth),\/wiki\/Nadine_West_(New_Earth),Public Identity,Good Characters,,,Female Characters,,Living Characters,,,
6892,283475,Warren Harding (New Earth),\/wiki\/Warren_Harding_(New_Earth),Public Identity,Good Characters,,,Male Characters,,Living Characters,,,
6893,283478,William Harrison (New Earth),\/wiki\/William_Harrison_(New_Earth),Public Identity,Good Characters,,,Male Characters,,Living Characters,,,
6894,283471,William McKinley (New Earth),\/wiki\/William_McKinley_(New_Earth),Public Identity,Good Characters,,,Male Characters,,Living Characters,,,


In [3]:
# What are our column labels? 
dc_hero_df.columns

Index(['page_id', 'name', 'urlslug', 'ID', 'ALIGN', 'EYE', 'HAIR', 'SEX',
       'GSM', 'ALIVE', 'APPEARANCES', 'FIRST APPEARANCE', 'YEAR'],
      dtype='object')

## Model Preparation
**We do not need to normalize data when the columns are mainly dummy variables with 1 and 0 values.
We can't use dummies(a type of encoder) for Perceptron Model. So we must use a different type of encoder for our predicted value as categorical variable. Use this:  https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html

### Normalize Data 
so that all variables have a mean of 0 and standard deviation

Code to run if needed. 
X = StandardScaler().fit_transform(new_df)

In [4]:
#Drop unnecessary non numeric columns: GSM and 'page_id'
dc_hero_df = dc_hero_df.drop(columns = ['GSM','page_id'])

In [5]:
dc_hero_df

Unnamed: 0,name,urlslug,ID,ALIGN,EYE,HAIR,SEX,ALIVE,APPEARANCES,FIRST APPEARANCE,YEAR
0,Batman (Bruce Wayne),\/wiki\/Batman_(Bruce_Wayne),Secret Identity,Good Characters,Blue Eyes,Black Hair,Male Characters,Living Characters,3093.0,"1939, May",1939.0
1,Superman (Clark Kent),\/wiki\/Superman_(Clark_Kent),Secret Identity,Good Characters,Blue Eyes,Black Hair,Male Characters,Living Characters,2496.0,"1986, October",1986.0
2,Green Lantern (Hal Jordan),\/wiki\/Green_Lantern_(Hal_Jordan),Secret Identity,Good Characters,Brown Eyes,Brown Hair,Male Characters,Living Characters,1565.0,"1959, October",1959.0
3,James Gordon (New Earth),\/wiki\/James_Gordon_(New_Earth),Public Identity,Good Characters,Brown Eyes,White Hair,Male Characters,Living Characters,1316.0,"1987, February",1987.0
4,Richard Grayson (New Earth),\/wiki\/Richard_Grayson_(New_Earth),Secret Identity,Good Characters,Blue Eyes,Black Hair,Male Characters,Living Characters,1237.0,"1940, April",1940.0
...,...,...,...,...,...,...,...,...,...,...,...
6891,Nadine West (New Earth),\/wiki\/Nadine_West_(New_Earth),Public Identity,Good Characters,,,Female Characters,Living Characters,,,
6892,Warren Harding (New Earth),\/wiki\/Warren_Harding_(New_Earth),Public Identity,Good Characters,,,Male Characters,Living Characters,,,
6893,William Harrison (New Earth),\/wiki\/William_Harrison_(New_Earth),Public Identity,Good Characters,,,Male Characters,Living Characters,,,
6894,William McKinley (New Earth),\/wiki\/William_McKinley_(New_Earth),Public Identity,Good Characters,,,Male Characters,Living Characters,,,


In [6]:
#Drop rows with NAN to run our Perceptron and Random Forest Classifier models b/c we need numeric values for all records.
dc_hero_df = dc_hero_df.dropna(axis=0)

In [8]:
#Target variable will explain the superheros' classification... 'ALIGN'
dc_hero_df['ALIGN'].value_counts()

Good Characters       1105
Bad Characters         760
Neutral Characters     231
Reformed Criminals       1
Name: ALIGN, dtype: int64

In [9]:
dc_hero_df['ALIGN'].isna().value_counts() 

False    2097
Name: ALIGN, dtype: int64

## Create dummies separately for 'X' and 'Y'

### Note: Create Dummies and encoders for feature and target columns to classify how DC characters are good or bad based on characteristics.

In [12]:
#Create Dummies and encoders for feature and target columnsn to classify how DC characters are good or bad based on characteristics.
#new_df = pd.get_dummies(dc_hero_df['EYE','HAIR','ID','SEX','ALIVE'])
from sklearn.preprocessing import LabelEncoder

#dc_hero_df.info()
#new_df = pd.get_dummies(old_df['EYE'])
new_df = pd.get_dummies(dc_hero_df, columns = ['EYE','HAIR','SEX','ALIVE','ID'])


y_encoder = LabelEncoder()
new_target_df = y_encoder.fit_transform(dc_hero_df['ALIGN'])

#new_target_df= pd.get_dummies(dc_hero_df['ALIGN'])
#need drop original "EYE"
new_df.info()
new_df.head()
#new_target_df.info()
#new_target_df.head()

#Y is our new_target_df
# Name' 'urslug', 'FIRST APPEARANCE' need to be dropped because they are objects and can't be used in the models. 
#They're being seen as strings.
# 'ALIGN' needs to be dropped b/c it will be used as target variable to classify.

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2097 entries, 0 to 6528
Data columns (total 46 columns):
name                          2097 non-null object
urlslug                       2097 non-null object
ALIGN                         2097 non-null object
APPEARANCES                   2097 non-null float64
FIRST APPEARANCE              2097 non-null object
YEAR                          2097 non-null float64
EYE_Amber Eyes                2097 non-null uint8
EYE_Black Eyes                2097 non-null uint8
EYE_Blue Eyes                 2097 non-null uint8
EYE_Brown Eyes                2097 non-null uint8
EYE_Gold Eyes                 2097 non-null uint8
EYE_Green Eyes                2097 non-null uint8
EYE_Grey Eyes                 2097 non-null uint8
EYE_Hazel Eyes                2097 non-null uint8
EYE_Orange Eyes               2097 non-null uint8
EYE_Photocellular Eyes        2097 non-null uint8
EYE_Pink Eyes                 2097 non-null uint8
EYE_Purple Eyes               2097 

Unnamed: 0,name,urlslug,ALIGN,APPEARANCES,FIRST APPEARANCE,YEAR,EYE_Amber Eyes,EYE_Black Eyes,EYE_Blue Eyes,EYE_Brown Eyes,...,HAIR_Violet Hair,HAIR_White Hair,SEX_Female Characters,SEX_Genderless Characters,SEX_Male Characters,ALIVE_Deceased Characters,ALIVE_Living Characters,ID_Identity Unknown,ID_Public Identity,ID_Secret Identity
0,Batman (Bruce Wayne),\/wiki\/Batman_(Bruce_Wayne),Good Characters,3093.0,"1939, May",1939.0,0,0,1,0,...,0,0,0,0,1,0,1,0,0,1
1,Superman (Clark Kent),\/wiki\/Superman_(Clark_Kent),Good Characters,2496.0,"1986, October",1986.0,0,0,1,0,...,0,0,0,0,1,0,1,0,0,1
2,Green Lantern (Hal Jordan),\/wiki\/Green_Lantern_(Hal_Jordan),Good Characters,1565.0,"1959, October",1959.0,0,0,0,1,...,0,0,0,0,1,0,1,0,0,1
3,James Gordon (New Earth),\/wiki\/James_Gordon_(New_Earth),Good Characters,1316.0,"1987, February",1987.0,0,0,0,1,...,0,1,0,0,1,0,1,0,1,0
4,Richard Grayson (New Earth),\/wiki\/Richard_Grayson_(New_Earth),Good Characters,1237.0,"1940, April",1940.0,0,0,1,0,...,0,0,0,0,1,0,1,0,0,1


In [13]:
pd.Series([1,2,3,'whatever']).head(3).dtype
#didn't change encoding. This is a series of numbers whose data type is an object --with ''

dtype('O')

In [14]:
pd.Series([1,2,3,'whatever']).head(3)

0    1
1    2
2    3
dtype: object

## Note/Advice: to note what data type is "profile" variables before running models. 
pd.Series([1,2,3,'whatever']).dtype

In [16]:
new_target_df

array([1, 1, 1, ..., 1, 1, 2])

In [19]:
print(*new_target_df) #see what values exist

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 2 1 1 0 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 1 1 2 1 1 1 2 1 1 0 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 0 2 1 1 1 0 1 1 1 1 1 1 1 1 2 1 1 1 0 1 0 2 0 2 0 2 1 1 1 1 0 1 1 1 1 1 1 1 2 1 1 1 1 1 1 0 1 1 1 2 1 1 1 0 1 0 2 1 1 1 1 1 1 1 1 1 1 1 2 0 1 0 2 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 0 1 2 1 2 1 0 1 0 1 1 1 1 1 0 1 2 0 1 2 0 0 0 1 1 1 1 1 0 0 2 1 1 1 1 1 2 1 1 2 0 0 0 1 1 1 1 1 1 0 1 0 1 1 1 1 1 1 0 1 1 0 0 1 1 0 0 1 0 1 0 1 1 0 1 1 1 1 1 3 1 0 0 1 1 0 0 1 0 1 1 1 1 1 0 0 1 0 0 1 1 0 1 1 0 1 0 0 1 0 1 0 1 2 1 2 0 1 1 0 1 1 1 0 1 1 1 1 0 1 0 2 1 1 1 2 1 1 0 1 1 2 1 0 2 0 0 0 0 1 1 0 1 0 1 1 1 1 2 0 0 1 0 1 1 1 1 1 2 1 1 1 1 1 1 1 0 1 1 2 2 1 0 1 0 2 1 0 1 0 2 1 1 1 1 0 1 2 2 1 1 1 1 0 1 0 1 1 1 1 1 0 1 1 1 0 0 1 1 1 2 1 1 1 1 2 1 1 0 0 1 1 0 2 1 0 1 1 2 1 1 1 0 1 2 1 1 0 1 1 1 1 1 1 1 2 1 0 1 1 1 1 1 0 1 1 1 0 2 2 0 1 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1 2 1 1 1 0 1 0 0 

In [20]:
print(new_target_df)

[1 1 1 ... 1 1 2]


In [21]:
new_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2097 entries, 0 to 6528
Data columns (total 46 columns):
name                          2097 non-null object
urlslug                       2097 non-null object
ALIGN                         2097 non-null object
APPEARANCES                   2097 non-null float64
FIRST APPEARANCE              2097 non-null object
YEAR                          2097 non-null float64
EYE_Amber Eyes                2097 non-null uint8
EYE_Black Eyes                2097 non-null uint8
EYE_Blue Eyes                 2097 non-null uint8
EYE_Brown Eyes                2097 non-null uint8
EYE_Gold Eyes                 2097 non-null uint8
EYE_Green Eyes                2097 non-null uint8
EYE_Grey Eyes                 2097 non-null uint8
EYE_Hazel Eyes                2097 non-null uint8
EYE_Orange Eyes               2097 non-null uint8
EYE_Photocellular Eyes        2097 non-null uint8
EYE_Pink Eyes                 2097 non-null uint8
EYE_Purple Eyes               2097 

In [22]:
#Drop string columns from both features and target listed as "non-null object" after running .info on new_df.
## Assign new data set to those dropped columns but still includes dummies.
new_df2 = new_df.drop(columns = ['name','urlslug','FIRST APPEARANCE','ALIGN'])

#new_target_df = new_target_df.drop(columns = ['ALIGN']) 

In [23]:
new_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2097 entries, 0 to 6528
Data columns (total 46 columns):
name                          2097 non-null object
urlslug                       2097 non-null object
ALIGN                         2097 non-null object
APPEARANCES                   2097 non-null float64
FIRST APPEARANCE              2097 non-null object
YEAR                          2097 non-null float64
EYE_Amber Eyes                2097 non-null uint8
EYE_Black Eyes                2097 non-null uint8
EYE_Blue Eyes                 2097 non-null uint8
EYE_Brown Eyes                2097 non-null uint8
EYE_Gold Eyes                 2097 non-null uint8
EYE_Green Eyes                2097 non-null uint8
EYE_Grey Eyes                 2097 non-null uint8
EYE_Hazel Eyes                2097 non-null uint8
EYE_Orange Eyes               2097 non-null uint8
EYE_Photocellular Eyes        2097 non-null uint8
EYE_Pink Eyes                 2097 non-null uint8
EYE_Purple Eyes               2097 

In [24]:
# Establish X and Y
X = new_df2
Y = new_target_df

In [25]:
X.info

<bound method DataFrame.info of       APPEARANCES    YEAR  EYE_Amber Eyes  EYE_Black Eyes  EYE_Blue Eyes  \
0          3093.0  1939.0               0               0              1   
1          2496.0  1986.0               0               0              1   
2          1565.0  1959.0               0               0              0   
3          1316.0  1987.0               0               0              0   
4          1237.0  1940.0               0               0              1   
...           ...     ...             ...             ...            ...   
6506          1.0  1963.0               0               0              1   
6508          1.0  1962.0               0               0              0   
6521          1.0  1951.0               0               0              1   
6526          1.0  1941.0               0               0              0   
6528          1.0  1941.0               0               1              0   

      EYE_Brown Eyes  EYE_Gold Eyes  EYE_Green Eyes  EY

In [26]:
Y

array([1, 1, 1, ..., 1, 1, 2])

## Feature Selection

In [27]:
#apply SelectKBest class to extract top 5 best features
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2

bestfeatures = SelectKBest(score_func=chi2, k=5)
fit = bestfeatures.fit(X,Y)

### Split and Train Data
#### Observation: Y should have same number of rows as X: 6898. And Y_train should have same number of rows as X_train: 5172 rows.

In [28]:
#Split data to train and test with 20% sample 
X_train, X_test, Y_train, Y_test = train_test_split(X,Y, test_size=.25,random_state =5)

In [29]:
X_train

Unnamed: 0,APPEARANCES,YEAR,EYE_Amber Eyes,EYE_Black Eyes,EYE_Blue Eyes,EYE_Brown Eyes,EYE_Gold Eyes,EYE_Green Eyes,EYE_Grey Eyes,EYE_Hazel Eyes,...,HAIR_Violet Hair,HAIR_White Hair,SEX_Female Characters,SEX_Genderless Characters,SEX_Male Characters,ALIVE_Deceased Characters,ALIVE_Living Characters,ID_Identity Unknown,ID_Public Identity,ID_Secret Identity
106,222.0,1986.0,0,0,0,1,0,0,0,0,...,0,0,1,0,0,0,1,0,1,0
2809,8.0,1969.0,0,0,0,1,0,0,0,0,...,0,0,0,0,1,0,1,0,0,1
1278,20.0,2002.0,0,1,0,0,0,0,0,0,...,0,0,0,0,1,0,1,0,1,0
897,31.0,1994.0,0,0,0,1,0,0,0,0,...,0,0,0,0,1,0,1,0,0,1
1034,26.0,2004.0,0,0,0,1,0,0,0,0,...,0,0,1,0,0,1,0,0,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5805,1.0,2006.0,0,0,1,0,0,0,0,0,...,0,0,0,0,1,0,1,0,0,1
1191,22.0,1997.0,0,1,0,0,0,0,0,0,...,0,0,0,0,1,1,0,0,0,1
1909,13.0,1982.0,0,0,0,1,0,0,0,0,...,0,0,0,0,1,1,0,0,0,1
3261,6.0,1996.0,0,0,0,0,0,0,0,0,...,0,0,1,0,0,0,1,0,1,0


### Standardized X

## Perceptron Model

In [30]:
X = new_df2
Y = new_target_df
X = X.dropna(axis=1)

In [31]:
X

Unnamed: 0,APPEARANCES,YEAR,EYE_Amber Eyes,EYE_Black Eyes,EYE_Blue Eyes,EYE_Brown Eyes,EYE_Gold Eyes,EYE_Green Eyes,EYE_Grey Eyes,EYE_Hazel Eyes,...,HAIR_Violet Hair,HAIR_White Hair,SEX_Female Characters,SEX_Genderless Characters,SEX_Male Characters,ALIVE_Deceased Characters,ALIVE_Living Characters,ID_Identity Unknown,ID_Public Identity,ID_Secret Identity
0,3093.0,1939.0,0,0,1,0,0,0,0,0,...,0,0,0,0,1,0,1,0,0,1
1,2496.0,1986.0,0,0,1,0,0,0,0,0,...,0,0,0,0,1,0,1,0,0,1
2,1565.0,1959.0,0,0,0,1,0,0,0,0,...,0,0,0,0,1,0,1,0,0,1
3,1316.0,1987.0,0,0,0,1,0,0,0,0,...,0,1,0,0,1,0,1,0,1,0
4,1237.0,1940.0,0,0,1,0,0,0,0,0,...,0,0,0,0,1,0,1,0,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6506,1.0,1963.0,0,0,1,0,0,0,0,0,...,0,0,0,0,1,0,1,0,0,1
6508,1.0,1962.0,0,0,0,0,0,0,0,1,...,0,0,1,0,0,0,1,0,1,0
6521,1.0,1951.0,0,0,1,0,0,0,0,0,...,0,0,0,0,1,0,1,0,1,0
6526,1.0,1941.0,0,0,0,0,0,1,0,0,...,0,0,0,0,1,0,1,0,0,1


In [32]:
Y
#Y should have same number of rows as X: 6898. And Y_train should have same number of rows as X_train: 5172 rows.

array([1, 1, 1, ..., 1, 1, 2])

In [33]:
X_train

Unnamed: 0,APPEARANCES,YEAR,EYE_Amber Eyes,EYE_Black Eyes,EYE_Blue Eyes,EYE_Brown Eyes,EYE_Gold Eyes,EYE_Green Eyes,EYE_Grey Eyes,EYE_Hazel Eyes,...,HAIR_Violet Hair,HAIR_White Hair,SEX_Female Characters,SEX_Genderless Characters,SEX_Male Characters,ALIVE_Deceased Characters,ALIVE_Living Characters,ID_Identity Unknown,ID_Public Identity,ID_Secret Identity
106,222.0,1986.0,0,0,0,1,0,0,0,0,...,0,0,1,0,0,0,1,0,1,0
2809,8.0,1969.0,0,0,0,1,0,0,0,0,...,0,0,0,0,1,0,1,0,0,1
1278,20.0,2002.0,0,1,0,0,0,0,0,0,...,0,0,0,0,1,0,1,0,1,0
897,31.0,1994.0,0,0,0,1,0,0,0,0,...,0,0,0,0,1,0,1,0,0,1
1034,26.0,2004.0,0,0,0,1,0,0,0,0,...,0,0,1,0,0,1,0,0,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5805,1.0,2006.0,0,0,1,0,0,0,0,0,...,0,0,0,0,1,0,1,0,0,1
1191,22.0,1997.0,0,1,0,0,0,0,0,0,...,0,0,0,0,1,1,0,0,0,1
1909,13.0,1982.0,0,0,0,1,0,0,0,0,...,0,0,0,0,1,1,0,0,0,1
3261,6.0,1996.0,0,0,0,0,0,0,0,0,...,0,0,1,0,0,0,1,0,1,0


In [34]:
Y_train

array([1, 1, 1, ..., 1, 1, 1])

In [35]:
# Need to Import Perceptron.
from sklearn.linear_model import Perceptron

# Establish Perceptron Model.
# 10,000 iterations to ensure accuracy since data is non-normalized.
#perceptron = Perceptron(n_iter=10000)
### If running in your own environment on scikit-learn 0.21, run the line of code below instead:
perceptron = Perceptron(max_iter=10000, tol=0, n_iter_no_change=10000)

# Fit Perceptron.
perceptron.fit(X_train, Y_train)



Perceptron(max_iter=10000, n_iter_no_change=10000, tol=0)

In [36]:
Y_train

array([1, 1, 1, ..., 1, 1, 1])

In [37]:
# Get Parameters.
print('Score: ' + str(perceptron.score(X_train, Y_train)))

Score: 0.6062340966921119


### Visualize Perceptron Model's Border

## Random Forest Classifer Model

In [39]:
from sklearn import ensemble
from sklearn.model_selection import cross_val_score

rfc = ensemble.RandomForestClassifier()
X = new_df
Y = new_target_df
X = X.dropna(axis=1)

cross_val_score(rfc, X, Y, cv=10)

Traceback (most recent call last):
  File "/opt/anaconda3/lib/python3.7/site-packages/sklearn/model_selection/_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/opt/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/_forest.py", line 304, in fit
    accept_sparse="csc", dtype=DTYPE)
  File "/opt/anaconda3/lib/python3.7/site-packages/sklearn/base.py", line 432, in _validate_data
    X, y = check_X_y(X, y, **check_params)
  File "/opt/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py", line 73, in inner_f
    return f(**kwargs)
  File "/opt/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py", line 803, in check_X_y
    estimator=estimator)
  File "/opt/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py", line 73, in inner_f
    return f(**kwargs)
  File "/opt/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py", line 599, in check_array
    array = np.asarray(array, order=

array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan])

### Analysis: 
The score cross validation reports is the accuracy of the tree. Here we're about 42% accurate. This was a weak performing classifier model.

In [40]:
from sklearn import ensemble
from sklearn.model_selection import cross_val_score

rfc = ensemble.RandomForestClassifier()
X = new_df2
Y = new_target_df
X = X.dropna(axis=1)

cross_val_score(rfc, X, Y, cv=10)



array([0.52857143, 0.37142857, 0.4047619 , 0.46190476, 0.33809524,
       0.31904762, 0.39047619, 0.34449761, 0.37320574, 0.40669856])

### Analysis: 
We re ran the RFC model with the updated data points. But the score cross validation reports a similar accuracy of the tree. Here we're about still about 42% accurate. This was a weak performing classifier model.