### Classification Quiz

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier 
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.inspection import permutation_importance

import otter
grader = otter.Notebook()

**Question 1**: Read in the `nba_rookies.csv` data file to a pandas dataframe named `rookies`.

<!--
BEGIN QUESTION
name: q1
manual: false
points: 1
-->

In [2]:
rookies = pd.read_csv('./data/nba_rookies.csv')

In [3]:
grader.check("q1")

**Question 2**: Set the `Name` column as the index.

<!--
BEGIN QUESTION
name: q2
manual: false
points: 1
-->

In [4]:
rookies.set_index('Name', inplace = True)

In [5]:
grader.check("q2")

**Question 3**: Convert the column `TARGET_5Yrs` to 0/1 for 'No' and 'Yes'.

<!--
BEGIN QUESTION
name: q3
points: 1
-->

In [6]:
rookies['TARGET_5Yrs'] = np.where(rookies['TARGET_5Yrs'] == 'No', 0, 1)

In [7]:
grader.check("q3")

**Question 4**: Define `X` and `y` as all of the feature columns for `X` and your `TARGET_5Yrs` as `y`. Note that `X` should be a DataFrame type.

<!--
BEGIN QUESTION
name: q4
points: 1
-->

In [8]:
X = rookies.drop(columns = 'TARGET_5Yrs')
y = rookies['TARGET_5Yrs']

In [9]:
grader.check("q4")

**Question 5**: Create train/test split.  Use random state of 22 when splitting the data.

<!--
BEGIN QUESTION
name: q5
points: 1
-->

In [10]:
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 22)

In [11]:
grader.check("q5")

**Question 6**:  Build a `DecisionTreeClassifier` model on your training data.  Assign your model as `dtree`, and use `random_state` = 22. 

<!--
BEGIN QUESTION
name: q6
points: 1
-->

In [12]:
dtree = DecisionTreeClassifier(random_state = 22)

In [13]:
dtree.fit(X_train, y_train)

DecisionTreeClassifier(random_state=22)

In [14]:
grader.check("q6")

**Question 7**: Generate predictions using your `test` data.  Save these to the variable `predictions` below.

<!--
BEGIN QUESTION
name: q7
points: 1
-->

In [15]:
predictions = dtree.predict(X_test)

In [16]:
grader.check("q7")

**Question 8**: Create a `DataFrame` called `predict_df` from your predictions that has a column named `predictions` which is the prediction on your test data, and the index column is the name for the given player. 

<!--
BEGIN QUESTION
name: q8
points: 1
-->

In [17]:
predict_df = pd.DataFrame(predictions, index = y_test.index, columns = ['predictions'])
predict_df

Unnamed: 0_level_0,predictions
Name,Unnamed: 1_level_1
David Lee,1
Sean Williams,1
Travis Williams,1
Trent Tucker,1
Brad Miller,1
...,...
Steve Scheffler,1
Leon Powe,1
Marquis Teague,1
Chris Smith,0


In [18]:
grader.check("q8")

**Question 9**: Compare the importance of your features using both the `.feature_importances_` attribute and the `permutation_importance` method (use your test data and 10 repeats, use `random_state = 22`).  Save the name of the most important feature for each metric to the variables below.

<!--
BEGIN QUESTION
name: q9
points: 1
-->

In [27]:
# pd.DataFrame(X.columns, dtree.feature_importances_)

feat_import_df = pd.DataFrame({'name': X.columns, 
                               'importances': dtree.feature_importances_}).sort_values('importances')

feat_import_df['name'].iloc[-1]

'GP'

In [38]:
pimports = permutation_importance(dtree, X_test, y_test, n_repeats = 10, n_jobs = -1)

In [39]:
sort_idx = pimports.importances_mean.argsort()
sort_idx

array([17,  5,  4, 16, 10, 18, 13, 11, 14,  6,  3,  9,  2, 12,  1,  7,  8,
       15,  0], dtype=int64)

In [40]:
imp_df = pd.DataFrame(pimports.importances[sort_idx].T, columns = X_test.columns[sort_idx])
imp_df.columns[-1]

'GP'

In [41]:
feature_importance_top_feature = feat_import_df['name'].iloc[-1]
permutation_importance_top_feature = imp_df.columns[-1]

In [42]:
grader.check("q9")

---

To double-check your work, the cell below will rerun all of the autograder tests.

In [25]:
grader.check_all()

## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

In [26]:
# Save your notebook first, then run this cell to export your submission.
grader.export()

AssertionError: nb_path not specified and > 1 notebook in working directory