# NBA Naive Bayes model 

## Introduction

In this activity, you will build your own Naive Bayes model. Naive Bayes models can be valuable to use any time you are doing work with predictions because they give you a way to account for new information. In today's world, where data is constantly evolving, modeling with Naive Bayes can help you adapt quickly and make more accurate predictions about what could occur.

For this activity, you work for a firm that provides insights for management and coaches in the National Basketball Association (NBA), a professional basketball league in North America. The league is interested in retaining players who can last in the high-pressure environment of professional basketball and help the team be successful over time. In the previous activity, you analyzed a subset of data that contained information about the NBA players and their performance records. You conducted feature engineering to determine which features would most effectively predict a player's career duration. You will now use those insights to build a model that predicts whether a player will have an NBA career lasting five years or more. 

The data for this activity consists of performance statistics from each player's rookie year. There are 1,341 observations, and each observation in the data represents a different player in the NBA. Your target variable is a Boolean value that indicates whether a given player will last in the league for five years. Since you previously performed feature engineering on this data, it is now ready for modeling.   

## Step 1: Imports

### Import packages

Begin with your import statements. Of particular note here are `pandas` and from `sklearn`, `naive_bayes`, `model_selection`, and `metrics`.

In [None]:
# Import relevant libraries and modules.

### Load the dataset

Recall that in the lab about feature engineering, you outputted features for the NBA player dataset along with the target variable ``target_5yrs``. Data was imported as a DataFrame called `extracted_data`. As shown in this cell, the dataset has been automatically loaded in for you. You do not need to download the .csv file, or provide more code, in order to access the dataset and proceed with this lab. Please continue with this activity by completing the following instructions.

In [None]:
# RUN THIS CELL TO IMPORT YOUR DATA.
# Load extracted_nba_players_data.csv into a DataFrame called extracted_data.

extracted_data = pd.read_csv('extracted_nba_players_data.csv')

### Display the data

Review the first 10 rows of data.

In [None]:
# Display the first 10 rows of data.

### YOUR CODE HERE ###



## Step 2: Model preparation

### Isolate your target and predictor variables
Separately define the target variable (`target_5yrs`) and the features.

In [None]:
# Define the y (target) variable.

### YOUR CODE HERE ###


# Define the X (predictor) variables.

### YOUR CODE HERE ###


### Display the first 10 rows of your target data

Display the first 10 rows of your target and predictor variables. This will help you get a sense of how the data is structured.

In [None]:
# Display the first 10 rows of your target data.

### YOUR CODE HERE ###



**Question:** What do you observe about the your target variable?


[Write your response here. Double-click (or enter) to edit.]

In [None]:
# Display the first 10 rows of your predictor variables.

### YOUR CODE HERE ###


**Question:** What do you observe about the your predictor variables?

[Write your response here. Double-click (or enter) to edit.]

### Perform a split operation on your data

Divide your data into a training set (75% of data) and test set (25% of data). This is an important step in the process, as it allows you to reserve a part of the data that the model has not observed. This tests how well the model generalizes—or performs—on new data.

In [None]:
# Perform the split operation on your data.
# Assign the outputs as follows: X_train, X_test, y_train, y_test.

### YOUR CODE HERE ###



### Print the shape of each output 

Print the shape of each output from your train-test split. This will verify that the split operated as expected.

In [None]:
# Print the shape (rows, columns) of the output from the train-test split.

# Print the shape of X_train.

### YOUR CODE HERE ###



# Print the shape of X_test.

### YOUR CODE HERE ###



# Print the shape of y_train.

### YOUR CODE HERE ###



# Print the shape of y_test.

### YOUR CODE HERE ###



**Question:** How many rows are in each of the outputs?


[Write your response here. Double-click (or enter) to edit.]

**Question:** What was the effect of the train-test split?


[Write your response here. Double-click (or enter) to edit.]

## Step 3: Model building

**Question:** Which Naive Bayes algorithm should you use?

[Write your response here. Double-click (or enter) to edit.]

### Fit your model to your training data and predict on your test data

By creating your model, you will be drawing on your feature engineering work by training the classifier on the `X_train` DataFrame. You will use this to predict `target_5yrs` from `y_train`.

Start by defining `nb` to be the relevant algorithm from `sklearn`.`naive_bayes`. Then fit your model to your training data. Use this fitted model to create predictions for your test data.

In [None]:
# Assign `nb` to be the appropriate implementation of Naive Bayes.

### YOUR CODE HERE ###



# Fit the model on your training data.

### YOUR CODE HERE ###



# Apply your model to predict on your test data. Call this "y_pred".

### YOUR CODE HERE ###



## Step 4: Results and evaluation


### Leverage metrics to evaluate your model's performance

To evaluate the data yielded from your model, you can leverage a series of metrics and evaluation techniques from scikit-learn by examining the actual observed values in the test set relative to your model's prediction. Specifically, print the accuracy score, precision score, recall score, and f1 score associated with your test data and predicted values.

In [None]:
# Print your accuracy score.

### YOUR CODE HERE ###



# Print your precision score.

### YOUR CODE HERE ###



# Print your recall score.

### YOUR CODE HERE ###



# Print your f1 score.

### YOUR CODE HERE ###



**Question:** What is the accuracy score for your model, and what does this tell you about the success of the model's performance?



[Write your response here. Double-click (or enter) to edit.]

**Question:** Can you evaluate the success of your model by using the accuracy score exclusively?


[Write your response here. Double-click (or enter) to edit.]

**Question:** What are the precision and recall scores for your model, and what do they mean? Is one of these scores more accurate than the other?


[Write your response here. Double-click (or enter) to edit.]

**Question:** What is the F1 score of your model, and what does this score mean?

[Write your response here. Double-click (or enter) to edit.]

### Gain clarity with the confusion matrix

Recall that a confusion matrix is a graphic that shows your model's true and false positives and negatives. It helps to create a visual representation of the components feeding into the metrics.

Create a confusion matrix based on your predicted values for the test set.

In [None]:
# Construct and display your confusion matrix.

# Construct the confusion matrix for your predicted and test values.

### YOUR CODE HERE ###



# Create the display for your confusion matrix.

### YOUR CODE HERE ###



# Plot the visual in-line.

### YOUR CODE HERE ###



**Question:** What do you notice when observing your confusion matrix, and does this correlate to any of your other calculations?


[Write your response here. Double-click (or enter) to edit.]

## Considerations

**What are some key takeaways that you learned from this lab?**

[Write your response here. Double-click (or enter) to edit.]


**How would you present your results to your team?**

[Write your response here. Double-click (or enter) to edit.]


**How would you summarize your findings to stakeholders?**

[Write your response here. Double-click (or enter) to edit.]

