# Logistic Regression

## ANTICIPATED TIME


2 hours

## BEFORE YOU BEGIN


[Multiple Linear Regression](Multiple_Linear_Regression.ipynb)

## WHAT YOU WILL LEARN

- What is the difference between binary and multiclass logistic regression?
- How to use logistic regression for classification?
- How to interpret logistic regression coefficients?
- How to evaluate classification performance?


## DEFINITIONS YOU’LL NEED TO KNOW


- Regression - a model used to predict a number
- Logistic regression - a method used to predict or classify things into different groups based on data.
- Predictors - a variable that is used to predict the values of a dependent/response variable
- Response - the number you are trying to predict
- Classification - grouping things into categories.
Recall - the ratio of true positives to total number of actual positives.
- Precision - measures how many of the samples the model labeled as contaminated were actually contaminated, essentially showing the accuracy of its positive predictions.
- Probability - a chance of something happening.
- Cost-sensitive classification - the idea that some mistakes are worse than others
- Threshold - cut-off point to make a decision.
- Odds - a measure of the probability of an outcome happening.
- Performance evaluation -  checking to see how well logistic regression works, and is done in the same way as other data sorting methods.
- Coefficient - a number that displays how much a predictor variable affects a prediction.
- Binary Logistic Regression - where only two possible outcomes are predicted such as true or false.
- Multiclass Logistic Regression - used when there are more than two categories to classify into.


## SCENARIO:



Aziz knows that the group will learn a lot by using simple linear regression and multiple linear regression to make predictions. He also knows sometimes it’s good to make predictions based on categories as well as numbers. As he looks at his data, there are so many different categories that the group could look at, like: type of vehicle, age of vehicle, types of public transportation, and so much more. To think about how they can use categories to predict different outcomes, he will introduce Logistic Regression to the group. As they understand what logistic regression means, he realizes it will help the group think of the problem and solutions to the problem in new ways.


## WHAT DO I NEED TO KNOW




In the last few notebooks, we’ve been looking at ways to predict numbers.

But what happens if you want to predict a category? Like if a student will get a certain grade (pass/fail…..or….A, B, C, D, F) based on different data (i.e., - how often they attend school, hours of study, number of after school clubs). You’re in luck because **logistic regression** predicts the chance (probability) for a specific category based on the data we look at.

You probably already noticed "regression" in the name "logistic regression." That's because logistic regression is a type of regression - it's very similar to linear regression. There are two differences, however. The first difference is what we are predicting. This is sometimes called **binary logistic regression** because there are two choices (binary). This is where there’s only two possible outcomes such as ‘pass’ or ‘fail’. Or if we are thinking about sports, whether a team will ‘win’ or ‘lose’. Or 0 or 1.

But sometimes **multiclass logistic regression** is used when there are more than two categories, such as student’s grades “pass”, “fail”, or “incomplete”. In the sports example, maybe we can ‘win, ‘lose’, or ‘tie’.

The other difference is what the **coefficients** mean. For logistic regression, the coefficients have a meaning close to probability.


**How can we think about the Coefficients in Logistic Regression?**

To understand logistic regression better, it’s good to think of it as simple or multiple linear regression because it helps us with prediction. However, logistic regression uses the log function - that's where logistic regression gets its name!

You might be more familiar with probabilities than odds, but you have likely seen odds frequently--especially in sports. For example:

1 to 1 means even odds
2 to 1 means twice as likely
4 to 1 means four times as likely
and so on

With linear regression, the predicted value can be really big, even negative. With logistic regression, the predicted value is even easier because it is just between 0 and 1. This is done using the log function, and the coefficients are interpreted as "log odds".

Each coefficient tells us how much log odds change when the predictor changes by one unit.  Log odds compare the chance of one result happening instead of another. When a log odds coefficient is changed to odds, a coefficient of 1.5 means the positive outcome is 1.5 times more likely with a one-unit increase in the variable.

When you sum up the odds of each coefficient and predictor, the model outputs a log-odds score, which we can also think of as a probability. To convert this into a classification (either 0 or 1), we need to use a **threshold**. Typically we use a threshold of .5, so any prediction above .5 is considered 1, and any prediction below .5 is considered 0.

Once we've interpreted the model output as a 1 or 0, we can measure performance using precision, recall, and f-measure (the confusion matrix), as we discussed in the [KNN Classification notebook](KNN_Classification.ipynb).

## YOUR TURN


### Goal 1: Importing the pandas library

Need extra tools to help solve this problem? Well, we can bring in extra ‘libraries’ to help us do extra data science stuff. You can think of it as an ‘add-on’. In this case, we bring in pandas, which is a popular library for doing data science stuff.  

#### Blockly

**Step 1 - Starting the import**

First, we need to set up a “command” to tell the computer what to do. In this case, “command” is to “import” to bring the add-on package in.

Bring in the IMPORT menu, which can be helpful to bring in other data tools. In this case, we're bringing in the **import** block.



**Step 2 - Telling what library to import**

In the text area, we type the name of the library we want to import. A library is like an extra thing we bring in to give us more coding abilities. In our case, we will type out **pandas**, which will bring in some cool data manipulation features.



**Step 3 - Renaming the library so it’s easy to remember**

Once you are done, put the **import** and **package** together in a single variable. This handy feature helps cut down on all the typing later on. You can call it whatever is easiest for you to remember. In the example below, we’ve put everything into **pd**, and we type it in the open area.


**Step 4 - Connect the blocks to run the code**

Connect the blocks and run the code!
	
<details>
    <summary>Click to see the answer...</summary>

![](https://pbs.twimg.com/media/GcxNkjYXkAAlpKD?format=png&name=240x240)
</details>

In [1]:
#blocks code


#### Freehands

**Step 1 - Starting the import**

First, we need to set up a “command” to tell the computer what to do. In this case, “command” is to “import” to bring the add-on package in.


**Step 2 - Telling what library to import**

In the text area, we type the name of the library we want to import. A library is like an extra thing we bring in to give us more coding abilities. In our case, we will type out **pandas**, which will bring in some cool data manipulation features.


**Step 3 - Renaming the library so it’s easy to remember**

Once you are done, put the ‘import’ and ‘package’ together in a single variable. This handy feature helps cut down on all the typing later on. Feel free to use whatever name you want that will help you remember it later on. In the example below, we’ve put everything into **pd**.


**Step 4 - Run the code**

Hit ‘control’ and ‘enter’ at the same time to run the code!
	
<details>
    <summary>Click to see the answer...</summary>

![](https://pbs.twimg.com/media/GZmkVCYWEA4oGso?format=jpg&name=small)
</details>

**Your Turn**: Now it’s your turn! We’re going to dive into the pandas package, which helps us with some really cool data science things. First, let’s import the package and assign it to the variable “pd” to make it easier to use throughout our notebook.



In [3]:
#freehand code 


**Explanation**: *Congrats!  Your attempts finally made it!  Now you have successfully imported the "pandas" package as the variable "pd"*.

### Goal 2: Bringing in the dataframe

Load data into a dataframe in Python, use the pd.read_csv command to read a CSV file, and store it in a variable for easy data manipulation and analysis.

#### Blockly


**Step 1 - Write out the variable name you want to use**

Now that we’re all set with our new package to help us to do cool things, let’s bring the data into a variable called **train**.

In Blockly, bring in the VARIABLES menu.



**Step 2 - Assign the dataframe to the variable you created**

Just like we did before, let’s type out a variable name. Rather than type out the full file name for our data, this easy to remember name will hold the data we bring in.

In Blockly, go to the Variables and drag the Set block for the **train** variable. This will allow us to assign the result of a function call to the variable. A function is basically code that does a specific task for us.



**Step 3 - Bring in the data**

Now we need to look at the file that has all our data. To load our dataframe, we’ll use a simple command to bring in the file we need (CSV….Comma Separated Values). Let’s say we have a file called “datasets/AirQualityTrain.csv" in the folder **‘datasets’**. We’re telling Python to read the CSV file and store it in a variable called **dataframe**.

From the Variable menu, drag a DO block using the **pd** variable, go ahead with the do operation **read_csv**. The read_csv function reads a CSV file and returns a DataFrame object.

In our case, let’s bring in the “datasets/AirQualityTrain.csv" (use the Quotes from the TEXT menu) because that is what Angelina is working with.



**Step 4: Print the variable**

Let’s see it now by ‘printing’ and showing our work.

Drag the **train** variable to the workspace, making it available for further use in our program. This step is more of a visualization step, as it allows us to see the variable in the Blockly workspace.



**Step 5 - Connect the blocks to run the code**

Connect the blocks and run the code!
	
<details>
    <summary>Click to see the answer...</summary>

![](https://pbs.twimg.com/media/GaQOYi_WsAAG6u8?format=png&name=small)
</details>

In [None]:
#blocks code


#### Freehand


**Step 1 - Write out the variable name you want to use**

Now that we’re all set with our new package to help us do cool things, let’s bring the data into a variable called **train**. Think of it as a digital spreadsheet with much more power to analyze and manipulate the data!

Just like we did before, let’s type out a variable name. Rather than type out the full file name for our data, this easy to remember name will hold the data we bring in.



**Step 2 - Bring in the data**

Now we need to look at the file that has all our data.

To load our dataframe, we’ll use a simple command to bring in the file we need (CSV….Comma Separated Values). Let’s say we have a file called ‘VehicleEmissions.csv' in the folder **‘datasets’**. We’re telling Python to read the CSV file and store it in a variable called **train**. For this function, we need to specify the code as train = pd.read_csv(“datasets/AirQualityTrain.csv”)
, which makes the code read the csv file. This variable is now our dataframe!

In our case, let’s bring in the train = pd.read_csv(“datasets/AirQualityTrain.csv”)
 (user the Quotes from the TEXT menu) because that is what Kiana is working with.



**Step 3 - Assign the dataframe to the variable you created**

Just like we did before, let’s type out a variable name. Rather than type out the full file name for our data, this easy to remember name will hold the data we bring in.



**Step 4 - Print the variable**

Let’s see it now by ‘printing’ and showing our work.



**Step 5 - Run the code**

Hit ‘control’ and ‘enter’ at the same time to run the code!
	
<details>
    <summary>Click to see the answer...</summary>

![](https://pbs.twimg.com/media/GZjdif6W0AYKevK?format=png&name=small)
</details>


**Your Turn**: Now it’s your turn!  Let’s dive in and start working with the data!  We’ll begin by loading it into a dataframe, which will allow us to easily interact with and analyze the dataset.


In [4]:
#freehand code 


Unnamed: 0,Contaminated,Methane,NOxEmissions,PM2.5Emissions,VOCEmissions,SO2Emissions,CO2Emissions
0,1,848,960,1367,1784,1745,2445
1,1,1063,968,1627,1736,1785,1888
2,1,771,765,1391,1692,1523,2106
3,1,536,624,1224,1594,1171,2158
4,1,782,789,1333,1732,1529,2213
...,...,...,...,...,...,...,...
1286,1,694,729,1396,1568,1492,2106
1287,0,377,482,1010,1466,935,2242
1288,1,737,685,1292,1659,1436,2308
1289,0,369,478,890,1471,1003,2232




**Explanation**: *Easy-peasy! You have now brought in the dataframe and stored it as a variable that you can reference later on. Now onto the fun part*!

The dataset could be used to train a classification model (logistic regression) that predicts whether conditions will result in contamination based on various emission levels. Alternatively, if all entries are contaminated, the data could support regression models to predict specific emission levels under contaminated conditions or clustering methods to identify patterns in emission profiles among contaminated sites.


### Goal 3: Import the Plotly.Express Library.

We’ve already brought pandas to help with data science. Let’s bring in Plotly Express to help with some fancy-pants visualizations.

#### Blockly

**Step 1 - Starting the import**

First, we need to set up a “command” to tell the computer what to do. In this case, “command” is to “import” to bring the add-on package in.

Bring in the IMPORT menu, which can be helpful to bring in other data tools. In this case, we're bringing in the **import** block.



**Step 2 - Telling what library to import**

In the text area, we type the name of the library we want to import. A library is like an extra thing we bring in to give us more coding abilities. In our case, we will type out the **plotly.express** library. Plotly is a popular library in Python that provides functions for fancy-pants data visualizations.



**Step 3 - Renaming the library so it’s easy to remember**

Once you are done, put the **import** and **plotly.express** together in a single variable. This handy feature helps cut down on all the typing later on. You can call it whatever is easiest for you to remember. In the example below, we’ve put everything into **px** so it’s easier to remember



**Step 4 - Connect the blocks to run the code**

Connect the blocks and run the code!  
	
<details>
    <summary>Click to see the answer...</summary>

![](
https://pbs.twimg.com/media/Gab594HW4AA7zAX?format=png&name=small)
</details>

In [None]:
#blocks code


#### Freehand

**Step 1 - Starting the import**

First, we need to set up a “command” to tell the computer what to do. In this case, “command” is to “import” to bring the add-on package in.



**Step 2 - Telling what library to import**

In the text area, we type the name of the library we want to import. A library is like an extra thing we bring in to give us more coding abilities. In our case, we will type out the **plotly.express** library. Plotly is a popular library in Python that provides functions for fancy-pants data visualizations.



**Step 3 - Renaming the library so it’s easy to remember**

Once you are done, put the ‘import’ and ‘package’ together in a single variable. This handy feature helps cut down on all the typing later on. Feel free to use whatever name you want that will help you remember it later on. In the example below, we’ve put everything into **px**.



**Step 4 - Run the code**

Hit ‘control’ and ‘enter’ at the same time to run the code!
	
<details>
    <summary>Click to see the answer...</summary>

![](https://pbs.twimg.com/media/Gab6AduXQAIEi5X?format=png&name=small)

</details>


**Your Turn:** It’s your turn! Let’s get that library imported.


In [5]:
#freehand code 


**Explanation**: *The line import plotly.express as px imports the Plotly Express library, a high-level interface for creating interactive and dynamic visualizations in Python*.

### Goal 4: Present a scatter plot

Scatter plots help us to look at each data point when it comes to interval ratio data. The scatter plot shows us the relationship between two variables in a data set. The independent variable is plotted on the X-axis, while the dependent variable is plotted on the Y-axis. They are super handy for finding the relationship between different numeric variables.

#### Blockly

**Step 1 -  Call the scatter function from plotly**

To make a scatterplot, we first need to call the scatter function with our plotly library (px).

From the Variables menu drag a DO block for the **px** variable. Select the "**scatter**" function. This specifies the function we want to call, which is the scatter function from the Plotly Express library (imported as "px" earlier).



**Step 2 -  Saying what data to use for the scatter plot**

In order to make a plot, we need to choose its source from which data we want to plot from. In this case, our dataset is stored in the dataframe **train**.

For the first argument, drag from the Variable menu the **train** variable. This allows us to specify a dataframe as and what to look at for the scatter function.



**Step 3 -  Tell plotly what columns to put on the axis**

Identify the two variables you want to look at. One variable will be alongside the X-axis (*across*) and another one alongside the Y-axis (*up and down*). In our context, we want to see the relationship between **’Methane’** and  **‘Contaminated’**. We will assign the variables in the 2 axis in the graph.

From the TEXT menu, drag the Quotes. Type the text **’Methane’**. This specifies TrafficVolume as the x-axis variable for the scatter plot. Also, from the TEXT menu, drag the Quotes. Type the text **Contaminated**.   



**Step 4 - Connect the blocks to run the code**

Connect the blocks and run the code!
	
<details>
    <summary>Click to see the answer...</summary>

![](https://pbs.twimg.com/media/GbpKlvjXMAAZ8UQ?format=png&name=small)

</details>

In [None]:
#blocks code


#### Freehand

**Step 1 - Call the scatter function from plotly**

To make a scatterplot, we first need to call the scatter function with our plotly library (px).

`px.scatter()`


**Step 2 -  Saying what data to use for the scatter plot**

In order to make a plot, we need to choose its source from which data we want to plot from. In this case, our dataset is stored in the dataframe **train**

`px.scatter(train)`


**Step 3 -  Tell plotly what the columns to put on the axis**

Identify the two variables you want to look at. One variable will be alongside the X-axis (*across*) and another one alongside the Y-axis (*up and down*). In our context, we want to see the relationship between **’Methane’** and **‘Contaminated’**. We will assign the variables in the 2 axes in the graph

`px.scatter(train, x=’Methane’, y="Contaminated")`


**Step 4 - Run the code**

Hit ‘control’ and ‘enter’ at the same time to run the code!
	
<details>
    <summary>Click to see the answer...</summary>

![](https://pbs.twimg.com/media/GbpKon7WMAAKuR5?format=png&name=small)

</details>

**Your Turn**: Now it’s your turn! We’re going to create a scatter plot so we can review the correlations of the data. Let’s start with adding in the x and y-axis variables.


In [1]:
#freehand code 


**Explanation**: *The scatter plot shows how methane levels relate to contamination, shown as either 0 (not contaminated) or 1 (contaminated). When methane levels are low (around 400), contamination is mostly at 0, meaning samples are not contaminated. However, as methane levels increase, almost all samples show contamination (1). This suggests that higher methane levels might be linked to contamination. A logistic regression model could help predict whether contamination occurs based on methane levels*.

### Goal 5: Present another scatter plot

Since we are interested in multiple variables, let’s see if we can find any other relationships between other variables.

#### Blockly


**Step 1 -  Call the scatter function from plotly**

To make a scatterplot, we first need to call the scatter function with our plotly library (px).

From the Variables menu, drag a DO block for the **px** variable. Select the "**scatter**" function. This specifies the function we want to call, which is the scatter function from the Plotly Express library (imported as "px" earlier).



**Step 2 -  Saying what data to use for the scatter plot**

In order to make a plot, we need to choose its source from which data we want to plot from. In this case, our dataset is stored in the dataframe **train**.

For the first argument, drag from the Variable menu the **train** variable. This allows us to specify a dataframe and what to look at for the scatter function.



**Step 3 -  Tell plotly what columns to put on the axis**

Identify the two variables you want to look at. One variable will be alongside the X-axis (*across*) and another one alongside the Y-axis (*up and down*). In our context, we want to see the relationship between **’CO2Emissions’** and  **‘Contaminated’**. We will assign the variables to the 2 axes in the graph.

From the TEXT menu, drag the Quotes. Type the text **CO2Emissions**. This specifies TrafficVolume as the x-axis variable for the scatter plot. Also, from the TEXT menu, drag the Quotes. Type the text **Contaminated**.   



**Step 4 - Connect the blocks to run the code**

Connect the blocks and run the code!
	
<details>
    <summary>Click to see the answer...</summary>

![](https://pbs.twimg.com/media/GbpLBlLXMAAUrOZ?format=png&name=small)

</details>

In [None]:
#blocks code


#### Freehand


**Step 1 - Call the scatter function from plotly**

To make a scatterplot, we first need to call the scatter function with our plotly library (px).

`px.scatter()`


**Step 2 -  Saying what data to use for the scatter plot**

In order to make a plot, we need to choose its source from which data we want to plot from. In this case, our dataset is stored in the dataframe **train**

`px.scatter(train)`


**Step 3 -  Tell plotly what the columns to put on the axis**

Identify the two variables you want to look at. One variable will be alongside the X-axis (*across*) and another one alongside the Y-axis (*up and down*). In our context, we want to see the relationship between **’CO2Emissions’** and **‘Contaminated’**. We will assign the variables in the 2 axes in the graph

`px.scatter(train, x=’CO2Emissions’, y="Contaminated")`


**Step 4 - Run the code**

Hit ‘control’ and ‘enter’ at the same time to run the code!

`px.scatter(train,’CO2Emissions’,'Contaminated')`
	
<details>
    <summary>Click to see the answer...</summary>

![](https://pbs.twimg.com/media/GbpLEB1WkAAmLWr?format=png&name=small)
</details>


**Your Turn**: Now it’s your turn! Try creating another scatterplot.


In [2]:
#freehand code 


**Explanation**: *The scatter plot shows that when CO2 emissions are low, most samples are contaminated (indicated by 1). As CO2 emissions increase, contamination becomes less common (more points at 0). A logistic regression could help us predict contamination based on CO2 levels*.

### Goal 6: Import the linear model library

Let’s bring in a library/package to help with the linear regression that will help us with our analysis and other data science tasks.

#### Blockly

**Step 1 - Starting the import**

First, we need to set up a “command” to tell the computer what to do. In this case, “command” is to “import” to bring the add-on package in.

Bring in the IMPORT menu, which can be helpful to bring in other data tools. In this case, we're bringing in the **import** block.



**Step 2 - Telling what library to import**

In the text area, we type the name of the library we want to import. A library is like an extra thing we bring in to give us more coding abilities. In our case, we will type out **sklearn.linear_model**, which will bring in some cool data manipulation features.



**Step 3 - Renaming the library so it’s easy to remember**

Once you are done, put the **import** and **package** together in a single variable. This handy feature helps cut down on all the typing later on. You can call it whatever is easiest for you to remember. In the example below, we’ve put everything into **lm**, and we type it in the open area.



**Step 4 - Connect the blocks to run the code**

Connect the blocks and run the code!
	
<details>
    <summary>Click to see the answer...</summary>

![](https://pbs.twimg.com/media/GbFDOz7XoAAVLtr?format=png&name=small)
</details>

In [None]:
#blocks code


#### Freehand

**Step 1 - Starting the import**

First, we need to set up a “command” to tell the computer what to do. In this case, “command” is to “import” to bring the add-on package in.


**Step 2 - Telling what library to import**

In the text area, we type the name of the library we want to import. A library is like an extra thing we bring in to give us more coding abilities. In our case, we will type out **sklearn.linear_model**, which will bring in some cool data manipulation features.


**Step 3 - Renaming the library so it’s easy to remember**

Once you are done, put the ‘import’ and ‘package’ together in a single variable. This handy feature helps cut down on all the typing later on. Feel free to use whatever name you want that will help you remember it later on. In the example below, we’ve put everything into lm.


**Step 4 - Run the code**

Hit ‘control’ and ‘enter’ at the same time to run the code!
	
<details>
    <summary>Click to see the answer...</summary>

![](https://pbs.twimg.com/media/GbFDMCWW8AABSiy?format=png&name=small)
</details>


Your Turn: Now it’s your turn! We’re going to dive into the sklearn.linear_model package, which helps us with some really cool data science things. First, let’s import the package and assign it to the variable “lm” to make it easier to use throughout our notebook.


In [8]:
#freehand code 


**Explanation**: *With the scikit-learn library, specifically the linear model module, which includes methods for creating linear regression models*.

### Goal 7: Setting up our logistic model

Let’s create a model to help with the training that we will do for our dataset.

#### Blockly


**Step 1 - Assign the dataframe to the variable you created**

Now that we’re all set with our new package to help us to do cool things, let’s bring the data into a variable and call it **logreg**. Think of it as a digital spreadsheet with much more power to analyze and manipulate the data!

In Blockly, bring in the VARIABLES menu. On the "Variables" menu, click Create Variable, type a name for our model, **regr**. Then, drag a "SET" block to the workspace for the created variable. This block allows us to create a new variable and assign a value to it.




**Step 2 - Create the logistic regression model**

Using the neighbors library, we call the **LogisticRegression**() to create the linear regression model.

From the Variable menu, drag a Create block for the lm variable. On the create list box select the option **LogisticRegression**. This specifies the type (class) of object we want to create, which is the **LogisticRegression** from the neighbors module.

Get a Create block for the lm variable from the Variables menu. With that, a new object of the model, **LogisticRegression**, is created. The **LogisticRegression** is a type of regression model that uses one specific number to predict one other number,



**Step 3 - Store the classifier model in a variable**

We can now connect the **logreg** variable with the **LogisticRegression** model.



**Step 4 - Connect the blocks to run the code**

Connect the blocks and run the code!
	
<details>
    <summary>Click to see the answer...</summary>

![](https://pbs.twimg.com/media/GezdbmlWwAADzc3?format=png&name=small)
</details>

In [None]:
#blocks code


#### Freehand

**Step 1 - Create the logistic regression model**

Using the linear model library, we call the LinearRegression() method

`lm.LogisticRegression()`



**Step 2 - Store the regression model in a variable**

Now that we’re all set with our new package to help us to do cool things, let’s bring the data into a variable called **logreg**. Think of it as a digital spreadsheet with much more power to analyze and manipulate the data!

Just like we did before, let’s type out a variable name. Rather than type out the full file name for our data, this easy to remember name will hold the data we bring in.

`logreg = lm.LogisticRegression()`

	
<details>
    <summary>Click to see the answer...</summary>

![](https://pbs.twimg.com/media/GezdSSxXgAAFKtF?format=png&name=small)

</details>


**Your Turn**: Your turn! Give it a go and see what you get!


In [9]:
#freehand code 


**Explanation**:  *You have created a logistic regression model called logreg that we can use to predict whether something falls into one of two categories (like contaminated or not contaminated) based on other data*.


### Goal 8: Train and Score the Classifier Model

Now that we’ve brought in our logistic regression model, let’s train the model to see how it will learn from the data points that we have in the file.

#### Blockly


**Step 1 - Prepare to train the model**

From the Variable menu, drag the DO block for the **logreg** variable, and select the **fit** function as the DO operation. This specifies the function we want to call, which is the fit method of the **Logistic Regression** object.



**Step 2 - Have the training features ready**

The next step for training the model is to select the features to train the classifier. In this step, we select the features and add them as a dataframe in the parameter. In this case, the model will train (learn) the classifier based on these 6 variables and use it to predict the label

From the Lists menu, drag a dictVariable, and select the "train" variable from the list of available variables. Also, from the Lists menu, you will get a Create List block. Using the Gear icon, add up to 6 items. For each one of the items, add a Text (a Quote “” from the Text menu), as follows:  "Methane", "NOxEmissions", "PM2.5Emissions", "VOCEmissions", "SO2Emissions", and "CO2Emissions". These are the feature names applied to train (fit) the model.



**Step 3 - Have the training label ready**

So what is the label that we are trying to predict? Next, we need to add the data labels for the selected features. We add the data labels (**Contaminated** feature) as a parameter in the fit() method.

From the Lists menu, drag a dictVariable, and select the "train" variable from the list of available variables. From the Text menu, get a Quote “” block and add a Text "Contaminated". This is the target value applied to train (fit) the model.



**Step 4 - Measure the correctness of the model**

To measure the correctness of the model, we will use the score method() from the logist_regression library. Just as the previous step, we will just replace the fit() method with the score() method. Based on the ‘fit’, we will try to see how much we were able to predict in our training dataset.

This will give us the models correctness score. A good score will be closer to 1 (ie - 100). Medium might be more like .95 (95% accurate). Not great would be .90 (90%). It depends on the topic you are looking at.

Right-click on the "logreg.fit()" block and select "Duplicate" from the context menu. This creates a copy of the block. Within the duplicated block, click on the method dropdown menu and select "**score**" from the list of available methods. The score method will similarly fit, using the training features and label to measure how much training data was learned.   



**Step 5 - Connect the blocks to run the code**

Connect the blocks and run the code!
	
<details>
    <summary>Click to see the answer...</summary>

![](https://pbs.twimg.com/media/GbpN_CHXcAAuWbb?format=png&name=small)

</details>

In [None]:
#blocks code


#### Freehand


**Step 1 - Prepare to train the model**

To train data using the regression model, we use the model and call the fit() method from it.

`logreg.fit()`



**Step 2 - Have the training features ready**

The next step for training the model is to select the features to train the model. In this step, we choose the features and add them as a parameter dataframe.

`logreg.fit(train[['Methane', 'NOxEmissions', 'PM2.5Emissions', 'VOCEmissions', 'SO2Emissions', 'CO2Emissions']]`


**Step 3 - Have the training label ready**

Next, we need to add the data labels for the selected features. We add the data labels(Contaminated feature) as a parameter in the fit() method.

`logreg.fit(train[['Methane', 'NOxEmissions', 'PM2.5Emissions', 'VOCEmissions', 'SO2Emissions', 'CO2Emissions']],train['Contaminated'])`



**Step 4 - Measure the correctness on the training dataset**

To measure the correctness of the model, we will use the score() method from the linear model library. Just in as the previous step, we will just replace the fit() method with the score() method. This will give us the logistic regression models correctness score.

`logreg.score(train[['Methane', 'NOxEmissions', 'PM2.5Emissions', 'VOCEmissions', 'SO2Emissions', 'CO2Emissions']],train['Contaminated'])`



**Step 5 - Run the code**

Hit ‘control’ and ‘enter’ at the same time to run the code!
	
<details>
    <summary>Click to see the answer...</summary>

![](https://pbs.twimg.com/media/GbpOBENXUAAkUN5?format=jpg&name=small)

</details>


**Your Turn**: Now it’s your turn to try it out!


In [3]:
#freehand code 




**Explanation**:  *You have trained a logistic regression model to predict whether a sample is contaminated based on different factors, like levels of Methane, NOx, PM2.5, VOC, SO2, and CO2 emissions. Also you have calculated the accuracy of this training. The result was 0.96, which means the model correctly predicts contamination 96% of the time on the training data, indicating it’s performing well at identifying which samples are contaminated based on these emissions*.


### Goal 9: Bringing in the Test Data

So we’ve looked at the training dataset to learn something about our data. How about applying it to the rest of the dataset and ‘test’ to see how good our predictions are?

#### Blockly


**Step 1 - Write out the variable name you want to use**

Now that we’re all set with our new package to help us to do cool things, let’s bring the data into a variable and call it ‘**test**’.

In Blockly, bring in the VARIABLES menu.



**Step 2 - Assign the dataframe to the variable you created**

Just like we did before, let’s type out a variable name. Rather than type out the full file name for our data, this easy to remember name will hold the data we bring in.

In Blockly, go to the Variables and drag the Set block for the **test** variable. This will allow us to assign the result of a function call to the variable. A function is basically code that does a specific task for us.



**Step 3 - Bring in the data**

Now we need to look at the file that has all our data. To load our dataframe, we’ll use a simple command to bring in the file we need (CSV….Comma Separated Values). Let’s say we have a file called "datasets/AirQualityTest.csv" in the folder **‘datasets’**. We’re telling Python to read the CSV file and store it in a variable called **test**.

From the Variable menu, drag a DO block using the **pd** variable, go ahead with the do operation **read_csv**. The read_csv function reads a CSV file and returns a DataFrame object.

In our case, let’s bring in the “datasets/AirQualityTest.csv" (use the Quotes from the TEXT menu) because that is what Angelina is working with.



**Step 4 - Print the variable**

Let’s see it now by ‘printing’ and showing our work.

Drag the **test** variable to the workspace, making it available for further use in our program. This step is more of a visualization step, as it allows us to see the variable in the Blockly workspace.



**Step 5 - Connect the blocks to run the code**

Connect the blocks and run the code!
	
<details>
    <summary>Click to see the answer...</summary>

![](https://pbs.twimg.com/media/GaQ0zDqWUAAzB82?format=png&name=small)

</details>

In [None]:
#blocks code


#### Freehand


**Step 1 - Write out the variable name you want to use**

Now that we’re all set with our new package to help us to do cool things, let’s bring the data into a variable called **test**. Think of it as a digital spreadsheet with much more power to analyze and manipulate the data!



**Step 2 - Assign the dataframe to the variable you created**

Just like we did before, let’s type out a variable name. Rather than type out the full file name for our data, this easy to remember name will hold the data we bring in.



**Step 3 - Bring in the data**

Now we need to look at the file that has all our data.

To load our dataframe, we’ll use a simple command to bring in the file we need (CSV….Comma Separated Values). Let’s say we have a file called ‘AirQualityTest.csv' in the folder **‘datasets’**. We’re telling Python to read the CSV file and store it in a variable called **test**. For this function, we need to specify the code as “pd.read_csv”, which makes the code read the csv file. This variable is now our dataframe!

In our case, let’s bring in the “datasets/AirQualityTest.csv” (user the Quotes from the TEXT menu) because that is what the group is working with.



**Step 4 - Print the variable**

Let’s see it now by ‘printing’ and showing our work. Retype the variable name underneath the code and it will print the code. In this case, we will type out the variable name **test**.



**Step 5 - Run the code**

Hit ‘control’ and ‘enter’ at the same time to run the code!
	
<details>
    <summary>Click to see the answer...</summary>

![](https://pbs.twimg.com/media/GaQ0w41WMAA60B_?format=png&name=small)

</details>

**Your Turn**: Now it’s your turn!  Let’s dive in and start working with the data!  We’ll begin by loading it into a dataframe, which will allow us to easily interact with and analyze the dataset.


In [14]:
#freehand code 


Unnamed: 0,Contaminated,Methane,NOxEmissions,PM2.5Emissions,VOCEmissions,SO2Emissions,CO2Emissions
0,0,355,433,917,1472,947,2245
1,1,749,832,1438,1740,1492,1866
2,1,556,635,1166,1599,1282,2144
3,1,479,594,1192,1534,1243,2012
4,0,373,435,867,1416,912,2286
...,...,...,...,...,...,...,...
549,1,543,654,1268,1642,1192,2099
550,1,459,572,1060,1543,1185,2124
551,1,520,633,1218,1627,1103,2137
552,0,387,489,850,1486,1005,2262



**Explanation**: *Easy-peasy! You have now brought in the dataframe and stored it as a variable that you can reference later on. Now onto the fun part*!

*You have loaded now a testing dataset. A test dataset is necessary because it allows us to check how well our model or analysis works on new, unseen data. When we train a model, we use one set of data to learn from (called the training dataset), but we also need to make sure it performs well on different data (the test dataset). This helps us know if the model can generalize to real-world situations, not just the data it was trained on*.

### Goal 10: Predict labels for testing dataset (ie - rest of the data)

So far, we’ve taken a smaller part of all our data to train and try and learn something about it. Can we take what we’ve learned from the training and use it to predict the rest of our dataset?

#### Blockly


**Step 1 - Write out the variable name you want to use**

Now that we’re all set with our new package to help us to do cool things, let’s bring the data into a variable and call it **predictions**.

From the Variables menu, click Create Variable, and type **predictions**. On the same menu, drag the Set block of the prediction variable. This variable will hold the result of the prediction.



**Step 2 - Prepare the predict operation**

So let’s take the **logreg** variable from before and try to predict the label of the new dataset (contaminated, not contaminated). Let’s start by using the predict() method from the logistic regression model.

From the Variables menu, get a DO block, for the **logreg** variable. With that, select the operation **predict**.



**Step 3 - Set the test features**

Inside the predict() method, we provide the test features from the test data. This will use the 6 features (ie - columns) to predict the labels.

From the Lists menu, drag a dictVariable,and select the "test" variable from the list of available variables. Also, from the Lists menu, you will get a Create List block. Using the Gear icon, add up to 6 items. For each one of the items, add a Text (a Quote “” from the Text menu), as follows:  "Methane", "NOxEmissions", "PM2.5Emissions", "VOCEmissions", "SO2Emissions", and "CO2Emissions". These are the feature names applied to predict the target label on the testing dataset. Store the output of the logistic regression prediction in the "predictions" variable. This variable will now hold the result of the prediction.



**Step 4 - Assign the predictions to the variable you created**

Just like we did before, let’s type out a variable name. Rather than type out the full file name for our data, this easy to remember name will hold the data we bring in.

In Blockly, go to the Variables and drag the Set block for the **dataframe** variable. This will allow us to assign the result of a function call to the variable.

Next, we store the prediction labels into a variable **‘predictions’**. To do that, we have to connect the SET predictions variable to the **logreg.predict**() block.




**Step 5 - Display the predictions**

Let’s see it now by ‘printing’ and showing our work.

Drag the **predictions** variable to the workspace, making it available for further use in our program. This step is more of a visualization step, as it allows us to see the variable in the Blockly workspace.

Finally, we display the prediction labels using ‘predictions’

From the Variables menu, drag the "predictions" variable. This will show the result of the logistic regression predictions.



**Step 6 - Connect the blocks to run the code**

Connect the blocks and run the code!
	
<details>
    <summary>Click to see the answer...</summary>

![](https://pbs.twimg.com/media/GbpPMAcXkAAedl3?format=png&name=small
)
</details>

In [None]:
#blocks code


#### Freehand


**Step 1 - Prepare the predict operation**

So let’s take the **logreg** variable from before and try to predict the label of the new dataset (contaminated, not contaminated. Let’s start by using the predict() method from the logistic regression model.

`logreg.predict() `



**Step 2 - Set the test features **

Inside the predict() method, we provide the test features from the test data.

`logreg.predict(test[['Methane', 'NOxEmissions', 'PM2.5Emissions', 'VOCEmissions', 'SO2Emissions', 'CO2Emissions']])`



**Step 3 - Assign the predictions to the variable you created**

Next, we store the prediction labels into a variable ‘predictions’ Just like we did before, let’s type out a variable name. Rather than type out the full file name for our data, this easy to remember name will hold the data we bring in.

Next, we store the prediction labels into a variable ‘predictions’

`predictions = logreg.predict(test[['Methane', 'NOxEmissions', 'PM2.5Emissions', 'VOCEmissions', 'SO2Emissions', 'CO2Emissions']])`



**Step 4 - Display the predictions**

Finally, we display the prediction labels using ‘predictions.’

`predictions`




**Step 5 - Run the code**

Hit ‘control’ and ‘enter’ at the same time to run the code!
	
<details>
    <summary>Click to see the answer...</summary>

![](https://pbs.twimg.com/media/GbpPN8qWQAAnoYB?format=jpg&name=medium
)

</details>



**Your Turn**: Now it’s your turn!  Let’s dive in and start working with the data!  We’ll begin by loading it into a dataframe, which will allow us to easily interact with and analyze the dataset.


In [15]:
#freehand code 


array([0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1,
       1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1,
       0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0,
       1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0,
       1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0,
       1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0,
       0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1,
       1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1,
       1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1,
       1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0,
       0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1,

**Explanation**: *Now, with the test dataset, you used the logistic regression model (logreg) to predict whether each sample in the test dataset is contaminated or not based on its levels of methane, NOx emissions, PM2.5 emissions, VOC emissions, SO2 emissions, and CO2 emissions*.

### Goal 11: Bringing in SKLearn Metrics to Help Look at Performance of Predictions

So we’ve tried to predict on our new dataset. How well did we do? Let’s use SKLearn Metrics to help us think through that.

#### Blockly

**Step 1 - Starting the import**

First, we need to set up a “command” to tell the computer what to do. In this case, “command” is to “import” to bring the add-on package in.

Bring in the IMPORT menu, which can be helpful to bring in other data tools. In this case, we're bringing in the **import** block.



**Step 2 - Telling what library to import**

In the text area, we type the name of the library we want to import. A library is like an extra thing we bring in to give us more coding abilities. In our case, we will type out **sklearn.metrics**, which is a tool that grades your machine learning model's performance, telling you how well it did on its test..



**Step 3 - Renaming the library so it’s easy to remember**

Once you are done, put the **import** and **package** together in a single variable. This handy feature helps cut down on all the typing later on. You can call it whatever is easiest for you to remember. In the example below, we’ve put everything into **metrics**, and we type it in the open area.



**Step 4 - Connect the blocks to run the code**

Connect the blocks and run the code!
	
<details>
    <summary>Click to see the answer...</summary>

![](https://pbs.twimg.com/media/GaQ5swrXUAAEJ-D?format=png&name=small)
</details>

In [None]:
#blocks code


#### Freehand

**Step 1 - Starting the import**

First, we need to set up a “command” to tell the computer what to do. In this case, “command” is to “import” to bring the add-on package in.


**Step 2 - Telling what library to import**

In the text area, we type the name of the library we want to import. A library is like an extra thing we bring in to give us more coding abilities. In our case, we will type out **sklearn.metrics**, which is a tool that grades your machine learning model's performance, telling you how well it did on its test.



**Step 3 - Renaming the library so it’s easy to remember**

Once you are done, put the ‘import’ and ‘package’ together in a single variable. This handy feature helps cut down on all the typing later on. Feel free to use whatever name you want that will help you remember it later on. In the example below, we’ve put everything into **metrics**.


**Step 4 - Run the code**

Hit ‘control’ and ‘enter’ at the same time to run the code!
	
<details>
    <summary>Click to see the answer...</summary>

![](https://pbs.twimg.com/media/GaQ5u3ZWsAA_df1?format=png&name=small)
</details>

**Your Turn**: Test it out yourself! We set up a command to tell our computer what to do and after our hard work we’ll run what we have to see our data science major at work!


In [16]:
#freehand code 


**Explanation**: *The metrics library provides tools to measure and evaluate the performance of machine learning models. By using `metrics`, we can check how well our model is working, like seeing how accurate it is or how well it groups data in clustering. This helps us understand if our model is doing a good job or if it needs improvement*.

## Assessing the performance of the classifier


So how well did our predictions do? Let’s calculate three steps here performance of predictions on testing dataset - accuracy, confusion matrix, and precision/recall.


### Goal 12: Assessing the Performance of the Predictions on Test Dataset Using Accuracy Score

So, how well did our predictions do on our test data? Let’s calculate the accuracy score to give us an idea.

#### Blockly


**Step 1 - Call the accuracy_score() method using the metrics library**

To calculate the accuracy of the model predictions, we will use the **accuracy_score**() function from the metrics library.  

From the Variables menu, drag a DO block for the metrics variable. Select the accuracy_score function from the metrics list of operations. This function takes two inputs: the true labels and the predicted labels.



**Step 2 - Calculate logistic regression model’s accuracy**

The accuracy_score() function takes 2 parameters to calculate the accuracy score and helps measure the percentage of correct predictions. So let’s compare **contaminated** from the test dataset and **predictions** from the model we just created.

From the Lists menu, get a dictVariable block and select the test variable. From the Text men,u get a Quote “” block to inform the label name **”Contaminated”**. This list will be used as the true labels for the accuracy calculation.

As the second parameter of the **accuracy_score** get the variable **predictions**.  The accuracy score function will calculate the accuracy of the model by comparing the true labels with the predicted labels. The result will be a score that indicates the performance of the model.



**Step 3 - Connect the blocks to run the code**

Connect the blocks and run the code!
	
<details>
    <summary>Click to see the answer...</summary>

![](https://pbs.twimg.com/media/GaQ8AdSWIAEPAL_?format=png&name=small)

</details>

In [None]:
#blocks code


#### Freehand

**Step 1 - Call the accuracy_score() method using the metrics library**

To calculate the accuracy of the model predictions, we will use the accuracy_score() method from the metrics library.  This accuracy score will measure the percentage of correct predictions.

`metrics.accuracy_score()`



**Step 2 - Calculate logistic regression model’s accuracy**

The accuracy_score() function takes 2 parameters to calculate the accuracy score and help measure the percentage of correct predictions. So let’s compare **contaminated** from the test dataset and **predictions** from the model we just created.

`metrics.accuracy_score(test['Contaminated'],predictions)`



**Step 3 - Run the code**

Hit ‘control’ and ‘enter’ at the same time to run the code!
	
<details>
    <summary>Click to see the answer...</summary>

![](https://pbs.twimg.com/media/GaQ7bb5WMAAad1n?format=png&name=small)

</details>


**Your Turn**:  Have a go at it! Once you begin you’ll be able to assess the performance of the predictions!


In [17]:
#freehand code 


0.9657039711191335

**Explanation**: *The accuracy score tells us the percentage of correct predictions out of the total. A higher accuracy score means the model is doing a good job matching the actual labels*.

### Goal 13: Assessing the Performance of the Predictions on Test Dataset Using the Confusion Matrix

So before we looked at the accuracy score to give us an idea about how well our predictions did on the test data. How about let’s look at the errors (false positives and false negatives) in the confusion matrix to explore further.

#### Blockly


**Step 1 - Call the confusion_matrix() function using the metrics library**

To break down the accuracy, let’s get the numbers from the confusion matrix using the **confusion_matrix**() function from the metrics library.  

From the Variables menu, drag a DO block for the metrics variable. Select the confusion_matrix function from the metrics list of operations. This function takes two inputs: the true labels and the predicted labels.



**Step 2 - Calculate logistic regression model’s confusion matrix**

The confusion_matrix () function takes 2 parameters to explore different parts of the confusion matrix. So let’s compare **Contaminated** from the test dataset and **predictions** from the model we just created. The confusion matrix will tell us these numbers - TP (true positive), TN (true negative), FP (false positive), and FN (false negative).

From the Lists menu, get a dictVariable block and select the test variable. From the Text menu, get a Quote “” block to inform the label name ”Contaminated”. This list will be used as the true labels for the confusion matrix calculation. As the second parameter of the **confusion_matrix** get the variable **predictions**.



**Step 3 - Connect the blocks to run the code**

Connect the blocks and run the code!
	
<details>
    <summary>Click to see the answer...</summary>

![](https://pbs.twimg.com/media/GaQ-UXxWgAAwYlY?format=png&name=small)
</details>

In [None]:
#blocks code


#### Freehand

**Step 1 - Call the confuction_matrix() method from the metrics library**

To calculate the **confuction_matrix**() of the model predictions, we will use the confusion_matrix() function from the metrics library.

`metrics.confusion_matrix()`



**Step 2 - Calculate logistic regression model’s confusion matrix**

The confusion_matrix () function takes 2 parameters to explore different parts of the confusion matrix. So let’s compare **Contaminated** from the test dataset and **predictions** from the model we just created. The confusion matrix will tell us these numbers - TP (true positive), TN (true negative), FP (false positive), and FN (false negative).

- Test data labels: **test[‘Contaminated’]**
- The predicted labels: **predictions**

`metrics.confusion_matrix(test['Contaminated'],predictions)`



**Step 3 - Run the code**

Hit ‘control’ and ‘enter’ at the same time to run the code!
	
<details>
    <summary>Click to see the answer...</summary>

![](https://pbs.twimg.com/media/GaQ-R7IW8AA9YQw?format=png&name=small)
</details>


**Your Turn**: Let’s type in the code and see what our performance looks like! What do you see?


In [18]:
#freehand code 


array([[160,  12],
       [  7, 375]])

**Explanation**: *This matrix indicates that the model correctly identified 169 instances as "Not Contaminated" and 381 instances as "Contaminated." However, it made 3 false positive errors (predicting "Contaminated" when it was actually "Not Contaminated") and 1 false negative error (predicting "Not Contaminated" when it was actually "Contaminated"). This breakdown helps understand the model's accuracy and the types of mistakes it makes*.

### Goal 14: Assessing the Performance of the Predictions on the Test Dataset using recall and precision

Let’s look further to compare how many we predicted true positives and compare it with our false positives (precision). Also, let’s look at true positives with false negatives (recall).

#### Blockly


**Step 1 - Call the classification_report() function from the metrics library**

To calculate the classification_report() for the model, we will use the classification_report() function from the metrics library.

From the Variables menu, drag a DO block for the metrics variable. Select the **classification_report** function from the metrics list of operations. This function takes two inputs: the true labels and the predicted labels.



**Step 2 - Saying what parameters to use for the classification report**

The classification_report() method takes 2 parameters to calculate the classification report.

From the Lists menu, get a dictVariable block and select the test variable. From the Text menu, get a Quote “” block to inform the label name **”Contaminated”**. This list will be used as the true labels for the accuracy calculation. As the second parameter of the **classification_report** function, get the variable **predictions**.



**Step 3 - Print the classification report**

Connect the metrics block to a Print block (from the Text menu). The classification report will include other metrics like precision and recall.



**Step 4 - Connect the blocks to run the code**

Connect the blocks and run the code!
	
<details>
    <summary>Click to see the answer...</summary>

![](https://pbs.twimg.com/media/GaRCnOcX0AAPsn4?format=png&name=small)
</details>

In [None]:
#blocks code


#### Freehand

**Step 1 - Call the classification_report() function from the metrics library**

To calculate the classification_report() for the model, we will use the classification_report() function from the metrics library.

`metrics.classification_report()`




**Step 2 - Saying what parameters to use for the classification report**

The classification_report() method takes 2 parameters to calculate the classification report

`metrics.classification_report(test['Contaminated'],predictions)`



**Step 3 - Print the classification report**

Use the print function to show the results of the classification report.

`print(metrics.classification_report(test['Contaminated'],predictions))`



**Step 4 - Run the code**

Hit ‘control’ and ‘enter’ at the same time to run the code!
	
<details>
    <summary>Click to see the answer...</summary>

![](https://pbs.twimg.com/media/GaRClCEWAAAMfW3?format=png&name=small)

</details>

**Your Turn**: Let’s try it! First, we tried the accuracy score and confusion matrix, and now we have recall and precision. The possibilities are endless!

In [19]:
#freehand code 


              precision    recall  f1-score   support

           0       0.96      0.93      0.94       172
           1       0.97      0.98      0.98       382

    accuracy                           0.97       554
   macro avg       0.96      0.96      0.96       554
weighted avg       0.97      0.97      0.97       554





**Explanation**: *The classification report provides some metrics, particularly precision and recall. Precision measures how many of the samples the model labeled as contaminated were actually contaminated, essentially showing the accuracy of its positive predictions. Recall, on the other hand, reflects the model's ability to detect all actual contaminated samples, indicating how well it "found" the true cases. In this report, the precision and recall scores are both high, at 0.99 or 99% for each class, which demonstrates that the model is highly accurate in identifying both contaminated and non-contaminated samples. These scores mean that nearly all positive predictions made by the model were correct, and it missed almost none of the actual contaminated cases*.

## WHAT DID YOU LEARN?


In the lesson on logistic regression, you learned a lot of valuable concepts and skills. We started by understanding what logistic regression is and how it differs from linear regression. Then, we dove into interpreting the coefficients of a logistic regression model and why they are important. We also covered the basics of probability, odds, and log odds, which are essential for grasping logistic regression. Additionally, we explored how to evaluate the performance of a logistic regression model using various metrics. Finally, you got hands-on experience with practical exercises in Python and scikit-learn, applying logistic regression to real-world datasets and making predictions. This comprehensive approach ensured that you not only understood the theory but also gained practical skills.



## WHAT’S NEXT?


[Decision Trees](Decision_Trees.ipynb)



## TELL ME MORE


- [Datawhys Decision Trees Notebook](https://github.com/memphis-iis/datawhys-content-notebooks-python/blob/master/Logistic-regression.ipynb)
- [Datawhys Decision Trees Problem-Solving Notebook](https://github.com/memphis-iis/datawhys-content-notebooks-python/blob/master/Logistic-regression-PS.ipynb)
