# Plotting

## ANTICIPATED LESSON TIME


2 hours


## BEFORE YOU BEGIN:


[Data Science and the Nature of Data](Datascience_and_the_Nature_of_Data.ipynb)


## WHAT YOU WILL LEARN



- What is data visualization?
- Why is data visualization important, and how do we apply it?
- How to identify the different levels of measurement (nominal, - ordinal, interval, ratio).
- How to create and interpret plots like scatter, bar, line, and histograms.
- How are plots created and interpreted?
- How is data analyzed through loading and manipulating data in a dataframe? (e.g., probability distributions)


## SCENARIO



Ethan recently took a trip to the Netherlands as a foreign exchange student. He quickly noticed that most people use public transportation. He also noticed how clean the air quality is compared to his home. He learned that the Netherlands took steps to improve their public transportation system, and now, they have one of the best public transportation systems in the world, which improved their air quality over time.  

Ethan thought about his bus route to school and how much traffic he encounters when he is home. He wants to use data science to help analyze what causes the delays and potential pollution problems in his hometown. Using data science will help him analyze traffic data in both countries, such as vehicle speeds and traffic flow, to improve traffic management and reduce congestion. Comparing data from both countries will help him find solutions to improving his quality of life at home.


## WHAT DO I NEED TO KNOW?



The thing about data is that it can add up really quickly. So one way to solve the problem is by trying to visualize the data, which can help us see patterns we might miss. Data visualization can include things like charts, plots, graphs, and maps (and many more). You will notice the type of variable (nominal, ordinal, interval, and ratio) that we described in the [Data Science and the Nature of Data](Data_Science_and_the_Nature_of_Data.ipynb) will be really helpful for certain data visualizations.

So…what are the four most often used plots?


| Kind of Visualization | What it Means | What it Looks Like |
| :-------------------- | ------------- | -----------------: |
| Scatter Plots         | *What does all my data look like?* <br><br> Scatter plots include all of the data points along the Y <br> (numbers up and down) and X (numbers across) axis | ![scatter](https://pbs.twimg.com/media/GZaad5YWsAA1eWu?format=png&name=900x900) |
| Bar Plots             |   *How many are in each category?* <br><br> Bar plots use ‘bars’ to show a value for a specific variable.<br>The Y axis are the totals, while the X axis are the categories. | ![bar](https://pbs.twimg.com/media/GZaaY1bWMAALgUZ?format=png&name=900x900) |
| Line Plots            |  *What do the trends look like?* <br><br> Line plots are for non-categorical variables, (e.g., numeric variables).<br>You can have multiple lines on a single plot,<br>in which case each line is a category like bar plots. | ![line](https://pbs.twimg.com/media/GZaaQiwXUAAP6WU?format=png&name=900x900) |
| Histogram             |  How many in a specific number range <br><br>Histograms show the count of a single non-categorical variable. <br> Be careful though. While it looks similar to Bar Plots,<br> the X axis describes how many are grouped in a specific number range | ![hist](https://pbs.twimg.com/media/GZaaJE3XYAAjKcJ?format=png&name=900x900) |


Our first thing to do is bring in (import) the package we want. This “package” gives us special features that help us solve Ethan’s problem. Think of it as kind of an ‘add-on’ that gives you extra data science help. We’ll give that package an easy-to-remember name to make things even more accessible.

## YOUR TURN


Now that you understand plotting and how it helps us to visualize or see data, it’s time to practice applying these concepts! As you go through this, think about how tools like scatter plots, bar plots, line plots, and histograms help us identify patterns and trends so that we can make decisions based on our data!

### Goal 1: Importing the pandas Library

We are going to do more data science stuff, so let’s bring in the pandas library again.

#### Blockly

**Step 1 - Starting the import**

First, we need to set up a “command” to tell the computer what to do. In this case, “command” is to “import” to bring the add-on package in.

Bring in the IMPORT menu, which can be helpful to bring in other data tools. In this case, we're bringing in the **import** block.





**Step 2 - Telling what library to import**

In the text area, we type the name of the library we want to import. A library is like an extra thing we bring in to give us more coding abilities. In our case, we will type out **pandas**, which will bring in some cool data manipulation features.



**Step 3 - Renaming the library so it’s easy to remember**

Once you are done, put the **import** and **package** together in a single variable. This handy feature helps cut down on all the typing later on. You can call it whatever is easiest for you to remember. In the example below, we’ve put everything into **pd**, and we type it in the open area.




**Step 4 - Connect the blocks to run the code**

Connect the blocks and run the code!

![](https://pbs.twimg.com/media/GcxNkjYXkAAlpKD?format=png&name=240x240)


In [4]:
#blocks code


#### Freehand


**Step 1 - Starting the import**

First, we need to set up a “command” to tell the computer what to do. In this case, “command” is to “import” to bring the add-on package in.


**Step 2 - Telling what library to import:**

In the text area, we type the name of the library we want to import. A library is like an extra thing we bring in to give us more coding abilities. In our case, we will type out **pandas**, which will bring in some cool data manipulation features.


**Step 3 - Renaming the library so it’s easy to remember**

Once you are done, put the ‘import’ and ‘package’ together in a single variable. This handy feature helps cut down on all the typing later on. Feel free to use whatever name you want that will help you remember it later on. In the example below, we’ve put everything into **pd**.


**Step 4 - Run the code**

Hit ‘control’ and ‘enter’ at the same time to run the data science magic!


![](https://pbs.twimg.com/media/GZmkVCYWEA4oGso?format=jpg&name=small)


**Your Turn**: Now it’s your turn! We’re going to dive into the pandas package, which helps us with some really cool data science things. First, let’s import the package and assign it to the variable “pd” to make it easier to use throughout our notebook.

In [1]:
import pandas as pd

**Explanation**: *Congrats!  Your attempts finally made it!  Now you have successfully imported the "pandas" package as the variable "pd".*

### Goal 2: Import the Plotly.

Express Library We’ve already brought pandas to help with data science. Let’s bring in Plotly Express to help with some fancy-pants visualizations.


#### Blockly

**Step 1 - Starting the import**

First, we need to set up a “command” to tell the computer what to do. In this case, “command” is to “import” to bring the add-on package in.

Bring in the IMPORT menu, which can be helpful to bring in other data tools. In this case, we're bringing in the **import** block.



**Step 2 - Telling what library to import**

In the text area, we type the name of the library we want to import. A library is like an extra thing we bring in to give us more coding abilities. In our case, we will type out the **plotly.express** library. Plotly is a popular library in Python that provides functions for fancy-pants data visualizations.



**Step 3 - Renaming the library so it’s easy to remember**

Once you are done, put the **import** and **plotly.express** together in a single variable. This handy feature helps cut down on all the typing later on. You can call it whatever is easiest for you to remember. In the example below, we’ve put everything into px so it’s easier to remember.



**Step 4 - Connect the blocks to run the code**

Connect the blocks and run the code!

![](https://pbs.twimg.com/media/GaON4A9XsAANFz2?format=png&name=240x240
)

In [5]:
#blocks code


#### Freehand

**Step 1 - Starting the import**

First, we need to set up a “command” to tell the computer what to do. In this case, “command” is to “import” to bring the add-on package in.



**Step 2 - Telling what library to import**

In the text area, we type the name of the library we want to import. A library is like an extra thing we bring in to give us more coding abilities. In our case, we will type out the **plotly.express** library. Plotly is a popular library in Python that provides functions for fancy-pants data visualizations.


**Step 3 - Renaming the library so it’s easy to remember**

Once you are done, put the ‘import’ and ‘package’ together in a single variable. This handy feature helps cut down on all the typing later on. Feel free to use whatever name you want that will help you remember it later on. In the example below, we’ve put everything into **px**.



**Step 4 - Run the code**

Hit ‘control’ and ‘enter’ at the same time to run the data science magic!

![](https://pbs.twimg.com/media/GaON1ppW0AIrmJh?format=png&name=360x360
)

**Your Turn**: Now it’s your turn! We’re going to bring in Plotly Express to help with those visualizations. Let’s start the import making sure to rename it before we run it.

In [5]:
import plotly.express as px

**Explanation**: *Congrats! Your attempts finally made it! Now you have successfully imported the **plotly.express** package as the variable **px**.*

### Goal 3: Bringing in the Dataframe

Let’s bring in the data that we want to look at.


#### Blockly


**Step 1 - Write out the variable name you want to use**

Now that we’re all set with our new package to help us to do cool things, let’s bring the data into a variable and call it **data**. Think of it as a digital spreadsheet with much more power to analyze and manipulate the data!

In Blockly, bring in the VARIABLES menu.



**Step 2 - Assign the dataframe to the variable you created**

Just like we did before, let’s type out a variable name. Rather than type out the full file name for our data, this easy to remember name will hold the data we bring in.

In Blockly, go to the Variables and drag the Set block for the **data** variable. This will allow us to assign the result of a function call to the variable. A function is basically code that does a specific task for us.



**Step 3 - Bring in the data**
Now we need to look at the file that has all our data. To load our dataframe, we’ll use a simple command to bring in the file we need (CSV….Comma Separated Values). Let’s say we have a file called ‘AirQuality.csv' in the folder **‘datasets’**. We’re telling Python to read the CSV file and store it in a variable called **data**.

From the Variable menu, drag a DO block using the **pd** variable, go ahead with the do operation read_csv. The **read_csv** function reads a CSV file and returns a DataFrame object.

In our case, let’s bring in the “datasets/AirQualityIndex.csv" (use the Quotes from the TEXT menu) because that is what Angelina is working with.



**Step 4 - Display the variable**

Let’s see it now by ‘displaying’ and showing our work.

Drag the **data** variable to the workspace, making it available for further use in our program. This step is more of a visualization step, allowing us to see the variable in the Blockly workspace.



**Step 5 - Connect the blocks to run the code**

Connect the blocks and run the code!

![](https://pbs.twimg.com/media/GhDkmMDXkAAe_5K?format=png&name=small)

In [2]:
#blocks code


Unnamed: 0,TrafficVolume,AverageSpeed,CO2Emissions,NoiseLevel,TrafficCondition
0,2222.222222,75,33.898305,33.75,Free Flow
1,1666.666667,50,33.898305,33.75,Free Flow
2,1111.111111,60,25.423729,33.75,Free Flow
3,833.333333,55,42.372881,33.75,Free Flow
4,1944.444444,80,33.898305,33.75,Free Flow
...,...,...,...,...,...
145,6666.666667,50,355.932203,112.50,Heavy
146,5555.555556,25,338.983051,97.50,Heavy
147,6111.111111,50,355.932203,101.25,Heavy
148,5277.777778,70,372.881356,112.50,Heavy


#### Freehand


**Step 1 - Write out the variable name you want to use**

Now that we’re all set with our new package to help us to do cool things let’s bring the **data** into a variable called data. Think of it as a digital spreadsheet with much more power to analyze and manipulate the data!

Just like we did before, let’s type out a variable name. Rather than type out the full file name for our data, this easy to remember name will hold the data we bring in.



**Step 2 - Assign the Dataframe to the Variable You Created**

Just like we did before, let’s type out a variable name. Rather than type out the full file name for our data, this easy to remember name will hold the data we bring in.


**Step 3 - Bring in the data**

Now, we need to look at the file that has all our data.

To load our dataframe, we’ll use a simple command to bring in the file we need (CSV….Comma Separated Values). Let’s say we have a file called ‘AirQuality.csv’ in the folder **‘datasets’**. We’re telling Python to read the CSV file and store it in a variable called **data**. For this function, we need to specify the code as “pd.read_csv”, which makes the code read the csv file. This variable is now our dataframe!

In our case, let’s bring in the “datasets/AirQualityIndex.csv" (user the Quotes from the TEXT menu) because that is what Kiana is working with.


**Step 4 - Print the Variable**

Let’s see it now by ‘printing’ and showing our work.


**Step 5 - Run the code**
Hit ‘control’ and ‘enter’ at the same time to run the code!

![](https://pbs.twimg.com/media/GhDkgM8X0AAQ2se?format=png&name=small)

**Your Turn**: Let’s dive in and start working with the data! We’ll begin by loading it into a dataframe, which will allow us to interact with and analyze the dataset easily.

In [29]:
data = pd.read_csv('datasets/AirQualityIndex.csv')
data



Unnamed: 0,TrafficVolume,AverageSpeed,CO2Emissions,NoiseLevel,TrafficCondition
0,2222.222222,75,33.898305,33.75,Free Flow
1,1666.666667,50,33.898305,33.75,Free Flow
2,1111.111111,60,25.423729,33.75,Free Flow
3,833.333333,55,42.372881,33.75,Free Flow
4,1944.444444,80,33.898305,33.75,Free Flow
...,...,...,...,...,...
145,6666.666667,50,355.932203,112.50,Heavy
146,5555.555556,25,338.983051,97.50,Heavy
147,6111.111111,50,355.932203,101.25,Heavy
148,5277.777778,70,372.881356,112.50,Heavy


**Explanation**:  *Voila! The variable ‘data’ is now our dataframe! You have now brought in the dataframe and stored it in handy software that you can reference later on. Now, on to the fun part!*

*The dataset has information about how clean or polluted the air is in different places over time. It includes details like the date, location, and levels of things that can make the air dirty, such as ozone, carbon monoxide, and dust particles. Sometimes, it also shows the weather, like temperature and wind speed, which can affect air quality. High school students can look at this data to see patterns in air pollution, understand what causes the air to be cleaner or dirtier, and learn why good air quality is essential for our health and the environment.*


### Goal 4: Building a Scatter Plot

Scatter plots help us to look at each data point when it comes to interval ratio data. The scatter plot shows us the relationship between two variables in a data set. The independent variable is plotted on the X-axis, while the dependent variable is plotted on the Y-axis. They are super handy for finding the relationship between different numeric variables.

#### Blockly


**Step 1 - Call the scatter function from Plotly**

To make a scatterplot, we first need to call the scatter function with our plotly library (px).

From the Variables menu, drag a DO block for the **px** variable. Select the "**scatter**" function. This specifies the function we want to call, which is the scatter function from the Plotly Express library (imported as "**px**" earlier).



**Step 2 - Saying what data to use for the scatter plot**

In order to make a plot, we need to choose its source from which data we want to plot from. In this case, our dataset is stored in the dataframe **data**.

For the first argument, drag from the Variable menu the **data** variable. This allows us to specify a dataframe and what to look at for the scatter function.



**Step 3 - Tell Plotly what columns to put on the axis**

Identify the two variables you want to look at. One variable will be alongside the X-axis (*across*) and another one alongside the Y-axis (*up and down*). In our context, we want to see the relationship between TrafficVolume and CO2Emissions. We will assign the variables to the 2 axes in the graph.

From the TEXT menu, drag the Quotes. Type the text **TrafficVolume**. This specifies TrafficVolume as the x-axis variable for the scatter plot. Also, from the TEXT menu, drag the Quotes. Type the text **CO2Emissions**. This specifies TrafficVolume as the x-axis variable for the scatter plot.



**Step 4 - Connect the blocks to run the code**

Connect the blocks and run the code!

![](https://pbs.twimg.com/media/GZjsNSEW8AUehfZ?format=png&name=small)

In [None]:
#blocks code


#### Freehand


**Step 1 - Call the scatter function from Plotly**

To make a scatterplot, we first need to call the scatter function with our plotly library (**px**).

`px.scatter()`


**Step 2 - Saying what data to use for the scatter plot**

In order to make a plot, we need to choose the source from which data we want to plot from. In this case, our dataset is stored in the dataframe “data”

`px.scatter(data)`


**Step 3 - Tell Plotly what the columns to put on the axis**
Identify the two variables you want to look at. One variable will be alongside the X-axis (*across*) and another one alongside the Y-axis (*up and down*). In our context, we want to see the relationship between **TrafficVolume** and **CO2Emissions**. We will assign the variables in the 2 axes in the graph

`px.scatter(data, x="TrafficVolume", y="CO2Emissions")`


**Step 4 - Run the code**

Hit ‘control’ and ‘enter’ at the same time to run the code!

![](https://pbs.twimg.com/media/GdBFjO9XMAAM0f8?format=png&name=small)

**Your Turn**: Let’s see what you can create and what it looks like. Try the code above to create your own scatter plot and see all the data

In [7]:
px.scatter(data,'TrafficVolume','CO2Emissions')

**Explanation**: *Look at all that data! You should now have a scatter plot that shows all of your data. Do you see any interesting patterns starting to form?*

*The scatterplot shows that as Traffic Volume (number of vehicles) increases, CO₂ Emissions (carbon dioxide pollution) also rise. Each blue dot represents a specific data point, and the general trend is that more traffic leads to higher CO₂ emissions. For example, with 2,000 vehicles, emissions stay low, but with 10,000 vehicles, emissions are much higher. What do you think this shows about how traffic contributes to air pollution?*


### Goal 5: Finding the Average of the Data (Mean)

Let’s see if we can get a better understanding of our data by looking at the average (mean). First, let’s go ahead and calculate the average for the categories we see in TrafficCondition.

#### Blockly

**Step 1 - Write out the variable name you want to use for the groups**

We are going to calculate the averages (mean) of the values for categories found in the TrafficCondition column (freeflow, heavy, moderate) soon. Let’s write out a variable name to create a new variable that will hold these averages.

On the "Variables" menu, click Create Variable. Type **groups** as the variable name. On the "Variables" menu, drag the SET block to the variable **groups**.



**Step 2 - Finding the categories in the column we are looking at**

So now let’s go ahead and find the categories we see in the TrafficCondition (freeflow, heavy, moderate)

From the Variable menu, drag the DO block for the data variable. Select the **groupby** DO function from the list of available functions. This specifies the function we want to call, which is the **groupby** function from the **data** object. From the TEXT menu, drag the Quote block (“”). Enter the text string "**TrafficCondition**". This specifies the column to group by in the data object.

![](https://pbs.twimg.com/media/GZjvn0jXYAIF7u-?format=png&name=small)

In [None]:
#blocks code



**Step 3 - Click blocks to code and run the code cell**

Now that we’ve got it all together let’s run to get the averages(means).

In the Blockly workspace, click on the "Blocks to Code" button to convert the blocks into executable code. Next, run the code cell to execute the code and see the results.



**Step 4 - Write out the variable name you want to use for the groups**

In Step 2, we looked at the different categories in TrafficCondition. Now, let’s create a variable that will hold all the averages (means) we have for each category found in in TrafficCondition.

On the "Variables" menu, click on Create Variable. Type the name averages to the new variable. This specifies the name of the variable we want to create, which is "**averages**" in this case. From the Variables menu, drag the SET block for the **averages** variable. This allows us to assign the result of a function call to the variable.



**Step 5 - Calculate the averages (means) for each category**

Now, let’s use Python to calculate the **averages** (means) for us for the new variable (what we called averages) you created!

From the Variables menu, drag the DO block for the **groups** variable. Select the "**mean**" from the list of available functions. This specifies the function we want to call, which is the **mean** function from the **groups** object.



**Step 6 - Print the averages (mean) for each category**

Now, let’s print the averages variable so we can see the calculated results.

From the Variables menu, drag the averages variable block to the workspace so that it will print the calculated results.



**Step 7 - Connect the blocks to run the code**

Connect the blocks and run the code!

![](https://pbs.twimg.com/media/GZa18vWXYAExGfy?format=png&name=360x360)


In [None]:
#blocks code


#### Freehand


**Step 1 - Write out the variable name you want to use for the groups**

We are going to soon calculate the averages (mean) of the values for categories found in the TrafficCondition column (freeflow, heavy, moderate). Let’s write out a variable name to create a new variable that will hold these averages. We write out the variable as **groups**

`groups =`



**Step 2 - Finding the categories in the column we are looking at**

So now let’s go ahead and find the categories we see in the TrafficCondition (freeflow, heavy, moderate)

Group the data by the TrafficCondition column, a categorical variable representing different traffic states, to analyze how each condition affects other variables in the dataset. You can do it by the **groupby**() function, getting the ‘TrafficCondition’ as a parameter.

`groups = data.groupby('TrafficCondition')`

In [31]:
groups = data.groupby('TrafficCondition')



![](https://pbs.twimg.com/media/GZjvYvWX0AICjtq?format=png&name=small)



**Step 3 - Run the code**

Now that we’ve got it all together let’s run to get the averages(means).



**Step 4 - Write out the variable name you want to use for the groups**

In Step 2, we looked at the different categories in TrafficCondition. Now, let’s create a variable that will hold all the averages (means) we have for each category found in in TrafficCondition.

`averages =`




**Step 5 - Calculate the averages (means) for each category**

Now, let’s use Python to calculate the averages (means) for us for the new variable (what we called **averages**) you created!

With the “groups” object, you should call the **mean**() function to calculate the average for all numerical columns.

`averages = groups.mean()`




**Step 6 - Display the averages (mean) for each category**

Now, let’s display the **averages** variable so we can see the calculated results.

`averages`





**Step 7 - Run the code**

Hit ‘control’ and ‘enter’ at the same time to run the code!

![](https://pbs.twimg.com/media/GZa10s7WEAA_S6L?format=png&name=360x360)


**Your Turn**:   So those are the steps. Can you look at the code and type it in there? Run it and see what you find.


In [32]:
averages = groups.mean()
averages

Unnamed: 0_level_0,TrafficVolume,AverageSpeed,CO2Emissions,NoiseLevel
TrafficCondition,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Free Flow,1961.111111,70.9,39.322034,35.4
Heavy,6355.555556,48.7,385.762712,102.225
Moderate,4544.444444,38.5,276.271186,75.975


**Explanation**: *In Heavy traffic, emissions are the highest, around 400 units, while Moderate traffic leads to medium levels of emissions, around 250 units. This indicates that heavier traffic contributes significantly more to CO₂ emissions than light or free-flowing traffic. It highlights the impact of traffic congestion on air pollution.*

### Goal 6: Building a Bart Chart for the Average of the Data (Mean)

Now, let's build a bar chart for the average data. We’ll use this bar chart to compare CO2 emissions averages for each traffic condition category.

#### Blockly

**Step 1 - Call the bar chart function from Plotly**

To make a bar chart, we first need to call the bar chart function with our plotly library (px).

From the Variable menu, get a DO block for the **px** variable and select the **bar** operation. From this same menu, get a variable.



**Step 2 -  Saying what data to use for the bar plot**

In order to make a plot, we need to choose the source from which data we want to plot from. In this case, our dataset is stored in the dataframe **averages**

From the Variables menu, drag a **“averages”** variable. From this same menu, drag two Get blocks, both for the same variable, **“averages”**.



**Step 3 - Tell Plotly what columns to put on the axis**

Let’s now create a bar chart using the averages dataset with the **bar**() function from **px**, having the index on the x-axis (across) and CO2Emissions on the y-axis (up and down) from the averages dataset.

On the first select the **index** attribute, and on the second one select the attribute **CO2Emissions**. Connect all these elements as shown below and generate a bar chart, comparing CO2 emissions averages for each traffic condition category (free-flow, medium, or heavy).




**Step 4 - Connect the blocks to run the code**

Connect the blocks and run the code!

![](
https://pbs.twimg.com/media/GZkfOSJWgA4g-np?format=png&name=small
)

In [None]:
#blocks code


#### Freehand


**Step 1 - Call the bar chart function from Plotly**

To make a bar chart, we first need to call the bar chart function with our plotly library (px).

`px.bar()`



**Step 2 - Saying what data to use for the bar plot**

In order to make a plot, we need to choose the source from which data we want to plot from. In this case, our dataset is stored in the dataframe **averages**

`px.bar(averages)`



**Step 3 - Tell Plotly what columns to put on the axis**

Let’s now create a bar chart using the averages dataset with the **bar**() function from **px**, having the index on the x-axis (across) and CO2Emissions on the y-axis (up and down) from the averages dataset.

`px.bar(averages,averages.index,averages.CO2Emissions)`

![](https://pbs.twimg.com/media/GZkfJQkWcAwpd2e?format=png&name=small)

**Your Turn**: So those are the steps. Can you look at the code and type it in there? Run it and see what you find.

In [33]:
px.bar(averages,averages.index,averages.CO2Emissions)

**Explanation**: *This bar chart shows how CO₂ Emissions vary under different Traffic Conditions. When traffic is in Free Flow, the average emissions are very low. In Heavy traffic, emissions are the highest, around 400 units, while Moderate traffic leads to medium levels of emissions, around 250 units. This indicates that heavier traffic contributes significantly more to CO₂ emissions than light or free-flowing traffic. It highlights the impact of traffic congestion on air pollution.*

### Goal 7: Bringing in Historical Data for a Line Chart

Let’s bring in the data that we want to look at using a line chart.

#### Blockly


**Step 1 - Write out the variable name you want to use**

Now that we’re all set with our new package to help us to do cool things, let’s bring the data into a variable and call it **timeseries**. Think of it as a digital spreadsheet with much more power to analyze and manipulate the data!

In Blockly, bring in the VARIABLES menu.



**Step 2 - Assign the dataframe to the variable you created**

Just like we did before, let’s type out a variable name. Rather than type out the full file name for our data, this easy to remember name will hold the data we bring in.

In Blockly, go to the Variables and drag the Set block for the **timeseries** variable. This will allow us to assign the result of a function call to the variable. A function is basically code that does a specific task for us.



**Step 3 - Bring in the data**

Now we need to look at the file that has all our data. To load our dataframe, we’ll use a simple command to bring in the file we need (CSV….Comma Separated Values). Let’s say we have a file called ‘AirQualityTimeseries.csv' in the folder **‘datasets’**. We’re telling Python to read the CSV file and store it in a variable called **timeseries**.

From the Variable menu, drag a DO block using the **pd** variable, go ahead with the do operation **read_csv**. The read_csv function reads a CSV file and returns a DataFrame object.

In our case, let’s bring in the “datasets/AirQualityTimeseries.csv" (use the Quotes from the TEXT menu) because that is what Angelina is working with.



**Step 4 - Display the variable**

Let’s see it now by ‘displaying’ and showing our work.

Drag the **timeseries** variable to the workspace, making it available for further use in our program. This step is more of a visualization step, as it allows us to see the variable in the Blockly workspace.

![](https://pbs.twimg.com/media/GboYZqqXAAEzRFT?format=jpg&name=medium)

In [None]:
#blocks code


#### Freehand


**Step 1 - Write out the variable name you want to use**

Now that we’re all set with our new package to help us to do cool things, let’s bring the data into a variable called **timeseries**. Think of it as a digital spreadsheet with much more power to analyze and manipulate the data!

Just like we did before, let’s type out a variable name. Rather than type out the full file name for our data, this easy to remember name will hold the data we bring in.


**Step 2 - Assign the dataframe to the variable you created**

Just like we did before, let’s type out a variable name. Rather than type out the full file name for our data, this easy to remember name will hold the data we bring in.


**Step 3 - Bring in the data**

Now we need to look at the file that has all our data.

To load our dataframe, we’ll use a simple command to bring in the file we need (CSV….Comma Separated Values). Let’s say we have a file called ‘AirQualityTimeseries.csv' in the folder **‘datasets’**. We’re telling Python to read the CSV file and store it in a variable called **timeseries**. For this function, we need to specify the code as “pd.read_csv”, which makes the code read the csv file. This variable is now our dataframe!

In our case, let’s bring in the “datasets/AirQualityTimeseries.csv" (user the Quotes from the TEXT menu) because that is what Kiana is working with.


**Step 4 - Print the variable**

Let’s see it now by printing the **timeseries** variable and showing our work.


**Step 5 - Run the code**

Hit ‘control’ and ‘enter’ at the same time to run the code!

![](https://pbs.twimg.com/media/GboYcA7WsAAAcjk?format=jpg&name=medium)



**Your Turn**: Let’s dive in and start working with the data! We’ll begin by loading it into a dataframe, allowing us to interact with and analyze the dataset easily.

In [34]:
timeseries = pd.read_csv('datasets/AirQualityTimeseries.csv')
timeseries

Unnamed: 0,Country,Code,Year,CO2Emissions
0,Africa,,1750,0.000000e+00
1,Africa,,1751,0.000000e+00
2,Africa,,1752,0.000000e+00
3,Africa,,1753,0.000000e+00
4,Africa,,1754,0.000000e+00
...,...,...,...,...
2164,World,OWID_WRL,2018,3.676694e+10
2165,World,OWID_WRL,2019,3.704010e+10
2166,World,OWID_WRL,2020,3.500774e+10
2167,World,OWID_WRL,2021,3.681654e+10


**Explanation**: *This dataset tracks CO₂ emissions by country over time. Each row represents a specific year for a particular country and includes information on the country name, a country code, the year, and the amount of CO₂ emissions recorded in tons. Initially, emissions are often zero or very low, reflecting minimal industrial activity, but they gradually increase in later years, indicating growth in industrialization and fossil fuel use. This dataset is useful for analyzing historical trends in CO₂ emissions across different countries, helping to understand how industrial activities have contributed to climate change over time.*


### Goal 8: Building a Line Plot

Line plots are really similar to bar plots, but they have an added thing where they show trends or changes in data over time or across a sequence. Line plots are especially helpful for seeing relationships between two variables, where one variable (often represented on the X-axis) is continuous or time-based, and the other variable is on the Y-axis.

#### Blockly


**Step 1 - Call the line function from Plotly**

To make a line plot, we first need to call the line function with our plotly library (px).

From the Variables menus, get a DO block for the **px** variable from the Variable menu and select the **line** function.



**Step 2 - Tell what data to use for the line plot**

In order to make a line plot, we need to choose the source from which data we want to plot from. In this case, our dataset is stored in the dataframe time series.
From this same menu, drag a **“timeseries”** variable.



**Step 3 - Tell Plotly what columns to put on the axis**

Identify the two variables you want to look at. One variable will be alongside the X-axis (*across*) and another one alongside the Y-axis (*up and down*). In our context, we want to see the relationship between Year and CO2Emissions. We will assign the variables to the 2 axes in the graph.

From this Text menu, drag two Quote “” blocks.  On the first, type the index **Year** to be our X-axis. Type the attribute **CO2Emissions** for our Y axis.



**Step 4 - Assign color for each line and country**

Lastly, get a Freestyle block, and type **color='Country'**:  Connect all these elements as shown below and generate a line chart presenting the trend of CO2 emission over the years per country/region.



**Step 5 - Connect the blocks to run the code**

Connect the blocks and run the code!

![](https://pbs.twimg.com/media/GboZRj2XUAAu68B?format=png&name=900x900)


In [None]:
#blocks code


#### Freehand


**Step 1 - Bring in the line function from the plotly library we put into px.**

To make a line plot, we first need to call the line function with our plotly library (px).

`px.line( )`



**Step 2 - Tell what data to use for the line plot**

In order to make a line plot, we need to choose the source from which data we want to plot from. In this case, our dataset is stored in the dataframe **timeseries**.

`px.line(timeseries )`



**Step 3 - Tell Plotly what columns to put on the axis**

Identify the two variables you want to look at. One variable will be alongside the X-axis (*across*) and another one alongside the Y-axis (*up and down*). In our context, we want to see the relationship between Year and CO2Emissions. We will assign the variables in the 2 axis in the graph.

From this Text menu, drag two Quote “” blocks.  On the first, type the index **Year** to be our X axis. Type the attribute **CO2Emissions** for our Y axis.

`px.line(timeseries,'Year','CO2Emissions')`



**Step 4 - Assign color for each line and country**

Lastly,  type **color='Country'**, which will generate a line chart presenting the trend of CO2 emission over the years per country/region.

`px.line(timeseries,'Year','CO2Emissions',color='Country')`




**Step 5 - Run the code**

Hit ‘control’ and ‘enter’ at the same time to run the code!

![](https://pbs.twimg.com/media/GboZT0zW8AEe3L2?format=jpg&name=medium)

**Your Turn**: You’ve done a bar chart visualization, but let’s see if we can try out a line graph to see what happens.



In [35]:
px.line(timeseries,'Year','CO2Emissions',color='Country')

**Explanation**: *The line chart shows how global carbon dioxide (CO₂) emissions have changed over time from 1750 to today. We can see that CO₂ emissions were very low until the mid-20th century. However, around 1950, emissions started to rise quickly, especially in regions like China, the United States, and Asia. The yellow line represents the entire world's CO₂ emissions, which have skyrocketed to over 30 billion tons in recent years. This rapid increase is mainly due to industrial growth, fossil fuel use, and population growth. The graph highlights how human activities have greatly increased CO₂ emissions, contributing to climate change.*

**Goal 9: Creating a histogram**

A histogram is another fascinating tool for data visualization. Unlike other methods, it tallies and keeps track of the frequencies for each unique distinct value in a variable. Picture it as a series of 'bins', each housing a specific group of numbers. These [distributions](## "the number of times a variable has a particular value") can be [uniform](## "a flat distribution where every value is equally likely"), [normal](## "a bell curve distribution where most values are in the middle are most likely"), [right-skewed](## "a graph where it looks like a lot of the data is smaller on the right-hand side"), [left-skewed](## "a graph where it looks like a lot of the data is smaller on the right-hand side "), and [mixtures](## "is a probability distribution formed by combining two or more simpler distributions"). We can find unique patterns like outliers, extreme values, or skewness, which makes histograms particularly useful. It's kinda like using a detective's magnifying glass to scrutinize your data!

#### Blockly

**Step 1 - Call the histogram function from Plotly**

To make a histogram, we first need to call the histogram() function with our plotly library (px).

Get a DO block for the **px** variable from the Variable menu and select the **histogram** function.



**Step 2 - Tell what data to use for the line plot**

In order to make a plot, we need to choose its source from which data we want to plot from. In this case, our dataset is stored in the dataframe **data**



**Step 3 - Tell Plotly what columns to look at**

So what variable do we want to look at?

From this Text menu, drag one Quote “” blocks.  Type the attribute **CO2Emissions**.



**Step 4 - Connect the blocks to run the code **

Connect the blocks and run the code!

![](https://pbs.twimg.com/media/GZkpUdRXsAohNWf?format=png&name=900x900)

In [None]:
#blocks code


#### Freehand


**Step 1 - Call the histogram function from Plotly**

To make a histogram, we first need to call the line function with our plotly library (px).

`px.histogram()`



**Step 2 - Tell what data to use for the line plot**

In order to make a plot, we need to choose the source from which data we want to plot from. In this case, our dataset is stored in the dataframe data.

`px.histogram(data)`



**Step 3 - Tell Plotly what columns to look at**

So what variable do we want to look at?

`px.histogram(data,'CO2Emissions')`



**Step 4 - Run the code**

Hit ‘control’ and ‘enter’ at the same time to run the code!

![](https://pbs.twimg.com/media/GZkoNEuWIAIXFMG?format=png&name=small)


**Your Turn**: To plot a histogram, we first need to call the **histogram**() method with our plotly library (**px**).To make a plot, we must choose the source from which data we want to plot. In this case, our dataset is stored in the data frame “data”. Then identify the variable you want to look at. Just like before, one variable will be the numerical variable alongside the X-axis

In [36]:
px.histogram(data,'CO2Emissions')

**Explanation**: *So, what do you see in the distribution? In many cases, you want a bell distribution where it gradually goes up and then down.*

*The histogram shows the distribution of CO₂ Emissions in different amounts. Most data points are concentrated around low emissions, with over 40 measurements having CO₂ emissions close to 100. Fewer measurements are seen at higher levels, with emissions between 300 and 400 being the next most common. Very few measurements are recorded at extremely low or very high levels. This suggests that most of the time, CO₂ emissions are relatively low, but there are still notable instances of higher emissions.*


**Goal 10: Create a segmented histogram**

The goal is to create a histogram of CO₂ Emissions segmented by Traffic Condition to visually compare how different traffic conditions (such as free flow, moderate, or heavy traffic) impact the distribution of the amount of CO₂ released. This helps identify which traffic conditions contribute the most to air pollution and can guide efforts to reduce emissions in those scenarios.

#### Blockly


**Step 1 - Call the histogram function from Plotly**

To make a segmented histogram, we first need to call the histogram function with our plotly library (px).
Get a DO block for the **px** variable from the Variable menu and select the **histogram** function.



**Step 2 - Tell what data to use for the line plot**

In order to make a plot, we need to choose the source from which data we want to plot from. In this case, our dataset is stored in the dataframe **data**



**Step 3 - Tell Plotly what columns to look at**

So, what variable do we want to look at?

From this Text menu, drag one Quote “” block.  Type the attribute **CO2Emissions**.



**Step 4 - Tell Python to show colors by TrafficCondition**

Let’s see if we can break it down by colors so it’s a bit easier to see.
From this FreeStyle menu, drag one Quote “” block.  
Type the attribute **‘TrafficCondition’**.



**Step 5 - Connect the blocks to run the code**

Connect the blocks to run the code!
  
![](https://pbs.twimg.com/media/GZks24GXMAgikWu?format=png&name=900x900)

In [None]:
#blocks code


#### Freehand

**Step 1 - Call the histogram function from Plotly**

To make a segmented histogram, we first need to call the histogram function with our plotly library (px).

`px.histogram()`




**Step 2 - Tell what data to use for the line plot**

In order to make a plot, we need to choose the source from which data we want to plot from. In this case, our dataset is stored in the dataframe data.

`px.histogram(data)`



**Step 3 - Tell Plotly what columns to look at**

So what variable do we want to look at? In this case, let’s look at CO2Emissions.

`px.histogram(data,'CO2Emissions')`



**Step 4 - Tell Python to show colors by “TrafficCondition”**

Let’s see if we can break it down by colors so it’s a bit easier to see.

`px.histogram(data,’CO2Emissions',color='TrafficCondition')`






**Step 5 - Run the code**

Hit ‘control’ and ‘enter’ at the same time to run the code!

![](https://pbs.twimg.com/media/GZksinhWUAcbi9P?format=jpg&name=medium)

**Your Turn**: Create a segmented histogram to visualize the distribution of CO₂ Emissions, using different colors to represent various Traffic Conditions. To do that, use the **histogram**() function again, having the data as the dataframe, the ‘CO2Emissions' as the numerical data, and the TrafficCondition as the categorical variable to segment it (color='TrafficCondition').

In [38]:
px.histogram(data,'CO2Emissions',color='TrafficCondition')

**Explanation**: *This graph helps show how traffic affects the distribution of pollution levels, as traffic gets heavier, emissions generally increase.*

*This new histogram shows the distribution of CO₂ Emissions under different Traffic Conditions. The bars are colored to represent three types of traffic: Free Flow (blue), Moderate (orange), and Heavy (green). Most emissions are higher when traffic is heavier, with the green bars representing the highest CO₂ emissions. In contrast, during Free-flow traffic, there are many instances of little to no emissions (blue bars near zero). Moderate traffic conditions have emissions that are mostly between 100 and 300.*


## WHAT DID YOU LEARN?


In this lesson, we've covered important skills and knowledge about data visualization. We’ve learned to create and think through how to understand scatter plots, bar plots, line plots, and histograms. These tools help us identify patterns and trends, which are great for making decisions based on data.

We've also worked on dataframes and applied these visualization techniques to solve real-world scenarios. We can use Ethan's experience as an inspiration so that we'll be able to analyze traffic data, propose solutions, and create interactive graphs with plotly.express to improve traffic management and reduce congestion in your hometown.


## WHAT’S NEXT?

[Descriptive Statistics](Descriptive_Statistics.ipynb)


## TELL ME MORE


- [Datawhys Plotting Notebook](https://github.com/memphis-iis/datawhys-content-notebooks-python/blob/master/Plotting.ipynb)
- [Datawhys Plotting Problem-Solving Notebook](https://github.com/memphis-iis/datawhys-content-notebooks-python/blob/master/Plotting-PS.ipynb)
- **Art**: Data visualization is an art! Check out this [Population Pyramid activity](https://www.rcboe.org/cms/lib/GA01903614/Centricity/Domain/2849/Population_Pyramids-_introduction.docx).
- **Math**: Learn about [scatter plots, bar graphs, line graphs, and histograms](https://www.mathsisfun.com/data/index.html). This resource includes step-by-step instructions on how to create and interpret these graphs, along with practice problems and real-world examples.
- **Computer Science**: Learn more about [data representation, data manipulation, and data visualization techniques](https://studio.code.org/s/explore-data-1-2021) using programming languages such as Python.
- **Career Connections**: Are you interested in becoming a data scientist? This resource offers an overview of [Data Science and the career prospects](https://www.youtube.com/watch?v=X3paOmcrTjQ&themeRefresh=1) for data scientists.
This resource provides an [overview of 15 project ideas](https://www.polygence.org/blog/data-science-passion-project-ideas-for-high-school-students) for you to use your data science knowledge to explore your passion!
- [Data visualization with D3 playlist](https://www.youtube.com/watch?v=q6GWkbUWFBg&list=PLhGp6N0DI_1TSgS8KqsGQMyGlVIatwVXn) - freeCodeCamp (video)
- [Excel Data Visualization Course](https://youtu.be/VV8iRJ-DS0A?si=2Kp4Qvi2Ut0vXyT9) - freeCodeCamp (video)
- [Data Visualization](https://towardsdatascience.com/data-visualization/home) - Towards Data Science (articles)