# Introduction to Data Visualization with Seaborn

Seaborn is a powerful Python library that makes it easy to create informative and attractive data visualizations. This 4-hour course provides an introduction to how you can use Seaborn to create a variety of plots, including scatter plots, count plots, bar plots, and box plots, and how you can customize your visualizations.

You’ll explore this library and create Seaborn plots based on a variety of real-world data sets, including exploring how air pollution in a city changes through the day and looking at what young people like to do in their free time. This data will give you the opportunity to find out about Seaborn’s advantages first hand, including how you can easily create subplots in a single figure and how to automatically calculate confidence intervals.

By the end of this course, you’ll be able to use Seaborn in various situations to explore your data and effectively communicate the results of your data analysis to others. These skills are highly sought-after for data analysts, data scientists, and any other job that may involve creating data visualizations. If you’d like to continue your learning, this course is part of several tracks, including the Data Visualization track, where you can add more libraries and techniques to your skillset.

## Introduction to Seaborn

What is Seaborn, and when should you use it? In this chapter, you will find out! Plus, you will learn how to create scatter plots and count plots with both lists of data and pandas DataFrames. You will also be introduced to one of the big advantages of using Seaborn - the ability to easily add a third variable to your plots by using color to represent different subgroups.

### 1. Introduction to Seaborn

00:00 - 00:10
Hello! Welcome to this introductory course on Seaborn! My name is Erin Case, and I'll be your instructor.

2. What is Seaborn?
00:10 - 00:30
So what is Seaborn? Seaborn is a powerful Python library for creating data visualizations. It was developed in order to make it easy to create the most common types of plots. The plot shown here can be created with just a few lines of Seaborn code.

1 Waskom, M. L. (2021). seaborn: statistical data visualization. https://seaborn.pydata.org/
3. Why is Seaborn useful?
00:30 - 00:45
This is a picture of a typical data analysis workflow. Data visualization is often a huge component of both the data exploration phase and the communication of results, so Seaborn will be very useful there.

4. Advantages of Seaborn
00:45 - 01:32
There are several tools that can be used for data visualization, but Seaborn offers several advantages. First, Seaborn's main purpose is to make data visualization easy. It was built to automatically handle a lot of complexity behind the scenes. Second, Seaborn works extremely well with pandas data structures. pandas is a Python library that is widely used for data analysis. Finally, it's built on top of Matplotlib, which is another Python visualization library. Matplotlib is extremely flexible. Seaborn allows you to take advantage of this flexibility when you need it, while avoiding the complexity that Matplotlib's flexibility can introduce.

5. Getting started
01:32 - 02:21
To get started, we'll need to import the Seaborn library. The line "import seaborn as sns" will import Seaborn as the conventionally used alias "sns". Why "sns"? The Seaborn library was apparently named after a character named Samuel Norman Seaborn from the television show "The West Wing" - thus, the standard alias is the character's initials ("sns"). We also need to import Matplotlib, which is the library that Seaborn is built on top of. We do this by typing "import matplotlib.pyplot as plt". "plt" is the alias that most people use to refer to Matplotlib, so we'll use that here as well.

6. Example 1: Scatter plot
02:21 - 03:15
Let's now dive into an example to illustrate how easily you can create visualizations using Seaborn. Here, we have data for 10 people consisting of lists of their heights in inches and their weights in pounds. Do taller people tend to weigh more? You can visualize this using a type of plot known as a scatter plot, which you'll learn more about later in the course. Use "sns dot scatterplot()" to call the scatterplot function from the Seaborn library. Then, specify what to put on the x-axis and y-axis. Finally, call the "plt dot show()" function from Matplotlib to show the scatterplot. This plot shows us that taller people tend to have a higher weight.

7. Example 2: Create a count plot
03:15 - 03:46
How many of our observations of heights and weights came from males vs. females? You can use another type of plot - the count plot - to investigate this. Count plots take in a categorical list and return bars that represent the number of list entries per category. Use the "countplot()" function and provide the list of every person's gender. This count plot shows that out of the 10 observations we had in our height and weight scatter plot, 6 were male and 4 were female.

8. Course Preview
03:46 - 04:03
Now, those were a couple of simple examples. Throughout this course, you'll learn to make more complex visualizations such as those pictured here. More importantly, you'll learn when to use each type of visualization in order to most effectively extract and communicate insights using data.

1 Waskom, M. L. (2021). seaborn: statistical data visualization. https://seaborn.pydata.org/

### 2. Using pandas with Seaborn

00:00 - 00:12
Data scientists commonly use pandas to perform data analysis, so it's a huge advantage that Seaborn works extremely well with pandas data structures. Let's see how this works!

2. What is pandas?
00:12 - 00:37
pandas is a python library for data analysis. It can easily read datasets from many types of files including csv and txt files. pandas supports several types of data structures, but the most common one is the DataFrame object. When you read in a dataset with pandas, you will create a DataFrame.

3. Working with DataFrames
00:37 - 01:30
Let's look at an example. First, import the pandas library as "pd". Then, use the "read_csv" function to read the csv file named "masculinity dot csv" and create a pandas DataFrame called "df". Calling "head" on the DataFrame will show us its first five rows. This dataset contains the results of a survey of adult men. We can see that it has four columns: "participant_id"; "age"; "how_masculine", which is that person's response to the question "how masculine or 'manly' do you feel?"; and "how_important", which is the response to the question "how important is it to you that others see you as masculine?"

4. Using DataFrames with countplot()
01:30 - 02:47
Now let's look at how to make a count plot with a DataFrame instead of a list of data. The first thing we'll do is import pandas, Matplotlib and Seaborn as we have in past examples. Then, we'll create a pandas DataFrame called "df" from the masculinity csv file. To create a count plot with a pandas DataFrame column instead of a list of data, set x equal to the name of the column in the DataFrame - in this case, we'll use the "how_masculine" column. Then, we'll set the data parameter equal to our DataFrame, "df". After calling "plt dot show", we can see that we have a nice count plot of the values in the "how_masculine" column of our data. This plot shows us that the most common response to the question "how masculine or 'manly' do you feel?" is "somewhat", with "very" being the second most common response. Note also that because we're using a named column in the DataFrame, Seaborn automatically adds the name of the column as the x-axis label at the bottom.

5. "Tidy" data
02:47 - 03:23
Let's pause for an important note here. Seaborn works great with pandas DataFrames, but only if the DataFrame is "tidy". "Tidy data" means that each observation has its own row and each variable has its own column. The "masculinity" DataFrame shown here is tidy because each row is a survey response with one answer to each survey question in each column. Making a count plot with the "how masculine" column works just like passing in a list of that column's values.

6. "Untidy" data
03:23 - 04:11
In contrast, here is an example of an "untidy" DataFrame made from the same survey on masculinity. In this untidy DataFrame, notice how each row doesn't contain the same information. Row 0 contains the age categories, rows 1 and 7 contain the question text, and the other rows contain summary data of the responses. This will not work well with Seaborn. Unlike the tidy DataFrame, values in the "Age" column don't look like a list of age categories for each observation. Transforming untidy DataFrames into tidy ones is possible, but it's not in scope for this course. There are other DataCamp courses that can teach you how to do this.

### 3. Adding a third variable with hue

00:00 - 00:22
We saw in the last lesson that a really nice advantage of Seaborn is that it works well with pandas DataFrames. In this lesson, we'll see another big advantage that Seaborn offers: the ability to quickly add a third variable to your plots by adding color.

2. Tips dataset
00:22 - 00:56
To showcase this cool feature in Seaborn, we'll be using Seaborn's built-in tips dataset. You can access it by using the "load dataset" function in Seaborn and passing in the name of the dataset. These are the first five rows of the tips dataset. This dataset contains one row for each table served at a restaurant and has information about things like the bill amount, how many people were at the table, and when the table was served. Let's explore the relationship between the "total_bill" and "tip" columns using a scatter plot.

1 Waskom, M. L. (2021). seaborn: statistical data visualization. https://seaborn.pydata.org/
3. A basic scatter plot
00:56 - 01:22
Here is the code to generate it. The total bill per table (in dollars) is on the x-axis, and the total tip (in dollars) is on the y-axis. We can see from this plot that larger bills are associated with larger tips. What if we want to see which of the data points are smokers versus non-smokers? Seaborn makes this super easy.

1 Waskom, M. L. (2021). seaborn: statistical data visualization. https://seaborn.pydata.org/
4. A scatter plot with hue
01:22 - 01:50
You can set the "hue" parameter equal to the DataFrame column name "smoker" and then Seaborn will automatically color each point by whether they are a smoker. Plus, it will add a legend to the plot automatically! If you don't want to use pandas, you can set it equal to a list of values instead of a column name.

1 Waskom, M. L. (2021). seaborn: statistical data visualization. https://seaborn.pydata.org/
5. Setting hue order
01:50 - 02:07
Hue also allows you to assert more control over the ordering and coloring of each value. The "hue order" parameter takes in a list of values and will set the order of the values in the plot accordingly. Notice how the legend for smoker now lists "yes" before "no".

1 Waskom, M. L. (2021). seaborn: statistical data visualization. https://seaborn.pydata.org/
6. Specifying hue colors
02:07 - 02:46
You can also control the colors assigned to each value using the "palette" parameter. This parameter takes in a dictionary, which is a data structure that has key-value pairs. This dictionary should map the variable values to the colors you want to represent the value. Here, we create a dictionary called "hue colors" that maps the value "Yes" to the color black and the value "No" to the color red. When we set hue equal to "smoker" and the palette parameter equal to this dictionary, we have a scatter plot where smokers are represented with black dots and non-smokers are represented with red dots.

1 Waskom, M. L. (2021). seaborn: statistical data visualization. https://seaborn.pydata.org/
7. Color options
02:46 - 03:20
In the last example, we used the words "black" and "red" to define what the hue colors should be. This only works for a small set of color names that are defined by Matplotlib. Here is the list of Matplotlib colors and their names. Note that you can use a single-letter Matplotlib abbreviation instead of the full name. You can also use an HTML color hex code instead of these Matplotlib color names, which allows you to choose any color you want to.

8. Using HTML hex color codes with hue
03:20 - 03:31
Here's an example using HTML hex codes. Make sure you put the hex codes in quotes with a pound sign at the beginning.

1 Waskom, M. L. (2021). seaborn: statistical data visualization. https://seaborn.pydata.org/
9. Using hue with count plots
03:31 - 03:57
As a final note, hue is available in most of Seaborn's plot types. For example, this count plot shows the number of observations we have for smokers versus non-smokers, and setting "hue" equal to "sex" divides these bars into subgroups of males versus females. From this plot, we can see that males outnumber females among both smokers and non-smokers in this dataset.

1 Waskom, M. L. (2021). seaborn: statistical data visualization. https://seaborn.pydata.org/

## Visualizing Two Quantitative Variables

In this chapter, you will create and customize plots that visualize the relationship between two quantitative variables. To do this, you will use scatter plots and line plots to explore how the level of air pollution in a city changes over the course of a day and how horsepower relates to fuel efficiency in cars. You will also see another big advantage of using Seaborn - the ability to easily create subplots in a single figure!

### 1. Introduction to relational plots and subplots

Many questions in data science are centered around describing the relationship between two quantitative variables. Seaborn calls plots that visualize this relationship "relational plots".

2. Questions about quantitative variables
00:13 - 00:27
So far we've seen several examples of questions about the relationship between two quantitative variables, and we answered them with scatter plots. These examples include: "do taller people tend to weigh more?"

3. Questions about quantitative variables
00:27 - 00:34
"what's the relationship between the number of absences a student has and their final grade?"

4. Questions about quantitative variables
00:34 - 00:50
and "how does a country's GDP relate to the percent of the population that can read and write?" Because they look at the relationship between two quantitative variables, these scatter plots are all considered relational plots.

5. Visualizing subgroups
00:50 - 01:12
While looking at a relationship between two variables at a high level is often informative, sometimes we suspect that the relationship may be different within certain subgroups. In the last chapter, we started to look at subgroups by using the "hue" parameter to visualize each subgroup using a different color on the same plot.

1 Waskom, M. L. (2021). seaborn: statistical data visualization. https://seaborn.pydata.org/
6. Visualizing subgroups
01:12 - 01:20
In this lesson, we'll try out a different method: creating a separate plot per subgroup.

1 Waskom, M. L. (2021). seaborn: statistical data visualization. https://seaborn.pydata.org/
7. Introducing relplot()
01:20 - 01:54
To do this, we're going to introduce a new Seaborn function: "relplot()". "relplot()" stands for "relational plot" and enables you to visualize the relationship between two quantitative variables using either scatter plots or line plots. You've already seen scatter plots, and you'll learn about line plots later in this chapter. Using "relplot()" gives us a big advantage: the ability to create subplots in a single figure. Because of this advantage, we'll be using "relplot()" instead of "scatterplot()" for the rest of the course.

8. scatterplot() vs. relplot()
01:54 - 02:24
Let's return to our scatter plot of total bill versus tip amount from the tips dataset. On the left, we see how to create a scatter plot with the "scatterplot" function. To make it with "relplot()" instead, we change the function name to "relplot()" and use the "kind" parameter to specify what kind of relational plot to use - scatter plot or line plot. In this case, we'll set kind equal to the word "scatter".

1 Waskom, M. L. (2021). seaborn: statistical data visualization. https://seaborn.pydata.org/
9. Subplots in columns
02:24 - 02:34
By setting "col" equal to "smoker", we get a separate scatter plot for smokers and non-smokers, arranged horizontally in columns.

1 Waskom, M. L. (2021). seaborn: statistical data visualization. https://seaborn.pydata.org/
10. Subplots in rows
02:34 - 02:44
If you want to arrange these vertically in rows instead, you can use the "row" parameter instead of "col".

1 Waskom, M. L. (2021). seaborn: statistical data visualization. https://seaborn.pydata.org/
11. Subplots in rows and columns
02:44 - 03:02
It is possible to use both "col" and "row" at the same time. Here, we set "col" equal to smoking status and "row" equal to the time of day (lunch or dinner). Now we have a subplot for each combination of these two categorical variables.

1 Waskom, M. L. (2021). seaborn: statistical data visualization. https://seaborn.pydata.org/
12. Subgroups for days of the week
03:02 - 03:19
As another example, let's look at subgroups based on day of the week. There are four subplots here, which can be a lot to show in a single row. To address this, you can use the "col_wrap" parameter to specify how many subplots you want per row.

1 Waskom, M. L. (2021). seaborn: statistical data visualization. https://seaborn.pydata.org/
13. Wrapping columns
03:19 - 03:24
Here, we set "col_wrap" equal to two plots per row.

1 Waskom, M. L. (2021). seaborn: statistical data visualization. https://seaborn.pydata.org/
14. Ordering columns
03:24 - 03:35
We can also change the order of the subplots by using the "col_order" and "row_order" parameters and giving it a list of ordered values.

1 Waskom, M. L. (2021). seaborn: statistical data visualization. https://seaborn.pydata.org/

### 2. Customizing scatter plots
00:00 - 00:07
So far, we've only scratched the surface of what we're able to do with scatter plots in Seaborn.

2. Scatter plot overview
00:07 - 00:59
As a reminder, scatter plots are a great tool for visualizing the relationship between two quantitative variables. We've seen a few ways to add more information to them as well, by creating subplots or plotting subgroups with different colored points. In addition to these, Seaborn allows you to add more information to scatter plots by varying the size, the style, and the transparency of the points. All of these options can be used in both the "scatterplot()" and "relplot()" functions, but we'll continue to use "relplot()" for the rest of the course since it's more flexible and allows us to create subplots. For the rest of this lesson, we'll use the tips dataset to learn how to use each customization and cover best practices for deciding which customizations to use.

3. Subgroups with point size
00:59 - 01:48
The first customization we'll talk about is point size. Here, we're creating a scatter plot of total bill versus tip amount. We want each point on the scatter plot to be sized based on the number of people in the group, with larger groups having bigger points on the plot. To do this, we'll set the "size" parameter equal to the variable name "size" from our dataset. As this example demonstrates, varying point size is best used if the variable is either a quantitative variable or a categorical variable that represents different levels of something, like "small", "medium", and "large". This plot is a bit hard to read because all of the points are of the same color.

1 Waskom, M. L. (2021). seaborn: statistical data visualization. https://seaborn.pydata.org/
4. Point size and hue
01:48 - 02:22
We can make it easier by using the "size" parameter in combination with the "hue" parameter. To do this, set "hue" equal to the variable name "size". Notice that because "size" is a quantitative variable, Seaborn will automatically color the points different shades of the same color instead of different colors per category value like we saw in previous plots. Now larger groups have both larger and darker points, which provides better contrast and makes the plot easier to read.

1 Waskom, M. L. (2021). seaborn: statistical data visualization. https://seaborn.pydata.org/
5. Subgroups with point style
02:22 - 02:52
The next customization we'll look at is the point style. Setting the "style" parameter to a variable name will use different point styles for each value of the variable. Here's a scatter plot we've seen before, where we use "hue" to create different colored points based on smoking status. Setting "style" equal to "smoker" allows us to better distinguish these subgroups by plotting smokers with a different point style in addition to a different color.

1 Waskom, M. L. (2021). seaborn: statistical data visualization. https://seaborn.pydata.org/
6. Changing point transparency
02:52 - 03:28
The last customization we'll look at is point transparency. Setting the "alpha" parameter to a value between 0 and 1 will vary the transparency of the points in the plot, with 0 being completely transparent and 1 being completely non-transparent. Here, we've set "alpha" equal to 0.4. This customization can be useful when you have many overlapping points on the scatter plot, so you can see which areas of the plot have more or less observations.

1 Waskom, M. L. (2021). seaborn: statistical data visualization. https://seaborn.pydata.org/
7. Let's practice!
03:28 - 03:43
This is just the beginning of what you can do to customize your Seaborn scatter plots. Make sure to check out the Seaborn documentation for more options like specifying specific sizes or point styles to use in your plots. For now, let's practice what we've learned!

### 3. Introduction to line plots
00:00 - 00:09
Hello! In this video we'll dive into a new type of relational plot: line plots.

2. What are line plots?
00:09 - 00:36
In Seaborn, we have two types of relational plots: scatter plots and line plots. While each point in a scatter plot is assumed to be an independent observation, line plots are the visualization of choice when we need to track the same thing over time. A common example is tracking the value of a company's stock over time, as shown here.

3. Air pollution data
00:36 - 01:15
In this video, we'll be using data on the levels of air pollution in a city. There are many air collection stations around the city, each measuring the nitrogen dioxide level every hour for a single day. Long-term exposure to high levels of nitrogen dioxide can cause chronic lung diseases. Let's begin with the simple case where we have one data point per x-value. Here we have one row per hour over the course of the day with the average nitrogen dioxide level across all the stations in a column called "NO_2_mean".

4. Scatter plot
01:15 - 01:29
This is a scatter plot with the average nitrogen dioxide level on the y-axis and the hour of the day on the x-axis. We're tracking the same thing over time, so a line plot would be a better choice.

5. Line plot
01:29 - 01:44
By specifying "kind" equals "line", we can create a line plot and more easily see how the average nitrogen dioxide level fluctuates throughout the day.

6. Subgroups by location
01:44 - 01:59
We can also track subgroups over time with line plots. Here we have the average nitrogen dioxide level for each region (North, South, East, and West) for each hour in the day.

7. Subgroups by location
01:59 - 02:21
Setting the "style" and "hue" parameters equal to the variable name "location" creates different lines for each region that vary in both line style and color. Here, we can see that the South region tends to have slightly higher average nitrogen dioxide levels compared to the other regions.

8. Adding markers
02:21 - 02:38
Setting the "markers" parameter equal to "True" will display a marker for each data point. The marker will vary based on the subgroup you've set using the "style" parameter.

9. Turning off line style
02:38 - 02:45
If you don't want the line styles to vary by subgroup, set the "dashes" parameter equal to "False".

10. Multiple observations per x-value
02:45 - 02:56
Line plots can also be used when you have more than one observation per x-value. This dataset has a row for each station that is taking a measurement every hour.

11. Multiple observations per x-value
02:56 - 03:02
This is the scatter plot, displaying one point per observation.

12. Multiple observations per x-value
03:02 - 03:17
This is the line plot. If a line plot is given multiple observations per x-value, it will aggregate them into a single summary measure. By default, it will display the mean.

13. Multiple observations per x-value
03:17 - 04:06
Notice that Seaborn will automatically calculate a confidence interval for the mean, displayed by the shaded region. Assuming the air collection stations were randomly placed throughout the city, this dataset is a random sample of the nitrogen dioxide levels across the whole city. This confidence interval tells us that based on our sample, we can be 95% confident that the average nitrogen dioxide level for the whole city is within this range. Confidence intervals indicate the uncertainty we have about what the true mean is for the whole city. To learn more about confidence intervals, you can check out DataCamp's statistics courses.

14. Replacing confidence interval with standard deviation
04:06 - 04:33
Instead of visualizing a confidence interval, we may want to see how varied the measurements of nitrogen dioxide are across the different collection stations at a given point in time. To visualize this, set the "ci" parameter equal to the string "sd" to make the shaded area represent the standard deviation, which shows the spread of the distribution of observations at each x value.

15. Turning off confidence interval
04:33 - 04:39
We can also turn off the confidence interval by setting the "ci" parameter equal to "None".

## Visualizing a Categorical and a Quantitative Variable

Categorical variables are present in nearly every dataset, but they are especially prominent in survey data. In this chapter, you will learn how to create and customize categorical plots such as box plots, bar plots, count plots, and point plots. Along the way, you will explore survey data from young people about their interests, students about their study habits, and adult men about their feelings about masculinity.

### 1. Count plots and bar plots

Welcome to Chapter 3! In this chapter, we'll focus on visualizations that involve categorical variables. The first two plots we'll look at are count plots and bar plots.

2. Categorical plots
00:15 - 01:09
Count plots and bar plots are two types of visualizations that Seaborn calls "categorical plots". Categorical plots involve a categorical variable, which is a variable that consists of a fixed, typically small number of possible values, or categories. These types of plots are commonly used when we want to make comparisons between different groups. We began to explore categorical plots in Chapter 1 with count plots. As a reminder, a count plot displays the number of observations in each category. We saw several examples of count plots in earlier chapters, like the number of men reporting that they feel masculine. Most men surveyed here feel "somewhat" or "very" masculine.

3. catplot()
01:09 - 01:33
Just like we used "relplot()" to create different types of relational plots, in this chapter we'll be using "catplot()" to create different types of categorical plots. "catplot()" offers the same flexibility that "relplot()" does, which means it will be easy to create subplots if we need to using the same "col" and "row" parameters.

4. countplot() vs. catplot()
01:33 - 01:47
To see how "catplot()" works, let's return to the masculinity count plot. On the left, we see how we originally created a count plot with the "countplot()" function.

5. countplot() vs. catplot()
01:47 - 02:18
To make this plot with "catplot()" instead, we change the function name to "catplot()" and use the "kind" parameter to specify what kind of categorical plot to use. In this case, we'll set kind equal to the word "count".

6. Changing the order
02:18 - 02:33
Sometimes there is a specific ordering of categories that makes sense for these plots. In this case, it makes more sense for the categories to be in order from not masculine to very masculine. To change the order of the categories, create a list of category values in the order that you want them to appear, and then use the "order" parameter. This works for all types of categorical plots, not just count plots.

7. Bar plots
02:33 - 03:18
Bar plots look similar to count plots, but instead of the count of observations in each category, they show the mean of a quantitative variable among observations in each category. This bar plot uses the tips dataset and shows the average bill paid among people who visited the restaurant on each day of the week. From this, we can see that the average bill is slightly higher on the weekends. To create this bar plot, we use "catplot". Specify the categorical variable "day" on the x-axis, the quantitative variable "total bill" on the y-axis, and set the "kind" parameter equal to "bar".

1 Waskom, M. L. (2021). seaborn: statistical data visualization. https://seaborn.pydata.org/
8. Confidence intervals
03:18 - 03:48
Notice also that Seaborn automatically shows 95% confidence intervals for these means. Just like with line plots, these confidence intervals show us the level of uncertainty we have about these estimates. Assuming our data is a random sample of some population, we can be 95% sure that the true population mean in each group lies within the confidence interval shown.

1 Waskom, M. L. (2021). seaborn: statistical data visualization. https://seaborn.pydata.org/
9. Turning off confidence intervals
03:48 - 03:58
If we want to turn off these confidence intervals, we can do this by setting the "ci" parameter equal to "None" - just like we did with line plots.

1 Waskom, M. L. (2021). seaborn: statistical data visualization. https://seaborn.pydata.org/
10. Changing the orientation
03:58 - 04:16
Finally, you can also change the orientation of the bars in bar plots and count plots by switching the x and y parameters. However, it is fairly common practice to put the categorical variable on the x-axis.

1 Waskom, M. L. (2021). seaborn: statistical data visualization. https://seaborn.pydata.org/

### 2. Creating a box plot

Hello! In this video we'll learn how to create a new type of categorical plot: the box plot.

2. What is a box plot?
00:09 - 01:13
A box plot shows the distribution of quantitative data. The colored box represents the 25th to 75th percentile, and the line in the middle of the box represents the median. The whiskers give a sense of the spread of the distribution, and the floating points represent outliers. Box plots are commonly used as a way to compare the distribution of a quantitative variable across different groups of a categorical variable. To see this, let's look at this example. The box plot shown here uses the tips dataset and compares the distribution of the total bill paid per table across the different days of the week. From this box plot we can quickly see that the median bill is higher on Saturday and Sunday, but the spread of the distribution is also larger. This comparison would be much harder to do with other types of visualizations.

1 Waskom, M. L. (2021). seaborn: statistical data visualization. https://seaborn.pydata.org/
3. How to create a box plot
01:13 - 01:56
Now let's look at how to create a box plot in Seaborn. While Seaborn does have a "boxplot()" function, we'll be using the "catplot()" function that we introduced in an earlier lesson because it makes it easy to create subplots using the "col" and "row" parameters. We'll put the categorical variable "time" on the x-axis and the quantitative variable "total bill" on the y-axis. Here, we want box plots, so we'll specify kind="box". That's it! We have a nice looking box plot. Next, we'll look at different ways to customize this plot.

1 Waskom, M. L. (2021). seaborn: statistical data visualization. https://seaborn.pydata.org/
4. Change the order of categories
01:56 - 02:09
As a reminder, "catplot" allows you to change the order of the categories using the "order" parameter. Here, we specified that "dinner" should be shown before "lunch".

1 Waskom, M. L. (2021). seaborn: statistical data visualization. https://seaborn.pydata.org/
5. Omitting the outliers using `sym`
02:09 - 02:29
Occasionally, you may want to omit the outliers from your box plot. You can do this using the "sym" parameter. If you pass an empty string into "sym", it will omit the outliers from your plot altogether. "Sym" can also be used to change the appearance of the outliers instead of omitting them.

1 Waskom, M. L. (2021). seaborn: statistical data visualization. https://seaborn.pydata.org/
6. Changing the whiskers using `whis`
02:29 - 03:48
By default, the whiskers extend to 1 point 5 times the interquartile range, or "IQR". The IQR is the 25th to the 75th percentile of a distribution of data. If you want to change the way the whiskers in your box plot are defined, you can do this using the "whis" parameter. There are several options for changing the whiskers. You can change the range of the whiskers from 1 point 5 times the IQR (which is the default) to 2 times the IQR by setting "whis" equal to 2 point 0. Alternatively, you can have the whiskers define specific lower and upper percentiles by passing in a list of the lower and upper values. In this example, passing in "[5, 95]" will result in the lower whisker being drawn at the 5th percentile and the upper whisker being drawn at the 95th percentile. Finally, you may just want to draw the whiskers at the min and max values. You can do this by specifying the lower percentile as 0 and the upper percentile as 100.

7. Changing the whiskers using `whis`
03:48 - 03:59
Here's an example where the whiskers are set to the min and max values. Note that there are no outliers, because the box and whiskers cover the entire range of the data.

1 Waskom, M. L. (2021). seaborn: statistical data visualization. https://seaborn.pydata.org/

### 3. Point plots

Welcome! So far we've seen several types of categorical plots including count plots, bar plots, and box plots. In this lesson, we'll see one final categorical plot: point plots.

2. What are point plots?
00:15 - 01:06
Point plots show the mean of a quantitative variable for the observations in each category, plotted as a single point. This point plot uses the tips dataset and shows the average bill among smokers versus non-smokers. The vertical bars extending above and below the mean represent the 95% confidence intervals for that mean. Just like the confidence intervals we saw in line plots and bar plots, these confidence intervals show us the level of uncertainty we have about these mean estimates. Assuming our data is a random sample of some population, we can be 95% sure that the true population mean in each group lies within the confidence interval shown.

1 Waskom, M. L. (2021). seaborn: statistical data visualization. https://seaborn.pydata.org/
3. Point plots vs. line plots
01:06 - 01:11
You may be thinking: point plots look a lot like line plots. What's the difference?

1 Waskom, M. L. (2021). seaborn: statistical data visualization. https://seaborn.pydata.org/
4. Point plots vs. line plots
01:11 - 01:41
Both line plots and point plots show the mean of a quantitative variable and 95% confidence intervals for the mean. However, there is a key difference. Line plots are relational plots, so both the x- and y-axis are quantitative variables. In a point plot, one axis - usually the x-axis - is a categorical variable, making it a categorical plot.

5. Point plots vs. bar plots
01:41 - 02:04
You may also be thinking: point plots seem to show the same information as bar plots. For each category, both show the mean of a quantitative variable and the confidence intervals for those means. When should we use one over the other? Let's look at an example using data from the masculinity survey that we've seen in prior lessons.

6. Point plots vs. bar plots
02:04 - 02:44
This is a bar plot of the percent of men per age group surveyed who report thinking that it's important that others see them as masculine, with subgroups based on whether they report feeling masculine or not. This is the same information, represented as a point plot. In the point plot, it's easier to compare the heights of the subgroup points when they're stacked above each other. In the point plot, it's also easier to look at the differences in slope between the categories than it is to compare the heights of the bars between them.

7. Creating a point plot
02:44 - 03:07
Here's the code to create the point plot we just saw. Since this is a categorical plot, we use "catplot" and set "kind" equal to "point".

8. Disconnecting the points
03:07 - 03:14
Sometimes we may want to remove the lines connecting each point, perhaps because we only wish to compare within a category group and not between them. To do this, set the "join" parameter equal to False.

9. Displaying the median
03:14 - 03:33
Let's return to the point plot using the tips dataset and go over a few more ways to customize your point plots. Here is the point plot of average bill comparing smokers to non-smokers.

1 Waskom, M. L. (2021). seaborn: statistical data visualization. https://seaborn.pydata.org/
10. Displaying the median
03:33 - 03:56
To have the points and confidence intervals be calculated for the median instead of the mean, import the median function from the numpy library and set "estimator" equal to the numpy median function. Why might you want to use the median instead of the mean? The median is more robust to outliers, so if your dataset has a lot of outliers, the median may be a better statistic to use.

1 Waskom, M. L. (2021). seaborn: statistical data visualization. https://seaborn.pydata.org/
11. Customizing the confidence intervals
03:56 - 04:17
You can also customize the way that the confidence intervals are displayed. To add “caps” to the end of the confidence intervals, set the “capsize” parameter equal to the desired width of the caps. In this case, we chose a width of 0.2.

1 Waskom, M. L. (2021). seaborn: statistical data visualization. https://seaborn.pydata.org/
12. Turning off confidence intervals
04:17 - 04:24
Finally, like we saw with line plots and bar plots, you can turn the confidence intervals off by setting the "ci" parameter equal to None.

1 Waskom, M. L. (2021). seaborn: statistical data visualization. https://seaborn.pydata.org/

## Customizing Seaborn Plots

In this final chapter, you will learn how to add informative plot titles and axis labels, which are one of the most important parts of any data visualization! You will also learn how to customize the style of your visualizations in order to more quickly orient your audience to the key takeaways. Then, you will put everything you have learned together for the final exercises of the course!

### 1. Changing plot style and color

So far we've covered how to create a variety of different plot types. Now let's learn how to customize them.

2. Why customize?
00:09 - 00:31
By default, Seaborn plots are pleasing to look at, but there are several reasons you may want to change the appearance. Changing the style of a plot can be motivated by personal preference, but it can also help improve its readability or help orient an audience more quickly to the key takeaway.

3. Changing the figure style
00:31 - 00:54
Seaborn has five preset figure styles which change the background and axes of the plot. You can refer to them by name: "white", "dark", "whitegrid", "darkgrid", and "ticks". To set one of these as the global style for all of your plots, use the "set style" function.

4. Default figure style ("white")
00:54 - 01:27
This is a plot we've seen before, showing the percentage of men reporting that masculinity was important to them, stratified by their age and whether or not they feel masculine. The default style is called "white" and provides clean axes with a solid white background. If we only care about the comparisons between groups or the general trend across age groups instead of the specific values, this is a good choice.

5. Figure style: "whitegrid"
01:27 - 01:42
Changing the style to "whitegrid" will add a gray grid in the background. This is useful if you want your audience to be able to determine the specific values of the plotted points instead of making higher level observations.

6. Other styles
01:42 - 01:52
The other styles are variants on these. "ticks" is similar to "white", but adds small tick marks to the x- and y-axes.

7. Other styles
01:52 - 01:55
"dark" provides a gray background,

8. Other styles
01:55 - 02:05
and "darkgrid" provides a gray background with a white grid.

9. Changing the palette
02:05 - 02:19
You can change the color of the main elements of the plot with Seaborn's "set palette" function. Seaborn has many preset color palettes that you can refer to by name, or you can create your own custom palette. Let's see an example.

10. Diverging palettes
02:19 - 02:50
Seaborn has a group of preset palettes called diverging palettes that are great to use if your visualization deals with a scale where the two ends of the scale are opposites and there is a neutral midpoint. Here are some examples of diverging palettes - red/blue and purple/green. Note that if you append the palette name with "_r", you can reverse the palette.

11. Example (default palette)
02:50 - 02:59
To see this in action, let's return to a count plot we've seen before of the responses of men reporting how masculine they feel.

12. Example (diverging palette)
02:59 - 03:11
Setting this plot's palette to red/blue diverging provides a clearer contrast between the men who do not feel masculine and the men who do.

13. Sequential palettes
03:11 - 03:22
Another group of palettes are called sequential palettes. These are a single color (or two colors blended) moving from light to dark values.

14. Sequential palette example
03:22 - 03:42
Sequential palettes are great for emphasizing a variable on a continuous scale. One example is this plot depicting the relationship between a car's horsepower and its miles per gallon, where points grow larger and darker when the car has more cylinders.

1 Waskom, M. L. (2021). seaborn: statistical data visualization. https://seaborn.pydata.org/
15. Custom palettes
03:42 - 03:47
You can also create your own custom palettes by passing in a list of color names...

16. Custom palettes
03:47 - 03:54
or a list of hex color codes.

17. Changing the scale
03:54 - 04:09
Finally, you can change the scale of your plot by using the "set context" function. The scale options from smallest to largest are "paper", "notebook", "talk", and "poster".

18. Default context: "paper"
04:09 - 04:14
The default context is "paper".

19. Larger context: "talk"
04:14 - 04:25
You'll want to choose a larger scale like "talk" for posters or presentations where the audience is further away from the plot.

### 2. Adding titles and labels: Part 1
00:00 - 00:14
Welcome! In the next two lessons, we'll go over one of the most important parts of any data visualization: plot titles and axis labels.

2. Creating informative visualizations
00:14 - 01:22
We create data visualizations to communicate information, and we can't do that effectively without a clear title and informative axis labels. To see this, let's compare two versions of the same visualization. On the left, we see box plots showing the distribution of birth rates for countries in each of 11 regions. On the right, we see the same visualization with three key modifications to make it easier to understand. A title is added, which immediately orients the audience to what they're looking at. The axis labels are more informative, making it clearer that birth rate is measured per one thousand people and birth rates are measured per country in each region. Finally, the x-axis tick labels are rotated to make it clear what each region is called. Let's learn how to make these changes.

3. FacetGrid vs. AxesSubplot objects
01:22 - 01:57
Before we go into the details of adding a title, we need to understand an underlying mechanism in Seaborn. Seaborn's plot functions create two different types of objects: FacetGrids and AxesSubplots. To figure out which type of object you're working with, first assign the plot output to a variable. In the documentation, the variable is often named "g", so we'll do that here as well. Write "type" "g" to return the object type. This scatter plot is an AxesSubplot.

4. An Empty FacetGrid
01:57 - 02:06
A FacetGrid consists of one or more AxesSubplots, which is how it supports subplots.

5. FacetGrid vs. AxesSubplot objects
02:06 - 02:28
Recall that "relplot()" and "catplot()" both support making subplots. This means that they are creating FacetGrid objects. In contrast, single-type plot functions like "scatterplot()" and "countplot()" return a single AxesSubplot object.

6. Adding a title to FacetGrid
02:28 - 03:07
Let's return to our messy plot from the beginning. Recall that "catplot()" enables subplots, so it returns a FacetGrid object. To add a title to a FacetGrid object, first assign the plot to the variable "g". After you assign the plot to "g", you can set the title using "g dot fig dot suptitle". This tells Seaborn you want to set a title for the figure as a whole.

7. Adjusting height of title in FacetGrid
03:07 - 03:20
Note that by default, the figure title might be a little low. To adjust the height of the title, you can use the "y" parameter. The default value is 1, so setting it to 1 point 03 will make it a little higher than the default.

### 3. Adding titles and labels: Part 2
00:00 - 00:09
Hello! In this lesson, we'll continue learning how to customize plot titles and axis labels.

2. Adding a title to AxesSubplot
00:09 - 00:38
In the last lesson, we learned how to add a title to a FacetGrid object using "g dot fig dot suptitle". To add a title to an AxesSubplot object like that from the "box plot" function, assign the plot to a variable and use “g dot set_title”. You can also use the “y” parameter here to adjust the height of the title.

3. Titles for subplots
00:38 - 00:57
Now let's look at what happens if the figure has subplots. Let's say we've divided countries into two groups - group one and group two - and we've set "col" equal to "Group" to create a subplot for each group.

4. Titles for subplots
00:57 - 01:08
Since g is a FacetGrid object, using "g dot fig dot suptitle" will add a title to the figure as a whole.

5. Titles for subplots
01:08 - 01:34
To alter the subplot titles, use "g dot set_titles" to set the titles for each AxesSubplot. If you want to use the variable name in the title, you can use "col name" in braces to reference the column value. Here, we've created subplot titles that display as "this is group 2" and "this is group 1".

6. Adding axis labels
01:34 - 01:58
To add axis labels, assign the plot to a variable and then call the "set" function. Set the parameters "x label" and "y label" to set the desired x-axis and y-axis labels, respectively. This works with both FacetGrid and AxesSubplot objects.

7. Rotating x-axis tick labels
01:58 - 02:35
Sometimes, like in the example we've seen in this lesson, your tick labels may overlap, making it hard to interpret the plot. One way to address this is by rotating the tick labels. To do this, we don't call a function on the plot object itself. Instead, after we create the plot, we call the matplotlib function "plt dot xticks" and set "rotation" equal to 90 degrees. This works with both FacetGrid and AxesSubplot objects.

### 4. Putting it all together
00:00 - 00:18
In this course, we've learned a great deal about how to create effective data visualizations in Seaborn. In this lesson, we'll review what we've learned and connect the pieces together to form a cohesive picture of how to use Seaborn for future projects.

2. Getting started
00:18 - 00:50
The first thing to recall is simply how to import Seaborn and its related library, Matplotlib. To do this, write "import seaborn as sns" and "import matplotlib dot pyplot as plt". Recall also that at the end of your data visualization code, you'll call "plt dot show" to show the visualization.

3. Relational plots
00:50 - 01:25
After you've imported the appropriate libraries, the next thing to do is to choose what type of plot you want to create. Relational plots are plots that show the relationship between two quantitative variables. Examples of relational plots that we've seen in this course are scatter plots and line plots. You can create a relational plot using "relplot()" and providing it with the x-axis variable name, y-axis variable name, the pandas tidy DataFrame, and the type of plot (either scatter or line).

4. Categorical plots
01:25 - 02:08
Categorical plots are another type of plot. These describe the distribution of a quantitative variable within categories given by a categorical variable. Examples of categorical plots we've seen are bar plots, count plots, box plots, and point plots. You can create a categorical plot using "catplot()" and providing it with the x-axis variable name, y-axis variable name (if applicable), the pandas tidy DataFrame, and the type of plot (either bar, count, box, or point).

5. Adding a third variable (hue)
02:08 - 02:25
If we want to add a third dimension to our plots, we can do this in one of two ways. Setting the "hue" parameter to a variable name will create a single plot but will show subgroups that are different colors based on that variable's values.

1 Waskom, M. L. (2021). seaborn: statistical data visualization. https://seaborn.pydata.org/
6. Adding a third variable (row/col)
02:25 - 02:39
Alternatively, you can use "relplot()" and "catplot()"’s "col" and "row" parameters to graph each subgroup on a separate subplot in the figure.

1 Waskom, M. L. (2021). seaborn: statistical data visualization. https://seaborn.pydata.org/
7. Customization
02:39 - 03:00
Once you have the basic plot created, you might want to customize the plot's appearance to improve its readability. You can change the background of the plot using "set_style", the color of the main elements using "set_palette", and the scale of the plot using "set_context".

8. Adding a title
03:00 - 03:16
Finally, every plot should be given an informative title and axis labels. Recall the two types of plot objects - FacetGrids and AxesSubplots - and the way to add a title to each of them.

9. Final touches
03:16 - 03:34
Also recall how to use the "set" function with the "xlabel" and "ylabel" parameters to provide custom x- and y-axis labels, and how to use "plt.xticks" with the "rotation" parameter to rotate the x-tick labels.

10. Let's practice!
03:34 - 03:47
And that's it! You're now equipped to make impressive and effective data visualizations with Seaborn. Let's practice putting all of these steps together in the final exercises of this course.