So far, we've coded our graphs using Matplotlib. In this lesson, we'll introduce Seaborn — a Python data visualization library that builds on Matplotlib.

The graphs we've built showed at most two variables (columns):

- Time versus number of coronavirus cases
- Temperature versus bikes rented
- Slowness in traffic versus lack of electricity

Seaborn enables us to easily show more than two variables on a graph. Below, we see a graph with five variables (we'll introduce the data and explain the graph later in this lesson).

![](screen1_1.png)

Although the graph shows five variables, we generated it with a single line of code. Behind the curtains, however, Seaborn used many lines of Matplotlib code to build the graph.

Throughout the lesson, we'll use a dataset about house characteristics and sale prices. The houses were sold between 2006 and 2010 in Ames, Iowa. Below, we see the first five rows:

Professor Dean DeCock collected the data — he described the data collection process in a [paper](https://www.tandfonline.com/doi/abs/10.1080/10691898.2011.11889627) he published in the Journal of Statistics Education. We can find the documentation for the dataset at [this link](https://s3.amazonaws.com/dq-content/307/data_description.txt).

We slightly modified the dataset for teaching purposes. You can download the original dataset from [here](https://s3.amazonaws.com/dq-content/305/AmesHousing.txt) and the modified version from our interface.

Let's read in the data (done above!).

### Exercise:

1. Read in the `housing.csv` file into a pandas `DataFrame`. Assign the `DataFrame` to a variable named `housing`.
2. Inspect the first and the last five rows.
3. Use the `DataFrame.info()` method to learn some basic facts about the dataset:
   - What is the number of rows and columns?
   - Are there null values?
   - What is the data type of each column?


In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

In [2]:
housing = pd.read_csv('housing.csv')

In [3]:
housing.head()

Unnamed: 0,Order,PID,MS SubClass,MS Zoning,Lot Frontage,Lot Area,Street,Alley,Lot Shape,Land Contour,...,Fence,Misc Feature,Misc Val,Mo Sold,Yr Sold,Sale Type,Sale Condition,SalePrice,Year,Rooms
0,1,526301100,20,RL,141.0,31770,Pave,,IR1,Lvl,...,,,0,5,2010,WD,Normal,215000,1999 or older,7 rooms or more
1,2,526350040,20,RH,80.0,11622,Pave,,Reg,Lvl,...,MnPrv,,0,6,2010,WD,Normal,105000,1999 or older,6 rooms or less
2,3,526351010,20,RL,81.0,14267,Pave,,IR1,Lvl,...,,Gar2,12500,6,2010,WD,Normal,172000,1999 or older,6 rooms or less
3,4,526353030,20,RL,93.0,11160,Pave,,Reg,Lvl,...,,,0,4,2010,WD,Normal,244000,1999 or older,7 rooms or more
4,5,527105010,60,RL,74.0,13830,Pave,,IR1,Lvl,...,MnPrv,,0,3,2010,WD,Normal,189900,1999 or older,6 rooms or less


In [4]:
housing.tail()

Unnamed: 0,Order,PID,MS SubClass,MS Zoning,Lot Frontage,Lot Area,Street,Alley,Lot Shape,Land Contour,...,Fence,Misc Feature,Misc Val,Mo Sold,Yr Sold,Sale Type,Sale Condition,SalePrice,Year,Rooms
2925,2926,923275080,80,RL,37.0,7937,Pave,,IR1,Lvl,...,GdPrv,,0,3,2006,WD,Normal,142500,1999 or older,6 rooms or less
2926,2927,923276100,20,RL,,8885,Pave,,IR1,Low,...,MnPrv,,0,6,2006,WD,Normal,131000,1999 or older,6 rooms or less
2927,2928,923400125,85,RL,62.0,10441,Pave,,Reg,Lvl,...,MnPrv,Shed,700,7,2006,WD,Normal,132000,1999 or older,6 rooms or less
2928,2929,924100070,20,RL,77.0,10010,Pave,,Reg,Lvl,...,,,0,4,2006,WD,Normal,170000,1999 or older,6 rooms or less
2929,2930,924151050,60,RL,74.0,9627,Pave,,Reg,Lvl,...,,,0,11,2006,WD,Normal,188000,1999 or older,7 rooms or more


In [5]:
housing.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2930 entries, 0 to 2929
Data columns (total 84 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Order            2930 non-null   int64  
 1   PID              2930 non-null   int64  
 2   MS SubClass      2930 non-null   int64  
 3   MS Zoning        2930 non-null   object 
 4   Lot Frontage     2440 non-null   float64
 5   Lot Area         2930 non-null   int64  
 6   Street           2930 non-null   object 
 7   Alley            198 non-null    object 
 8   Lot Shape        2930 non-null   object 
 9   Land Contour     2930 non-null   object 
 10  Utilities        2930 non-null   object 
 11  Lot Config       2930 non-null   object 
 12  Land Slope       2930 non-null   object 
 13  Neighborhood     2930 non-null   object 
 14  Condition 1      2930 non-null   object 
 15  Condition 2      2930 non-null   object 
 16  Bldg Type        2930 non-null   object 
 17  House Style   

On the previous screen, we introduced a graph that shows five variables (columns):

- `SalePrice`: price of the sale in USD <p>
- `Gr Liv Area`: above grade (ground) living area in square feet <p>
- `Overall Qual`: quality ratings of the overall material and finish of the house <p>
- `Garage Area`: garage area in square feet <p>
- `Rooms` : number of rooms <p>

![](screen1_1.png)
    
We'll start by plotting only two of these variables: SalePrice and Gr Liv Area. In the code below, we do the following:

- Import `seaborn` as `sns` — "sns" is the standard alias. <p> 
- Import `matplotlib.pyplot` as `plt` — "plt" is the standard alias. <p> 
- Call the `sns.relplot()` function to generate the plot. <p> 
    - We pass in the `housing` DataFrame to the `data` parameter. <p> 
    - We pass in the column names as strings to parameters `x` and `y`. <p> 
    - By default, the `sns.relplot()` function generates a scatter plot. <p> 
- Call `plt.show()` to display the graph (this works because Seaborn uses Matplotlib code behind the curtains — just like Pandas).
