# Basic EDA on Lagos House Prices Data Set

The dataset contains 5336 records of rental properties around 7 districts in lagos. The data was sourced from a Nigerian real estate company website.

Follow the steps provided to explore the data

In [2]:
#import pandas
import pandas as pd

Load the dataset and save it in a dataframe called `lagos_houses`.

```
lagos_houses = pd.read_csv('path_to_file\\filename.csv')
```

If your file is located in the same folder as this notebook you are working with, your code will simply look like this:

```
lagos_houses = pd.read_csv('filename.csv')
```

In [4]:
#load data file
lagos_houses = pd.read_csv(r'lagos_house_prices_raw.csv')

use the head() method to view the first few rows of the data in `lagos_houses`

In [None]:
#use the head() method to view the first few rows of the data in lagos_houses
lagos_houses.head()

use the tail() method to view the last few rows of the data in `lagos_houses`

In [None]:
#your code here


Check if there is missing data in the data set.

`dataframe.isna().sum()`

In [None]:
#your code here


Use info() method to view datatypes and null counts

In [None]:
#Use info() method to view datatypes and null counts
lagos_houses.info()

How many listings are there per location?

On the location column, use the method `value_counts()` to aggregate the count of listings per location.

Your code will look like this:
```
dataframe.column.value_counts()
```

Your result will look like this:

![image.png](attachment:image.png)

> **Make sure to edit the code to refer to your own variable names and column names**

In [None]:
#how many listings per location
#your code here


Use a bar plot to represent the count of listings per location.

Add `.plot.bar()` after your value_counts() code. Your code will look like this:
    
```
dataframe.column.value_counts().plot.bar()
```

Your bar chart will look like this:

![image.png](attachment:image.png)

In [7]:
#plot location counts
#your code here:


We can create summary statistics per location. At a glance we can see:

* counts
* mean
* minimum value
* standard deviation
* 25th percentile
* median (50th percentile)
* 75th percentile
* maximum value

The `describe()` method is used to view summary statistics. Run the code cell below to see it

In [None]:
#summary statistics per location
lagos_houses.groupby('location').price.describe()

Are your numbers showing in scientific notation and you don't like it? :) 

add `.astype(int)` after `describe()` to have it show integers in full form instead.

Rerun the cell after you've done this and check the difference.

Now, let's create a few pretty visuals with the **seaborn** and **matplotlib** libraries

In [None]:
#import both visualizations packages
import seaborn as sns
import matplotlib.pyplot as plt


Let's create a boxplot. Remember to use documentation to view parameters you can pass into any function you are calling.

For instance, if you use **SHIFT+TAB** to pop out documentation for the boxplot, you'll have something that looks like this:

![image.png](attachment:image.png)

Notice the parameters `x`, `y` and `data`.

* Assign your dataframe 'lagos_houses' to the `data` parameter
* then assign 'location' to the `x` parameter 
* Assign 'price' to the `y` parameter

Your code should look like this

`sns.boxplot(x=column_for_x, y=column_for_y, data=dataframe_name)`

In [None]:
#view distribution of prices by location
#your code here


plt.show()

* What location has the most expensive houses?
* What location has the least expensive houses?

In [None]:
#Your answers here

#1. 
#2.

In [None]:
# what can we get from looking at a pairplot
sns.pairplot(lagos_houses)

### Let's make it a bit complex

The cells below contain visualization code with more complexity. For learning purposes, I want you to see a few more possibilities and notice the kinds of questions you can use these charts to answer.

In [None]:
#create a pivot comparing prices of different property types at various locations
#use the unstack() method after a groupby()
pivot = lagos_houses.groupby(['location','Property_Type']).price.mean().astype(int).unstack('Property_Type')
#display the pivot
pivot

In [None]:
#plot the pivot table

pivot.plot.bar(figsize=(15,5))#we use figsize parameter to set the height and width of the chart
plt.ticklabel_format(style='plain',axis='y') #write out the prices on y axis in full not scientific notation

plt.show()

In [None]:
fig, ax = plt.subplots(7,figsize=(8,20)) #set up multiple plots on the same canvas


ax[0] = sns.boxplot(y='Property_Type',x='price',data=lagos_houses[lagos_houses.location=='yaba'], ax=ax[0])
ax[0].ticklabel_format(style='plain',axis='x') #write out numbers in full instead of scientific notation
ax[0].set_title('Yaba')

ax[1] = sns.boxplot(y='Property_Type',x='price',data=lagos_houses[lagos_houses.location=='ajah'], ax=ax[1])
ax[1].set_title('Ajah')

ax[2] = sns.boxplot(y='Property_Type',x='price',data=lagos_houses[lagos_houses.location=='surulere'], ax=ax[2])
ax[2].set_title('Surulere')

ax[3] = sns.boxplot(y='Property_Type',x='price',data=lagos_houses[lagos_houses.location=='gbagada'], ax=ax[3])
ax[3].set_title('Gbagada')

ax[4] = sns.boxplot(y='Property_Type',x='price',data=lagos_houses[lagos_houses.location=='ikeja'], ax=ax[4])
ax[4].set_title('Ikeja')

ax[5] = sns.boxplot(y='Property_Type',x='price',data=lagos_houses[lagos_houses.location=='lekki phase 1'], ax=ax[5])
ax[5].set_title('Lekki Phase 1')

ax[6] = sns.boxplot(y='Property_Type',x='price',data=lagos_houses[lagos_houses.location=='ikorodu'], ax=ax[6])
ax[6].set_title('Ikorodu')

plt.tight_layout() #prevent overlapping charts

### Can you keep going?