# Exploratory Data Analysis on Housing Dataset

## Congratulations!!

<img src="https://curriculum-content.s3.amazonaws.com/data-science/images/awesome.gif" alt='image of a man motioning in celebration of your progress'>

## Introduction

In this lesson you will use all of the information you have learned and do a short project. You will take a look at the provided dataset and gather some insight that you will share in a short presentation. You have been provided with some starter code, but please explore different features and look for more meaningful insight.

The ultimate purpose of exploratory analysis is not just to learn about the data, but to help an organization perform better. Explicitly relate your findings to business needs by recommending actions that you think an investor should take if they were shopping for property in this area.

## Objectives
* Perform basic data analysis
* Communicate analysis results

## Dataset Overview

The dataset provides information about various properties, including details such as price, number of bedrooms and bathrooms, square footage, location, and more. The data can be found in this repository, the file name is `housing_data.csv`. Below is a description of each of the columns ('features'):

* id - unique identified for a house
* dateDate - house was sold
* pricePrice - is prediction target
* bedroomsNumber - of Bedrooms/House
* bathroomsNumber - of bathrooms/bedrooms
* sqft_livingsquare - footage of the home
* sqft_lotsquare - footage of the lot
* floorsTotal - floors (levels) in house
* waterfront - House which has a view to a waterfront
* view - Has been viewed
* condition - How good the condition is ( Overall )
* grade - overall grade given to the housing unit, based on King County grading system
* sqft_above - square footage of house apart from basement
* sqft_basement - square footage of the basement
* yr_built - Built Year
* yr_renovated - Year when house was renovated
* zipcode - zip
* lat - Latitude coordinate
* long - Longitude coordinate
* sqft_living15 - The square footage of interior housing living space for the nearest 15 neighbors
* sqft_lot15 - The square footage of the land lots of the nearest 15 neighbors

## Getting Started
Begin by loading the dataset and taking a quick look at its structure.

In [None]:
import pandas as pd

# Load the dataset
with open('housing_data.csv', 'w')as f:
    data = f.read()

df = # add the code here to load 'data' into a pandas dataframe

# Display the first few rows of the dataset



## Data Cleaning
Before you dive into the analysis, it's essential to ensure the data is clean and ready for exploration. Take some time to look inspect your data and handle missing values and anomalies. Use the following code cell for any data cleaning processes.

In [None]:
# Data cleaning
# Handle missing values and anomalies


## Descriptive Statistics
Now, let's calculate some descriptive statistics to understand the distribution of numerical columns in the dataset. Use the following code cell to get some basic information about your data.

In [None]:
# Descriptive statistics
# Calculate mean, median, standard deviation, percentiles, etc.


## Categorical Insights
Exploring categorical features can provide valuable insights into property distribution across different categories. Start by analyzing the distribution of property conditions.

In [None]:
# Categorical insights
# Count occurrences of different categories.



## Visualization: Property Price Distribution Histogram
Visualizing the distribution of property prices can help us understand the price range that is most common in the market.

In [None]:
import matplotlib.pyplot as plt

# Visualization: Property Price Distribution Histogram
plt.figure(figsize=(10, 6))
plt.hist(df['price'], bins=20, color='blue', alpha=0.7)
plt.title('Property Price Distribution')
plt.xlabel('Price')
plt.ylabel('Frequency')
plt.show()


## Visualization: Property Size vs. Price Scatter Plot
Let's create a scatter plot to explore the relationship between property size and price.

In [None]:
# Visualization: Property Size vs. Price Scatter Plot
plt.figure(figsize=(10, 6))
plt.scatter(df['sqft_living'], df['price'], alpha=0.5)
plt.title('Property Size vs. Price')
plt.xlabel('Square Footage')
plt.ylabel('Price')
plt.show()


## Visualization: Geographic Distribution using Latitude and Longitude
This scatter plot will visualize the geographic distribution of properties using latitude and longitude, with color indicating property prices.

In [None]:
# Visualization: Geographic Distribution using Latitude and Longitude
plt.figure(figsize=(12, 8))
plt.scatter(df['long'], df['lat'], alpha=0.5, c=df['price'], cmap='coolwarm')
plt.title('Geographic Distribution of Properties')
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.colorbar(label='Price')
plt.show()


## Visualization: Property Condition Bar Chart
A bar chart can provide insights into the distribution of properties across different conditions.

In [None]:
# Visualization: Property Condition Bar Chart
plt.figure(figsize=(10, 6))
condition_counts.plot(kind='bar', color='green', alpha=0.7)
plt.title('Property Condition Distribution')
plt.xlabel('Condition')
plt.ylabel('Number of Properties')
plt.xticks(rotation=0)
plt.show()


Visualization: Number of Bedrooms and Bathrooms Correlation
Finally, let's explore the correlation between the number of bedrooms and bathrooms in properties using a scatter plot.

In [None]:
# Visualization: Number of Bedrooms and Bathrooms Correlation
plt.figure(figsize=(10, 6))
plt.scatter(df['bedrooms'], df['bathrooms'], alpha=0.5, color='purple')
plt.title('Bedrooms vs. Bathrooms Correlation')
plt.xlabel('Number of Bedrooms')
plt.ylabel('Number of Bathrooms')
plt.show()


## Presentation Prompt
Now that we have completed the analysis and visualizations, your task is to prepare a short presentation describing the insights obtained from the analysis. Focus on the key findings, trends, and patterns you observed. Your presentation should be concise and informative, highlighting the investor relevant insights drawn from the data.