<a href="https://colab.research.google.com/github/geonextgis/Geospatial_Data_Science_with_Python/blob/main/03_Exploratory_Spatial_Data_Analysis/01_Exploratory_Data_Visualization_New.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **Import Required Libraries**

In [3]:
import os
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt

## **Setting Up the Current Working Directory**

In [4]:
# Checking the current working directory
os.getcwd()

'/content'

In [5]:
# Changing the current working directory
file_path = r"D:\Coding\Git Repository\Geospatial_Data_Science_with_Python\Datasets"
os.chdir(file_path)
csv_path = file_path + "\\CSVs"
shp_path = file_path + "\\Shapafiles"

FileNotFoundError: ignored

In [None]:
# Checking the new current working directory
os.getcwd()

## **Reading the Data**

**Dataset Description:**<br>
This is the dataset used in the second chapter of Aurélien Géron's recent book 'Hands-On Machine learning with Scikit-Learn and TensorFlow'. It serves as an excellent introduction to implementing machine learning algorithms because it requires rudimentary data cleaning, has an easily understandable list of variables and sits at an optimal size between being to toyish and too cumbersome.

The data contains information from the 1990 California census. So although it may not help you with predicting current housing prices like the Zillow Zestimate dataset, it does provide an accessible introductory dataset for teaching people about the basics of machine learning.

**Content:**<br>
1. longitude: A measure of how far west a house is; a higher value is farther west
2. latitude: A measure of how far north a house is; a higher value is farther north
3. housingMedianAge: Median age of a house within a block; a lower number is a newer building
4. totalRooms: Total number of rooms within a block
5. totalBedrooms: Total number of bedrooms within a block
6. population: Total number of people residing within a block
7. households: Total number of households, a group of people residing within a home unit, for a block
8. medianIncome: Median income for households within a block of houses (measured in tens of thousands of US Dollars)
9. medianHouseValue: Median house value for households within a block (measured in US Dollars)
10. oceanProximity: Location of the house w.r.t ocean/sea

In [None]:
# Reading the housing.csv data with pandas
housing = pd.read_csv(csv_path + "\\housing.csv")
# Checking the name of the columns
housing.columns

In [None]:
# Checking the first 5 rows of the data
housing.head()

In [None]:
# Checking the shape of the dataframe
housing.shape

## **Conducting Exploratory Data Analysis (EDA)**

In [None]:
# Checking the non-null values in each column
housing.info()

In [None]:
# Cleaning the data
housing.dropna(inplace=True)

In [None]:
# Checking the shape of the dataframe
housing.shape

In [None]:
# Checking the value counts of the ocean_proximity column
housing["ocean_proximity"].value_counts()

In [None]:
# Definining a dictionary to encode the values of ocean_proximity column from string to int
ocean_proximity_dict = {"ISLAND": 0, "NEAR BAY": 1, "NEAR OCEAN": 2, "INLAND": 3, "<1H OCEAN": 4}
# Encoding the ocean_proximity column
encoded_ocean_proximity = housing["ocean_proximity"].replace(ocean_proximity_dict)

In [None]:
# Creating a copy of housing dataframe
housing_copy = housing.copy()

In [None]:
# Setting the encoded values of ocean_proximity column
housing_copy["ocean_proximity"] = encoded_ocean_proximity
# Checking the first 5 rows of the new housing_copy dataframe
housing_copy.head()

In [None]:
# Checking the non-null values in each column of the new housing_copy dataframe
housing_copy.info()

In [None]:
# Dropping the rows with null values
housing_copy.dropna(inplace=True)
# Resetting the index
housing_copy.reset_index(inplace=True, drop=True)

In [None]:
# Checking the final dataframe
housing_copy.head()

In [None]:
# Checking the dataframe information
housing_copy.info()

In [None]:
# Describing the dataframe
housing_copy.describe()

In [None]:
# Create a visual representation of the data
housing_copy.hist(bins=50, figsize=(20, 18))

## **Exploratory Spatial Data Analysis (ESDA)**

In [None]:
# Converting the pandas dataframe to a geopandas dataframe
housing_gdf = gpd.GeoDataFrame(housing_copy, geometry=gpd.points_from_xy(housing_copy.longitude, housing_copy.latitude, crs=4326))

In [None]:
# Checking the CRS of the geodataframe
housing_gdf.crs

In [8]:
import leafmap

In [9]:
m = leafmap.Map()

In [10]:
m

Map(center=[20, 0], controls=(ZoomControl(options=['position', 'zoom_in_text', 'zoom_in_title', 'zoom_out_text…