## Geospatial Data Visualization

In [None]:
%%html
<style>
    /* Jupyter */
    .rendered_html table,
    /* Jupyter Lab*/
    div[data-mime-type="text-markdown"] table {
        margin-left: 0
    }
</style>

### Import the necessary Libraries

In [None]:
# Basic Libaries
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

## Geospatial Data Visualization Libraries
#!pip install folium - install folium library for mapping
import folium
from folium.plugins import MarkerCluster

### Import property attributes dataset

In [None]:
df = pd.read_csv("innercity.csv")

### Check a few rows of the loaded dataset to ensure if data is loaded is correctly

In [None]:
df.head()

- Dataset is property data from the King county, Washington, USA

- King County is considered the most populous county in Washington, and the 12th-most populous in the United States

- The data is densely clustered around Seattle-Bellevue-Renton-Kent-Federal Way-Tacoma area, an urban conglomeration

- Property data contains 21,613 observations with 23 variables [22 independent and 1 target variable (price)], which includes mostly numerical and date/time attributes and help define property characteristics.

### Exploratory Data Analysis

#### # Check the shape of dataframe

In [None]:
df.shape

Property data contains **21,613** observations with **23** variables, which includes mostly numerical and date/time attributes and help define property characteristics. The dataset contains **22 independent variables** and **1 target variable (price)**. Each entry represents a property characteristics such as number of bedroom, bathroom, measurements (area, height), year built, aesthetic value (proximity to coast, sight etc.) along with locational attributes (lat, lon, zipcode) according to the set of attributes.

#### # Check the name of fields in data

In [None]:
df.columns

The below table list these attributes with tehir description:

#### Attribute Information

| Attribute        | Data Type | Description                                                                                                     |
|:------------------|:-----------|:-----------------------------------------------------------------------------------------------------------------|
| cid              | Numeric   | A notation for a house                                                                                          |
| dayhours         | Date/Time | Date house was sold                                                                                             |
| price            | Numeric   | Price is prediction target                                                                                      |
| room_bed         | Numeric   | Number of Bedrooms/House                                                                                        |
| room_bath        | Numeric   | Number of bathrooms/bedrooms                                                                                    |
| living_measure   | Numeric   | Square footage of the home                                                                                      |
| lot_measure      | Numeric   | Square footage of the lot                                                                                       |
| ceil             | Numeric   | Total floors (levels) in house                                                                                  |
| coast            | Numeric   | House which has a view to a waterfront                                                                          |
| sight            | Numeric   | Has been viewed                                                                                                 |
| condition        | Numeric   | How good the condition is (Overall)                                                                             |
| quality          | Numeric   | Grade given to the housing unit, based on grading system                                                        |
| ceil_measure     | Numeric   | Square footage of house apart from basement                                                                     |
| basement_measure | Numeric   | Square footage of the basement                                                                                   |
| yr_built         | Numeric   | Built Year                                                                                                      |
| yr_renovated     | Numeric   | Year when house was renovated                                                                                   |
| zipcode          | Numeric   | Zip                                                                                                             |
| lat              | Numeric   | Latitude coordinate                                                                                             |
| long             | Numeric   | Longitude coordinate                                                                                            |
| living_measure15 | Numeric   | Living room area in 2015 (implies-- some renovations) This might or might not have affected the lotsize area |
| lot_measure15    | Numeric   | LotSize area in 2015 (implies-- some renovations)                                                               |
| furnished        | Numeric   | Based on the quality of room                                                                                    |
| total_area       | Numeric   | Measure of both living and lot       

#### # Check the data type and null values present in fields

In [None]:
df.info()

*No null value is present in the data*

#### Count of unique values in each field

In [None]:
df.nunique()

#### Check for missing values

In [None]:
df.isnull().sum()

*No missing value is present in the data*

#### Check rows with missing values

In [None]:
df[df.isnull().any(axis=1)]

*No duplicate rows exists and hence no duplicate removal step is required*

#### Viewing the data statistics

In [None]:
df1 = df.describe().transpose()
dfStyler = df1.style.set_properties(**{'text-align': 'left'})
dfStyler.set_table_styles([dict(selector='th', props=[('text-align', 'left')])])

Describe function illustrates that various fields/attributes have 0 values in the data. Whether these Zero are meaningful or require cleansing would require further data exploration. A few outliers are present in the data (for example75% of the data in field room_bed is within limits of 3 bedroom but value such as 33 is seen as max value) which need to be imputed with right strategy. Skewness is present in the data as well. 

### Geospatial Data Visualization using Folium Library

#### Map with simple marker

In [None]:
# Create Map: Basemap - OpenStreet Map
property_map = folium.Map(
    location=[df['lat'].mean(), 
              df['long'].mean()], 
    zoom_start=11,
    control_scale=True
)

for i in range(len(df)):
    
    folium.CircleMarker(
        location = [df.lat.iloc[i], df.long.iloc[i]],
        radius = 3,
        popup = df.cid.iloc[i],
        color = 'blue',
        opacity = 0.2
    ).add_to(property_map) 
    
property_map

#### Cluster Map

In [None]:
# Create Map with clustering: Basemap - OpenStreet Map
property_map = folium.Map(
    location=[df['lat'].mean(), 
              df['long'].mean()], 
    zoom_start=11,
    control_scale=True)

mc = MarkerCluster()

#creating a Marker for each point in dataframe. Each point will get a popup with their unique identifier (cid) value
for row in df.itertuples():
    mc.add_child(folium.Marker(location=[row.lat,row.long], popup=row.cid))
property_map.add_child(mc)
property_map
## Other background Maps can be added. Options are
    # "OpenStreetMap" -- default option
    # "Mapbox Bright" (Limited levels of zoom for free tiles)
    # "Mapbox Control Room" (Limited levels of zoom for free tiles)
    # "Stamen" (Terrain, Toner, and Watercolor)
    # "Cloudmade" (Must pass API key)
    # "Mapbox" (Must pass API key)
    # "CartoDB" (positron and dark_matter)

#### Heatmap

In [None]:
# Create HeatMap
from folium.plugins import HeatMap
property_map = folium.Map(
    location=[df['lat'].mean(), 
              df['long'].mean()], 
    tiles='Stamen Terrain',
    attr='Map tiles by Stamen Design, under CC BY 3.0. Data by OpenStreetMap, under ODbL',
    zoom_start=11,
    control_scale=True
)

df['count'] = 1
property_heatmap = HeatMap(
    data=df[['lat', 'long', 'count']].groupby(['lat', 'long']).sum().reset_index().values.tolist(), 
    name = 'Heatmap',
    radius = 10, 
    min_opacity = 0.1, 
    max_zoom=16,
    opacity = 10
).add_to(property_map)
         
folium.LayerControl().add_to(property_map)
property_map
