# Freestyle python exercises

Trying some basic Python programming and plot

![img](https://o.aolcdn.com/images/dims?quality=85&image_uri=http%3A%2F%2Fo.aolcdn.com%2Fhss%2Fstorage%2Fmidas%2F196d822091dd62d7ab7eac6e7ecdda3f%2F203421819%2Fpythonbootcamp_editorial.jpg&client=amp-blogside-v2&signature=b49b56f1df65b7a855aee2d6b5b85b34f56196f3)

In [None]:
import math
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import geopandas as geo
from geopandas import GeoDataFrame
from shapely.geometry import Point

In [None]:
%matplotlib inline
plt.rcParams['figure.figsize'] = [20, 8]

## Let's cook the data!

Loading CSV file locally with Pandas

In [None]:
estate = pd.read_csv("data/Sacramentorealestatetransactions.csv", sep=",")

How can you view the data you just loaded?

## Explore the data

### How many property types are there?

List the unique "type" of the properties from the dataframe.

### Find date range of the data (sale dates)

The column "sale_date" represents the date of each record in the dataframe. Let's find out the bounding range.

### How many properties are smaller than 60 m^2 ?

Given 10.764 sq ft = 1 sqm

### Find the highest, lowest, mean prices of properties in each city

Aggregations come into play.

In [None]:
price_by_state = estate.groupby("city") ..........

In [None]:
price_by_state[:10]

### Find how much each property is more expensive than average price of the city

Using the mean price of each city we figured out earlier.

In [None]:
mean_city_price = price_by_state["price"]["mean"]
estate_price = estate[["street","type","city", "price","latitude","longitude"]]

In [None]:
estate = pd.merge(estate_price, mean_city_price, how="inner", on="city")

In [None]:
estate[:5]

In [None]:
estate["%"] = .......

### A more complex calculations

How many properties in each city is more expensive than city's average by at least 25%?

In [None]:
city_count = estate.groupby("city").size().to_frame("c")
city_count.head()

And how many % of properties are that expensive? 

### Distance from the city centre

Let's apply a function

In [None]:
city_centre = pd.read_csv("data/city_centre.csv", sep=",")
city_centre[:5]

In [None]:
def dist(lat1, lng1, lat2, lng2):
    r = 6371 # km
    dlat = math.radians(lat2-lat1)
    dlng = math.radians(lng2-lng1) 
    a = math.sin(dlat/2) * math.sin(dlat/2) + \
        math.cos(math.radians(lat1)) * math.cos(math.radians(lat2)) * \
        math.sin(dlng/2) * math.sin(dlng/2)
    c = 2 * math.atan2(math.sqrt(a), math.sqrt(1-a))
    return r * c

In [None]:
def find_dist(r):
    return dist(r["latitude"], r["longitude"], r["latitude_centre"], r["longitude_centre"])

In [None]:
estate = pd.merge(estate, city_centre, how="inner", on="city", suffixes=("", "_centre"))

In [None]:
estate["dist"] = .......
estate[:10]

### Go graphical

Let's plot the geographical distribution of the price / relative price to the city average / ...

In [None]:
coords = [Point(coord) for coord in zip(estate["latitude"], estate["longitude"])]

In [None]:
gf = geo.GeoDataFrame(estate, geometry=coords)

In [None]:
gf[gf["%"] > 0].plot(column="%", legend=True)

In [None]:
gf.plot(column="price", legend=True)

### Plot distance from the centre

### Plot histogram of prices by city

Can you change the following code to plot price histograms by type of properties instead of city?

In [None]:
for city in ["LINCOLN", "SACRAMENTO", "ELVERTA"]:
    x = estate[estate["city"] == city]["price"]
    plt.hist(x, bins=25, alpha=0.5, label=city)
plt.legend(loc="upper right")
plt.show()

Try plotting the histogram of prices by each property type

### Extract street names

In [None]:
dist_city["str"] = .......

## Clustering

In [None]:
from sklearn.cluster import KMeans

In [None]:
kmeans = KMeans(n_clusters=8, max_iter=80, n_jobs=4)

In [None]:
trainset = estate[["latitude", "longitude"]].values
trainset.shape

In [None]:
cluster_kmeans = kmeans.fit_predict(trainset)

In [None]:
estate["cluster_kmeans"] = cluster_kmeans.tolist()

Let's plot the clustered data!

In [None]:
for i in range(8):
    
    # plt.scatter(lat, lng, label=i)
    
plt.legend(loc="upper right")
plt.show()

### More exercises (Jupyter Notebooks)

- [Idiomic python samples](https://nbviewer.jupyter.org/github/jerry-git/learn-python3/blob/master/notebooks/intermediate/notebooks/idiomatic_misc1.ipynb)
- [Python standard library samples](https://nbviewer.jupyter.org/github/jerry-git/learn-python3/blob/master/notebooks/beginner/notebooks/std_lib.ipynb)
- [Python list samples](https://github.com/ksjpswaroop/Learn2Code/blob/master/Learn2Code%20-%20Part%203%20-%20Lists.ipynb~)
- [100 Pandas puzzles](https://github.com/ajcr/100-pandas-puzzles/blob/master/100-pandas-puzzles-with-solutions.ipynb)
- [Scipy intro samples](https://github.com/iitmcvg/Python-Exercises/blob/master/Exercise%204%20-%20Scipy.ipynb)