# Exercise 4 from the intro to python gis course
https://github.com/AutoGIS-2017/Exercise-4

In [7]:
import geopandas as gpd
import pandas as pd
from geopandas.tools import geocode
import pandas as pd
from shapely.geometry import Point, LineString, Polygon
import fiona

## Problem 1: Join accessibility datasets into a grid and visualize them by using a classifier (6 points)

**Steps:**

 - Download a dataset from [**here**](https://github.com/Automating-GIS-processes/Lesson-4-Classification-overlay/raw/master/data/dataE4.zip) that includes 7 text files containing data about accessibility in Helsinki Region and a Shapefile that contains a Polygon grid that can be used to visualize and analyze the data spatially. The datasets are:
 
     - `travel_times_to_[XXXXXXX]_[NAME-OF-THE-CENTER].txt` including travel times and road network distances to specific shopping center
     - `MetropAccess_YKR_grid_EurefFIN.shp` including the Polygon grid with YKR_ID column that can be used to join the grid with the    accessibility data

 - Read those travel_time data files (one by one) with Pandas and select only following columns from them:
    
    - pt_r_tt
    - car_r_t
    - from_id
    - to_id
  
 - Visualize the **classified** travel times (Public transport AND Car) of at least one of the shopping centers using the classification methods that we went through in the [lesson materials](https://automating-gis-processes.github.io/2017/lessons/L4/reclassify.html). You need to classify the data into a new column in your GeoDataFrame. For classification, you can either:
 
    - Use the [common classifiers from pysal](https://automating-gis-processes.github.io/2017/lessons/L4/reclassify.html#classification-based-on-common-classifiers)
 
    - Or create your own [custom classifier](https://automating-gis-processes.github.io/2017/lessons/L4/reclassify.html#creating-a-custom-classifier). If you create your own, remember to document it well how it works! Write a general description of it and comment your code as well.
 
 - Upload the map(s) you have visualized into your own Exercise 4 repository (they don't need to be pretty). If visualizing takes for ever (as computer instance can be a bit slow), it is enough that you visualize only one map using plotting in Geopandas. If it is really slow, you can do the visualization also using the QuantumGIS in the computer instance or even ArcGIS in the GIS-lab. 

In [42]:
stores = ['5878070_Jumbo', '5878087_Dixi', '5902043_Myyrmanni', '5944003_Itis', '5975373_Forum', '5978593_Iso_omena', '5980260_Ruoholahti']
stores_files = ['TravelTimes_to_' + storeName + '.txt' for storeName in stores]
stores_dfs = [x for x in range(len(stores))]

for i in range(len(stores)):
    store_df = pd.read_csv('./dataE4/' + stores_files[i], delimiter=';')
    store_df = store_df[['pt_r_tt','car_r_t','from_id','to_id']]
    stores_dfs[i] = store_df
    
stores_dfs[1].head()

   from_id    to_id  walk_t  walk_d  car_r_t  car_r_d  car_m_t  car_m_d  \
0  5785640  5878070     318   22279       39    23233       34    23233   
1  5785641  5878070     281   19662       39    23444       34    23444   
2  5785642  5878070     282   19742       45    17666       41    17666   
3  5785643  5878070     286   20034       46    24695       40    24698   
4  5787544  5878070     311   21789       38    22807       33    22807   

   pt_r_t  pt_r_tt  pt_r_d  pt_m_t  pt_m_tt  pt_m_d  
0     101      131   24276     106      138   22627  
1     108      129   26134     109      137   22833  
2     109      129   26251     111      137   22951  
3     114      138   26544     115      141   23244  
4      98      115   25438      90      113   22138  
   from_id    to_id  walk_t  walk_d  car_r_t  car_r_d  car_m_t  car_m_d  \
0  5785640  5878087     350   24513       45    27882       40    27890   
1  5785641  5878087     321   22438       45    28093       40    28100   


Unnamed: 0,pt_r_tt,car_r_t,from_id,to_id
0,134,45,5785640,5878087
1,130,45,5785641,5878087
2,130,52,5785642,5878087
3,140,48,5785643,5878087
4,118,44,5787544,5878087


In [30]:
def relativePublicTransitToCar(row, output_col):
    if row['car_r_t'] == 0:
        pttToCar = None
    else:        
        pttToCar = row['pt_r_tt'] / row['car_r_t']
    row[output_col] = pttToCar

In [31]:
exampleStoreDF = stores_dfs[0]
exampleStoreDF['publicTransitToCarTime'] = None
exampleStoreDF = exampleStoreDF.apply(relativePublicTransitToCar, output_col='publicTransitToCarTime', axis=1)


In [40]:
data = gpd.read_file('./dataE4/MetropAccess_YKR_grid_EurefFIN.shp')
data.head()

Unnamed: 0,x,y,YKR_ID,geometry
0,381875.0,6697880.0,5785640,"POLYGON ((382000.0001388059 6697750.000128186,..."
1,382125.0,6697880.0,5785641,"POLYGON ((382250.00013875 6697750.000128181, 3..."
2,382375.0,6697880.0,5785642,"POLYGON ((382500.0001386951 6697750.000128172,..."
3,382625.0,6697880.0,5785643,"POLYGON ((382750.0001386406 6697750.000128165,..."
4,381125.0,6697630.0,5787544,"POLYGON ((381250.000138978 6697500.000128254, ..."


## Problem 2: Calculate and visualize the dominance areas of shopping centers (9 points)

In this problem, the aim is to define the dominance area for each of those shopping centers based on travel time. 

How you could proceed with the given problem is: 

 - iterate over the accessibility files one by one
 - rename the travel time columns so that they can be identified 
   - you can include e.g. the `to_id` number as part of the column name (then the column name could be e.g. "pt_r_tt_5987221")
 - Join those columns into MetropAccess_YKR_grid_EurefFIN.shp where `YKR_ID` in the grid corresponds to `from_id` in the travel time data file. At the end you should have a GeoDataFrame with different columns show the travel times to different shopping centers.
 - For each row find out the **minimum** value of **all** pt_r_tt_XXXXXX columns and insert that value into a new column called `min_time_pt`. You can now also parse the `to_id` value from the column name (i.e. parse the last number-series from the column text) that had the minimum travel time value and insert that value **as a number** into a column called `dominant_service`. In this, way are able to determine the "closest" shopping center for each grid cell and visualize it either by travel times or by using the `YKR_ID` number of the shopping center (i.e. that number series that was used in column name).
 - Visualize the travel times of our `min_time_pt` column using a [common classifier from pysal](https://automating-gis-processes.github.io/2017/lessons/L4/reclassify.html#classification-based-on-common-classifiers) (you can choose which one).
 - Visualize also the values in `dominant_service` column (no need to use any specific classifier). Notice that the value should be a number. If it is still as text, you need to convert it first.
 - Upload the map(s) you have visualized into your own Exercise 4 repository (they don't need to be pretty).

## Problem 3: How many people live under the dominance area of each shopping center? (5 points)

Take advantage of the materials last week and find out how many people live under the dominance area of each shopping center. You should first [aggregate](file:///D:/KOODIT/Opetus/Automating-GIS-processes/AutoGIS-Sphinx/build/html/Lesson4-geometric-operations.html#aggregating-data) your dominance areas into a unified geometries using [`dissolve()`](http://geopandas.org/aggregation_with_dissolve.html#dissolve-example) -function in Geopandas.