# 1. UrbanSim 소개 및 기본 예제 (Basic Example - Residential Price Hedonic)

* 싸이그래머 / DeepCity - 도시 데이터 분석 : UrbanSim
* 김무성

------------------------------

# UrbanSim
* [1] New version of UrbanSim, a platform for modeling metropolitan real estate markets - https://github.com/UDST/urbansim
* Ref . 
    - [2] UrbanSim - Modeling Urban Development for Land Use, Transportation, and Environmen tal Planning - https://astro.temple.edu/~jmennis/Courses/GUS_0150/readings/Waddell02.pdf
    - [3] Architecture for Modular Microsimulation of Real Estate Markets and Transportation - https://arxiv.org/abs/1807.01148
* [4] An UrbanSim for San Francisco: an example implementation of the new framework - https://github.com/UDST/sanfran_urbansim

# ActiveSim
* [5] An Open Platform for Activity-Based Travel Modeling - https://github.com/ActivitySim/activitysim

-------------------------

# Basic Example - Residential Price Hedonic
* https://udst.github.io/urbansim/examples.html#basic-example-residential-price-hedonic

In [33]:
%ls

01_Intro_And_UrbanSim-Hedonic_Example.ipynb


In [34]:
%mkdir data

In [35]:
%ls

01_Intro_And_UrbanSim-Hedonic_Example.ipynb  [0m[01;34mdata[0m/


In [5]:
%ls data

In [8]:
%ls ../

[0m[01;34mnotebooks[0m/  [01;34msanfran_urbansim[0m/  [01;34murbansim[0m/


In [10]:
%ls ../sanfran_urbansim/

assumptions.py  Estimation.ipynb       models.py         variables.py
[0m[01;34mconfigs[0m/        Exploration.ipynb      README.md
[01;34mdata[0m/           Hedonic Example.ipynb  Simulation.ipynb
dataset.py      ipython.lnk            utils.py


In [11]:
%ls ../sanfran_urbansim/data/

sanfran_public.h5  zones.json


In [12]:
%cp ../sanfran_urbansim/data/sanfran_public.h5 data/

In [13]:
%ls data

sanfran_public.h5


In [25]:
%cp ../sanfran_urbansim/utils.py .

In [26]:
%ls

01_Intro_And_UrbanSim-Hedonic_Example.ipynb  [0m[01;34mdata[0m/  utils.py


--------------------

In [14]:
import os
import pandas as pd
import numpy as np
import orca
from urbansim.models import RegressionModel
from urbansim.utils import misc

## Set the location of the HDFStore as an injectable called "store"

In [15]:
orca.add_injectable("store", pd.HDFStore(os.path.join(misc.data_dir(), "sanfran_public.h5"), mode="r"))

## Specify table sources and broadcasts that will be used later

In [16]:
@orca.table('buildings')
def buildings(store):
    df = store['buildings']
    return df

@orca.table('zones')
def zones(store):
    df = store['zones']
    return df

@orca.table('households')
def households(store):
    df = store['households']
    return df

@orca.table('parcels')
def parcels(store):
    df = store['parcels']
    return df

orca.broadcast('zones', 'buildings', cast_index=True, onto_on='zone_id')

## Specify the computed columns

In [18]:
@orca.column('households', 'income_quartile', cache=True)
def income_quartile(households):
    return pd.Series(pd.qcut(households.income, 4).labels,
                     index=households.index)

@orca.column('households', 'zone_id', cache=True)
def zone_id(households, buildings):
    return misc.reindex(buildings.zone_id, households.building_id)

@orca.column('zones', 'ave_unit_sqft')
def ave_unit_sqft(buildings, zones):
    s = buildings.unit_sqft[buildings.general_type == "Residential"]\
        .groupby(buildings.zone_id).quantile().apply(np.log1p)
    return s.reindex(zones.index).fillna(s.quantile())

@orca.column('zones', 'ave_lot_sqft')
def ave_lot_sqft(buildings, zones):
    s = buildings.unit_lot_size.groupby(buildings.zone_id).quantile().apply(np.log1p)
    return s.reindex(zones.index).fillna(s.quantile())

@orca.column('zones', 'sum_residential_units')
def sum_residential_units(buildings):
    return buildings.residential_units.groupby(buildings.zone_id).sum().apply(np.log1p)

@orca.column('zones', 'ave_income')
def ave_income(households, zones):
    s = households.income.groupby(households.zone_id).quantile().apply(np.log1p)
    return s.reindex(zones.index).fillna(s.quantile())

orca.add_injectable("building_type_map", {
    1: "Residential",
    2: "Residential",
    3: "Residential",
    4: "Office",
    5: "Hotel",
    6: "School",
    7: "Industrial",
    8: "Industrial",
    9: "Industrial",
    10: "Retail",
    11: "Retail",
    12: "Residential",
    13: "Retail",
    14: "Office"
})

@orca.column('buildings', 'zone_id', cache=True)
def zone_id(buildings, parcels):
    return misc.reindex(parcels.zone_id, buildings.parcel_id)

@orca.column('buildings', 'general_type', cache=True)
def general_type(buildings, building_type_map):
    return buildings.building_type_id.map(building_type_map)

@orca.column('buildings', 'unit_sqft', cache=True)
def unit_sqft(buildings):
    return buildings.building_sqft / buildings.residential_units.replace(0, 1)

@orca.column('buildings', 'unit_lot_size', cache=True)
def unit_lot_size(buildings, parcels):
    return misc.reindex(parcels.parcel_size, buildings.parcel_id) / \
        buildings.residential_units.replace(0, 1)
    
@orca.column('parcels', 'parcel_size', cache=True)
def parcel_size(parcels):
    return parcels.shape_area * 10.764

## Configure the model

In [19]:
rm = RegressionModel(
    fit_filters=[
        'unit_lot_size > 0',
        'year_built > 1000',
        'year_built < 2020',
        'unit_sqft > 100',
        'unit_sqft < 20000'
    ],
    predict_filters=[
        "general_type == 'Residential'"
    ],
    model_expression='np.log1p(residential_sales_price) ~ I(year_built < 1940)'
        '+ I(year_built > 2005) + np.log1p(unit_sqft) + np.log1p(unit_lot_size)'
        '+ sum_residential_units + ave_lot_sqft + ave_unit_sqft + ave_income',
    ytransform = np.exp
)     

## Get the data - merge buildings and zones  (notice UrbanSum does the merge based on the broadcast)

In [20]:
merged_df = orca.merge_tables(target="buildings", tables=["buildings", "zones"], columns=rm.columns_used()) 

In [21]:
merged_df.head()

Unnamed: 0_level_0,year_built,residential_sales_price,unit_lot_size,general_type,zone_id,unit_sqft,ave_lot_sqft,ave_unit_sqft,sum_residential_units,ave_income
building_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
66515,1900.0,568.475987,1263.125026,Residential,103.0,1585.0,7.293847,7.153834,7.95156,11.156265
65547,1930.0,455.136612,5852.075522,Residential,103.0,3314.0,7.293847,7.153834,7.95156,11.156265
66514,1906.0,544.091933,832.842163,Residential,103.0,1650.0,7.293847,7.153834,7.95156,11.156265
66655,1900.0,694.135544,725.82646,Residential,103.0,941.5,7.293847,7.153834,7.95156,11.156265
65727,1960.0,712.225493,474.725479,Residential,103.0,739.666687,7.293847,7.153834,7.95156,11.156265


In [22]:
merged_df.describe()

Unnamed: 0,year_built,residential_sales_price,unit_lot_size,zone_id,unit_sqft,ave_lot_sqft,ave_unit_sqft,sum_residential_units,ave_income
count,149488.0,138407.0,152605.0,152605.0,152605.0,152605.0,152605.0,152605.0,152605.0
mean,1932.578003,543.022467,4678.895,120.381996,3582.393,7.673279,7.172032,7.612719,11.077892
std,50.238934,171.802835,206989.5,49.635498,32563.92,0.460024,0.223331,0.445305,0.285365
min,1791.0,14.502545,0.0585616,1.0,0.0041841,5.843264,4.214594,1.098612,9.392745
25%,1911.0,460.778445,1472.359,81.0,1040.0,7.374665,7.058758,7.295735,11.034906
50%,1927.0,528.836079,2498.522,130.0,1350.0,7.823854,7.147559,7.736307,11.156265
75%,1947.0,596.269347,3001.338,167.0,1860.0,7.986542,7.259116,7.936303,11.225257
max,8687.0,10138.582062,65544970.0,190.0,4701100.0,12.454458,9.675912,8.502688,11.779136


## Fill nans - UrbanSim wants you to take care of nans

In [27]:
import utils
merged_df["year_built"] = merged_df.year_built.fillna(merged_df.year_built.quantile())
merged_df["residential_sales_price"] = merged_df.residential_sales_price.fillna(0)
merged_df["general_type"] = merged_df.general_type.fillna(merged_df.general_type.value_counts().idxmax())
_ = utils.deal_with_nas(merged_df)

## Fit and report

In [29]:
rm.fit(merged_df).summary()

0,1,2,3
Dep. Variable:,np.log1p(residential_sales_price),R-squared:,0.399
Model:,OLS,Adj. R-squared:,0.399
Method:,Least Squares,F-statistic:,12400.0
Date:,"Mon, 07 Jan 2019",Prob (F-statistic):,0.0
Time:,10:04:17,Log-Likelihood:,-252440.0
No. Observations:,149409,AIC:,504900.0
Df Residuals:,149400,BIC:,505000.0
Df Model:,8,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,-3.8664,0.170,-22.761,0.000,-4.199,-3.533
I(year_built < 1940)[T.True],0.0011,0.007,0.144,0.885,-0.013,0.016
I(year_built > 2005)[T.True],-0.0632,0.049,-1.298,0.194,-0.159,0.032
np.log1p(unit_sqft),-1.4830,0.007,-210.907,0.000,-1.497,-1.469
np.log1p(unit_lot_size),-0.1476,0.006,-23.219,0.000,-0.160,-0.135
sum_residential_units,0.0986,0.008,11.972,0.000,0.082,0.115
ave_lot_sqft,-0.2179,0.010,-21.293,0.000,-0.238,-0.198
ave_unit_sqft,0.9376,0.020,45.828,0.000,0.898,0.978
ave_income,1.4192,0.015,92.812,0.000,1.389,1.449

0,1,2,3
Omnibus:,89673.667,Durbin-Watson:,1.798
Prob(Omnibus):,0.0,Jarque-Bera (JB):,819859.558
Skew:,-2.841,Prob(JB):,0.0
Kurtosis:,12.97,Cond. No.,1010.0


## Predict and report

In [30]:
rm.predict(merged_df).describe()

count    1.407820e+05
mean     1.157916e+03
std      4.153603e+04
min      1.361599e-03
25%      2.529656e+02
50%      3.803416e+02
75%      5.547495e+02
max      1.251726e+07
dtype: float64

In [31]:
rm.predict(merged_df)

building_id
66515       360.450275
65547        96.331840
66514       361.120867
66655       846.276923
65727      1286.549276
66505       651.477698
66456      1443.009788
65535      1037.913054
66522       289.315590
65732      1338.533686
66357      2049.828057
65163      2478.963790
66593      1674.377331
66618      1563.286354
65945      2113.306241
65041      1029.652607
66299      2052.795187
66650       638.677935
65421       163.866953
66653       393.321059
66809       676.124509
65531      1627.889249
64896       286.476864
65534      2327.335977
66584       703.725446
65207       383.587160
64976       939.723052
63542       757.732830
65474       377.552921
63019        67.700260
              ...     
116864      919.380365
119278      644.489446
119279      644.473536
117917      606.084489
118487      365.706460
117918      644.672369
119284     2360.506241
119283     2415.185478
118202      479.091462
118486      163.496328
118539      606.147686
117234      644.589575

--------------------

# 참고자료 
* [1] New version of UrbanSim, a platform for modeling metropolitan real estate markets - https://github.com/UDST/urbansim
* [2] UrbanSim - Modeling Urban Development for Land Use, Transportation, and Environmen tal Planning - https://astro.temple.edu/~jmennis/Courses/GUS_0150/readings/Waddell02.pdf
* [3] Architecture for Modular Microsimulation of Real Estate Markets and Transportation - https://arxiv.org/abs/1807.01148
* [4] An UrbanSim for San Francisco: an example implementation of the new framework - https://github.com/UDST/sanfran_urbansim
* [5] An Open Platform for Activity-Based Travel Modeling - https://github.com/ActivitySim/activitysim