# **Typologies DF**

## Expected Features

| features name | type  | description  |
|----	|---	|---
| `building` | str | unique id for specific building
| `asset` | str | describes the programme of the building
| `typology` | str | describes the sub category of that building asset
| `plot` | str | specific plot code for building
| `area` | float | total floor area of building
| `gfa` | float | area calculated by applyin efficiency to area 
| `efficiency` | float | float value representing the efficiency as a percentage of 100 (1.0 = 100%, 0.5 = 50%, 0.1 = 10% )
| `facade` | float | value representing the envelope area
| `height` | float | value representing the height of building
| `floors` | int | value representing the number of floors in a building

the dataframe is amplified by the specailist dataset: `brief_metrics`\
Typology dataframes shoud contain the building typologies as well as any other location based identification such as plot zone and building types. the naming should be informed by the specialist dataset 

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
# import modules
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plot

In [3]:
from etl.extract import ProjectZero

In [4]:
# import projectzero data
from etl.extract import ProjectZero
data = ProjectZero().get_data()

# view keys
data.keys()

# df_model instance
df_model = data['hz_model'].copy()
df_model.head()

  for (key, df) in zip(key_names, [pd.read_csv(os.path.join(csv_path, file)) for file in file_names]):


Unnamed: 0,ID,Typology,Area,Plot,Building,Colour
0,be49f53d-2a79-4d59-a0f4-39cfbccac1ef,Commercial,1696.329992,YH090703-05,,#FF8DA2
1,186a8323-9d4e-4307-8d0a-32207f5acf58,Commercial,2896.125039,YH090703-04,Tower 03,#FF8DA2
2,2fce15ca-9547-40c0-b47c-831838e39a1f,Commercial,2056.249041,YH090703-04,Tower 03,#FF8DA2
3,179dbd06-4328-4b56-8b6c-e3bdf2979253,Commercial,2056.249041,YH090703-04,Tower 03,#FF8DA2
4,e8dfed03-a169-40ae-a274-c37d146433d9,Commercial,1696.329992,YH090703-05,,#FF8DA2


## 1. `get_building_typology`

this method should return a dataframe that has the following features:\
`building`, `asset`, `typology`(optional), `plot`, `Area`

In [5]:
# Drop unecessary columns
df_model = df_model.drop(columns=['ID','Colour'])

In [6]:
df_model.head()

Unnamed: 0,Typology,Area,Plot,Building
0,Commercial,1696.329992,YH090703-05,
1,Commercial,2896.125039,YH090703-04,Tower 03
2,Commercial,2056.249041,YH090703-04,Tower 03
3,Commercial,2056.249041,YH090703-04,Tower 03
4,Commercial,1696.329992,YH090703-05,


In [7]:
# filtering str based columns 
columns = list(df_model.columns)
str_columns = [column for column in columns if columns != 'Area']

# replace empty with unassigned
for column in str_columns:
    df_model[column] = df_model[column].replace(' ','')


In [8]:
# dropping empty values for building column
df_model = df_model[df_model.Building != '']

df_model

Unnamed: 0,Typology,Area,Plot,Building
1,Commercial,2896.125039,YH090703-04,Tower 03
2,Commercial,2056.249041,YH090703-04,Tower 03
3,Commercial,2056.249041,YH090703-04,Tower 03
5,Commercial,2896.125039,YH090703-04,Tower 03
6,Commercial,2896.125039,YH090703-04,Tower 03
...,...,...,...,...
564,Retail,70.068490,YH090703-01,Podium
565,Retail,106.068490,YH090703-01,Podium
566,Retail,70.068583,YH090703-01,Podium
567,Retail,192.468583,YH090703-01,Podium


In [9]:
# Group by building type 
df_grouped = df_model.groupby(['Building', 'Typology','Plot']).sum(False).reset_index().sort_values('Building')
df_grouped.Area = df_grouped.Area.astype(int)
df_grouped

Unnamed: 0,Building,Typology,Plot,Area
0,Podium,Commercial,YH090703-07,9938
1,Podium,Landscape,YH090703-01,3194
2,Podium,Landscape,YH090703-04,5805
3,Podium,Landscape,YH090703-07,4288
4,Podium,Plaza,YH090703-01,8870
5,Podium,Plaza,YH090703-04,10758
6,Podium,Plaza,YH090703-07,14475
7,Podium,Retail,YH090703-01,12776
8,Podium,Retail,YH090703-04,24729
9,Podium,Retail,YH090703-07,13042


## 2. `get_building_facade`

this method should return a dataframe that has the following features: \
`building`, `area`, `facade`

In [10]:
# df_model instance
df_model = data['hz_model'].copy()
df_model.head()

Unnamed: 0,ID,Typology,Area,Plot,Building,Colour
0,be49f53d-2a79-4d59-a0f4-39cfbccac1ef,Commercial,1696.329992,YH090703-05,,#FF8DA2
1,186a8323-9d4e-4307-8d0a-32207f5acf58,Commercial,2896.125039,YH090703-04,Tower 03,#FF8DA2
2,2fce15ca-9547-40c0-b47c-831838e39a1f,Commercial,2056.249041,YH090703-04,Tower 03,#FF8DA2
3,179dbd06-4328-4b56-8b6c-e3bdf2979253,Commercial,2056.249041,YH090703-04,Tower 03,#FF8DA2
4,e8dfed03-a169-40ae-a274-c37d146433d9,Commercial,1696.329992,YH090703-05,,#FF8DA2


In [11]:
from etl.utils import estimate_facade_area

# if facade area not in df then generate estimate with utils.estimate_facade_area
if 'Facade' not in list(df_model.columns):
    df_model['Facade'] = estimate_facade_area(df_model, 4)

df_model.head()

Unnamed: 0,ID,Typology,Area,Plot,Building,Colour,Facade
0,be49f53d-2a79-4d59-a0f4-39cfbccac1ef,Commercial,1696.329992,YH090703-05,,#FF8DA2,658
1,186a8323-9d4e-4307-8d0a-32207f5acf58,Commercial,2896.125039,YH090703-04,Tower 03,#FF8DA2,861
2,2fce15ca-9547-40c0-b47c-831838e39a1f,Commercial,2056.249041,YH090703-04,Tower 03,#FF8DA2,725
3,179dbd06-4328-4b56-8b6c-e3bdf2979253,Commercial,2056.249041,YH090703-04,Tower 03,#FF8DA2,725
4,e8dfed03-a169-40ae-a274-c37d146433d9,Commercial,1696.329992,YH090703-05,,#FF8DA2,658


## 3. `get_building_floors`

this method should return a dataframe that has the following features: \
`building`, `height`, `floors`

In [12]:
df_model.isna().sum()

ID          0
Typology    0
Area        0
Plot        0
Building    0
Colour      0
Facade      0
dtype: int64

## 4. `get_building_areas`

this method should return a dataframe that has the following features: \
`building`, `area`, `efficiency`, `gfa`

In [13]:
df_model.isna().sum()

ID          0
Typology    0
Area        0
Plot        0
Building    0
Colour      0
Facade      0
dtype: int64