# Data preparation for Numerical Taxonomy of Urban Form

This notebook serves as a template for data preparation for morphometric assessment and generation of a taxonomy.

## Code requiring changes before running

Input data:
 - QGIS data from OSM
 - Open Buildings data
 - Koordinates data
 - or any other data

Geometry column is required.
 
This notebook is running the preparation on the data used in El-Paso and Ciudad-Juares case study. You can replace the sample with your own data, assuming that they are cleaned to a required standard.

The data is saved in `../data` in a folder in different folders for convenience.

First we import all required libraries.

In [3]:
import pandas as pd
import geopandas as gpd
import shapely

Import data from Open Buildings.

In [2]:
# reading as much file as necessary 
open_buildings_data_1 = pd.read_csv('Open Buildings/86f_buildings.csv') 
open_buildings_data_2 = pd.read_csv('Open Buildings/86d_buildings.csv')

# joint of DataFrames if necessary
open_buildings_data = pd.concat([open_buildings_data_1, open_buildings_data_2])
open_buildings_data = open_buildings_data.reset_index(drop=True)

# creating GeoDataFrame
open_buildings_data = gpd.GeoDataFrame( 
    geometry=gpd.GeoSeries.from_wkt(open_buildings_data['geometry']), 
    crs='EPSG:4326'
)

Import El Paso and Ciudad Juraez border data from QGIS.

In [3]:
# reading as much file as necessary 
area = gpd.read_file('QGIS/boundaries.gpkg')

# joint of DataFrames if necessary

# creating GeoDataFrame
area = gpd.GeoDataFrame(
    geometry=area['geometry'], 
    crs='EPSG:4326'
)

Reducing the search area for faster subsequent calculations.

In [4]:
# creating Polygon

polygon = shapely.Polygon(
    ((-105., 33.),
    (-108., 33.),
    (-108., 31.),
    (-105., 31.))
)

# selection of data by area
open_buildings_data['target'] = open_buildings_data.within(polygon, align=False)
open_buildings_data = open_buildings_data.loc[open_buildings_data['target'] == True]

Finding those buildings that are located inside city boundaries.

In [5]:
open_buildings_data = open_buildings_data.sjoin(area, how='left', predicate='within')
open_buildings_data = open_buildings_data.loc[open_buildings_data['index_right'].notnull()]
open_buildings_data = open_buildings_data.drop(columns='index_right')

Importing data from Koordinates

In [6]:
# reading as much file as necessary
koordinates_data = gpd.read_file('Koordinates\el-paso-county-texas-building-footprints.shp')
koordinates_data = koordinates_data.to_crs(crs='EPSG:4326')
koordinates_data = koordinates_data.explode(index_parts=False)
koordinates_data.reset_index(drop=True)

# joint of DataFrames if necessary

# creating GeoDataFrame
koordinates_data = gpd.GeoDataFrame(
    geometry=koordinates_data['geometry'], 
    crs='EPSG:4326'
)

Removing buildings from Open Buildings that appear in Koordinates.

In [7]:
open_buildings_data = open_buildings_data.sjoin(koordinates_data, how='left')
open_buildings_data = open_buildings_data.loc[open_buildings_data['index_right'].isnull()]

Creating a single geodataframe from Open Buildings and Koordinates and exporting for further work.

In [8]:
buildings = pd.concat([open_buildings_data , koordinates_data])
buildings = buildings.drop(columns=['target', 'index_right'])

Importing El Paso and Ciudad Juraez road data from QGIS.

In [9]:
# reading as much file as necessary
streets = gpd.read_file('QGIS/roads.shp')

# joint of DataFrames if necessary

# creating GeoDataFrame
streets = gpd.GeoDataFrame(
    geometry=streets['geometry'], 
    crs='EPSG:4326'
)

You can also apply some actions to the street data.

Export dataframes into a single file for further work.

In [10]:
buildings.to_file('data.gpkg', layer='buildings', driver='GPKG')
streets.to_file('data.gpkg', layer='streets', driver='GPKG')