## GEO 877 - Spatial Algorithms 
### Group Kirsteina 
#### Tamara, Joya, Andrejs, Djordje

### Data


#### Description
For our analysis, we will be using the following data:
- Park data - Grünfläschen - Stadt Zürich - **BOUNDARY DATA**
    - 4 data types: Parks, Sports Areas, Cemeteries, Other
    - We will use Parks, Sports Areas, and Cemeteries as designated "parks"
    - We are considering adding Forests to the "parks" set
        - Could someone please download it, I can't: https://www.stadt-zuerich.ch/geodaten/download/111
    - **Can we find a shapefile somewhere that has just green areas as polygons?** we can use it to calculate % areas


- Fountain data - **EVALUATION DATA**
    - Brunnen - fountains for heat relief
    - Stillgewässer - fountains for drinking 
    - **Can we find a shapefile somewhere that has just water areas as polygons?** we can use it to calculate % areas


- ZüriWC data - **EVALUATION DATA**
    - Location of publicly accessible WCs


- Spielpark data - **EVALUATION DATA**
    - Location of kids' playgrounds


- LIDAR data - Canopy Height - **EVALUATION DATA**
    - Download link: https://www.stadt-zuerich.ch/geodaten/download/Baumhoehen_2022__CHM_aus_Lidar_ 


- Socialshilfe data - **EVALUATION DATA**
    - Data on Social Assistance Quotas - need additional clarity whether this is percent of residents that receive Social Assistance or else


#### Data Structure

| PolygonID | Polygon Info | Green Area | Water Area | WC | Fountain | Playground | Socialshilfe | Canopy Height |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
| X1 | coordinates | % of park | % of park | yes/no boolean | yes/no boolean | yes/no boolean | quota* | height* |

*needs more thought and discussion


### Analysis Workflow - Preliminary


1. Create park polygons:
    1. Download Wald data
    2. Remove "other" category
    3. Add Wald data
    4. Finalize all polygons - ensure all lines are cohesive
2. LIDAR Data:
    1. Prepare LIDAR data
    2. Extract only relevant data for Parks (Union of Park extent and LIDAR) to remove excess
3. Calculations:
    1. Park area as green space - %
    2. Park area as water space - %
4. Data Merge:
    1. Green area - join data to each park
    2. Water area - join data to each park
    3. Socialshilfe - join data to each park
5. Point in Polygon:
    1. WC - each park receives a numeric value
    2. Fountain - each park receives a numeric value
    3. Playground - each park receives a numeric value
6. Point in Polygon - translated:
    1. For each (WC, Fountain, Playground) assign a yes/no as result of a logical test loop (0 = no, 0> yes)
7. LIDAR Data
    1. **Determine a way to use this data** that is in line with the paper
8. Clustering


### Production

In [33]:
#Packages

from geospatial import *
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import geopandas as gpd
from shapely import wkt

#### 1. Polygon creation

In [34]:
#Data Import

parks = gpd.read_file('data/grunflaschen_csv/data/gsz.gruenflaechen.csv')
#parks['produkt'].unique()

#forests = pd.read_csv
#...

In [None]:
#Data Cleaning
#filter out 640 Weitere Freiräume
filtered_park = []
for x, y in parks.iterrows():
    if y['produkt'] != "640 Weitere Freiräume":
        filtered_park.append(y)

dfParks = pd.DataFrame(filtered_park)

dfParks['produkt'].unique()
#parks.info()

array(['610 Parkanlagen', '630 Sport- und Badeanlagen', '620 Friedhöfe'],
      dtype=object)

In [None]:
#convert to geodataframe 
gdfParks = gpd.GeoDataFrame(dfParks, geometry="geometry", crs="EPSG:2056") 
#group and dissolve by 'pflegeareal'
gdf_grouped = gdfParks.dissolve(by="pflegeareal")
# Reset the index to make 'pflegeareal' a column again
gdf_grouped = gdf_grouped.reset_index()
# Add an area column
gdf_grouped['area'] = gdf_grouped.geometry.area
# Sort by area in descending order and keep the largest geometry for each 'pflegeareal', so that biggest area is kept
largest_geometry = gdf_grouped.sort_values(by='area', ascending=False).drop_duplicates(subset='pflegeareal', keep='first')
# Save the result to a new csv
largest_geometry.to_file('data/largest_geometry.shp')
largest_geometry[['pflegeareal', 'area']].to_csv('data/largest_geometry.csv', index=False)

# Print the result
print(largest_geometry)
#PROBLEM: we only have geometry but not location of the parks, so we need to merge with the original data to get the location of the parks.


              pflegeareal                                           geometry  \
110           FH Sihlfeld  MULTIPOLYGON (((2.68e+06 1.25e+06, 2.68e+06 1....   
410  Sportzentrum Hardhof  MULTIPOLYGON (((2.68e+06 1.25e+06, 2.68e+06 1....   
8               Allmend I  MULTIPOLYGON (((2.68e+06 1.24e+06, 2.68e+06 1....   
105           FH Nordheim  MULTIPOLYGON (((2.68e+06 1.25e+06, 2.68e+06 1....   
98            FH Eichbühl  POLYGON ((2.68e+06 1.25e+06, 2.68e+06 1.25e+06...   
..                    ...                                                ...   
30          Bellariaplatz  POLYGON ((2.68e+06 1.25e+06, 2.68e+06 1.25e+06...   
480       Weggebiet LE911  POLYGON ((2.68e+06 1.24e+06, 2.68e+06 1.24e+06...   
190          Hauseranlage  POLYGON ((2.69e+06 1.25e+06, 2.69e+06 1.25e+06...   
349     Schaffhauserplatz  MULTIPOLYGON (((2.68e+06 1.25e+06, 2.68e+06 1....   
506          Zelglianlage  POLYGON ((2.68e+06 1.25e+06, 2.68e+06 1.25e+06...   

    objektidentifikator          produk

  largest_geometry.to_file('data/largest_geometry.shp')
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(


#### 2. LiDAR data preparation

In [None]:
#file much too big (505MB), upload to cloud provider (google drive)
import gdown

# Google Drive file ID
file_id = "1kDGaIJPOZITEBPkJTaSSKi3eEWbwXpkl"
# Construct the download URL
url = f"https://drive.google.com/uc?id={file_id}"

# Download the file
output_path = "your_file.tif"  # Change extension as needed
gdown.download(url, output_path, quiet=False)

print(f"Downloaded to {output_path}")


#### 3. % Space Calculations (I think that we can make this an algorithm, too)

In [None]:
#area calculation of all parks already done 
#area in area (Chaussierung of parks needs to be considered) polygon in polygon
#for trails in parks, can we also consider other lines?

#### 4. Data Merging/Joining

#### 5. Point in Polygon (Algorithm)

#### X. Clustering (We could attempt an algorithm)