### Assignment
#### Choose a dataset
You get to decide which dataset you want to work on. The data set must be different from the ones used in previous homeworks You can work on a problem from your job, or something you are interested in. You may also obtain a dataset from sites such as Kaggle, Data.Gov, Census Bureau, USGS or other open data portals. 
Select one of the methodologies studied in weeks 1-10, and another methodology from weeks 11-15 to apply in the new dataset selected.

#### To complete this task:. 
    - Describe the problem you are trying to solve.
    - Describe your dataset and what you did to prepare the data for analysis. 
    - Methodologies you used for analyzing the data
    - What's the purpose of the analysis performed
    - Make your conclusions from your analysis. Please be sure to address the business impact (it could be of any domain) of your solution.
    
#### Deliverable
Your final presentation (essay or video) should include:
- The traditional R file or Python file and essay,
- An Essay (minimum 500 word document) or Video ( 5 to 8 minutes recording)
- Include the execution and explanation of your code. The video can be recorded on any platform of your choice (Youtube, Free Cam).

# -----------------------------------------------

## Overview
For this assignment I decided to leverage my capstone project to also cover the rquirments for this DATA 622 Final Project. Therefore, while this file attempts to place everythign needed into one coherent file, the brightspace submission will have several additional files / links in order to provide full context if needed. The dataset used in the final analysis is one that was custom builts from a variety of publically available datasets, as well as using multiple different enrichment techniques. In short, this document is a condensed version of my capstone project to submit for DATA 622 Assignment 4.

## The Problem
There has been a lot of research into the presence of urban trees on the surrounding environment, specifcially on their impacts on countering extreme heat and cold. Specifically, urban trees have been found to damper the effects of extreme head through providing shade and as a result of evapotranspiration. Additionally, there have been findings that outline how the presence of trees can help block wind, which tends to damper the impacts of extreme cold as well. As weather extremes become increasingly common due to climate change, further study on the impact of urban trees is needed. Specifically, in different regions in order to better understand how trees can help different cities. For this project, data specific to New York City was used in order to help identify any potential impact trees can have on energy use intensity in buildings. 

## Preparing the Data
This project uses a custom built dataset derived from a multitude of different public datasets and resources. In short there were three main phases of gathering, enriching and processing data in order to distill the final working dataset. 

#### Part 1 - Buildings Data & Energy Usage (BuildingWork_Part1.ipynb)
The first section of the data made use of Local Law 84 energy benchmarking data, which was is ingested via the NYC Open Data Socrata API for all years availabe, but ultimately was restricted to 2010 and 2017. This restriction was done to align with the primary second data source, which was the tree canopy change data. The main feature of interest in the Local Law 84 data was the "weather normalized eui" information, which is the energy usage intensity for each building that reports data normalized to counter weather variations between years. The raw data included a large number of administrative and reporting fields, ehich needed to be sifted through and dropped. The data was filtered in order to keep observations that are metered at the whole-building or whole-property level, while allowing some metering fields to remain null, as these fields were introduced in later reporting years.The buildings that were kept for this analysis, included those that were predominatly residential properties. The identifying dimensions of buildings, like unique identifiers (e.g., BBL, property id, addresses) were enriched using various methods like self-joining and publi APIs. Further more, these identifyincg columns, once properly enriched, were used in order to geocode each building to obtain a latitide/longitude point where the raw data had nulls. These geographic points were used in order to further add to the data with census tracts, building footprint geographies, and other spatially oriented data. Additional building information, such as Number of floors, zoning classifications, construction year, etc. were added to the data via the MapPLUTO API. Lastly, LiDAR-derived canopy change data that classifies the city into areas of canopy gain, loss, or no change between 2010 and 2017 was injested, and subsequently joined to the building data in order to cateogorize each building into one of those canopy categories. A buffer space of 50 feet was used, in order to limit any shifts in canopy coverage to within 50 feet of a building's footprint geometry. The final data set from this section limited to buildings that intersected (with the 50 foot buffer considered) with the canopy data for analysis. 

#### Part 2 - Tree Count Data (TreeWork_Part2.ipynb)
The second section of processing focued on city-level forestry and tree data. As another tree-focused dataset, NYC street tree inventory data and forestry work order data from NYC Open Data were also ingested and processed. The data was filtered to keep only trees plausibly present during the 2010â€“2017 period. Newly planted trees that appear only very late in the series are dropped as too young to have a meaningful effect on shading or wind, while dead trees, stumps, and records with unusable coordinates are removed entirely. Finally, forestery work orders data, which is another tree focused dataset, was used to find and flag tree removals yielding a shift in tree inventory between 2010 and 2017. The yielded data is an attempt at a tree count for NYC in 2010 and 2017, two years that are outside of that "NYC Tree Census" years.  

#### Part 3 - Putting it Together & Aggregation (Analysis_Part3.ipynb)
The third and final steps to the data preparation and processing phase, is where the Part 1 data and the Part 2 data are put together into a final product. Essentially, using a spatial join, the tree counts were joined to the building data. Similar to the canopy data any tree that was wihtin 50 feet of a building geometric footprint was included in the data. In short, this step allow for the aggregation of a tree count for each building wthin the dataset to help with impact analysis. 

## Data Analysis and Modeling
The work outlined in the "Preparing the Data" section above was carried out in multiple different scripts, however for the purposes of this assignment, the analysis code is below, and the finalzied working dataset is just read in from CSV. 

In [5]:
import pandas as pd
import numpy as np
import io
import requests
import geopandas as gpd
import matplotlib.pyplot as plt

In [4]:
### Reading in the Data 
df = pd.read_csv("FinalWorkingData_20251207.csv")

#### Exploratory Analysis Work

In [None]:
colors = {
    "mint_cream": "#F1F7ED",
    "dark_slate_grey": "#243E36",
    "muted_teal": "#7CA982",
    "frosted_mint": "#E0EEC6",
    "old_gold": "#C2A83E",
}

sns.set(style="whitegrid")
plt.rcParams["figure.figsize"] = (8, 5)

canopy_palette = {
    "Gain": colors["muted_teal"],
    "Loss": colors["old_gold"],
    "No Change": colors["dark_slate_grey"]
}

# Choropleth colormap (buildings per tract)
bldg_cmap = LinearSegmentedColormap.from_list(
    "bldg_cmap",
    [colors["mint_cream"], colors["muted_teal"],]
)


In [None]:
# Download CT2010 from NYC Open Data as GeoJSON
CT2010_URL = "https://data.cityofnewyork.us/resource/bmjq-373p.geojson?$limit=50000"
resp = requests.get(CT2010_URL, timeout=60)
resp.raise_for_status()

# Read into GeoDataFrame and project to EPSG:2263
ct2010 = gpd.read_file(io.BytesIO(resp.content)).to_crs(2263)

# Quick look
ct2010.head()