# Department of Environmental Protection

This is an important agency to build out any type of climate adapation strategy.  For starters I am looking at:

  - 311 service requests
  - MS4
  - Pratt/CSO
  - Catch basins

I will continue on the approach of looking at the content and filtering to support my end story.

In [None]:
IFrame('https://www1.nyc.gov/site/dep/index.page', width=1200, height=1000)

Recollect in previous work I created the DEP 311 service requests.

At this point I am going to focus on building out one year's worth of service requests (2021) to support further analysis.  I always like to add some focus (time and space) to minimize my cognitive overhead!

In [None]:
%%time
dep_gdf = gpd.read_parquet('../data/processed/311/dep-clean-geo.parq')

We can do a standard look-see at the data set.

In [None]:
dep_gdf.shape

In [None]:
dep_gdf.info(verbose=True, show_counts=True)

In [None]:
print(dep_gdf['Created Date'].min())
print(dep_gdf['Created Date'].max())

We should also look at the types of service requests addressed by DEP.

In [None]:
dep_gdf['Complaint Type'].value_counts()

I would like to look at the data from last year for starters.

In [None]:
dep_gdf['year'] = dep_gdf.apply(lambda row: row['Created Date'].year, axis=1)

In [None]:
dep_gdf['year'].value_counts()

So let's build the geodataframe for 2021.

In [None]:
gdf_2021 = dep_gdf[dep_gdf['year'] == 2021].copy()

In [None]:
gdf_2021['Complaint Type'].value_counts()

Interesting?

In [None]:
len(gdf_2021)

Next is a text desription of when the 311 request is created.  It includes date and day-of-week.  This can be useful for temporal display/analysis.

In [None]:
gdf_2021['day'] = gdf_2021['Created Date'].apply(lambda dt: dt.date().strftime("%m-%d (%A)"))

At this point I know (extra credit for you to figure out) that some of the Location columns (used to add geometries) are NoneType.  I want to drop those so that each record can be displayed on a map.

In [None]:
with_loc_2021_df = gdf_2021[gdf_2021.Location.notnull()]

In [None]:
len(gdf_2021)

In [None]:
_ - len(with_loc_2021_df)

In [None]:
_ / len(gdf_2021)

So I can live with this.  Small number without Location, and even smaller percentage!

We can save this for later processing.

In [None]:
with_loc_2021_df.to_parquet('../data/processed/DEP/2021-311.parq')

Summary:  We now have a file with DEP 311 requests for 2021.  The file is 'clean' in the sense the geometries are valid.

# MS4

This section will look at the [MS4 data](https://data.cityofnewyork.us/Environment/Municipal-Separate-Storm-Sewer-System-MS4-Data/j57c-rqtq).

In [None]:
IFrame("https://www1.nyc.gov/site/dep/water/municipal-separate-storm-sewer-system.page", width=1200, height=800)

In [None]:
import fiona

In [None]:
fiona.listlayers('../data/MS4OpenData.gdb/')

In [None]:
drainage_areas_gdf = gpd.read_file('../data/MS4OpenData.gdb/',
                                   driver='FileGDB',
                                   layer='MS4DRAINAGEAREAS')

In [None]:
outfalls_gdf = gpd.read_file('../data/MS4OpenData.gdb/',
                                   driver='FileGDB',
                                   layer='MS4OUTFALLS')

db2_gdf = gpd.read_file('../data/MS4OpenData.gdb/',
                                   driver='FileGDB',
                                   layer='MSWFACILITIES')

db3_gdf = gpd.read_file('../data/MS4OpenData.gdb/',
                                   driver='FileGDB',
                                   layer='SPDESPERMITS')

db4_gdf = gpd.read_file('../data/MS4OpenData.gdb/',
                                   driver='FileGDB',
                                   layer='BLUEBELTBMP')

In [None]:
len(outfalls_gdf)

I am not really sure what to keep for this first pass.  I should revisit this at some point?

I am going to save drainage areas, and outfalls for starters.

In [None]:
len(drainage_areas_gdf)

In [None]:
drainage_areas_gdf.explore()

As is the case with many of these datasets, the number of rows is small but the complexity of the geometries seems to burden browser based maps!

In [None]:
#drainage_areas_gdf['poly_count'] = drainage_areas_gdf['geometry'].apply(lambda val: len(val.geoms))

In [None]:
#drainage_areas_gdf['poly_count'].value_counts()

In [None]:
outfalls_gdf.explore()

Finally, save these two files in the processed directory.

In [None]:
drainage_areas_gdf.to_parquet('../data/processed/DEP/ms4-drainage.parq')
outfalls_gdf.to_parquet('../data/processed/DEP/ms4-outfalls.parq')

# Pratt/SAVI

Can told me about a project that uses 'older' stormwater/sewage data than MS4.  A bit of research led me to the following page.

I am going to leave this alone for now, but will consider revisting.

In [None]:
IFrame("https://openseweratlas.tumblr.com/data", height=1000, width=1200)

# Catch Basins

I was just reading (2022/05/22) the [resiliency plan](https://www1.nyc.gov/assets/orr/pdf/publications/stormwater-resiliency-plan.pdf) and I think I need to include catch basins.  There are a lot of them, so in this section I am going to simply read the shape file and save as parquet.

In [None]:
catch_basins_gdf = gpd.read_file('../data/raw/DEP/NYCDEP Citywide Catch Basins.zip')

In [None]:
catch_basins_gdf.to_parquet('../data/processed/DEP/catch-basins.parq')