# Max Kilman's Spatial Data Exploration 
## Mapping Methane-producing Waste Disposal Sites in LAC
### Assignment 2 Due Jan. 21st
Through this small-scale mapping project using LA County open-source geographic data [Methane Producing Landfills](https://data.lacounty.gov/datasets/lacounty::methane-producing-landfills/about), I aim to map the location of Methane-Producing Waste Disposal Sites in LAC using `geopandas`. I will explore the dataset using commands like `.shape`; `.info`; `.head()`; `.plot()`; `.value_counts()`; and `.query()`. Finally, I will use the geometry column and `folium` to map these sites. Overall, this mini-project relates to my one of my areas of interest, namely environmental health.   

First, I will import the necessary modules (`geopandas` and `pandas`) to conduct my data exploration.

In [2]:
import pandas as pd
import geopandas as gpd

Next, I will import the LAC data as a [shapefile](https://desktop.arcgis.com/en/arcmap/latest/manage-data/shapefiles/what-is-a-shapefile.htm) using the [`read_file`](https://geopandas.readthedocs.io/en/latest/docs/user_guide/io.html) command

In [3]:
# read and add shp file to notebook

meth_ds = gpd.read_file('data/Methane_Producing_Landfills.shp')

ERROR 1: PROJ: proj_create_from_database: Open of /opt/conda/share/proj failed


To be sure, I will verify the data type:

In [4]:
# data type is...
type(meth_ds)

geopandas.geodataframe.GeoDataFrame

Next, I will take a look at the first 5 and last 5 rows of my dataset using the `head()` and `tail()` commands:

In [5]:
# Examining the first 5 rows of data 
meth_ds.head()

Unnamed: 0,OBJECTID,NUM,SITE_NUM,NAME,SUP_DISTRI,SHAPEAREA,SHAPELEN,geometry
0,2,204-157-752,2012,Russell Moe Landfill,"3, 5",2165331.0,6624.595836,"POLYGON ((-118.40271 34.29349, -118.40102 34.2..."
1,3,114-117-550,1949,Gladding McBean Dump,3,448791.6,3035.853919,"POLYGON ((-118.47156 34.02893, -118.47001 34.0..."
2,4,054-197-404,1889,Cal Compact LF/Metro 2000,2,7352565.0,12231.683028,"POLYGON ((-118.27164 33.84618, -118.27150 33.8..."
3,5,084-157-446,1946,General Tile,2,848635.0,3980.914743,"POLYGON ((-118.41062 33.92218, -118.40942 33.9..."
4,6,084-157-442,1897,Centinela Ave Dump,2,1036129.0,4192.380859,"POLYGON ((-118.37257 33.97973, -118.37182 33.9..."


In [6]:
# Examining the last 5 rows of data 
meth_ds.tail()

Unnamed: 0,OBJECTID,NUM,SITE_NUM,NAME,SUP_DISTRI,SHAPEAREA,SHAPELEN,geometry
641,8645,,3625,R Big Continental,3.0,7814.160373,313.760521,"POLYGON ((-118.59319 34.24285, -118.59316 34.2..."
642,8646,,3626,Rancho Disposal Service,,7814.171696,313.760748,"POLYGON ((-117.48904 34.07459, -117.48901 34.0..."
643,8647,TS,3627,Pomona Valley Transfer Station,1.0,7814.167756,313.760669,"POLYGON ((-117.72853 34.05216, -117.72850 34.0..."
644,8648,,274,City of Culver City (Waste Hauler),2.0,7814.167756,313.760669,"POLYGON ((-118.38859 34.01598, -118.38856 34.0..."
645,8649,,3865,H. M. Holloway,,7814.167833,313.760671,"POLYGON ((-119.77248 35.65796, -119.77245 35.6..."


It will also be helpful for me to know the data types for each column, so I will use the `dtypes` command:

In [7]:
# checking data types
meth_ds.dtypes

OBJECTID         int64
NUM             object
SITE_NUM        object
NAME            object
SUP_DISTRI      object
SHAPEAREA      float64
SHAPELEN       float64
geometry      geometry
dtype: object

It turns out that it will be more useful if I use the `info` command. This command will allow me to verify that I have all non-null values. 

In [8]:
# info command to check for null values
meth_ds.info()


<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 646 entries, 0 to 645
Data columns (total 8 columns):
 #   Column      Non-Null Count  Dtype   
---  ------      --------------  -----   
 0   OBJECTID    646 non-null    int64   
 1   NUM         294 non-null    object  
 2   SITE_NUM    646 non-null    object  
 3   NAME        646 non-null    object  
 4   SUP_DISTRI  556 non-null    object  
 5   SHAPEAREA   646 non-null    float64 
 6   SHAPELEN    646 non-null    float64 
 7   geometry    646 non-null    geometry
dtypes: float64(2), geometry(1), int64(1), object(4)
memory usage: 40.5+ KB


The data looks acceptable for the data exploration phase. Now, I can get a sense of the number of rows and columns I'll be working with. The most basic command for this is `.shape`

In [9]:
# number of rows and columns
meth_ds.shape

(646, 8)

Looks like the dataset has 8 columns. To learn more about what those columns are, I'll use the `.columns.to_list()` command.

In [10]:
meth_ds.columns.to_list()

['OBJECTID',
 'NUM',
 'SITE_NUM',
 'NAME',
 'SUP_DISTRI',
 'SHAPEAREA',
 'SHAPELEN',
 'geometry']

For this project, I will only consider two columns–`NAME` and `geometry`. I will get a sense of the different values for the first column under consideration:

In [11]:
# NAME column info
meth_ds['NAME'].sample(5)

119    Nu-Way Arrow Reclamation, Inc. (2006+)
340              Waste Management - Lancaster
94                          Walnut Creek Dump
617                 Amswede Organic Recycling
546            Athens Services (Waste Hauler)
Name: NAME, dtype: object

The NAME column includes 646 rows of disposal site names.

Now, I'll trim the dataframe, so it includes only the relevant columns for the map I aim to create.: 

In [12]:
# trim dataframe with only a subset of the 8 columns
meth_ds[['NAME','geometry']]
meth_dsn = meth_ds[['NAME','geometry']].copy()

After trimming the dataframe, I saved a copy using `.copy()`. This allows me to maintain the edits I have made, as I continue with my exploration. 

In [51]:
site_count = meth_dsn['NAME'].value_counts()
site_count

NAME
American Waste Industries                 2
Lopez Canyon Environmental Center         2
Mission Recycling/West Coast Recycling    2
Rob's Roll-Off & Recycling                2
Evergreen Nursery                         2
                                         ..
L.A. County Flood Control Sawpit          1
30th Street East Pit                      1
Mission Canyon #1-3                       1
Three Points Dump Site                    1
H. M. Holloway                            1
Name: count, Length: 641, dtype: int64

By using the `.value_counts()` command, I can note that certain "Names" appear more than once in my dataset and thus likely correspond with more than one disposal site. 

Let's say I want to find out the coordinates or geometry for a specific disposal site to verify my assumption above. I can use the `query` command and input the specific column value I'm interested in: 

In [52]:
# simple query for American Waste Industries 
meth_dsn.query("NAME == 'American Waste Industries'")

Unnamed: 0,NAME,geometry
491,American Waste Industries,"POLYGON ((-118.37433 34.23776, -118.37431 34.2..."
506,American Waste Industries,"POLYGON ((-118.37659 34.23278, -118.37656 34.2..."


I know from using `.value_counts()` that there are two sites under the `NAME` American Waste Industries. Above `query` command has allowed me to identify the different locations (as a polygon) for both of those sites.

Now that I have verified that our dataframe contains basic information on the name and location of each LAC waste disposal site, I'll import `folium` and prepare the dataframe for mapping:

In [14]:
import folium

Given that this dataframe lacks a specific column with information on the latitude and longitude of each row, I'll scale my map based on the average coordinates for LAC:

In [15]:
lat = 34.01187554166666
long = -118.24713253124999

Now, I'll prepare markers for each disposal site using looping:

In [16]:
m = folium.Map()

In [19]:
meth_df = meth_dsn.to_crs(epsg=4326)
print(meth_df.crs)
meth_df.head()

EPSG:4326


Unnamed: 0,NAME,geometry
0,Russell Moe Landfill,"POLYGON ((-118.40271 34.29349, -118.40102 34.2..."
1,Gladding McBean Dump,"POLYGON ((-118.47156 34.02893, -118.47001 34.0..."
2,Cal Compact LF/Metro 2000,"POLYGON ((-118.27164 33.84618, -118.27150 33.8..."
3,General Tile,"POLYGON ((-118.41062 33.92218, -118.40942 33.9..."
4,Centinela Ave Dump,"POLYGON ((-118.37257 33.97973, -118.37182 33.9..."


In [53]:
m = folium.Map(location=[lat,long], tiles='Cartodb Positron', zoom_start=10) 
folium.Choropleth(
    geo_data = meth_df,
    name = "Methane Producing Disposal Sites",
).add_to(m) 
folium.Popup(['NAME']).add_to(m)
m

I had hoped to add a tooltip for each disposal site listing the site name. Unfortunately, my efforts to be creative and use a "polygon" value under the geometry column rather than a "point" value made it impossible to plot a `folium.Marker`. In future assignments, I would like to learn how to convert a polygon into a point location that `folium` will accept. Despite my search for an answer, I came up empty-handed, so my map as plotted above offers fairly minimal descriptive information. Indeed, this `folium.Choropleth` map merely locates the sites geographically but does not indicate their names. 