# Spatial dataframes from CSV files
If a CSV file includes coordinates - either a coordinate pair representing a point location, or a series of coordinate pairs depicting a line or a polygon's permiter - then we can use those coordinates to construct a geometric object and thus create a spatially enabled dataframe, which in Geopandas is referred to as a <u>geodataframe</u>. 

Here we focus on the steps involved in going from raw coordinate data stored in a field of CSV file to a spatial dataframe. We first look at the techniques using <u>Geopandas</u> and then those using the <u>ArcGIS Python API</u>. 
<font color='brown'>(*Note that Geopandas refers to spatial dataframes and "geodataframes" and the ArcGIS Python API sometimes refers to them as "spatially enabled dataframes"; I will use those terms interchangeably...*)</font>

In exploring the **Geopandas** technique, we discuss the hierarchy of components that go into adding spatial elements to a dataframe: from geometries, to geoseries, and finally to geodataframes and see where the Shapely package (installed as one of Geopandas' dependencies) is used in the process. 

Then, we see that the same process of converting a CSV file to a spatial dataframe using the **ArcGIS Python API** is a bit easier at first, but that the spatial dataframe produced is a bit more difficult to manipulate. 

We'll examine this process with a simple example of creating a point spatial dataframe from a CSV file containing latitude and longitude coordinates. The data we'll use in this exercise is electric vehicle charging locations in North Carolina ([source](https://afdc.energy.gov/data_download)).

## 1. Constructing a Pandas dataframe from the CSV file
The first step in creating the geodataframe from the CSV file - for both the *Geopandas* and *ArcGIS API* methods - is to read it in as a simple Pandas dataframe. 

In [None]:
#Import pandas 


In [None]:
#Read the EV Charging station data into a Pandas dataframe


In [None]:
#Examine the columns, noting the data include "latitude"  "longitude" columns


---
## GeoPandas
* http://geopandas.org/data_structures.html
* http://geopandas.org/io.html

### 2. Creating geometries from latitude and longitude coordinates
Now that we have our dataframe with its coordinate values, the next step is to convert these raw coordinate values into geometric objects, points in our case. This is done with the `shapely` package. First, we'll demonstrate how this is done with a single coordinate pair, and then reveal a nifty way to do this for all coordinate pairs in our dataframe.

#### Creating a single point geometry from a single coordinate pair

In [None]:
#Extract latitude and longitude values from our first record


In [None]:
#Import the Point class from shapely's geometry module


In [None]:
#Construct a shapely point from our XY coordinates


In [None]:
#Display the point


Ok, we now have a point object. What we next need to do is repeat this process for all records in our dataframe, storing the geometries in a new list. 

We could simply iterate through all rows in our dataframe (e.g. using Pandas' `iterrow()` function. However, a much more elegant and efficient method exists using Python's "list comprehension" methods. (See more [here](https://www.pythonforbeginners.com/basics/list-comprehensions-in-python) on list comprehension...)

#### Creating a list of point geometries by iterating through all records

In [None]:
#Old style:
thePoints = []
for i,row in df_EVStations.iterrows():
    theLat = row['Latitude']
    theLng = row['Longitude']
    thePoint = Point(theLng,theLat)
    thePoints.append(thePoint)
len(thePoints)

#### Creating a list of point geometries by iterating through all records - *using list comprehension*

In [None]:
#New style: Using list comprehension
thePoints = [Point(xy) for xy in zip(df_EVStations['Longitude'],df_EVStations['Latitude'])]
len(thePoints)

---
#### → Understanding *list comprehension*
*A lot is going on in the above statement. Let's pause and break it down...*

* First, the `zip(df_EVStations['Longitude'],df_EVStations['Latitude'])` code creats a Python "zip" object which is a combination two (or more) collections of the same length that now share a common index. Take a look:

In [None]:
#Zip the two columns of data such that they share a common index

#Convert the zip object to a list

#Reveal the first 3 object in the list


* The second action in the statement is a `for` loop that iterates through each item in the new `zip` object, assigning the current value in each iteration (i.e. coordinate pair) to the variable named `xy`.
* And the third action is constucting a Point object using this coordinate pair, again done within the for loop. 
* Finally, if you note that the entire statement is enclosed in brackets just like any Python list. This is meaningful because the result of each iteration is stored as a list which we assign to the variable `thePoints`.

*List comprehension is a clever scripting technique. Some argue that it's less "Pythonic", but I've seen it more and more in Python scripts...*

---

### 3. Creating the geodataframe
We are almost there! 

The remaining step in the Geopandas method is to convert our existing Pandas dataframe to a GeoPandas *geo*dataframe. To do this we simply call the GeoPandas `GeoDataFrame` command, referencing the original dataframe, the list of geometries corresponding to each row in this dataframe, and the <u>coordinate reference system</u> or **crs** to which our geometries are referenced. 

These coordinate reference systems can actually take many forms. But most often, you'll just use the format shown below, replacing the `4326` with the "WKID" of any coordinate reference system listed at https://spatialreference.org.  

In [None]:
#Import geopandas


In [None]:
#Create a coordinate reference system dictionary for WGS84 (WKID=4326)


In [None]:
#Create the spatial dataframe from the Pandas dataframe, the geometry collection and crs


#Display the type of the object created


* Explore the geodataframe...

In [None]:
#Show info for the dataframe; note the new column at the end


In [None]:
#Examine the data; note the last column contains Shapely point geometries


* Visualize the data... (more info [here](https://geopandas.org/mapping.html))

In [None]:
#Create a simple plot


In [None]:
#Preview of some plot visualization options...


## Geopandas -- all in one place
So, let's look at all those steps in one short script - good for reference.

In [None]:
#Import packages
import pandas as pd
import geopandas
from shapely.geometry import Point

#Read the CSV file into a Pandas dataframe
df = pd.read_csv('./data/NC_Charging_Stations.csv')

#Create a list of point geometries from the appropriate columns
geomList = [Point(xy) for xy in zip(df['Longitude'],df['Latitude'])]

#Specify coordinate reference system of our data in the form of a dictionary
crs_NAD83 = {'init':'epsg:4326'}

#Upgrade the dataframe to spatial dataframe, assigining it to the NAD83 crs
gdf = geopandas.GeoDataFrame(df,geometry=geomList,crs=crs_NAD83)

---
---
## ArcGIS Python API
* https://developers.arcgis.com/python/guide/introduction-to-the-spatially-enabled-dataframe/#Accessing-GIS-data
* https://esri.github.io/arcgis-python-api/apidoc/html/arcgis.features.toc.html#arcgis.features.GeoAccessor.from_xy

### 1. Create the Pandas dataframe from the CSV data
As above, saved as `df_EVStations`
### 2. Create a "Spatially Enabled Dataframe" from the Pandas dataframe
The ArcGIS Python API offers a simpler method for converting CSV coordinate data to a spatial dataframe, one that combines the two steps above. This involves the `from_xy()` method of the API's `GeoAccessor` object ([link](https://esri.github.io/arcgis-python-api/apidoc/html/arcgis.features.toc.html#arcgis.features.GeoAccessor.from_xy)).

In [None]:
#Import the GeoAccessor module from the arcgis API
from arcgis import GeoAccessor

In [None]:
#Re-read the EV Charging station data into a Pandas dataframe
df_EVStations = pd.read_csv('./data/NC_Charging_Stations.csv')

In [None]:
#Convert to spatially enabled dataframe using the "from_xy() method"
sdf_EVStations = 
type(sdf_EVStations)

What's interesting is that the above operation returns what still looks like a *Pandas* dataframe. However, this dataframe is now associated with the ArcGIS *GeoAccessor* object, which is accessed by appending `.spatial` to the dataframe (which is not a part of standard Pandas dataframes):

In [None]:
#Reveal the new geoaccessor object linked to the dataframe
type(sdf_EVStations.spatial)

Thus, by appending `.spatial` to our *spatially enabled* dataframe, we attach a number of new actions we can do with this object. This [link](https://esri.github.io/arcgis-python-api/apidoc/html/arcgis.features.toc.html#geoaccessor) lists these actions, and below we show a few. 

In [None]:
#Show the full extent of the sdf


In [None]:
#Show the full extent as a bounding box


In [None]:
#Show its spatial reference


In [None]:
#Reproject to UTM Zone 17N (wkid = 26917)
sdf_EVStations_utm = sdf_EVStations.copy(deep=True) #We first need to copy to a new SDF
sdf_EVStations_utm.spatial.project({'wkid': 26917})
sdf_EVStations_utm.spatial.sr

In [None]:
#Plot the points
sdf_EVStations.spatial.plot()

https://developers.arcgis.com/python/guide/visualizing-data-with-the-spatially-enabled-dataframe/

In [None]:
#Plot the points, with some embellishment
sdf_EVStations.spatial.plot(
    renderer_type='u', #Set to show each unqiue value
    col='ZIP',         #Set the field with unique values,
    marker_size=5,
    line_width=.5,
)

## Recap 
Both GeoPandas and the ArcGIS Python API give us the ability to import CSV data containing coordinates into Spatial DataFrames. We've seen the simplest example in action, i.e., bringing in point features. Polyline and polygon features are a bit more tricky but can be done fairly easily using GeoPandas if the CSV includes a column listing the point coordinates that make up the vertices of the polyline or polygon.

A cool example: https://medium.com/geoai/house-hunting-the-data-scientist-way-b32d93f5a42f