# Spatial Dataframes 1b: Creating them using the ArcGIS API for Python
ENV 859 - Fall 2022  
© John Fay, Duke University

<h3 style="background-color:Yellow;">Note: This notebook should be run using the defauly Conda environment included with ArcGIS Pro</h3>

### The ArcGIS Python API vs GeoPandas
In ***Spatial Dataframes 1a*** we focused on reading data in to spatial dataframes using the **GeoPandas** package. Here we exlore an alternative: the **ArcGIS API for Python**. Why are there two packages? What's the difference? 

Both packages are built on-top of Pandas dataframes and include all the functionality of the Pandas package in dealing with spatial dataframes as standard dataframes. Both introduce geometries as a new data type, and by attaching these geometries to dataframes, enable various types of spatial analyses with our dataframes. 

The key difference seems to be from where they originated and how they evolved. GeoPandas is built off of the Shapely library for dealing with geometric objects and the Fiona library for reading and writing different recognized spatial data formats. And deeper down, Geopandas uses the open source GDAL (geospatial data abstraction library) as the computation engine that drives spatial analyses. 

The ArcGIS API for Python, in contrast, originated as a cloud based version of ESRI's ArcPy package. ESRI has long been developing cyberinfrastructure for web-based access to and processing of spatial data, and they have developed a host of Java based instructions for doing this. Recognizing the popularity of Python, however, ESRI has more recently developed the ArcGIS API for Python as an alternative to working with on-line spatial data. And included in this Python based offering is ESRI's own version of the spatial dataframe, what they call the "***spatially enabled dataframe***". 

The bottom line is that these two Python objects - GeoPandas' geodataframe and ESRI's spatially enabled dataframe - are quite similar, but have some key differences. They also evolve at different paces. Which one should you use? That depends on what you are doing, as each as its own advantages and limitations. The best thing to do is find a level of comfort with each and see how they compare across different tasks...

## The Lesson  - Constructing Spatial Dataframes with the ArcGIS API
Similar to our last lesson using GeoPandas, we'll explore the techniques for importing data in various formats into spatial dataframes. We'll use the same datasets as that lesson so you can easily compare and contrast the methods.

The source formats we examine include:
1. [A delimited text file (e.g. CSV) containing coordinate columns and a know coordinate reference system](#1.1---Creating-spatial-dataframes-from-CSV-files-using-GeoPandas)
2. [An existing feature class in the form of a shapefile or within a geodatabase](#1.2:-Creating-spatial-dataframes-from-existing-feature-classes)
3. [Other formats: GeoJSON files, KML, and [kind of] GeoDatabases](#1.3---Creating-spatial-dataframes-from-other-file-formats)

### 1.1 - Creating spatial dataframes from CSV files using the ArcGIS Python API
We revisit the electric vehicle charging locations in North Carolina obtained from the Alternative Fuels Data Center ([link](https://afdc.energy.gov/data_download)). 

The process of importing a CSV file into a "spatially enabled dataframe" is done via the ArcGIS API's [***GeoAccessor***](https://developers.arcgis.com/python/api-reference/arcgis.features.toc.html#geoaccessor) object, which has a function called `from_xy()` that converts a pandas dataframe to a spatial dataframe. This differes from the GeoPandas workflow in that we don't have to create a GeoSeries object; we just specify the X and Y coordinate columns.

#### Step 1. Import the GeoAccessor object and read the CSV file directly
We could simply import the `arcgis` package in its entirety, but it is a rather large package and often we import the bits we need. In this case, we just want the GeoAccessor object, which is part of the `features` submodule of the `arcgis` package.

We also import the Pandas package, used to read the CSV data into a standard dataframe. Then, we invoke the GeoAccessor's `read_xy()` function to "upgrade" the dataframe into a spatial dataframe. 

In [1]:
#Import pandas and the arcgis GeoAccessor object
import pandas as pd
from arcgis.features import GeoAccessor

In [2]:
#Read the data into a Pandas dataframe
df = pd.read_csv('../data/NC_Charging_Stations.csv')

In [4]:
#Review the read_xy() function's syntax
GeoAccessor.from_xy?
df.head()

Unnamed: 0,ID,Fuel Type Code,Station Name,City,State,ZIP,Status Code,Latitude,Longitude,Facility Type
0,39016,ELEC,City of Raleigh - Municipal Building,Raleigh,NC,27601,E,35.778416,-78.64347,STREET_PARKING
1,39017,ELEC,City of Raleigh - Downtown,Raleigh,NC,27601,E,35.77435,-78.642287,STREET_PARKING
2,40066,ELEC,Modern Nissan - Concord,Concord,NC,28027,E,35.392063,-80.622777,CAR_DEALER
3,40067,ELEC,Fred Anderson Nissan,Fayetteville,NC,28304,E,35.042419,-78.956747,CAR_DEALER
4,40068,ELEC,Vann Yorks High Point Nissan,High Point,NC,27260,E,35.937981,-79.996012,CAR_DEALER


##### ►Spatial references with the GeoAccessor object
Spatial references (aka coordinate reference systems in GeoPandas) are also handled via WKIDs. 

In [6]:
#Read the EV Charging station data into a Pandas dataframe
sdf = GeoAccessor.from_xy(df, "Longitude", "Latitude", sr=4326)

#### 1.2 Explore our new object

In [7]:
#View a few records
sdf.head()

Unnamed: 0,ID,Fuel Type Code,Station Name,City,State,ZIP,Status Code,Latitude,Longitude,Facility Type,SHAPE
0,39016,ELEC,City of Raleigh - Municipal Building,Raleigh,NC,27601,E,35.778416,-78.64347,STREET_PARKING,"{""spatialReference"": {""wkid"": 4326}, ""x"": -78...."
1,39017,ELEC,City of Raleigh - Downtown,Raleigh,NC,27601,E,35.77435,-78.642287,STREET_PARKING,"{""spatialReference"": {""wkid"": 4326}, ""x"": -78...."
2,40066,ELEC,Modern Nissan - Concord,Concord,NC,28027,E,35.392063,-80.622777,CAR_DEALER,"{""spatialReference"": {""wkid"": 4326}, ""x"": -80...."
3,40067,ELEC,Fred Anderson Nissan,Fayetteville,NC,28304,E,35.042419,-78.956747,CAR_DEALER,"{""spatialReference"": {""wkid"": 4326}, ""x"": -78...."
4,40068,ELEC,Vann Yorks High Point Nissan,High Point,NC,27260,E,35.937981,-79.996012,CAR_DEALER,"{""spatialReference"": {""wkid"": 4326}, ""x"": -79...."


In [8]:
#Examine the data type of the object we just created
type(sdf)

pandas.core.frame.DataFrame

*►Hmmm... that's odd. It's still a Pandas dataframe --- or so it appears.* 

The ArcGIS API handles dataframes a bit differently. The object we created *is* recognized as a Pandas dataframe, but we can now append `.spatial` to the end of it to access its spatial capabilities...

In [9]:
#Examine the "spatially enabled" dataframe
type(sdf.spatial)

arcgis.features.geo._accessor.GeoAccessor

In [13]:
#Examine a few properties of this object
sdf.spatial.name

'SHAPE'

In [None]:
#Plot the data
sdf.spatial.plot()

### 1.2: Creating spatial dataframes from existing feature classes
Here we look at the process of getting existing feature classes, e.g. Shapefiles, into spatial dataframes. Now we'll do this using the ArcGIS API for Python. 

The dataset we'll use represents major river basins of North Carolina (source: https://data-ncdenr.opendata.arcgis.com/datasets/ncdenr::major-river-basins), a copy of which has been downloaded into the data folder as `Major_Basins.shp`. 

#### Step 1. Importing shapefiles using `from_featureclass()`
Importing feature classes using GeoPandas is easy with the `from_featureclass()` command. 

In [14]:
#Explore the GeoAccessor's from_featureclass() command
GeoAccessor.from_featureclass?

In [16]:
#Read the shapefile into a GeoPandas geodataframe
sdf_shp = GeoAccessor.from_featureclass('../data/Major_Basins.shp')

In [17]:
#Examine the data
sdf_shp.head()

Unnamed: 0,FID,FID_1,Basin,Sq_Miles,Acres,Name,PlanLink,SHAPE_Leng,SHAPE_Area,SHAPE
0,0,1,BRD,1513.894812,968892.7,Broad,https://deq.nc.gov/about/divisions/water-resou...,558125.8,-5910819000.0,"{""rings"": [[[-9213866.2715, 4173023.750500001]..."
1,1,2,CAT,3285.405145,2102659.0,Catawba,https://deq.nc.gov/about/divisions/water-resou...,856740.0,-12897940000.0,"{""rings"": [[[-9094141.1919, 4320798.406099997]..."
2,2,3,CHO,1298.283191,830901.2,Chowan,https://deq.nc.gov/about/divisions/water-resou...,466571.5,-5193827000.0,"{""rings"": [[[-8531329.2013, 4376184.236199997]..."
3,3,4,CPF,9163.594976,5864701.0,Cape Fear,https://deq.nc.gov/about/divisions/water-resou...,1392877.0,-35644020000.0,"{""rings"": [[[-8873544.7811, 4349237.935599998]..."
4,4,5,FBR,2828.806116,1810436.0,French Broad,https://deq.nc.gov/about/divisions/water-resou...,721696.7,-11123080000.0,"{""rings"": [[[-9151379.6031, 4322288.984700002]..."


In [21]:
#What is the spatial reference of the data
sdf_shp.spatial.sr

{'wkid': 102100, 'latestWkid': 3857}

In [23]:
#Plot the data...
sdf_shp.spatial.plot()

MapView(layout=Layout(height='400px', width='100%'))

### 1.3 - Creating spatial dataframes from other file formats

A look at the other "`from_`" operations associated with the GeoAccessor object reveals other datasources we can read into ArcGIS spatally enabled dataframes. GeoJSON and KML are NOT listed here (though GeoDataframe is, meaning we can potentially read in these files using GeoPandas into a geodataframe, and then convert this into a spatially enabled dataframe...).

In [25]:
#Explore other import options associated with the GeoAccessor object
GeoAccessor.

<function arcgis.features.geo._accessor.GeoAccessor.relationship(self, other, op, relation=None)>

Recalling that the ArcGIS Python API was developed for cloud-based computing, however, we see that it is quite adept at working with on-line resources. For example, you may have noticed pages that look like this:  
<https://services.nconemap.gov/secure/rest/services>  

This is a listing of a number of spatial (and sometimes non-spatial) datasets served online. Click on some of the links labeled "Feature Server" and that will reveal feature layers associated with that service. For example:   
https://services.nconemap.gov/secure/rest/services/NC1Map_Regional_Boundaries/FeatureServer 
Reveals two layers (state boundaries (1) and county boundaries (1)

The ArcGIS API can import these as spatial dataframes fairly easily from the web address associated with these layers. This is done not with the *GeoAccessor*, but with the ***FeatureLayer*** object...

In [26]:
#Import the FeatureLayer arcgis package
from arcgis.features import FeatureLayer

In [27]:
#Set the urls where the feature layer are hosted
state_layer_url = 'https://services.nconemap.gov/secure/rest/services/NC1Map_Regional_Boundaries/FeatureServer/0'
county_layer_url = 'https://services.nconemap.gov/secure/rest/services/NC1Map_Regional_Boundaries/FeatureServer/1'

In [30]:
#Read the data in as a feature layer
stateLayer = FeatureLayer(state_layer_url)
countyLayer = FeatureLayer(county_layer_url)
dfState = GeoAccessor.from_layer(stateLayer)
dfCounty = GeoAccessor.from_layer(countyLayer)

In [None]:
#Conver to a spatial dataframe


In [35]:
#Explore 
dfState.head()
#type(dfState.spatial)
#dfCounty.head()

Unnamed: 0,SHAPE,Shape__Area,Shape__Length,day_adm,month_adm,objectid,onemap_sdeadmin_usa_states_area,order_adm,perimeter,state,state_fips,statesp020,year_adm
0,"{""rings"": [[[-2744517.7946000025, 5879742.8082...",1979286000000.0,29922890.0,3.0,January,1,267.357,49,374.768,Alaska,2,2.0,1959.0
1,"{""rings"": [[[-2657753.126500003, 5868445.36879...",3382181.0,12074.83,3.0,January,2,0.0,49,0.224,Alaska,2,3.0,1959.0
2,"{""rings"": [[[-2653616.897699997, 5848470.94200...",1576285.0,6235.42,3.0,January,3,0.0,49,0.118,Alaska,2,4.0,1959.0
3,"{""rings"": [[[-2650007.5568000004, 5838549.7893...",2656461.0,15243.32,3.0,January,4,0.0,49,0.276,Alaska,2,5.0,1959.0
4,"{""rings"": [[[-2644430.9363999963, 5825202.4289...",2361460.0,9267.466,3.0,January,5,0.0,49,0.167,Alaska,2,6.0,1959.0


In [36]:
#Get the spatial reference
dfState.spatial.sr

{'wkid': 32119, 'latestWkid': 32119}

In [37]:
dfCounty.spatial.sr

{'wkid': 32119, 'latestWkid': 32119}