## Using NumPy with ArcGIS: *FeatureClass to NumPy Arrays*

This notebook demonstrates how NumPy facilitates manipulation of feature class attribute data. By no means is this an in-depth introduction, let alone discussion, of NumPy, but it does at least familiarize you with what NumPy is about and how it can be used with ArcGIS feature classes. The links below provide more in-depth reading on NumPy and how it's used with feature classes.

Resources:
* https://jakevdp.github.io/PythonDataScienceHandbook/index.html#2.-Introduction-to-NumPy
* http://pro.arcgis.com/en/pro-app/arcpy/data-access/featureclasstonumpyarray.htm

In [1]:
#Import arcpy and numpy
import arcpy
import numpy as np

RuntimeError: The Product License has not been initialized.

In [None]:
#Point to the HUC12.shp feature class in the Data folder (ensure it exists)
huc12_fc = '../Data/HUC12.shp'
print (arcpy.Exists(huc12_fc))

In [None]:
#List the fields contained in the "huc12_fc" feature class
[f.name for f in arcpy.ListFields(huc12_fc)]

* Here,we convert the feature class to a NumPy array using ArcPy's [`FeatureClassToNumPyArray`](http://pro.arcgis.com/en/pro-app/arcpy/data-access/featureclasstonumpyarray.htm) function. 

In [None]:
#List the fields we want to convert
fieldList = ["Shape@XY","HUC_8","HUC_12","ACRES"]
arrHUCS = arcpy.da.FeatureClassToNumPyArray(huc12_fc,fieldList)

In [None]:
#Display the type of the returned object
type(arrHUCS)

* As a NumPy array, we can do different operations on the feature class. But first, let's inspect the array's properties. 

In [None]:
#Reveal documentation on the ndarray object
arrHUCS?

In [None]:
#How many records does it contain
arrHUCS.size

In [None]:
#What are the data types stored in this array
print (arrHUCS.dtype)

In [None]:
#Or, just what are the names of the "columns"
print (arrHUCS.dtype.names)

In looking at the data types, you notice that this NumPy array actually has multiple data types.


The ndarray object is actually a specific type of NumPy array - a **structured array**. (See https://jakevdp.github.io/PythonDataScienceHandbook/02.09-structured-data-numpy.html) A *structured array* is really just a collection of individual ndarrays, all of the same length, with each individual ndarray referenced by a field name. In other words, it's structured much like an attribute table where each field/column is it's own ndarray. 

Our `arrHUCs` structured array has 4 embedded "sub" arrays with the names `SHAPE@XY`, `HUC_8`, `HUC_12`, and `ACRES`)

Now, let's see what we can do with this...

### Selecting specific rows/columns/values from our NumPy array
* Numpy arrays allow **slicing**, much like familiar Python lists, enabling us to quickly nab subsets of data. 

In [None]:
#Show the first row of data
print (arrHUCS[0]) 

In [None]:
#Show all data from the first 5 rows of data
print (arrHUCS[0:5])     

In [None]:
#YOU TRY IT: Show all data from rows 10 thru 15
print(arrHUCS[])

In [None]:
#Show the value of the 5th row in the `HUC_8' field
arrHUCS[4]['HUC_8']

In [None]:
#YOU TRY IT: Show the value in the ACRES field of the last row


In [None]:
#List all the HUC12s in the dataset
print (arrHUCS['HUC_12'])

In [None]:
#YOU TRY IT: #List all the ACRES values in the dataset


### Calculations
* We can also do rapid calculations with the data...

In [None]:
#List the mean area of all HUCs
arrHUCS['ACRES'].mean()

In [None]:
#List all the ACRES values, but in hectares (1 acre = 2.47105 ha)
arrHUCS['ACRES'] * 2.47105

In [None]:
#YOU TRY IT: What is the total area of all HUC12s, in hectares


### Subsetting data with NumPy
We can also subset records in our array which we will do as a two step process. 
1. First we create a boolean **mask array**, that is an array of true and false values where a record is true if a condition is met. 
2. Then we **apply this mask** to our original array to isolate records where the mask is true

In [None]:
#First we make a boolean mask and show the first 10 records
arrMask = (arrHUCS["HUC_8"] == '03040103')
arrMask[:10]

In [None]:
#Now we apply the mask to isolate record where this is true
arrSelectedHUC8 = arrHUCS[arrMask]

In [None]:
#The original array had 201 records, how many records does this have? 
print (arrSelectedHUC8.size)

In [None]:
#Print the 11th row of our selected records
arrSelectedHUC8[10]

In [None]:
#Calculate the mean area of these HUCs
arrSelectedHUC8['ACRES'].mean()

In [None]:
#Plot a historam of HUC_12 areas
%matplotlib inline
import matplotlib.pyplot as plt
#import seaborn; seaborn.set()  # set plot style

In [None]:
plt.hist(arrHUCS['ACRES']);
plt.title('Area Distribution of HUC_12s')
plt.xlabel('Area (acres)')
plt.ylabel('number');

### Recap
Converting our feature attribute table to a NumPy array opens the door to some rapid computations using NumPy's speedy capabilities - a vast improvment over using arcpy's cursor objects!