# Power of Arrow
 - intro to Arrow
 - using arrow with ArcGIS

## intro to Arrow
 - what is Apache Arrow
    - [arrow.apache.org/overview](https://arrow.apache.org/overview/)
    - [Apache Arrow page on wikipedia](https://en.wikipedia.org/wiki/Apache_Arrow)
 - more about Arrow
    - [github.com/apache/arrow](https://github.com/apache/arrow) 
    - 1000 contributors
    - apache2 license
    - commercial support by **Voltron Data**
 - goals
    - interop and connectivity
    - high performance query and processing


## using arrow with ArcGIS
 - installing arrow (included in arcgispro-py3)
 - getting data from ArcGIS to arrow
 - getting data from arrow to ArcGIS
 

In [None]:
import pyarrow as pa
import arcpy
import os
import sys

# getting data from ArcGIS to arrow

 - `arcpy.da.TableToArrowTable`
 - [documentation for TableToArrowTable](https://pro.arcgis.com/en/pro-app/latest/arcpy/data-access/tabletoarrowtable.htm)

In [None]:
# describe data
cwd = os.getcwd()
fc = os.path.join(cwd, r"f.gdb\counties")
for i in arcpy.ListFields(fc):
    print(f"{i.name} {i.type}")

In [None]:
# inspired by arcpy.da's Cursors and Numpy import/export
patable = arcpy.da.TableToArrowTable(fc, ["NAME", "Shape", "POP2000"])

In [None]:
patable

In [None]:
patable.shape

In [None]:
# structs and ops
import pyarrow.compute as pc

for i in dir(pc): 
    if "__" in i: continue
    print(i)

In [None]:
pc.sum(patable["POP2000"])

# more about arrow
 - arrow.apache.org's [pyarrow documentation](https://arrow.apache.org/docs/python/)
 - ArcGIS Pro 3.1 / Server 11.1 ship with Arrow **1.0.1**
 - ArcGIS Pro <i>next</i> / Server <i>next</i> evaluating Arrow **11.0.0**
 
  
 ## pyarrow 1.0.0
  - `pyarrow.compute` has **55** members
 
 ## pyarrow 11.0.0
  - joins
  - group_by & aggregate
  - `pyarrow.compute` has **334** members

In [None]:
print(f"{sys.prefix}")
print(f"{pa.__version__=}")

### create env
 - cannot use with arcpy api
 - `conda create -p c:\envs\arrow11 pyarrow=11.0.0 -c conda-forge`

# getting data from arrow to ArcGIS 
 - Demo data: [TLC Trip Record Data](https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page)

In [None]:
import pyarrow.parquet as pq
table = pq.read_table("yellow_tripdata_2021-12.parquet")
table

In [None]:
print(table.shape)

In [None]:
table = pq.read_table("yellow_tripdata_2021-12.parquet", 
                      columns=["PULocationID", "DOLocationID", "fare_amount"], 
                      filters=[('DOLocationID','=',230)])
print(table.shape)

## arrow Table into arcpy functions (gp tools)
 - *as input to many row based operations*
 - `arcpy.CopyRows`
 - `arcpy.CopyFeatures`

In [None]:
# arrow table can be injested by GP tools
arcpy.env.overwriteOutput = True
out_tab = os.path.join(cwd, r"f.gdb\yellow_tripdata_2021_12")
arcpy.management.CopyRows(table, out_tab)  # <- THIS

in_zones = os.path.join(cwd, r"poarrow\taxi_zones\taxi_zones.shp")
out_tab_stats = os.path.join(cwd, r"f.gdb\yellow_tripdata_2021_12_stats")

# with more recent version would do this computation in arrow
arcpy.analysis.Statistics(
    in_table=out_tab,
    out_table=out_tab_stats,
    statistics_fields="PULocationID COUNT;fare_amount MIN;fare_amount MAX;fare_amount STD",
    case_field="PULocationID")

# More projects using Arrow
 - Esri
 - RAPIDS.AI
     - GPU backed dataFrame by Nvidia
 - Pola.rs
     - "Lightning-fast DataFrame library for Rust and Python"
     - [www.pola.rs/](https://www.pola.rs/)
 - pandas 2.0
     - built on NumPy but now with Arrow backend
     - [datapythonista.me/blog](https://datapythonista.me/blog/pandas-20-and-the-arrow-revolution-part-i) by Marc Garcia
 - Wes McKinney
     - [Apache Arrow and the "10 Things I Hate About pandas"](https://wesmckinney.com/blog/apache-arrow-pandas-internals/)
     - [What is a pyarrow table? Will it be a replacement for pandas dataframes?](https://stackoverflow.com/questions/52873072/what-is-a-pyarrow-table-will-it-be-a-replacement-for-pandas-dataframes)