# PyGw Showcase

This notebook demonstrates the some of the utility provided by the `pygw` python package.

In this guide, we will show how you can use `pygw` to easily:
- **Define** a data schema for Geotools SimpleFeature/Vector data (aka create a new data type)
- **Create** instances for the new type
- **Create** a RocksDB GeoWave Data Store
- **Register** a DataType Adapter & Index to the data store for your new data type
- **Write** user-created data into the GeoWave Data Store
- **Query** data out of the data store


In [None]:
%pip install ../../../../python/src/main/python

### Loading state capitals test data set
Load state capitals from CSV

In [1]:
import csv

with open("../../../java-api/src/main/resources/stateCapitals.csv", encoding="utf-8-sig") as f:
    reader = csv.reader(f)
    raw_data = [row for row in reader]


In [2]:
# Let's take a look at what the data looks like
raw_data[0]

['Alabama',
 'Montgomery',
 '-86.2460375',
 '32.343799',
 '1846',
 '155.4',
 '205764',
 'scala']

For the purposes of this exercise, we will use the state name (`[0]`), capital name (`[1]`), longitude (`[2]`), latitude (`[3]`), and the year that the capital was established (`[4]`).

### Creating a new SimpleFeatureType for the state capitals data set

We can define a data schema for our data by using a `SimpleFeatureTypeBuilder` to build a `SimpleFeatureType`.

We can use the convenience methods defined in `AttributeDescriptor` to define each field of the feature type.

In [3]:
from pygw.geotools import SimpleFeatureTypeBuilder
from pygw.geotools import AttributeDescriptor

# Create the feature type builder
type_builder = SimpleFeatureTypeBuilder()
# Set the name of the feature type
type_builder.set_name("StateCapitals")
# Add the attributes
type_builder.add(AttributeDescriptor.point("location"))
type_builder.add(AttributeDescriptor.string("state_name"))
type_builder.add(AttributeDescriptor.string("capital_name"))
type_builder.add(AttributeDescriptor.date("established"))
# Build the feature type
state_capitals_type = type_builder.build_feature_type()


### Creating features for each data point using our new SimpleFeatureType

`pygw` allows you to create `SimpleFeature` instances for `SimpleFeatureType` using a `SimpleFeatureBuilder`.

The `SimpleFeatureBuilder` allows us to specify all of the attributes of a feature, and then build it by providing a feature ID.  For this exercise, we will use the index of the data as the unique feature id.  We will use `shapely` to create the geometries for each feature.


In [4]:
from pygw.geotools import SimpleFeatureBuilder
from shapely.geometry import Point
from datetime import datetime

feature_builder = SimpleFeatureBuilder(state_capitals_type)

features = []
for idx, capital in enumerate(raw_data):
    state_name = capital[0]
    capital_name = capital[1]
    longitude = float(capital[2])
    latitude = float(capital[3])
    established = datetime(int(capital[4]), 1, 1)
    
    feature_builder.set_attr("location", Point(longitude, latitude))
    feature_builder.set_attr("state_name", state_name)
    feature_builder.set_attr("capital_name", capital_name)
    feature_builder.set_attr("established", established)
    
    feature = feature_builder.build(str(idx))
    
    features.append(feature)

### Creating a data store

Now that we have a set of `SimpleFeatures`, let's create a data store to write the features into.  `pygw` supports all of the data store types that GeoWave supports.  All that is needed is to first construct the appropriate `DataStoreOptions` variant that defines the parameters of the data store, then to pass those options to a `DataStoreFactory` to construct the `DataStore`.  In this example we will create a new RocksDB data store.

In [5]:
from pygw.store import DataStoreFactory
from pygw.store.rocksdb import RocksDBOptions

# Specify the options for the data store
options = RocksDBOptions()
options.set_geowave_namespace("geowave.example")
# NOTE: Directory is relative to the JVM working directory.
options.set_directory("./datastore")
# Create the data store
datastore = DataStoreFactory.create_data_store(options)

#### An aside: `help()`

Much of `pygw` is well-documented, and the `help` method in python can be useful for figuring out what a `pygw` instance can do. Let's try it out on our data store.

In [6]:
help(datastore)

Help on DataStore in module pygw.store.data_store object:

class DataStore(pygw.base.geowave_object.GeoWaveObject)
 |  DataStore(java_ref)
 |  
 |  This class models the DataStore interface methods.
 |  
 |  Method resolution order:
 |      DataStore
 |      pygw.base.geowave_object.GeoWaveObject
 |      builtins.object
 |  
 |  Methods defined here:
 |  
 |  __init__(self, java_ref)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  add_index(self, type_name, *indices)
 |      Add new indices for the given type. If there is data in other indices for this type, for
 |      consistency it will need to copy all of the data into the new indices, which could be a long
 |      process for lots of data.
 |      
 |      Args:
 |          type_name (str): Name of data type to register indices to.
 |          *indices (pygw.index.index.Index): Index to add.
 |  
 |  add_type(self, type_adapter, *initial_indices)
 |      Add this type to the data store. This only ne

### Adding our data to the data store

To store data into our data store, we first have to register a `DataTypeAdapter` for our simple feature data and create an index that defines how the data is queried.  GeoWave supports simple feature data through the use of a `FeatureDataAdapter`.  All that is needed for a `FeatureDataAdapter` is a `SimpleFeatureType`.  We will also add both spatial and spatial/temporal indices.

In [7]:
from pygw.geotools import FeatureDataAdapter

# Create an adapter for feature type
state_capitals_adapter = FeatureDataAdapter(state_capitals_type)

In [8]:
from pygw.index import SpatialIndexBuilder
from pygw.index import SpatialTemporalIndexBuilder

# Add a spatial index
spatial_idx = SpatialIndexBuilder().set_name_override("spatial_idx").create_index()

# Add a spatial/temporal index
spatial_temporal_idx = SpatialTemporalIndexBuilder().set_name_override("spatial_temporal_idx").create_index()

In [9]:
# Now we can add our type to the data store with our spatial index
datastore.add_type(state_capitals_adapter, spatial_idx, spatial_temporal_idx)

In [10]:
# Check that we've successfully registered an index and type
registered_types = datastore.get_types()

for t in registered_types:
    print(t.get_type_name())

StateCapitals


In [11]:
registered_indices = datastore.get_indices(state_capitals_adapter.get_type_name())

for i in registered_indices:
    print(i.get_name())

spatial_idx
spatial_temporal_idx


### Writing data to our store
Now our data store is ready to receive our feature data.  To do this, we must create a `Writer` for our data type.

In [12]:
# Create a writer for our data
writer = datastore.create_writer(state_capitals_adapter.get_type_name())

In [13]:
# Writing data to the data store
for ft in features:
    writer.write(ft)

In [14]:
# Close the writer when we are done with it
writer.close()

### Querying our store to make sure the data was ingested properly
`pygw` supports querying data in the same fashion as the Java API.  You can use a `VectorQueryBuilder` to create queries on simple feature data sets.  We will use one now to query all of the state capitals in the data store.

In [15]:
from pygw.query import VectorQueryBuilder

# Create the query builder
query_builder = VectorQueryBuilder()

# When you don't supply any constraints to the query builder, everything will be queried
query = query_builder.build()

# Execute the query
results = datastore.query(query)

The results returned above is a closeable iterator of `SimpleFeature` objects.  Let's define a function that we can use to print out some information about these feature and then close the iterator when we are finished with it.

In [16]:
def print_results(results):
    for result in results:
        capital_name = result.get_attribute("capital_name")
        state_name = result.get_attribute("state_name")
        established = result.get_attribute("established")
        print("{}, {} was established in {}".format(capital_name, state_name, established.year))
    
    # Close the iterator
    results.close()

In [17]:
# Print the results
print_results(results)

Honolulu, Hawaii was established in 1845
Phoenix, Arizona was established in 1889
Baton Rouge, Louisiana was established in 1880
Jackson, Mississippi was established in 1821
Austin, Texas was established in 1839
Topeka, Kansas was established in 1856
Oklahoma City, Oklahoma was established in 1910
Little Rock, Arkansas was established in 1821
Jefferson City, Missouri was established in 1826
Des Moines, Iowa was established in 1857
Saint Paul, Minnesota was established in 1849
Lincoln, Nebraska was established in 1867
Pierre, South Dakota was established in 1889
Cheyenne, Wyoming was established in 1869
Denver, Colorado was established in 1867
Santa Fe, New Mexico was established in 1610
Salt Lake City, Utah was established in 1858
Boise, Idaho was established in 1865
Salem, Oregon was established in 1855
Carson City, Nevada was established in 1861
Sacramento, California was established in 1854
Juneau, Alaska was established in 1906
Olympia, Washington was established in 1853
Helena, Mo

### Constraining the results
Querying all of the data can be useful occasionally, but most of the time we will want to filter the data to only return results that we are interested in.  `pygw` supports several types of constraints to make querying data as flexible as possible.

#### CQL Constraints
One way you might want to query the data is using a simple CQL query.

In [18]:
# A CQL expression for capitals that are in the northeastern part of the US
cql_expression = "BBOX(location, -87.83,36.64,-66.74,48.44)"

In [19]:
# Create the query builder
query_builder = VectorQueryBuilder()
query_builder.add_type_name(state_capitals_adapter.get_type_name())

# If we want, we can tell the query builder to use the spatial index, since we aren't using time
query_builder.index_name(spatial_idx.get_name())

# Get the constraints factory
constraints_factory = query_builder.constraints_factory()
# Create the cql constraints
constraints = constraints_factory.cql_constraints(cql_expression)

# Set the constraints and build the query
query = query_builder.constraints(constraints).build()
# Execute the query
results = datastore.query(query)

In [20]:
# Display the results
print_results(results)

Augusta, Maine was established in 1832
Montpelier, Vermont was established in 1805
Boston, Massachusetts was established in 1630
Concord, New Hampshire was established in 1808
Providence, Rhode Island was established in 1900
Hartford, Connecticut was established in 1875
Dover, Delaware was established in 1777
Richmond, Virginia was established in 1780
Annapolis, Maryland was established in 1694
Harrisburg, Pennsylvania was established in 1812
Trenton, New Jersey was established in 1784
Albany, New York was established in 1797
Columbus, Ohio was established in 1816
Lansing, Michigan was established in 1847
Indianapolis, Indiana was established in 1825
Frankfort, Kentucky was established in 1792
Charleston, West Virginia was established in 1885


#### Spatial/Temporal Constraints
You may also want to contrain the data by both spatial and temporal constraints using the `SpatialTemporalConstraintsBuilder`.  For this example, we will query all capitals that were established after 1800 within 10 degrees of Washington DC.

In [21]:
# Create the query builder
query_builder = VectorQueryBuilder()
query_builder.add_type_name(state_capitals_adapter.get_type_name())

# We can tell the builder to use the spatial/temporal index
query_builder.index_name(spatial_temporal_idx.get_name())

# Get the constraints factory
constraints_factory = query_builder.constraints_factory()
# Create the spatial/temporal constraints builder
constraints_builder = constraints_factory.spatial_temporal_constraints()
# Create the spatial constraint geometry.
washington_dc_buffer = Point(-77.035, 38.894).buffer(10.0)
# Set the spatial constraint
constraints_builder.spatial_constraints(washington_dc_buffer)
# Set the temporal constraint
constraints_builder.add_time_range(datetime(1800,1,1), datetime.now())
# Build the constraints
constraints = constraints_builder.build()

# Set the constraints and build the query
query = query_builder.constraints(constraints).build()
# Execute the query
results = datastore.query(query)

In [22]:
# Display the results
print_results(results)

Harrisburg, Pennsylvania was established in 1812
Columbus, Ohio was established in 1816
Indianapolis, Indiana was established in 1825
Montpelier, Vermont was established in 1805
Concord, New Hampshire was established in 1808
Providence, Rhode Island was established in 1900
Hartford, Connecticut was established in 1875
Charleston, West Virginia was established in 1885
Atlanta, Georgia was established in 1868
Augusta, Maine was established in 1832
Lansing, Michigan was established in 1847


#### Filter Factory Constraints
We can also use the `FilterFactory` to create more complicated filters.  For example, if we wanted to find all of the capitals within 500 miles of Washington DC that contain the letter L that were established after 1830.

In [23]:
from pygw.query import FilterFactory

# Create the filter factory
filter_factory = FilterFactory()

# Create a filter that passes when the capital location is within 500 miles of the
# literal location of Washington DC
location_prop = filter_factory.property("location")
washington_dc_lit = filter_factory.literal(Point(-77.035, 38.894))
distance_km = 500 * 1.609344 # Convert miles to kilometers
distance_filter = filter_factory.dwithin(location_prop, washington_dc_lit, distance_km, "kilometers")

# Create a filter that passes when the capital name contains the letter L.
capital_name_prop = filter_factory.property("capital_name")
name_filter = filter_factory.like(capital_name_prop, "*l*")

# Create a filter that passes when the established date is after 1830
established_prop = filter_factory.property("established")
date_lit = filter_factory.literal(datetime(1830, 1, 1))
date_filter = filter_factory.after(established_prop, date_lit)

# Combine the name, distance, and date filters
combined_filter = filter_factory.and_([distance_filter, name_filter, date_filter])

# Create the query builder
query_builder = VectorQueryBuilder()
query_builder.add_type_name(state_capitals_adapter.get_type_name())

# Get the constraints factory
constraints_factory = query_builder.constraints_factory()
# Create the filter constraints
constraints = constraints_factory.filter_constraints(combined_filter)

# Set the constraints and build the query
query = query_builder.constraints(constraints).build()
# Execute the query
results = datastore.query(query)

In [24]:
# Display the results
print_results(results)

Lansing, Michigan was established in 1847
Atlanta, Georgia was established in 1868
Charleston, West Virginia was established in 1885
