# GBDX Vector Datasets

This short tutorial covers the use of various GBDX vector datasets. For examples on querying external resources to pull in vector data see this [notebook](https://notebooks.geobigdata.io/hub/notebooks/5bd884c1f2eb065c15f00e56). It is recommended to complete the Postman setup [notebook](https://notebooks.geobigdata.io/hub/notebooks/5bae0c65e9c92b5d7a00f174) before this tutorial if you wish to run the API calls shown in this notebook.

## Vector Services

The vector search capability gives users the ability to query a vast amount of vector data rapidly and easily by location, keyword, date/time, etc. This quick access and easy to use API gets you access to a wide variety of vector data including map data and social media records.

### Endpoints

- **General Vector Services** for use with any application.

- **ESRI Vector Services** endpoints designed to work best with ESRI software such as ArcMap.

- **Vector Write** publish local shapefile or json data to the Vector Services index for later use or sharing with other users.

- **Vector Aggregations** - useful to query large areas by location/keywords/time in order to narrow search for a specific dataset. Also enables analytics such as change detection.

- **Facets** - useful to discover what values for a set of vector fields are available in a given geographic area.

- **Export Service** - export vector datasets to an Amazon S3 bucket for download.

### List Fields

The following API call will retrieve a list of fields that we can use to query Vector Services:

`https://vector.geobigdata.io/insight-vector/api/facets/fields`

Results:

```json
[
    "item_date",
    "ingest_date",
    "ingest_source",
    "source",
    "item_type",
    "format",
    "text",
    "name",
    "geom_type"
]
```

### List Vector Sources

The following API call will retrieve a list of the vector sources available for an area of interest:

`https://vector.geobigdata.io/insight-vector/api/vectors/sources?left=12.4008&right=12.692&upper=55.7758&lower=55.5488`

Results:

```json
{
    "data": [
        {
            "name": "OSM",
            "count": 378725
        },
        {
            "name": "Twitter",
            "count": 120336
        },
        {
            "name": "GDELT",
            "count": 115289
        },
        {
            "name": "GBDX_INGEST_ALPHA",
            "count": 687
        },
        {
            "name": "Gazetteer",
            "count": 390
        },
        {
            "name": "VEDA",
            "count": 5
        },
        {
            "name": "admin1-states-provinces",
            "count": 1
        }
    ],
    "shards": 2557
}
```

### List Vector Types

The following API call will retrieve a list of the OSM vector types available for an area of interest. For other data sources simply swap "OSM" in the URL to "Twitter", "Gazetteer", etc.

`https://vector.geobigdata.io/insight-vector/api/vectors/OSM/types?left=12.4008&right=12.692&upper=55.7758&lower=55.5488`

```json
{
    "data": [
        {
            "name": "Location",
            "count": 204968
        },
        {
            "name": "Building",
            "count": 56310
        },
        {
            "name": "Road",
            "count": 30076
        },
        {
            "name": "Pedestrian",
            "count": 16399
        },
        {
            "name": "Uncategorized",
            "count": 7833
        },
        {
            "name": "Barrier (Fence)",
            "count": 5142
        },
        {
            "name": "House",
            "count": 4689
        },
        {
            "name": "Parking",
            "count": 3882
        },
        {
            "name": "Tree",
            "count": 3472
        },
        {
            "name": "Transportation - Uncategorized",
            "count": 2560
        }
    ],
    "shards": 5961
}
```

From the information above we can format a query to retrieve data from GBDX.

In [1]:
# Install dependencies
!pip install -qq folium==0.7.0
!pip install -qq python-geohash
!pip install -qq geojson
!pip install -qq panda

import time
import json
import folium
import geohash
import numpy as np
import pandas as pd
from pandas.io.json import json_normalize

from decimal import Decimal
from shapely.geometry import box
from shapely.geometry import shape
from geojson import Polygon, Feature, FeatureCollection
from gbdxtools import Interface

gbdx = Interface()

## Datasets

### Gazetteer

USGS & NGA GeoNames, Global coverage, data is static.

The code block below demonstrates how to query GBDX for gazetteer data and display the results on a web map. Click on a point on the map to display the name and type attributes.

In [2]:
# define the area of interest
bbox = [12.557158470153809,55.67826194312847,12.57406711578369,55.686972023983884]
aoi = box(*bbox)

In [3]:
start_time = time.time()
query = gbdx.vectors.query(aoi.wkt, query='ingest_source:Gazetteer',
                              index='vector-gazetteer*', count=500)
print('Runtime: ', round(Decimal(time.time() - start_time), 4),'s')

# build geojson feature collection
data = {
    'type': 'FeatureCollection',
    'features': query
}

m = folium.Map(
    location=[55.682, 12.568],
    zoom_start=16
)

fg = folium.FeatureGroup(name='Gazetteer')

# add features to folium feature group
for feature in data['features']:
    lon = feature['geometry']['coordinates'][0]
    lat = feature['geometry']['coordinates'][1]
    fname = feature['properties']['attributes']['fname']
    ftype = feature['properties']['attributes']['ftype']
    fg.add_child(folium.Marker(location=[lat,lon], popup=(folium.Popup('NAME: ' + fname + ', TYPE: ' + ftype))))
    
m.add_child(fg)

m

Runtime:  0.1068 s


We can then use the following to save the query results to our workspace. After saving a geojson file we can download it by clicking the jupyter icon at the top of the notebook to open the file browser and select our file for download.

In [4]:
with open ('gazetteer_query.geojson', 'w') as output:
    json.dump(data, output)

### ACLED

The Armed Conflict Location & Event Data Project (ACLED) is a disaggregated conflict collection, analysis and crisis mapping project. ACLED collects the dates, actors, types of violence, locations, and fatalities of all reported political violence and protest events across Africa, South Asia, South East Asia, the Middle East, Europe, and Latin America. Political violence and protest includes events that occur within civil wars and periods of instability, public protest and regime breakdown. ACLEDâ€™s aim is to capture the forms, actors, dates and locations of political violence and protest as it occurs across states. The ACLED team conducts analysis to describe, explore and test conflict scenarios, and makes both data and analysis open to freely use by the public.

The code block below demonstrates how to query GBDX for ACLED data and display the results on a web map. Click on a point on the map to display the name and type attributes.

In [5]:
# define the area of interest
bbox = [10.107421874999998,2.855262784366583,14.3701171875,5.922044619883305]
aoi = box(*bbox)

In [6]:
start_time = time.time()
query = gbdx.vectors.query(aoi.wkt, query='ingest_source:ACLED',
                              index='vector-acled', count=100)
print('Runtime: ', round(Decimal(time.time() - start_time), 4),'s')

# build geojson feature collection
data = {
    'type': 'FeatureCollection',
    'features': query
}

m = folium.Map(
    location=[4.5, 12],
    zoom_start=8
)

fg = folium.FeatureGroup(name='ACLED')

# add features to folium feature group
for feature in data['features']:
    lon = feature['geometry']['coordinates'][0]
    lat = feature['geometry']['coordinates'][1]
    item_type = feature['properties']['item_type'][0]
    fg.add_child(folium.Marker(location=[lat,lon], popup=(folium.Popup(item_type))))
    
m.add_child(fg)

m

Runtime:  0.0832 s


### GDELT

Supported by Google Jigsaw, the GDELT Project monitors the world's broadcast, print, and web news from nearly every corner of every country in over 100 languages and identifies the people, locations, organizations, themes, sources, emotions, counts, quotes, images and events driving our global society every second of every day, creating a free open platform for computing on the entire world.

The code block below demonstrates how to query GBDX for GDELT data and display the results on a web map. Click on a point on the map to display attributes.

In [7]:
# define the area of interest
bbox = [10.107421874999998,2.855262784366583,14.3701171875,5.922044619883305]
aoi = box(*bbox)

In [8]:
start_time = time.time()
query = gbdx.vectors.query(aoi.wkt, query='ingest_source:GDELT',
                              index='vector-gdelt-*', count=100)
print('Runtime: ', round(Decimal(time.time() - start_time), 4),'s')

# build geojson feature collection
data = {
    'type': 'FeatureCollection',
    'features': query
}

m = folium.Map(
    location=[4.5, 12],
    zoom_start=8
)

fg = folium.FeatureGroup(name='GDELT')

# add features to folium feature group
for feature in data['features']:
    lon = feature['geometry']['coordinates'][0]
    lat = feature['geometry']['coordinates'][1]
    dte = feature['properties']['item_date']
    des = feature['properties']['attributes']['CAMEOCodeDescription']
    fg.add_child(folium.Marker(location=[lat,lon], popup=(folium.Popup('DESC:' + des + ', DATE: ' + dte))))
    
m.add_child(fg)

m

Runtime:  0.2133 s


### HGIS

Human Landscape, DigitalGlobe | Radiant's Human Geography Information
Surveys, are comprehensive geodatabases with rich attribution and
metadata, detailing core human geography themes across Country and Metro
scale, in analysis ready format. Leveraging DigitalGlobe high-resolution
imagery significantly enriches publicly available data sources, resulting in
DigitalGlobe unique surveys that enable analysts to develop responses to
a wide range of complex geospatial taskings, effectively reducing operating
costs and accelerating time-to-mission.

The code block below demonstrates how to query GBDX for HGIS data and save the results to your notebook workspace.

In [9]:
# define the area of interest
bbox = [-5.450094528639852,9.74640175912148,2.547952346360148,15.40548120703602]
aoi = box(*bbox)

In [10]:
start_time = time.time()

# query for cell towers
query = gbdx.vectors.query(aoi.wkt, query='ingest_source:HGIS 2.0 AND geom_type:Point AND item_type:"Cell Towers"',
                              index='vector-hgis2-burkinafaso', count=500)
print('Runtime: ', round(Decimal(time.time() - start_time), 4),'s')

# build geojson feature collection
data = {
    'type': 'FeatureCollection',
    'features': query
}

m = folium.Map(
    location=[11.5, -2.5],
    zoom_start=8
)

fg = folium.FeatureGroup(name='HGIS')

# add features to folium feature group
for feature in data['features']:
    feat = folium.GeoJson(feature['geometry'])
    feat.add_child(folium.Popup( 'OPERATOR: ' + str(feature['properties']['attributes']['OPERATOR']) + 
                                 ' NETWORK: '  + str(feature['properties']['attributes']['NETWORK'])
                               ))
    fg.add_child(feat)
    
m.add_child(fg)

m

Runtime:  0.0674 s


Let's run an aggregation query to see full coverage of HGIS data.

In [11]:
bbox = [-180, -90, 180, 90]

aoi = box(*bbox).wkt

agg = 'geohash:3;terms:item_type'
query = 'ingest_source:HGIS 2.0'
features = gbdx.vectors.aggregate_query(aoi, agg, query, index='vector-hgis2-*', count=1000000)

geo_test = [[]]
test_list = []

for i in features:
    for j in i['terms']:
        count = j['count']
        geo_id = j['term']
        geo_dic = geohash.bbox(geo_id)
        geo_test[0].append((geo_dic['e'],geo_dic['n']))
        geo_test[0].append((geo_dic['w'],geo_dic['n']))
        geo_test[0].append((geo_dic['w'],geo_dic['s']))
        geo_test[0].append((geo_dic['e'],geo_dic['s']))
        geo_test[0].append((geo_dic['e'],geo_dic['n']))
        geo_feature = Feature(geometry=Polygon(geo_test))
        geo_feature['properties'] = {'count':count, 'geohash':geo_id}
        geo_test[:] = [[]]
        test_list.append(geo_feature)
        
feature_collection = FeatureCollection(test_list)
fc_copy = feature_collection

In [12]:
import pandas as pd
from pandas.io.json import json_normalize
import geopandas as gpd

df = pd.DataFrame(json_normalize(fc_copy['features']))

m = folium.Map(location=[0, 0], zoom_start=2)

folium.Choropleth(
    geo_data=feature_collection,
    name='choropleth',
    data=df,
    columns=['properties.geohash', 'properties.count'],
    key_on='properties.geohash',
    fill_color='RdYlBu',
    fill_opacity=0.6,
    line_opacity=0.3,
    legend_name='Geohash Agg',
    highlight=True
    
).add_to(m)


folium.LayerControl().add_to(m)
m

### OpenStreetMap

OpenStreetMap data, global coverage, last update 02/2018.

The code block below demonstrates how to query GBDX for OSM building polygon data and display the results on a web map. To query Point or Linestring simply change the geom_type value in the query. Click on a point on the map to display the name if available.

**Note** we will set the count value to 500 so our results will be a smaller subset of the data actually available.

In [13]:
# define the area of interest
bbox = [12.557158470153809,55.67826194312847,12.57406711578369,55.686972023983884]
aoi = box(*bbox)

In [14]:
start_time = time.time()
query = gbdx.vectors.query(aoi.wkt, query='ingest_source:OSM AND geom_type:Polygon AND item_type:Building',
                              index='vector-osm-*', count=500)
print('Runtime: ', round(Decimal(time.time() - start_time), 4),'s')

# build geojson feature collection
data = {
    'type': 'FeatureCollection',
    'features': query
}

m = folium.Map(
    location=[55.682, 12.568],
    zoom_start=16
)

fg = folium.FeatureGroup(name='Openstreetmap')

# add features to folium feature group
for feature in data['features']:
    feat = folium.GeoJson(feature['geometry'])
    feat.add_child(folium.Popup('NAME: ' + str(feature['properties']['name'])))
    fg.add_child(feat)
    
m.add_child(fg)

m

Runtime:  0.4774 s


We can then use the following to save the query results to our workspace. After saving a geojson file we can download it by clicking the jupyter icon at the top of the notebook to open the file browser and select our file for download.

In [15]:
with open ('openstreetmap_query.geojson', 'w') as output:
    json.dump(data, output)

### Twitter

Twitter data, global coverage.

The code block below demonstrates how to query GBDX for tweets and display the results as a heat map.

**Note** we will set the count value to 500 so our results will be a smaller subset of the data actually available.

See [here](https://gbdxdocs.digitalglobe.com/docs/twitter-attributes) for a list of attributes that may be queried.

In [16]:
start_time = time.time()
query = gbdx.vectors.query(aoi.wkt, query='ingest_source:Twitter AND item_type:tweet',
                              index='vector-*', count=500)
print('Runtime: ', round(Decimal(time.time() - start_time), 4),'s')

Runtime:  5.6208 s


In [17]:
tweet_centroids = []
for tweet in query:
    tweet_geom = shape(tweet['geometry'])
    tweet_centroids.append((tweet_geom.centroid.coords[0][1],tweet_geom.centroid.coords[0][0]))
    
from folium import plugins
from folium.plugins import HeatMap

m = folium.Map(location=[55.682, 12.568],
                    zoom_start = 15) 

# Plot it on the map
HeatMap(tweet_centroids).add_to(m)

m

The next cell demonstrates a query to aggregate all tweets within the last day by geohash.

In [18]:
bbox = [-180, -90, 180, 90]
aoi = box(*bbox)

agg = 'geohash:3;terms:item_type'
query = 'ingest_source:Twitter AND item_type:tweet AND item_date:[now-24h TO now]'
features = gbdx.vectors.aggregate_query(aoi.wkt, agg, query, index='vector-*', count=1000)

geo_test = [[]]
test_list = []

for i in features:
    for j in i['terms']:
        count = j['count']
        geo_id = j['term']
        geo_dic = geohash.bbox(geo_id)
        geo_test[0].append((geo_dic['e'],geo_dic['n']))
        geo_test[0].append((geo_dic['w'],geo_dic['n']))
        geo_test[0].append((geo_dic['w'],geo_dic['s']))
        geo_test[0].append((geo_dic['e'],geo_dic['s']))
        geo_test[0].append((geo_dic['e'],geo_dic['n']))
        geo_feature = Feature(geometry=Polygon(geo_test))
        geo_feature['properties'] = {'count':count, 'geohash':geo_id}
        geo_test[:] = [[]]
        test_list.append(geo_feature)  
        
count_max = 0
total_tweets = 0
for bucket in test_list:
    count = bucket['properties']['count']
    if count > count_max:
        count_max = count
    total_tweets = total_tweets + count
    
print('Total tweets:', "{:,}".format(total_tweets))
print('Max tweet count in a single cell:', "{:,}".format(count_max))

#create a color bar legend
num_levels = 5

# choose "linear" or "log" format for the color bar spacing
legend_format = 'log'

if legend_format == 'linear':
    
    levels = np.arange(num_levels+1)*(round(count_max,-3)+1000)/num_levels
    
elif legend_format == 'log':
    
    levels = (np.logspace(0,3,5,base=2)-1)
    levels = levels/np.max(levels)*(round(count_max,-3)+1000)

levels = [int(i) for i in levels]

feature_collection = FeatureCollection(test_list)
fc_copy = feature_collection

df = pd.DataFrame(json_normalize(fc_copy['features']))

m = folium.Map(location=[0, 0], zoom_start=2)

folium.Choropleth(
    geo_data=feature_collection,
    name='choropleth',
    data=df,
    columns=['properties.geohash', 'properties.count'],
    key_on='properties.geohash',
    fill_color='YlOrRd',
    fill_opacity=0.7,
    line_opacity=0.1,
    legend_name='Tweets',
    threshold_scale=levels,
    highlight=True
    
).add_to(m)

folium.LayerControl().add_to(m)

m

Total tweets: 3,727,797
Max tweet count in a single cell: 180,332
