This notebook shows the essential query for individual SGCN state lists from the structure created in the sgcn schema of the experimental GC2 instance. The whole system has been completely rebuilt from the source repository out. The "Process SGCN repository source files.ipynb" (and its corresponding py script) in this repo is what executes that process starting with all the source files in the ScienceBase Repository.

We are also making some tweaks to the design of the SWAP application where we want to stick with the overall philosophy that we always show exactly what the states submitted on the state pages of the apps. The National List shows what we add to the process by aligning with taxonomic authorities and making judgments on how we group the information. To aid in this process and show full transparency, we added a few additional properties to the sgcn.sgcn table so that each record traces back to its original source. These include sourcid (ScienceBase item URI/URL) along with sourcefileurl and sourcefilename (the actual file processed by the code to produce the data for a given state and year).

Data for the states can be pulled from a database view or its corresponding ElasticSearch index. The view uses the following SQL:
```sql
SELECT sgcn_state, scientificname_submitted AS scientificname,
(array_agg(commonname_submitted ORDER BY sgcn_year DESC))[1] AS commonname,
(array_agg(taxonomicgroup_submitted ORDER BY sgcn_year DESC))[1] AS taxonomicgroup,
sum(((sgcn_year = 2005))::integer) AS sgcn2005,
sum(((sgcn_year = 2015))::integer) AS sgcn2015
FROM sgcn.sgcn
GROUP BY sgcn_state,scientificname_submitted
```  
This view makes the following choices:
* Group on the scientific name submitted by the state
* Use the latest common name and taxonomic group provided for species (2015 vs. 2005)

In [1]:
import requests
from IPython.display import display

In [2]:
# Available States
q = "SELECT DISTINCT sgcn_state FROM sgcn.sgcn GROUP BY sgcn_state"

r = requests.get("https://gc2.mapcentia.com/api/v1/sql/bcb?q="+q).json()

display (r)

{'_execution_time': 0.121,
 'auth_check': {'auth_level': 'Write',
  'checked_relations': ['sgcn.sgcn'],
  'session': None,
  'success': True},
 'features': [{'properties': {'sgcn_state': 'Alabama'}, 'type': 'Feature'},
  {'properties': {'sgcn_state': 'Indiana'}, 'type': 'Feature'},
  {'properties': {'sgcn_state': 'Minnesota'}, 'type': 'Feature'},
  {'properties': {'sgcn_state': 'South Carolina'}, 'type': 'Feature'},
  {'properties': {'sgcn_state': 'Louisiana'}, 'type': 'Feature'},
  {'properties': {'sgcn_state': 'California'}, 'type': 'Feature'},
  {'properties': {'sgcn_state': 'New Mexico'}, 'type': 'Feature'},
  {'properties': {'sgcn_state': 'New Hampshire'}, 'type': 'Feature'},
  {'properties': {'sgcn_state': 'American Samoa'}, 'type': 'Feature'},
  {'properties': {'sgcn_state': 'Connecticut'}, 'type': 'Feature'},
  {'properties': {'sgcn_state': 'Alaska'}, 'type': 'Feature'},
  {'properties': {'sgcn_state': 'Nevada'}, 'type': 'Feature'},
  {'properties': {'sgcn_state': 'Oklahoma'}, 

In [3]:
# ElasticSearch API query for a state showing pagination method
stateName = 'Nebraska'

stateListQuery = "https://gc2.mapcentia.com/api/v1/elasticsearch/search/bcb/sgcn/sgcn_statelists?q={%22query%22:{%22match%22:{%22properties.sgcn_state%22:%22"+stateName+"%22}}}&size=25&from=50"

stateList = requests.get(stateListQuery).json()

display (stateList)


{'_shards': {'failed': 0, 'successful': 5, 'total': 5},
 'hits': {'hits': [{'_id': 'AVxZqadvUuPNezaKDMV0',
    '_index': 'bcb_sgcn_sgcn_statelists',
    '_score': 3.771065,
    '_source': {'properties': {'commonname': 'Curve-Pod Fumewort',
      'scientificname': 'Corydalis curvisiliqua ssp. occidentalis',
      'sgcn2005': 1,
      'sgcn2015': 0,
      'sgcn_state': 'Nebraska',
      'taxonomicgroup': 'Vascular Plants'},
     'type': 'Feature'},
    '_type': 'sgcn_statelists'},
   {'_id': 'AVxZqadvUuPNezaKDMV6',
    '_index': 'bcb_sgcn_sgcn_statelists',
    '_score': 3.771065,
    '_source': {'properties': {'commonname': 'Plains Frostweed',
      'scientificname': 'Crocanthemum bicknellii',
      'sgcn2005': 0,
      'sgcn2015': 1,
      'sgcn_state': 'Nebraska',
      'taxonomicgroup': 'Plants'},
     'type': 'Feature'},
    '_type': 'sgcn_statelists'},
   {'_id': 'AVxZqadvUuPNezaKDMV7',
    '_index': 'bcb_sgcn_sgcn_statelists',
    '_score': 3.771065,
    '_source': {'properties': {

## Aggregations (facets)
The ES index for the state lists is set up to support aggregations on taxonomicgroup for faceted searching in the system. The aggregations are added to the query DSL using the following:
```json
{
  "aggs": {
    "taxrank": {
      "terms": {
        "field": "properties.taxonomicgroup"
      }
    }
}
```
See the [ElasticSearch documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html) on aggregations for more details.

In [5]:
# Query for the specified state name and add in the aggregations
queryWithAggs = "https://gc2.mapcentia.com/api/v1/elasticsearch/search/bcb/sgcn/sgcn_statelists?q={%22query%22:{%22match%22:{%22properties.sgcn_state%22:%22"+stateName+"%22}},%22aggs%22:%20{%22taxgroup%22:%20{%22terms%22:%20{%22field%22:%20%22properties.taxonomicgroup%22}}}}"
rAggs = requests.get(queryWithAggs).json()

print ("Taxonomic Group")
for bucket in rAggs["aggregations"]["taxgroup"]["buckets"]:
    print (bucket["key"], bucket["doc_count"])

Taxonomic Group
plants 573
insects 141
vascular 116
birds 113
mammals 36
fish 33
reptiles 28
mollusks 14
amphibians 4
bivalves 4
