# SGCN Search

We changed pretty much everything about how this search is supposed to function under the new GC2 instance on the DataDistillery. Because we put logic into building out TIR Common Properties (see that under the tir repo), we are now trying to simply search on an index created from the TIR core table. You can facet on taxonomic group, taxonomic rank, and match method here for the SGCN searches. The main thing that has do be done is to first limit the search to the SGCN registrants (source=SGCN).

```json
{
  "query": {
    "match": {
      "properties.source": "SGCN"
    }
  }
}
```

All the other information that we had previously chunked out into separate fields from what was in HStore properties is still there, but it is now in JSONB data fields in the tir table. We are initially trying to just pipe all that into ElasticSearch via GC2 to see how searches behave. We will likely need to parse the properties we care about out again in some fasion to make them more usable as we're likely to get weird results with the way the JSONB data structures are thrown into ElasticSearch as escaped text strings. Or we may want to either a) pull the plug on the GC2 way of piping to ElasticSearch and go to a different architecture or b) look at the GC2 codebase again to see if we could contribute some new thinking about different kinds of PostgreSQL-stored data than they had considered.

A couple changes to worry about here:

- taxonomicauthorityid changed to authorityid
- taxonomicrank changed to rank

In [5]:
import requests
from IPython.display import display
import pandas as pd

In [2]:
#Class to render tables
class ListTable(list):
    def _repr_html_(self):
        html = ["<table>"]
        for row in self:
            html.append("<tr>")
            
            for col in row:
                html.append("<td>{0}</td>".format(col))
            
            html.append("</tr>")
        html.append("</table>")
        return ''.join(html)

This query returns results from the Elasticsearch index for the tir.tir table. It only calls the first 25 results, so that will need to be paginated for the SWAP online app. I included the taxonomic authority ID as a reference. Those IDs to ITIS or WoRMS return a machine-readable response and are not content negotiable, so if we want to include them in the UI, we would need to translate the ID into something for humans.

In [11]:
sgcnNationalListURL = 'https://gc2.datadistillery.org/api/v1/elasticsearch/search/bcb/sgcn/sgcn_search?size=25'
sgcnNationalList = requests.get(sgcnNationalListURL).json()

hitNum = 0
for hit in sgcnNationalList['hits']['hits']:
    if hitNum == 0:
        results = pd.DataFrame(hit["_source"]["properties"])
    else:
        newResult = pd.DataFrame(hit["_source"]["properties"])
        results.append(newResult)
    hitNum = hitNum + 1

display (results)



ValueError: If using all scalar values, you must pass an index

## Aggregations (facets)
The ES index for the national list is set up to support aggregations on taxonomicgroup, rank, and matchmethod for faceted searching in the system. The aggregations are added to the query DSL using the following:
```json
{
  "query": {
    "match": {
      "properties.source": "SGCN"
    }
  },
  "aggs": {
    "taxrank": {
      "terms": {
        "field": "properties.rank"
      }
    },
    "taxgroup": {
      "terms": {
        "field": "properties.taxonomicgroup"
      }
    },
    "matchmethod": {
      "terms": {
        "field": "properties.matchmethod"
      }
    }
  }
}
```
See the [ElasticSearch documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html) on aggregations for more details.

### NOTE:
We still have the problem here where the not_analyzed flag in the ElasticSearch GUI from GC2 does not seem to be keeping the aggregation properties from splitting across words.

In [15]:
queryWithAggs = 'https://gc2.datadistillery.org/api/v1/elasticsearch/search/bcb/tir/tir?q={"query": {"match": {"properties.source": "SGCN"}},"aggs": {"taxrank": {"terms": {"field": "properties.rank"}},"taxgroup": {"terms": {"field": "properties.taxonomicgroup"}},"matchmethod": {"terms": {"field": "properties.matchmethod"}}}}'
rAggs = requests.get(queryWithAggs).json()

print ("Taxonomic Rank")
for bucket in rAggs["aggregations"]["taxrank"]["buckets"]:
    print (bucket["key"], bucket["doc_count"])
print ("----")
print ("Taxonomic Group")
for bucket in rAggs["aggregations"]["taxgroup"]["buckets"]:
    print (bucket["key"], bucket["doc_count"])
print ("----")
print ("Match Method")
for bucket in rAggs["aggregations"]["matchmethod"]["buckets"]:
    print (bucket["key"], bucket["doc_count"])


Taxonomic Rank
species 14316
rank 1413
taxonomic 1413
unknown 1413
subspecies 1345
genus 506
variety 373
family 196
order 33
class 5
----
Taxonomic Group
plants 4307
insects 4142
other 2076
fish 1929
mollusks 1791
birds 1249
invertebrates 1243
mammals 823
reptiles 660
unknown 473
----
Match Method
match 15874
exact 14482
accepted 1663
followed 1663
tsn 1592
legacy 748
matched 665
not 665
fuzzy 644
aphiaid 71


### List of submitting states by year

We put another step into the process that assembles SGCN information by unique scientific name into the TIR where the submitting states by year are put into lists. Grabbing this information from an ElasticSearch result set, you can pull out the states by year and put them on a map or other visualization

In [9]:
queryForSpecies = 'https://gc2.datadistillery.org/api/v1/elasticsearch/search/bcb/tir/tir?q={"query": {"bool": {"must": {"match": {"properties.source": "SGCN"}},"must": {"match": {"properties.scientificname": "Anodontoides ferussacianus"}}}}}'
rSpecies = requests.get(queryForSpecies).json()

display (rSpecies)


{'_shards': {'failed': 0, 'successful': 5, 'total': 5},
 'hits': {'hits': [{'_id': '11924',
    '_index': 'bcb_tir_tir',
    '_score': 14.656125,
    '_source': {'properties': {'authorityid': 'http://services.itis.gov/?q=tsn:80148',
      'cachedate': '2017-07-06',
      'commonname': 'cylindrical papershell',
      'id': 11924,
      'itis': '{"tsn": "80148", "rank": "Species", "unit1": "Anodontoides", "unit2": "ferussacianus", "usage": "valid", "kingdom": "Animalia", "nameWInd": "Anodontoides ferussacianus", "synonyms": ["80148:$Anodontoides denigratus$Anodonta ferussaciana$Anodonta buchanensis$Anodonta subcylindracea$Anodonta argentea$Anodonta ferruginea$Anodon plicatus$Anodonta denigrata$Anodonta oblita$Anodonta modesta$Anodontoides birgei$Anodontoides denigrata$"], "cacheDate": "2017-07-06T13:37:01.130212", "hierarchy": [{"name": "Animalia", "rank": "Kingdom"}, {"name": "Bilateria", "rank": "Subkingdom"}, {"name": "Protostomia", "rank": "Infrakingdom"}, {"name": "Lophozoa", "rank"

In [11]:
q = "SELECT * FROM tir.tir WHERE source='SGCN' AND scientificname='Anodontoides ferussacianus'"
r = requests.get("https://gc2.datadistillery.org/api/v1/sql/bcb?q="+q).json()

for feature in r["features"]:
    display (feature["properties"]["registration"])
    display (feature["properties"]["sgcn"])

'{"source": "SGCN", "commonnames": [{"commonname": "Cumberland Papershell"}], "followTaxonomy": true, "scientificname": "Anodontoides denigratus", "registrationDate": "2017-07-05T13:43:54.980032", "taxonomicLookupProperty": "scientificname"}'

'{"swap2005": true, "dateCached": "2017-08-06T12:14:19.020956", "stateLists": [{"states": "Tennessee,Kentucky", "sgcn_year": 2015}, {"states": "Kentucky", "sgcn_year": 2005}], "taxonomicgroup": "Mollusks"}'

'{"source": "SGCN", "commonnames": [{"commonname": "Cylindrical Papershell"}], "followTaxonomy": true, "scientificname": "Anodontiodes ferussacianus", "registrationDate": "2017-07-05T13:43:52.673967", "taxonomicLookupProperty": "scientificname"}'

'{"swap2005": true, "dateCached": "2017-08-04T10:53:42.307185", "stateLists": [{"states": "Ohio", "sgcn_year": 2015}], "taxonomicgroup": "Mollusks"}'

'{"source": "SGCN", "commonnames": [{"commonname": "Cylinder"}, {"commonname": "Cylinder (Cylindrical Papershell)"}, {"commonname": "cylindrical papershell"}, {"commonname": "Cylindrical papershell"}, {"commonname": "Cylindrical Papershell"}], "followTaxonomy": true, "scientificname": "Anodontoides ferussacianus", "registrationDate": "2017-07-05T13:43:57.127727", "taxonomicLookupProperty": "scientificname"}'

'{"swap2005": true, "dateCached": "2017-08-04T10:08:08.741159", "stateLists": [{"states": "Vermont,Iowa,Wyoming,Pennsylvania,Missouri,Kansas,Colorado,West Virginia,Michigan", "sgcn_year": 2005}, {"states": "Pennsylvania,Colorado,Missouri,Iowa,Wyoming,Vermont,Kansas", "sgcn_year": 2015}], "taxonomicgroup": "Mollusks"}'