# Demonstration of awsgnssroutils

Authors: Amy McVey (amcvey@aer.com) and Stephen Leroy (sleroy@aer.com) \
Version: 1.1 \
Date: 12 December 2022

This module contains utilities to query the AWS Registry of Open Data repository
of GNSS radio occultation data. It does so using database files posted in the
AWS repository. Specifically, the module defines two classes: *RODatabaseClient* and 
*OccList*. The former creates a gateway to the metadata on GNSS radio occultation 
(RO) soundings hosted in the AWS Registry of Open Data and enables a local mirror 
repository of that metadata if one is desired. The latter creates instances of 
lists of RO soundings and their associated metadata and offers several methods to 
manipulate those metadata and even download RO data files. 

## Installation
The following non-standard modules must be installed: boto3, s3fs, numpy.

## Functionality
This module defines two classes: RODatabaseClient and OccList. The first 
creates a portal to a database of RO metadata, and the second is an instance 
of a list of radio occultations (ROs). Each are described below. Before the 
heart of the demonstration begins, though, first import necessary modules, 
including RODatabaseClient. 

In [None]:
from awsgnssroutils import RODatabaseClient
import os
import json

### RODatabaseClient: 
Create an instance of a portal to a metadata on all RO data in the AWS 
Registry of Open Data. It provides an option to create a repository of 
the RO metadata on the local file system as keyword "repository". For 
example,

In [None]:
rodb = RODatabaseClient()
print( rodb )

creates a database interface directly to the AWS S3 bucket to access 
the metadata. This interface is slow but requires no local disk space. 

In [None]:
HOME = os.path.expanduser( "~" )
repository = os.path.join( HOME, "local/rodatabase" )
rodb = RODatabaseClient( repository=repository )
print( rodb )

also creates a database interface but with a local repository of the 
metadata in the directory "rometadata". It is far more efficient than 
the direct access method if a copy of requested metadata is already 
in the local repository. 

In [None]:
if False: 
    HOME = os.path.expanduser( "~" )
    repository = os.path.join( HOME, "local/rodatabase" )
    rodb = RODatabaseClient( repository=repository, update=True )
    print( rodb )

By specifying "update" as True, the local repository is updated at the 
instantiation of rodb. The update compares metadata in the repository 
of metadata on the local file system to the same metadata files in the
AWS Registry of Open Data and updates the local metadata as needed. 
The update does not add any "new" metadata files to the local repository. 

An update can be computationally expensive, however. We recommend executing 
an update=True instantiations of RODatabaseClient only periodically. The 
contents of the metadatabase files in the AWS Registry of Open Data rarely 
change, and so it should not be necessary to update the local mirror at 
every instantiation of RODatabaseClient. 

There are two methods to create a list of occultations through the 
database client. One is to perform an inquiry in which missions and/or 
a date-time range is specified, and a second is to restore a previously 
saved list of RO data. 

In [None]:
occlist = rodb.query( missions="champ" )
print( occlist )

generates an OccList containing metadata on all CHAMP RO data. The inquiry 
can be performed instead over a range in time. The date-time fields are 
always ISO format times...

In [None]:
occlist = rodb.query( datetimerange=("2019-06-01","2019-07-01") )
print( occlist )

creates an OccList of metadata for all RO soundings in the month of June, 
2019, regardless of mission. 

The other option to creating an OccList is be restoring a previously 
saved OccList: 

In [None]:
savefile = "old_occlist.json"
occlist.save( savefile )
occlist1 = rodb.restore( savefile )

in which the old OccList was saved in a JSON format file. 

### OccList:

An instance of the class OccList is contains the metadata on a list of RO 
soundings along with pointers to the RO data files in the AWS Registry of 
Open Data S3 bucket. AWS functionality is completely embedded in the 
methods of the OccList class. Those methods include the ability to 
subset/filter the list according to geolocation and time, 
GNSS transmitter/constellation, GNSS receiver, whether it is a rising or a 
setting occultation, etc. It also includes the ability to combine 
instances of OccList, save the OccList to a JSON format file for future 
restoration by RODatabaseClient.restore, and even download RO data files. 

In order to filter an OccList previously generated by 
RODatabaseClient.query or RODatabaseClient.restore, use the OccList.filter 
method: 

In [None]:
champoccs = rodb.query( missions="champ" )
print( champoccs )
champoccs2003 = champoccs.filter( datetimerange=("2003-01-01","2004-01-01") )
print( champoccs2003 )

illustrates how to apply a filter in date-time, retaining all CHAMP RO 
metadata for the year 2003. Filtering can be done in longitude and latitude 
as well: 

In [None]:
champoccs_US = champoccs.filter( longituderange=(-110,-70), latituderange=(25,55) )
print( champoccs_US )

and even those can be subset by local time (a.k.a. solar time): 

In [None]:
champoccs_US_midnight = champoccs_US.filter( localtimerange=(22,2) )
print( champoccs_US_midnight )

in which the local time range is given in hours and can wrap around 
midnight. Other filter options are for the GNSS constellation used as 
transmitters ("G" for GPS, "R" for GLONASS, "E" for Galileo, "C" for 
BeiDou), for individual transmitters ("G01", etc.), for individual 
receivers ("cosmic1c1", "metopb", etc.), and for occultation 'geometry' 
("rising" vs. "setting"). 

One can get information on the metadata in an OccList using the 
OccList.info method. For instance, if you want to get a listing of all of 
the Spire receiver satellites, do 

In [None]:
spire = rodb.query( missions="spire" )
print( f"Number of Spire occultation found: {spire.size}" )
spire_receivers = spire.info( "receiver" )
print( spire_receivers )

Notice that OccList.size contains a count of the number of RO entries in this 
instance of OccList. The first step in this process could be time consuming if the Spire 
metadata do not already reside on the local file system and the rodb object 
does not interface with a local repository. One can also get a list of the 
GNSS transmitters tracked by Spire on a particular day by 

In [None]:
spire_day = spire.filter( datetimerange=("2021-12-01","2021-12-02") )
spire_day_transmitters = spire_day.info("transmitter")
print( spire_day_transmitters )

which will give a list of all GNSS transmitters tracked by all Spire 
satellites on December 1, 2021. The spire_day list can be split up between 
rising and setting RO soundings as well: 

In [None]:
spire_day_rising = spire_day.filter( geometry="rising" )
print( "spire_day_rising = " + str(spire_day_rising) )
spire_day_setting = spire_day.filter( geometry="setting" )
print( "spire_day_setting = " + str(spire_day_setting) )

Then it is possible to save the spire metadata OccList to a JSON file 
for future restoration by 

In [None]:
spire.save( "spire_metadata.json" )

The metadata also contain pointers to the RO sounding data files in the 
AWS Open Data bucket. To get information on the data files available, 
use the OccList.info( "filetype" ) method. For example, to find out the 
types of RO data files avialable for the month of June, 2009: 

In [None]:
June2009 = rodb.query( datetimerange=("2009-06-01","2009-06-30") )
filetype_dict = June2009.info( "filetype" )
print( json.dumps( filetype_dict, indent="  ") )

which will return a dictionary with the AWS-native RO file types as keys 
with corresponding values being the counts of each. The file types have the 
format "{processing_center}_{file_type}" in which "processing_center" is an 
RO processing center that contributed to the AWS repository ("ucar", 
"romsaf", "jpl") and the "file_type" is one of "calibratedPhase", 
"refractivityRetrieval", or "atmosphericRetrieval". 

The values of the longitude, latitude, datetime, and localtimes of the RO 
soundings in an OccList can be obtained using the OccList.values() method: """

In [None]:
longitudes = June2009.values( "longitude" )
latitudes = June2009.values( "latitude" )
localtimes = June2009.values( "localtime" )

each of these variables being a masked numpy ndarray. 

Finally, RO data files themselves can be downloaded for subsequent 
scientific analysis using the OccList.download() method. If one wishes to 
download the all RO bending angle data contributed by the ROM SAF to the archive 
for the day of June 5, 2012, one only need execute the commands  

In [None]:
day_list = rodb.query( datetimerange=("2012-06-05T00:00:00","2012-06-05T23:59:59") )
files = day_list.download( "romsaf_refractivityRetrieval", "datadir" )
print( files )

which will download all file type "refractivityRetrieval" contributed by 
the ROMSAF into the directory "datadir". All of the files will be entered into 
just one directory. If instead one wants to download the files maintaining 
the AWS directory structure, use the keyword "keep_aws_structure" in the 
method call: 

In [None]:
day_list.download( "romsaf_refractivityRetrieval", "datadir", \
            keep_aws_structure=True )
print( files )

## Exercise
Plot the geolocation of all GNSS RO soundings on June 1, 2012. 

In [None]:
import matplotlib.pyplot as plt
import cartopy.crs as ccrs

plt.clf()
cmap = plt.get_cmap( "nipy_spectral" )
ax = plt.axes( projection=ccrs.PlateCarree() )
ax.coastlines( color="black" )

occlist = rodb.query( datetimerange=("2012-06-01","2012-06-02") )
print( f"Found {occlist.size} soundings." )
missions = occlist.info( "mission" )

for imission, mission in enumerate(missions): 
    occlist_by_mission = occlist.filter( missions=mission )
    lons = occlist_by_mission.values( "longitude" )
    lats = occlist_by_mission.values( "latitude" )
    color = cmap( (imission+0.5)/len(missions) )
    ax.scatter( lons, lats, color=color, marker="o", s=0.2 , label=mission )
    
ax.legend(loc="lower left", fontsize="x-small")
plt.show()