# Introduction to AWS cloud computing and GNSS RO Data in the AWS Open Data Registry

**Authors:** Stephen Leroy and Amy McVey

**Date:** September 13, 2022

This notebook is composed for instruction and experimentation at the workshop to 
introduce cloud computing and how it can be applied to GNSS radio occultation 
data in the AWS Open Data Registry. Feel free to write any of your own code 
based on this notebook in your EC2 instance. Python is available in it for your 
convenience. 

- AWS portal to the RO repository in the Open Data Registry: 
    https://registry.opendata.aws/gnss-ro-opendata/

- GitHub support material: 
    http://github.com/gnss-ro/aws-opendata/


### Portal to Open Data Registry of RO data with s3fs 

The variable "s3" is an s3fs object, which makes the contents of S3 buckets appear as if they were in a Linux-like file system. The AWS Open Data Registry is free for browse and download, so you need to inform AWS that no authentication ("signature") is necessary for access. That's why the signature_version is set to UNSIGNED. Also, the RO data is hosted in AWS region "us-east-1", and that needs to be declared.

In [None]:
import s3fs
from botocore import UNSIGNED

AWS_region = "us-east-1"
s3 = s3fs.S3FileSystem( 
        client_kwargs = { 'region_name': AWS_region },  
        config_kwargs = { 'signature_version': UNSIGNED }
        )   


Attempt a few s3fs methods. First, list all RO missions contributed by UCAR. 

In [None]:
import os
ucar_missions = sorted( [ os.path.split(p)[1] for p in s3.ls("gnss-ro-data/contributed/v1.1/ucar/") ] )
print( ucar_missions )

Now list all files containing bending angle profiles
as retrieved by UCAR for Metop satellites on January 1, 2020...

In [None]:
files = sorted( s3.ls( "gnss-ro-data/contributed/v1.1/ucar/metop/refractivityRetrieval/2020/01/01/" ) )
print( "\n".join( files ) )

### Using a DynamoDB database

The AWS Open Data Registry only allows accessible S3 storage of 
data; it does not allow publication of a database service. At home, 
you will need to use our GitHub utilities to construct your own 
private database based on the contents of 
s3://gnss-ro-data/dynamo/v1.1/export_subsets,
which will take considerable time. At this workshop, we have made 
AER's own internal DynamoDB database of RO data available so you 
can learn how to query a DynamoDB database. Authentication has been 
enabled behind the scenes. 

First, the portal into the AWS Universe.


In [None]:
import boto3
session = boto3.Session( region_name=AWS_region )

At home, it is possible that this won't work because the requisite security tokens giving you permission to access your own database are not in the computing environment. If the tokens are not available in the environment, then you will have to specify the current saml2aws profile you use for access to the database through the keyword *profile_name* in the call to boto3.Session. 

Commission the AWS service "dynamodb" by creating a "dynamodb" 
Python resource. 


In [None]:
dynamodb_resource = session.resource( "dynamodb" )

Directly access a specific DynamoDB database, or "table". 

In [None]:
table = dynamodb_resource.Table( "gnss-ro-data-stagingv1_1" )

Every query requires a specific partition key, labeled as 
"leo-ttt" (receiver - dash - transmitter). This is how you 
define a "key" for the Metop-A receiver and occultations of 
GPS PRN 1 ("G01"): 


In [None]:
from boto3.dynamodb.conditions import Key
partitionkey = Key('leo-ttt').eq( "metopa-G01" )

Every query requires a sort key as well, but the sort key 
can be loosely defined, such as greater than or less than a value 
or between two values. Keep in mind that every combination of 
unique partition key and unique sort key must point to just one occultation 
sounding. Because an occultation is uniquely defined by the 
receiving satellite, the transmitting satellite, and the time 
of the sounding (precise to within a few minutes), all of those 
values must be contained in the partition key and sort key when 
combined. In our case, the partition key contains the receiver 
and the transmitter; therefore, the sort key must contain the time
of the occultation sounding, and it does so under the label "date-time": 

In [None]:
sortkey = Key('date-time').between( "2020-01-01-00-00", "2020-01-31-23-59" )

Query the database. Notice how a compound key is formed. 

In [None]:
ret = table.query( KeyConditionExpression = partitionkey & sortkey )
print( 'Query results found:', ret['Count'] )

A filter on the results can be applied behind the scenes once 
AWS performs the query. That is done by filtering according to 
information in each RO item in the database. 

In [None]:
from decimal import Decimal
from boto3.dynamodb.conditions import Attr

Filter by region. Notice how compound filters are formed, using the 
binary "and" ("&") operator. The binary "or" operator ("|") and the 
logical "not" unary operator ("~") are also available. 

In [None]:
filters = Attr('longitude').between( Decimal(120.0), Decimal(150.0) )
filters = filters & Attr('latitude').between( Decimal(-35.0), Decimal(-15.0) )

Filter for setting occultations only. 

In [None]:
filters = filters & Attr("setting").eq("True")

Filter for the availability of bending angle data as produced by 
the processing_center. Currently, *processing_center* can take on 
the values "ucar" or "romsaf". 

In [None]:
processing_center = "ucar"
filters = filters & Attr(f"{processing_center}_refractivityRetrieval").exists()

Query the database. 

In [None]:
ret = table.query( KeyConditionExpression = partitionkey & sortkey,
        FilterExpression = filters )
print( 'Query results found:', ret['Count'] )

The returned dictionary *ret* contains two keys of interest: 
*Count* and *Items*. *Count* is an integer count of the number 
of database items (occultation soundings) found by the query, and *Items* 
is a list of dictionaries, each dictionary containing all of the 
keys and information for one occultation sounding. 

Notice that these are occultations involving Metop-A and GPS 01 
only. Now search for all Metop occultations in the same time period. 
A nested loop is necessary in order to specify all possible 
partition keys (over receiver and transmitter). 

In [None]:
receivers = [ "metopa", "metopb", "metopc" ]
transmitters = [ f"G{i:02d}" for i in range(1,33) ]

allitems = []       # Initialize the accumulative output list. 

for receiver in receivers:
    for transmitter in transmitters:
        partitionkey = Key('leo-ttt').eq( f"{receiver}-{transmitter}" )
        ret = table.query(
                KeyConditionExpression = partitionkey & sortkey,
                FilterExpression = filters
                )
        allitems += ret['Items']    # Append to output list. 

print( f"There were {len(allitems):d} soundings found." )

Congratulations! You have obtained a listing of all Metop occultation 
soundings, setting only, over Australia for the month of January, 2020. 
Now let's download all of those retrievals. 

In [None]:
local_paths = []
os.makedirs('ro_data', exist_ok=True)

for item in allitems: 
    refractivityRetrieval_file = item[f"{processing_center}_refractivityRetrieval"]
    s3_path = f"gnss-ro-data/{refractivityRetrieval_file}"
    local_path = os.path.join('ro_data',os.path.split( s3_path )[1])
    local_paths.append( local_path )
    if os.path.exists(local_path): continue
    print( f"Downloading {local_path}" )
    ret = s3.download( s3_path, local_path )

### Simple analysis example

Let's plot all of the "unoptimized", ionospheric-corrected 
bending angles. 

In [None]:
from netCDF4 import Dataset
import matplotlib.pyplot as plt 
from matplotlib.ticker import MultipleLocator
import numpy as np

Make the plot look nice. 

In [None]:
axeslinewidth = 0.5 
plt.rcParams.update( {
  'font.family': "Times New Roman", 
  'font.size': 10,  
  'font.weight': "normal",
  'xtick.major.width': axeslinewidth,
  'xtick.minor.width': axeslinewidth,
  'ytick.major.width': axeslinewidth,
  'ytick.minor.width': axeslinewidth,
  'axes.linewidth': axeslinewidth } )

Analysis program follows...

In [None]:
#  Define the x axis. 

fig = plt.figure( figsize=(6,6) )
ax = fig.add_axes( [ 0.12, 0.12, 0.84, 0.86 ] )

ax.set( 
    xlabel = "Calibrated bending angle [mrad]", 
    xticks = np.arange( 0.0, 40.01, 5.0 ), 
    xlim = ( 0.0, 40.0 ) 
    )
ax.xaxis.set_minor_locator( MultipleLocator( 1.0 ) )

#  Define the y axis. 

ax.set( 
    ylabel = "Impact height [km]", 
    yticks = np.arange( 0.0, 40.001, 10.0 ), 
    ylim = ( 0.0, 40.0 )
    )
ax.yaxis.set_minor_locator( MultipleLocator( 2.0 ) )

#  Loop over files, converting impact parameter to impact height 
#  in km and bending angle to milliradians. 

for local_path in local_paths:

    #  Read data. 

    d = Dataset( local_path, 'r' )
    bending_angle = d.variables['bendingAngle'][:] * 1000.0
    radius_of_curvature = d.variables['radiusOfCurvature'].getValue()
    impact_height = ( d.variables['impactParameter'][:] - radius_of_curvature ) / 1000.0
    d.close()

    #  Plot bending angle profile. 

    ax.plot( bending_angle, impact_height, lw=0.2 )

#  Save Figures. 

fname = "all_profiles.eps"
print( f"Saving figure to {fname}" )
fig.savefig( fname, format='eps')

If you wish not to save the file and just display it, comment out the method 
*.savefig(...)* above. 