#  Operationalized Rotation-Collocation for Finding Collocated RO and Nadir-scanner Soundings

This demonstrates how to use the various utilities associated with collocation-finding algorithms, 
including how to query for occultations, how to define a nadir-scanning radiance instrument, how to 
find collocations by brute force, how to find collocations by the rotation-collocation method, how to 
populate sounder data before extracting calibrated measurements, how to extract the collocated nadir-scanner 
sounding measurements for the found collocations, and how to save collocation data to an output 
NetCDF file. 

## Prerequisites

You will need to set defaults for awsgnssroutils.database and for the EUMETSAT data store. The former is 
necessary for accessing GNSS radio occultation (RO) data in the AWS Registry of Open Data; the latter for 
accessing AMSU-A radiance data from the Metop satellites in the EUMETSAT Data Store. 

### awsgnssroutils.database 

The awsgnssroutils.database API is designed with efficiency in mind. Toward that end, copies of RO 
metadata records are stored on the local file system, either by pre-population or as requested in 
querying the metadata for the existence of RO soundings according to mission, satellite, geolocation, etc. 

In order to set defaults for awsgnssroutils.database, if you haven't done so already, execute the following 
commands in a python session. 

In [None]:
import os
from awsgnssroutils.database import setdefaults

HOME = os.path.expanduser("~")
repository_directory = os.path.join( HOME, "local/rodatabase" )
rodata_root_directory = os.path.join( HOME, "Data/rodata" )

setdefaults( repository=repository_directory, rodata=rodata_root_directory, version="v1.1" )

This code defines an RO metadata "repository", where the RO metadata repository should reside on the 
local file system. These are not the same as the RO data themselves, including profile retrievals: 
they just record geolocation and other characteristics of the soundings. It also defines "rodata", 
where RO data themselves are downloaded for data analysis. 

The code above roots both the metadata "repository" and the data download directories in the user's home directory. 
Individual users should feel free to establish the metadata repository and the rodata roots wherever they like. It 
might be preferable to establish the rodata root on a scratch drive if one is available. There is no need for 
the rodata directory to be on backed-up or any other guaranteed-storage media. 

Finally, it is worth the user's time to pre-populate the RO metadata. This step will take several minutes to 
run now, but it will greatly accelerate all RO database queries in the future. 

In [None]:
from awsgnssroutils.database import populate
populate()

### Space-Track

[Space-Track](http://space-track.com) provides two-line element (TLE) data for most if not all scientific and 
weather satellites. The rotation-collocation algorithm takes advantage of these TLE data, and access to the data as available at Space-Track is automatic. The user of the rotation-collocation code need only establish an account with Space-Track here. Set the defaults to access Space-Track by declaring your username and password for Space-Track and where you would like for the TLE data to be stored on the local file system. 


In [None]:
import os
from awsgnssroutils.collocation.core.spacetrack import setdefaults

HOME = os.path.expanduser( "~" )
spacetrack_root = os.path.join( HOME, "Data", "spacetrack" )
setdefaults( spacetrack_root, "sleroy@aer.com", "-_J_xx46n-K.Cbn" )

### EUMETSAT Data Store

You will need to obtain an account on the EUMETSAT Data Store in order to access Metop AMSU-A
data. You can register for an account at http://eoportal.eumetsat.int/cas/login if you haven't 
done so already. Once you have obtained an account, you will have to configure your operating 
Linux/Unix account using the following command: 
    
```
% eumdac set-credentials ConsumerKey ConsumerSecret
```

The ConsumerKey and the ConsumerSecret can be found at https://api.eumetsat.int/api-key/ after 
you have already set up your account. 

Next you'll have to define a root for the storage of EUMETSAT Data Store data on the local 
file system. Do that using the *setdefaults* method in the **eumetsat** module. As for RO data, 
you should feel free to tweek the code below to root the EUMETSAT Data Store downloads on a 
scratch disk. 

In [None]:
import os
from awsgnssroutils.collocation.core.eumetsat import setdefaults

HOME = os.path.expanduser( "~" )
path = os.path.join( HOME, "Data", "eumdac" )
setdefaults( path )

### NASA Earthdata

You will need to obtain an account on NASA Earthdata in order to access JPSS data, including 
ATMS and CrIS data. You can sign up for an account, if you don't already have one, on 
http://earthdata.nasa.gov. Once you have that account, you must define your Earthdata 
username and password. Don't worry, all such defaults are protected from all other users. 

In [None]:
import os
from awsgnssroutils.collocation.core.nasa_earthdata import setdefaults

HOME = os.path.expanduser( "~" )
path = os.path.join( HOME, "Data", "earthdata" )
setdefaults( path, earthdatalogin=("username","password") )

## Demonstration

This begins the demonstration of RO and nadir-scanner collocation finding. First, set the parameters 
that define the scenario: the time period over which you wish to scan for collocations, the RO data processing 
center from which to download RO data, the collocation spatial and temporal tolerances. 


In [None]:
from datetime import datetime, timedelta 

#  RO processing center. The choices are "ucar", "romsaf", and "jpl". Only 
#  "ucar" provides cosmic2 and Spire data in addition to Metop data. 
#  "romsaf" does provide Metop RO data. 

ro_processing_center = "ucar"                  # Choice of "ucar", "romsaf", "jpl"

#  Time period. Defined as a 2-tuple (or 2-element list) of instances of 
#  datetime.datetime. Each is interpreted as UTC time. 

day = datetime( year=2023, month=7, day=5 )    # July 5, 2023
nextday = day + timedelta(days=1)
datetimerange = ( day, nextday )

#  Spatial and temporal tolerances. These are used primarily for the brute 
#  force collocation algorithm. 

time_tolerance = 600                           # 10 min/600 sec
spatial_tolerance = 150.0e3                    # m


Create the portal to the RO metadata database, the Celestrak database, and the 
EUMETSAT Data Store. 

In [None]:
from awsgnssroutils.database import RODatabaseClient
from awsgnssroutils.collocation.core.spacetrack import Spacetrack
from awsgnssroutils.collocation.core.eumetsat import EUMETSATDataStore

db = RODatabaseClient()
st = Spacetrack()
eumetsat_data_access = EUMETSATDataStore()

Define the nadir-scanner satellite and instrument. The methods defining the instruments 
are keys in the *instruments* package. The arguments to the definition call are 
1. Satellite name
2. Approximate radius of orbit
3. Maximum scan angle for scanning
4. Time between consecutive scans
5. Number of footprints in each scan 
6. Angular spacing in scan angle between neighboring footprints in a scan
7. The portal to the EUMETSAT Data Store
8.  The portal to the Celestrak repository

In [None]:
from awsgnssroutils.collocation.instruments import instruments

MetopAMSUA = instruments['Metop_AMSUA']['class']
MetopB_AMSUA = MetopAMSUA( 'Metop-B', eumetsat_data_access, spacetrack=st )

Now query the RO metadata database for occultation soundings. The result is an instance of 
OccList. 

In [None]:
from time import time

print( "Querying occultation database" )

tbegin = time()
occs = db.query( receivers="metopb", datetimerange=[ dt.isoformat() for dt in datetimerange ], 
                availablefiletypes=f'{ro_processing_center}_refractivityRetrieval', silent=True )
tend = time()

print( "  - number found = {:}".format( occs.size ) )
print( "  - elapsed time = {:10.3f} s".format( tend-tbegin ) )

Now implement the rotation-collocation algorithm. The temporal and 
spatial tolerances are only used for internal checking purposes. The last 
argument in the call to *rotation_collocation* specifies the number of 
sub-occultations to use for collocation finding. Using just two sub-occultations 
works for most cases. Only when the temporal tolerance exceeds 30 minutes do 
more than 2 sub-occultations need to be specified. The number of sub-occultations 
should correspond to the temporal tolerance divided by 30 minutes. 

In [None]:
from time import time
from awsgnssroutils.collocation.core.rotation_collocation import rotation_collocation

print( "Executing rotation-collocation" )

tbegin = time()
collocations_rotation = rotation_collocation( MetopB_AMSUA, occs, 
        time_tolerance, spatial_tolerance, 2 )
tend = time()

print( "  - number found = {:}".format( len( collocations_rotation ) ) )
print( "  - elapsed time = {:10.3f} s".format( tend-tbegin ) )

So far we have simply found the number of RO-nadir scanner collocations. We have also 
stored some information that will help us find the actual data values for the nadir-scanner 
observations. Before we download those data, however, we first need to populate the Metop-B AMSU-A 
root with the relevant data files. This is how it's done. Note that downloading from the EUMETSAT 
Data Store can be a **very** time-consuming operation. We've tried to make this as efficient as 
possible by first searching the local repository before requesting data downloads from the 
EUMETSAT Data Store. Hence, the first time you execute this code, it could take many minutes. 
Subsequent searches will take only a few seconds provided you do not flush the EUMETSAT Data Store 
local repository (as defined above). 

In [None]:
print( "Populating Metop AMSU-A local repository" )

tbegin = time()
MetopB_AMSUA.populate( datetimerange )
tend = time()

print( "  - elapsed time = {:10.3f} s".format( tend-tbegin ) )

Now we extract the nadir-scanner calibrated (level 1B) observations for the collocations. The data are 
stored as xarray Datasets. Feel free to examine *cdata* when done. It contains the *occid* identifier 
for the collocation --- actually, it identifies the occultation --- and the occultation and sounder data. 

In [None]:
print( "Extracting collocation data" )
tbegin = time()
for collocation in collocations_rotation: 
    occid = collocation.get_data( ro_processing_center )
tend = time()
print( "  - elapsed time = {:10.3f} s".format( tend-tbegin ) )
cdata = collocations_rotation[0].data

Finally, we write the collocation data to an output NetCDF file. The collocation data 
are stored in a group structure. At the highest level, each group corresponds to a 
collocation. Two sub-groups are contained in each collocation group: one that contains 
occultation data, and the other contains nadir-scanner data. All are annoted with attributes
as appropriate. 

In [None]:
file = "collocations.nc"
tbegin = time()
print( f"Writing to output file {file}" )
collocations_rotation.write_to_netcdf( file )
tend = time()
print( "  - elapsed time = {:10.3f} s".format( tend-tbegin ) )

The following code illustrates how to use the brute force collocation finding algorithm. It follows up by 
evaluating the confusion matrix analyzing the performance of the rotation-collocation algorithm. 

In [None]:
from awsgnssroutils.collocation.core.brute_force import brute_force
from awsgnssroutils.collocation.core.collocation import collocation_confusion

print( "Executing brute force" )
tbegin = time()
collocations_sorted = brute_force( MetopB_AMSUA, occs, time_tolerance, spatial_tolerance, progressbar=False )
tend = time()
print( "  - elapsed time = {:10.3f} s".format( tend-tbegin ) )

#  Confusion matrix. 

confusion = collocation_confusion( occs, collocations_sorted, collocations_rotation )

print( '\nConfusion matrix\n================' )
print( '{:12s}{:^12s}{:^12s}'.format( "", "Positive", "Negative" ) )
print( '{:12s}{:^12d}{:^12d}'.format( "True", confusion['true_positive'], confusion['true_negative'] ) )
print( '{:12s}{:^12d}{:^12d}'.format( "False", confusion['false_positive'], confusion['false_negative'] ) )

## ATMS and the Earthdata DAAC

Demonstrate the use of the NASAEarthdata class to access the NASA Earthdata 
DAAC portal (and earthaccess API). Also use the ATMS instrument class. 

In [None]:
#  Access to RO data, Celestrak TLEs, and NASA Earthdata DAAC

from awsgnssroutils.database import RODatabaseClient
from awsgnssroutils.collocation.core.spacetrack import Spacetrack
from awsgnssroutils.collocation.core.nasa_earthdata import NASAEarthdata

db = RODatabaseClient()
st = Spacetrack()
nasa_earthdata_access = NASAEarthdata()

#  Define JPSS-1 ATMS instrument. 

from awsgnssroutils.collocation.instruments import instruments

JPSS1_ATMS = instruments['JPSS_ATMS']['class']( "JPSS-1", nasa_earthdata_access, spacetrack=st )

#  Time interval for collocation finding. 

from datetime import datetime, timedelta

day = datetime( year=2023, month=6, day=5 )
nextday = day + timedelta(days=1)
datetimerange = ( day, nextday )

#  Collocation tolerances. 

time_tolerance = 600                           # 10 min/600 sec
spatial_tolerance = 150.0e3                    # m

#  Get occultation geolocations. 

ro_processing_center = "ucar"
ro_mission = "cosmic2"

from time import time

print( "Querying occultation database" )

tbegin = time()
occs = db.query( missions=ro_mission, datetimerange=[ dt.isoformat() for dt in datetimerange ], 
                availablefiletypes=f'{ro_processing_center}_refractivityRetrieval', silent=True )
tend = time()

print( "  - number found = {:}".format( occs.size ) )
print( "  - elapsed time = {:10.3f} s".format( tend-tbegin ) )

#  Exercise rotation-collocation. 

from awsgnssroutils.collocation.core.rotation_collocation import rotation_collocation

print( "Executing rotation-collocation" )

tbegin = time()
collocations_rotation = rotation_collocation( JPSS1_ATMS, occs, 
        time_tolerance, spatial_tolerance, 2 )
tend = time()

print( "  - number found = {:}".format( len( collocations_rotation ) ) )
print( "  - elapsed time = {:10.3f} s".format( tend-tbegin ) )

#  Populate ATMS data. 

tbegin = time()
JPSS1_ATMS.populate( datetimerange )
tend = time()

print( "  - elapsed time = {:10.3f} s".format( tend-tbegin ) )

#  Extract data. 

print( "Extracting collocation data" )
tbegin = time()
for collocation in collocations_rotation: 
    occid = collocation.get_data( ro_processing_center )
tend = time()
print( "  - elapsed time = {:10.3f} s".format( tend-tbegin ) )
cdata = collocations_rotation[0].data

#  Save to output file. 

file = "cosmic2_collocations.nc"
tbegin = time()
print( f"Writing to output file {file}" )
collocations_rotation.write_to_netcdf( file )
tend = time()
print( "  - elapsed time = {:10.3f} s".format( tend-tbegin ) )
