## Using NEXRAD data from Amazon Web Services

Amazon has a cloud storage solution called Simple Storage Service or S3. S3 is a distributed store with very low latency (especially when compared to "shopping cart" systems like NCEI). http://docs.aws.amazon.com/AmazonS3/latest/dev/Welcome.html

There are many tools that can be used to interact with S3 stores or "Buckets". One tool in Python is Boto. 
From the website: http://boto3.readthedocs.io/en/latest/

"Boto is the Amazon Web Services (AWS) SDK for Python, which allows Python developers to write software that makes use of Amazon services like S3 and EC2. Boto provides an easy to use, object-oriented API as well as low-level direct service access."


In [14]:
#lets start with some imports

#Py-ART, simply the best sowftware around.. Give those guys a grant
import pyart

#Boto3 is the AWS SDK
import boto3

#botocore contains core configuration utilities for boto2 and boto3
from botocore.handlers import disable_signing

#Tempory files in Python.. A very useful module
import tempfile

#datetime modules.. very handy!
from datetime import datetime

You can use Boto and S3 to store all kinds of data! open or closed. In fact Amazon has some very sophisticated methods to set rules.. But the real game changer for the radar community came with the CRADA between Amazon, NOAA and Unidata in putting all NEXRAD data into an S3 Bucket [1].


The NEXRAD Level II archive data is hosted in the “noaa-nexrad-level2” Amazon S3 bucket in S3’s US East region. The address for the public bucket is:

http://noaa-nexrad-level2.s3.amazonaws.com

https://noaa-nexrad-level2.s3.amazonaws.com

Each volume scan file is its own object in Amazon S3. The basic data format is the following:

`/<Year>/<Month>/<Day>/<NEXRAD Station>/<filename>`

All files in the archive use the same compressed format (.gz). The data file names are, for example, KAKQ20010101_080138.gz. The file naming convention is:

GGGGYYYYMMDD_TTTTTT

Where:

GGGG = Ground station ID (map of ground stations) YYYY = year MM = month DD = day TTTTTT = time when data started to be collected (GMT)

Note that the 2015 files have an additional field on the file name. It adds “_V06” to the end of the file name. An example is KABX20150303_001050_V06.gz.

In [2]:
# So we start with bucket neame
bucket = "noaa-nexrad-level2"
regon = "us-east-1"
# Create a s3 "client"
s3 = boto3.resource('s3')
# Set it to unsigned 
s3.meta.client.meta.events.register('choose-signer.s3.*', disable_signing)

In [3]:
#So now we connect to the bucket with the radar data
aws_radar = s3.Bucket(bucket)

Now.. indexing the whole bucket would be stupid.. lets do the math: 150 radars, ~6 files per hour, 15 (plus) years that's 324,000 files. Fortunately Boto has some nice indexing utilities. 

Buckets contain objects. The objects can be used to do and find out a whole bunch of things.. We are going to use the objects method of out bucket resource to only return objects from the 20th of may 2011 from the vance radar. 

In [7]:
for obj in aws_radar.objects.filter(Prefix='2011/05/20/KVNX/'):
    print('{0}:{1}'.format(aws_radar.name, obj.key))

noaa-nexrad-level2:2011/05/20/KVNX/KVNX20110520_000023_V06.gz
noaa-nexrad-level2:2011/05/20/KVNX/KVNX20110520_000442_V06.gz
noaa-nexrad-level2:2011/05/20/KVNX/KVNX20110520_000901_V06.gz
noaa-nexrad-level2:2011/05/20/KVNX/KVNX20110520_001320_V06.gz
noaa-nexrad-level2:2011/05/20/KVNX/KVNX20110520_001740_V06.gz
noaa-nexrad-level2:2011/05/20/KVNX/KVNX20110520_002201_V06.gz
noaa-nexrad-level2:2011/05/20/KVNX/KVNX20110520_002620_V06.gz
noaa-nexrad-level2:2011/05/20/KVNX/KVNX20110520_003040_V06.gz
noaa-nexrad-level2:2011/05/20/KVNX/KVNX20110520_003459_V06.gz
noaa-nexrad-level2:2011/05/20/KVNX/KVNX20110520_003918_V06.gz
noaa-nexrad-level2:2011/05/20/KVNX/KVNX20110520_004338_V06.gz
noaa-nexrad-level2:2011/05/20/KVNX/KVNX20110520_004758_V06.gz
noaa-nexrad-level2:2011/05/20/KVNX/KVNX20110520_005219_V06.gz
noaa-nexrad-level2:2011/05/20/KVNX/KVNX20110520_005639_V06.gz
noaa-nexrad-level2:2011/05/20/KVNX/KVNX20110520_010100_V06.gz
noaa-nexrad-level2:2011/05/20/KVNX/KVNX20110520_010520_V06.gz
noaa-nex

We can also dump all the keys into a list

In [13]:
my_list_of_keys = [this_object.key for this_object in aws_radar.objects.filter(Prefix='2011/05/20/KVNX/')]
print(my_list_of_keys[1])

2011/05/20/KVNX/KVNX20110520_000442_V06.gz


In [28]:
my_datetime = datetime.strptime(my_list_of_keys[1][20:35], '%Y%m%d_%H%M%S')
print(my_datetime)
my_list_of_datetimes = []
for obj in aws_radar.objects.filter(Prefix='2011/05/20/KVNX/'):
    try:
        my_list_of_datetimes.append(datetime.strptime(obj.key[20:35], '%Y%m%d_%H%M%S'))
    except ValueError:
        pass #usually a tar file left in the bucket

2011-05-20 00:04:42


In [30]:
def nearest(items, pivot):
    return min(items, key=lambda x: abs(x - pivot))

In [35]:
desired_time = datetime(2011,5,20,11,0)
my_nearest = nearest(my_list_of_datetimes, desired_time)

print('nearest: ', my_nearest, ' desired: ', desired_time)
print('index: ', my_list_of_datetimes.index(my_nearest))
print('key: ', my_list_of_keys[my_list_of_datetimes.index(my_nearest)])


nearest:  2011-05-20 10:58:09  desired:  2011-05-20 11:00:00
index:  151
key:  2011/05/20/KVNX/KVNX20110520_105809_V06.gz


[1] Ansari, S., S. Del Greco, E. Kearns, O. Brown, S. Wilkins, M. Ramamurthy, J. Weber, R. May, J. Sundwall, J. Layton, A. Gold, A. Pasch, and V. Lakshmanan, 0: Unlocking the potential of NEXRAD data through NOAA’s Big Data Partnership. Bull. Amer. Meteor. Soc., 0, https://doi.org/10.1175/BAMS-D-16-0021.1 