# Welcome to the cloud products notebook!
#### **Audience:** Anybody with a computer and interest in cloud data.
#### **Intent:** Build familiarity with the process of requesting NOAA datasets from the AWS, Azure, and Google clouds.
This is a short tutorial on using the "requests" library, in conjunction with data from a Cloud Service Provider (CSP; either AWS, Azure, or Google), to find and use a desired dataset. The example used in this notebook is GOES-16 data, but a familiarity with the structure of the code required to request data from the cloud will allow the user to alter this code to be applicable to any open-source NOAA dataset in the cloud. 

# AWS

How to find datasets on AWS:     
1. Go to the NODD page (https://www.noaa.gov/nodd/datasets), locate your dataset, and click on the "Amazon Web Services" link.   
2. Under "Resources on AWS", locate your desired bucket. Then, click "Browse Bucket."
3. Click through your desired product, date, and time, until you see the list of files.
4. To the right of the "AWS S3 Explorer" text on the top of your screen, you'll see the file path you followed. Adjust the name_string variable in the cell below to reflect your new product path. You can either choose generic variable names, then fill them in with your specific request (as shown below), or copy the specific file path directly into name_string, and delete the generic variable definitions (i.e. name_string = "noaa-goes18/ABI-L2-ACHAC/2023/003/02").      

**For this tutorial:** GOES (16, 17, 18) buckets: https://registry.opendata.aws/noaa-goes/

In [None]:
import requests
import s3fs
import netCDF4

# Select your desired variables by looking at the product documentation (see instructions above!)
satellite = "noaa-goes16"
product = "ABI-L2-LSTM"
year = "2023"
day_of_year = "004"
hour = "17"

name_string = f'{satellite}/{product}/{year}/{day_of_year}/{hour}'

# Using name_string as a file path, search for all files that match your criteria.
aws = s3fs.S3FileSystem(anon=True)
data_files = aws.ls(name_string, refresh=True)

# See which files your search produced, and return the file name.
# Then, use the requests library to find each file that met your search criteria. 
for f in data_files:
    
    print(f) 
    
    fname = f[len(satellite)+1:]
    resp = requests.get(f'https://{satellite}.s3.amazonaws.com/{fname}')
    
    # Read the file in as a netCDF file, and you may begin analysis.
    nc = netCDF4.Dataset(fname, memory = resp.content)
    
    # your analysis here.

# For every run of the above loop, your data file will be overwritten because the data is in memory.
# So, either perform your analysis within the 'for' loop or make each netCDF file available outside of the loop.

# Azure

How to find datasets on Azure:     
1. Go to the NODD page (https://www.noaa.gov/nodd/datasets), locate your dataset, and click on the "Microsoft Azure" link.   
2. Scroll down for documentation of product availability and file path. 
3. Adjust the name_string variable in the cell below to reflect your new product path. You can either choose generic variable names, then fill them in with your specific request (as shown below), or copy the specific file path directly into name_string, and delete the generic variable definitions (i.e. name_string = "ABI-L2-ACHAC/2023/003/02/").      
4. Make sure to change the container name from the "satellite" variable to the name of the container as documented on Azure's dataset page. 

**For this tutorial:** GOES (16, 17, 18) buckets: https://microsoft.github.io/AIforEarthDataSets/data/goes-r.html

In [None]:
import requests
from azure.storage.blob import ContainerClient
import netCDF4

# Select your desired variables by looking at the product documentation (see instructions above!)
satellite = "noaa-goes16"
product = "ABI-L2-LSTM"
year = "2023"
day_of_year = "004"
hour = "17"

name_string = f'{product}/{year}/{day_of_year}/{hour}/'

# Using name_string as a file path, search for all files that match your criteria.
container_client = ContainerClient(account_url='https://goeseuwest.blob.core.windows.net', 
                                                     container_name=satellite, credential=None)

# See which files your search produced, and return the file name.
# Then, use the requests library to find each file that met your search criteria. 
for f in container_client.walk_blobs(name_starts_with=name_string, delimiter='/'):
    
    print(f.name)
    
    fname = f.name
    resp = requests.get(f'https://goeseuwest.blob.core.windows.net/{satellite}/{fname}')
    
    # Read the file in as a netCDF file, and you may begin analysis.
    nc = netCDF4.Dataset(fname, memory = resp.content)
    
    # your analysis here
    
# For every run of the above loop, your data file will be overwritten because the data is in memory.
# So, either perform your analysis within the 'for' loop or make each netCDF file available outside of the loop.

# Google

How to find datasets on Google:     
1. Go to the NODD page (https://www.noaa.gov/nodd/datasets), locate your dataset, and click on the "Google" link.   
2. From the Google product page, click the blue "VIEW DATASET" button. From here, you may select a project and discover your desired file path.
3. Adjust the name_string variable in the cell below to reflect your new product path. You can either choose generic variable names, then fill them in with your specific request (as shown below), or copy the specific file path directly into name_string, and delete the generic variable definitions (i.e. name_string = "ABI-L2-ACHAC/2023/003/02/").      
4. Be sure to change the bucket name (in this example, the bucket name is stored in the "satellite" variable) to reflect your product choice. 

**For this tutorial:** GOES (16, 17, 18) buckets: https://console.cloud.google.com/marketplace/product/noaa-public/goes

In [None]:
import requests
from google.cloud import storage
import netCDF4

# Select your desired variables by looking at the product documentation (see instructions above!)
satellite = "gcp-public-data-goes-16"
product = "ABI-L2-LSTM"
year = "2023"
day_of_year = "004"
hour = "17"

name_string = f'{product}/{year}/{day_of_year}/{hour}/'

# Using name_string as a file path, search for all files that match your criteria.
client = storage.Client.create_anonymous_client()
bucket = client.bucket(bucket_name = satellite)

blobs = bucket.list_blobs(prefix = name_string, delimiter = "/")
response = blobs._get_next_page_response()

# See which files your search produced, and return the file name and url.
# Then, use the requests library to find each file that met your search criteria. 
for f in range(0, len(response["items"])):
    print(((response["items"])[f])["name"])
    
    url_link = ((response["items"])[f])["mediaLink"]
    fname = ((response["items"])[f])["name"]
    resp = requests.get(url_link)
    
    # Read the file in as a netCDF file, and you may begin analysis.\
    nc = netCDF4.Dataset(fname, memory = resp.content)
    
    # your analysis here

# For every run of the above loop, your data file will be overwritten because the data is in memory.
# So, either perform your analysis within the 'for' loop or make each netCDF file available outside of the loop.