> From the PO.DAAC Cookbook, to access the GitHub version of the notebook, follow [this link](https://github.com/podaac/tutorials/blob/master/notebooks/SearchDownload_SWOTviaCMR.ipynb).

# Search and Download Simulated SWOT Data via the Common Metadata Repository (CMR)
#### *Author: Cassandra Nickles, PO.DAAC*

## Summary
This notebook will find and download simulated SWOT data programmatically via CMR. It searches for the desired data by shapefile extent but can be modified to do otherwise.

## Requirements
### 1. Compute environment 
This tutorial can be run in the following environments:
- **AWS instance running in us-west-2**: NASA Earthdata Cloud data in S3 can be directly accessed via temporary credentials; this access is limited to requests made within the US West (Oregon) (code: `us-west-2`) AWS region.
- **Local compute environment** e.g. laptop, server: this tutorial can be run on your local machine

### 2. Earthdata Login

An Earthdata Login account is required to access data, as well as discover restricted data, from the NASA Earthdata system. Thus, to access NASA data, you need Earthdata Login. Please visit https://urs.earthdata.nasa.gov to register and manage your Earthdata Login account. This account is free to create and only takes a moment to set up.

### 3. netrc File

You will need a `.netrc` file containing your NASA Earthdata Login credentials. A `.netrc` file can be created manually within text editor and saved to your home directory. For additional information see: [Authentication for NASA Earthdata tutorial](https://nasa-openscapes.github.io/2021-Cloud-Workshop-AGU/tutorials/02_NASA_Earthdata_Authentication.html). If you do not have this file, a code block has been added below as a work around.

### Import libraries

In [1]:
import requests
import json
import geopandas as gpd
import glob
from pathlib import Path
import pandas as pd
import os
import zipfile
from urllib.request import urlretrieve
from json import dumps

In this notebook, we will be calling the authentication in the below cell, a work around if you do not yet have a netrc file.

In [2]:
from urllib import request
from http.cookiejar import CookieJar
from getpass import getpass
import netrc
from platform import system
from os.path import join, isfile, basename, abspath, expanduser

def setup_earthdata_login_auth(endpoint: str='urs.earthdata.nasa.gov'):
    netrc_name = "_netrc" if system()=="Windows" else ".netrc"
    try:
        username, _, password = netrc(file=join(expanduser('~'), netrc_name)).authenticators(endpoint)
    except (FileNotFoundError, TypeError):
        print('Please provide your Earthdata Login credentials for access.')
        print('Your info will only be passed to %s and will not be exposed in Jupyter.' % (endpoint))
        username = input('Username: ')
        password = getpass('Password: ')
    manager = request.HTTPPasswordMgrWithDefaultRealm()
    manager.add_password(None, endpoint, username, password)
    auth = request.HTTPBasicAuthHandler(manager)
    jar = CookieJar()
    processor = request.HTTPCookieProcessor(jar)
    opener = request.build_opener(auth, processor)
    request.install_opener(opener)
    
setup_earthdata_login_auth('urs.earthdata.nasa.gov')

Please provide your Earthdata Login credentials for access.
Your info will only be passed to urs.earthdata.nasa.gov and will not be exposed in Jupyter.


Username:  nickles
Password:  ···········


### Search Common Metadata Repository (CMR) for SWOT sample data links by Shapefile
We want to find the SWOT sample files that will cross over our region of interest. For this tutorial, we use a shapefile of the United States, finding 44 total granules over the land. Each dataset has it's own unique collection ID. For the SWOT_SIMULATED_NA_CONTINENT_L2_HR_RIVERSP_V1 dataset, we find the collection ID [here](https://podaac.jpl.nasa.gov/dataset/SWOT_SIMULATED_NA_CONTINENT_L2_HR_RIVERSP_V1).

**Sample SWOT Hydrology Datasets and Associated Collection IDs:**
1. **River Vector Shapefile** - SWOT_SIMULATED_NA_CONTINENT_L2_HR_RIVERSP_V1 - **C2263384307-POCLOUD**

2. **Lake Vector Shapefile** - SWOT_SIMULATED_NA_CONTINENT_L2_HR_LAKESP_V1 - **C2263384453-POCLOUD**
    
3. **Raster NetCDF** - SWOT_SIMULATED_NA_CONTINENT_L2_HR_RASTER_V1 - **C2263383790-POCLOUD**

4. **Water Mask Pixel Cloud NetCDF** - SWOT_SIMULATED_NA_CONTINENT_L2_HR_PIXC_V1 - **C2263383386-POCLOUD**
    
5. **Water Mask Pixel Cloud Vector Attribute NetCDF** - SWOT_SIMULATED_NA_CONTINENT_L2_HR_PIXCVEC_V1 - **C2263383657-POCLOUD**

In [3]:
# the URL of the CMR service
cmr_url = 'https://cmr.earthdata.nasa.gov/search/granules.json'

#The shapefile we want to use in our search
shp_file = open('../resources/US_shapefile.zip', 'rb')

#need to declare the file and the type we are uploading
files = {'shapefile':('US_shapefile.zip',shp_file, 'application/shapefile+zip')}

#used to define parameters such as the concept-id and things like temporal searches
parameters = {'collection_concept_id':'C2263384307-POCLOUD', #insert desired collection ID here
             'page_size': 2000}#, #default will only return 10 granules, so we set it to the max

#request the granules from this collection that align with the shapefile
response = requests.post(cmr_url, params=parameters, files=files)

#If you want to search by bounding box instead of shapefile, use the following instead:
#parameters = {'collection_concept_id':'C2263384307-POCLOUD',
#             'page_size': 2000, 
#             'bounding_box':"-124.848974,24.396308,-66.885444,49.384358"} 
#response = requests.post(cmr_url, params=parameters)

if len(response.json()['feed']['entry'])>0:
    print(len(response.json()['feed']['entry'])) #print out number of files found
    #print(dumps(response.json()['feed']['entry'][0], indent=2)) #print out the first file information

44


### Get Download links from CMR search results

In [4]:
downloads = []
for r in response.json()['feed']['entry']:
    for l in r['links']:
        #if the link starts with the following, it is the download link we want
        if 'https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/' in l['href']: 
            #if the link has "Reach" instead of "Node" in the name, we want to download it for the swath use case
            if 'Reach' in l['href']:
                downloads.append(l['href'])
print(len(downloads)) #should end up with half the number of files above since we only need reach files, not node files

22


### Download the Data into a folder

In [5]:
#Create folder to house downloaded data 
folder = Path("SWOT_sample_files")
#newpath = r'SWOT_sample_files' 
if not os.path.exists(folder):
    os.makedirs(folder)

In [6]:
for f in downloads:
    urlretrieve(f, f"{folder}/{os.path.basename(f)}")

### Shapefiles come in a .zip format, and need to be unzipped in the existing folder

In [7]:
for item in os.listdir(folder): # loop through items in dir
    if item.endswith(".zip"): # check for ".zip" extension
        zip_ref = zipfile.ZipFile(f"{folder}/{item}") # create zipfile object
        zip_ref.extractall(folder) # extract file to dir
        zip_ref.close() # close file

In [8]:
os.listdir(folder)

['SWOT_L2_HR_RiverSP_Reach_007_022_NA_20220804T224145_20220804T224402_PGA0_01.dbf',
 'SWOT_L2_HR_RiverSP_Reach_007_022_NA_20220804T224145_20220804T224402_PGA0_01.prj',
 'SWOT_L2_HR_RiverSP_Reach_007_022_NA_20220804T224145_20220804T224402_PGA0_01.shp',
 'SWOT_L2_HR_RiverSP_Reach_007_022_NA_20220804T224145_20220804T224402_PGA0_01.shp.xml',
 'SWOT_L2_HR_RiverSP_Reach_007_022_NA_20220804T224145_20220804T224402_PGA0_01.shx',
 'SWOT_L2_HR_RiverSP_Reach_007_022_NA_20220804T224145_20220804T224402_PGA0_01.zip',
 'SWOT_L2_HR_RiverSP_Reach_007_037_NA_20220805T115553_20220805T120212_PGA0_01.dbf',
 'SWOT_L2_HR_RiverSP_Reach_007_037_NA_20220805T115553_20220805T120212_PGA0_01.prj',
 'SWOT_L2_HR_RiverSP_Reach_007_037_NA_20220805T115553_20220805T120212_PGA0_01.shp',
 'SWOT_L2_HR_RiverSP_Reach_007_037_NA_20220805T115553_20220805T120212_PGA0_01.shp.xml',
 'SWOT_L2_HR_RiverSP_Reach_007_037_NA_20220805T115553_20220805T120212_PGA0_01.shx',
 'SWOT_L2_HR_RiverSP_Reach_007_037_NA_20220805T115553_20220805T12021