> From the PO.DAAC Cookbook, to access the GitHub version of the notebook, follow [this link](https://github.com/podaac/tutorials/blob/master/notebooks/SearchDownload_SWOTviaCMR.ipynb).

# Search and Download Simulated SWOT Data via `earthaccess`
#### *Author: Cassandra Nickles, PO.DAAC*

## Summary
This notebook will find and download simulated SWOT data programmatically via earthaccess. For more information about earthaccess visit: https://nsidc.github.io/earthaccess/

## Requirements
### 1. Compute environment 
This tutorial can be run in the following environments:
- **Local compute environment** e.g. laptop, server: this tutorial can be run on your local machine

### 2. Earthdata Login

An Earthdata Login account is required to access data, as well as discover restricted data, from the NASA Earthdata system. Thus, to access NASA data, you need Earthdata Login. Please visit https://urs.earthdata.nasa.gov to register and manage your Earthdata Login account. This account is free to create and only takes a moment to set up.

### Import libraries

In [1]:
import requests
import json
import geopandas as gpd
import glob
from pathlib import Path
import pandas as pd
import os
import zipfile
from urllib.request import urlretrieve
from json import dumps
import earthaccess
from earthaccess import Auth, DataCollections, DataGranules, Store

In this notebook, we will be calling the authentication in the below cell.

In [None]:
auth = earthaccess.login(strategy="interactive", persist=True)

### Search for SWOT sample data links
We want to find the SWOT sample files that will cross over our region of interest, in the case, a bounding box of the United States. 

Each dataset has it's own unique collection ID. For the SWOT_SIMULATED_NA_CONTINENT_L2_HR_RIVERSP_V1 dataset, we find the collection ID [here](https://podaac.jpl.nasa.gov/dataset/SWOT_SIMULATED_NA_CONTINENT_L2_HR_RIVERSP_V1).

**Sample SWOT Hydrology Datasets and Associated Collection IDs:**
1. **River Vector Shapefile** - SWOT_SIMULATED_NA_CONTINENT_L2_HR_RIVERSP_V1 - **C2263384307-POCLOUD**

2. **Lake Vector Shapefile** - SWOT_SIMULATED_NA_CONTINENT_L2_HR_LAKESP_V1 - **C2263384453-POCLOUD**
    
3. **Raster NetCDF** - SWOT_SIMULATED_NA_CONTINENT_L2_HR_RASTER_V1 - **C2263383790-POCLOUD**

4. **Water Mask Pixel Cloud NetCDF** - SWOT_SIMULATED_NA_CONTINENT_L2_HR_PIXC_V1 - **C2263383386-POCLOUD**
    
5. **Water Mask Pixel Cloud Vector Attribute NetCDF** - SWOT_SIMULATED_NA_CONTINENT_L2_HR_PIXCVEC_V1 - **C2263383657-POCLOUD**

In [3]:
#earthaccess data search
results = earthaccess.search_data(concept_id="C2263384307-POCLOUD", bounding_box=(-124.848974,24.396308,-66.885444,49.384358))

Granules found: 46


### Get Download links from `earthaccess` search results

In [4]:
#add desired links to a list
#if the link has "Reach" instead of "Node" in the name, we want to download it for the swath use case
downloads = []
for g in results:
    for l in earthaccess.results.DataGranule.data_links(g):
        if 'https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/' in l:
            if 'Reach' in l:
                downloads.append(l)
            
print(len(downloads))

23


This leaves us with half of the original links from our search.

### Download the Data into a folder

In [5]:
#Create folder to house downloaded data 
folder = Path("SWOT_sample_files")
#newpath = r'SWOT_sample_files' 
if not os.path.exists(folder):
    os.makedirs(folder)

In [6]:
#download data
earthaccess.download(downloads, "./SWOT_sample_files")

QUEUEING TASKS | :   0%|          | 0/23 [00:00<?, ?it/s]

File SWOT_L2_HR_RiverSP_Reach_007_022_NA_20220804T224145_20220804T224402_PGA0_01.zip already downloadedFile SWOT_L2_HR_RiverSP_Reach_007_037_NA_20220805T115553_20220805T120212_PGA0_01.zip already downloaded

File SWOT_L2_HR_RiverSP_Reach_007_104_NA_20220807T205936_20220807T210016_PGA0_01.zip already downloaded
File SWOT_L2_HR_RiverSP_Reach_007_121_NA_20220808T115628_20220808T120311_PGA0_01.zip already downloaded
File SWOT_L2_HR_RiverSP_Reach_007_065_NA_20220806T115630_20220806T120114_PGA0_01.zip already downloaded
File SWOT_L2_HR_RiverSP_Reach_007_147_NA_20220809T101525_20220809T101639_PGA0_01.zip already downloaded
File SWOT_L2_HR_RiverSP_Reach_007_162_NA_20220809T224722_20220809T225058_PGA0_01.zip already downloaded
File SWOT_L2_HR_RiverSP_Reach_007_175_NA_20220810T101607_20220810T101940_PGA0_01.zip already downloaded
File SWOT_L2_HR_RiverSP_Reach_007_132_NA_20220808T210018_20220808T210252_PGA0_01.zip already downloaded
File SWOT_L2_HR_RiverSP_Reach_007_177_NA_20220810T120102_2022081

PROCESSING TASKS | :   0%|          | 0/23 [00:00<?, ?it/s]

COLLECTING RESULTS | :   0%|          | 0/23 [00:00<?, ?it/s]

['SWOT_L2_HR_RiverSP_Reach_007_022_NA_20220804T224145_20220804T224402_PGA0_01.zip',
 'SWOT_L2_HR_RiverSP_Reach_007_037_NA_20220805T115553_20220805T120212_PGA0_01.zip',
 'SWOT_L2_HR_RiverSP_Reach_007_065_NA_20220806T115630_20220806T120114_PGA0_01.zip',
 'SWOT_L2_HR_RiverSP_Reach_007_104_NA_20220807T205936_20220807T210016_PGA0_01.zip',
 'SWOT_L2_HR_RiverSP_Reach_007_121_NA_20220808T115628_20220808T120311_PGA0_01.zip',
 'SWOT_L2_HR_RiverSP_Reach_007_132_NA_20220808T210018_20220808T210252_PGA0_01.zip',
 'SWOT_L2_HR_RiverSP_Reach_007_147_NA_20220809T101525_20220809T101639_PGA0_01.zip',
 'SWOT_L2_HR_RiverSP_Reach_007_162_NA_20220809T224722_20220809T225058_PGA0_01.zip',
 'SWOT_L2_HR_RiverSP_Reach_007_175_NA_20220810T101607_20220810T101940_PGA0_01.zip',
 'SWOT_L2_HR_RiverSP_Reach_007_177_NA_20220810T120102_20220810T120420_PGA0_01.zip',
 'SWOT_L2_HR_RiverSP_Reach_007_203_NA_20220811T101614_20220811T102211_PGA0_01.zip',
 'SWOT_L2_HR_RiverSP_Reach_007_205_NA_20220811T120350_20220811T120457_PGA0_0

### Shapefiles come in a .zip format, and need to be unzipped in the existing folder

In [7]:
for item in os.listdir(folder): # loop through items in dir
    if item.endswith(".zip"): # check for ".zip" extension
        zip_ref = zipfile.ZipFile(f"{folder}/{item}") # create zipfile object
        zip_ref.extractall(folder) # extract file to dir
        zip_ref.close() # close file

In [8]:
os.listdir(folder)

['SWOT_L2_HR_RiverSP_Reach_007_147_NA_20220809T101525_20220809T101639_PGA0_01.shx',
 'SWOT_L2_HR_RiverSP_Reach_007_205_NA_20220811T120350_20220811T120457_PGA0_01.shx',
 'SWOT_L2_HR_RiverSP_Reach_007_175_NA_20220810T101607_20220810T101940_PGA0_01.zip',
 'SWOT_L2_HR_RiverSP_Reach_007_440_NA_20220819T210905_20220819T211311_PGA0_01.zip',
 'SWOT_L2_HR_RiverSP_Reach_007_300_NA_20220814T210504_20220814T210907_PGA0_01.zip',
 'SWOT_L2_HR_RiverSP_Reach_007_483_NA_20220821T102527_20220821T102706_PGA0_01.shp.xml',
 'SWOT_L2_HR_RiverSP_Reach_007_177_NA_20220810T120102_20220810T120420_PGA0_01.prj',
 'SWOT_L2_HR_RiverSP_Reach_007_427_NA_20220819T101956_20220819T102559_PGA0_01.zip',
 'SWOT_L2_HR_RiverSP_Reach_007_483_NA_20220821T102527_20220821T102706_PGA0_01.zip',
 'SWOT_L2_HR_RiverSP_Reach_007_147_NA_20220809T101525_20220809T101639_PGA0_01.shp',
 'SWOT_L2_HR_RiverSP_Reach_007_104_NA_20220807T205936_20220807T210016_PGA0_01.shp.xml',
 'SWOT_L2_HR_RiverSP_Reach_007_162_NA_20220809T224722_20220809T22505