> From the PO.DAAC Cookbook, to access the GitHub version of the notebook, follow [this link](https://github.com/podaac/tutorials/blob/master/notebooks/GIS/SWOTsample_CSVconversion.ipynb).

# SWOT Shapefile Data Conversion to CSV

### Notebook showcasing how to merge/concatenate multiple shapefiles into a single file.
- Utilizing the merged shapefile and converting it to a csv file.
- Option to query the new dataset based on users choice; either 'reach_id' or water surface elevation ('wse'), etc.
- Using the queried variable to export it as a csv or shapefile.

### Import libraries

In [1]:
import requests
import json
import geopandas as gpd
import glob
from pathlib import Path
import pandas as pd
import os
import zipfile
from urllib.request import urlretrieve
from json import dumps

## Before you start

Before you beginning this tutorial, make sure you have an account in the Earthdata Login, which is required to access data from the NASA Earthdata system. Please visit https://urs.earthdata.nasa.gov to register for an Earthdata Login account. It is free to create and only takes a moment to set up.

You will also need a netrc file containing your NASA Earthdata Login credentials in order to execute this notebook. A netrc file can be created manually within text editor and saved to your home directory. For additional information see: [Authentication for NASA Earthdata](https://nasa-openscapes.github.io/2021-Cloud-Hackathon/tutorials/04_NASA_Earthdata_Authentication.html#authentication-via-netrc-file) 

In this notebook, we will be calling the authentication in the below cell, a work around if you do not yet have a netrc file.

In [2]:
from urllib import request
from http.cookiejar import CookieJar
from getpass import getpass
import netrc
from platform import system
from os.path import join, isfile, basename, abspath, expanduser

def setup_earthdata_login_auth(endpoint: str='urs.earthdata.nasa.gov'):
    netrc_name = "_netrc" if system()=="Windows" else ".netrc"
    try:
        username, _, password = netrc(file=join(expanduser('~'), netrc_name)).authenticators(endpoint)
    except (FileNotFoundError, TypeError):
        print('Please provide your Earthdata Login credentials for access.')
        print('Your info will only be passed to %s and will not be exposed in Jupyter.' % (endpoint))
        username = input('Username: ')
        password = getpass('Password: ')
    manager = request.HTTPPasswordMgrWithDefaultRealm()
    manager.add_password(None, endpoint, username, password)
    auth = request.HTTPBasicAuthHandler(manager)
    jar = CookieJar()
    processor = request.HTTPCookieProcessor(jar)
    opener = request.build_opener(auth, processor)
    request.install_opener(opener)
    
setup_earthdata_login_auth('urs.earthdata.nasa.gov')

Please provide your Earthdata Login credentials for access.
Your info will only be passed to urs.earthdata.nasa.gov and will not be exposed in Jupyter.


Username:  nickles
Password:  ···········


### Search Common Metadata Repository (CMR) for SWOT sample data links by Shapefile
We want to find the SWOT sample files that will cross over our region of interest. For this tutorial, we use a shapefile of the United States, finding 44 total granules. Each dataset has it's own unique collection ID. For the SWOT_SIMULATED_NA_CONTINENT_L2_HR_RIVERSP_V1 dataset, we can find the collection ID [here](https://podaac.jpl.nasa.gov/dataset/SWOT_SIMULATED_NA_CONTINENT_L2_HR_RIVERSP_V1).

In [3]:
# the URL of the CMR service
cmr_url = 'https://cmr.earthdata.nasa.gov/search/granules.json'

#The shapefile we want to use in our search
shp_file = open('../resources/US_shapefile.zip', 'rb')

#need to declare the file and the type we are uploading
files = {'shapefile':('US_shapefile.zip',shp_file, 'application/shapefile+zip')}

#used to define parameters such as the concept-id and things like temporal searches
parameters = {'collection_concept_id':'C2263384307-POCLOUD',
             'page_size': 2000}#, #default will only return 10 granules, so we set it to the max
             #'bounding_box':"-124.848974,24.396308,-66.885444,49.384358"} #could also use a bounding box

#request the granules from this collection that align with the shapefile
response = requests.post(cmr_url, params=parameters, files=files)

if len(response.json()['feed']['entry'])>0:
    print(len(response.json()['feed']['entry'])) #print out number of files found
    #print(dumps(response.json()['feed']['entry'][0], indent=2)) #print out the first file information

44


### Get Download links from CMR search results

In [4]:
downloads = []
for r in response.json()['feed']['entry']:
    for l in r['links']:
        #if the link starts with the following, it is the download link we want
        if 'https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/' in l['href']: 
            #if the link has "Reach" instead of "Node" in the name, we want to download it for the swath use case
            if 'Reach' in l['href']:
                downloads.append(l['href'])
print(len(downloads)) #should end up with half the number of files above since we only need reach files, not node files

22


### Download the Data into a folder

In [5]:
#Create folder to house downloaded data 
folder = Path("SWOT_sample_files")
#newpath = r'SWOT_sample_files' 
if not os.path.exists(folder):
    os.makedirs(folder)

In [6]:
for f in downloads:
    urlretrieve(f, f"{folder}/{os.path.basename(f)}")

### Unzip shapefiles in existing folder

In [7]:
for item in os.listdir(folder): # loop through items in dir
    if item.endswith(".zip"): # check for ".zip" extension
        zip_ref = zipfile.ZipFile(f"{folder}/{item}") # create zipfile object
        zip_ref.extractall(folder) # extract file to dir
        zip_ref.close() # close file

### Merging two seperate shapefiles into one

In [8]:
# Read shapefiles
SWOT_1 = gpd.read_file(folder / 'SWOT_L2_HR_RiverSP_Reach_007_037_NA_20220805T115553_20220805T120212_PGA0_01.shp')
SWOT_2 = gpd.read_file(folder / 'SWOT_L2_HR_RiverSP_Reach_007_065_NA_20220806T115630_20220806T120114_PGA0_01.shp')
 
# Merge/Combine multiple shapefiles into one
SWOT_Merge = gpd.pd.concat([SWOT_1, SWOT_2])
 
#Export merged geodataframe into shapefile
SWOT_Merge.to_file(folder / 'SWOT_Merge.shp')

### Merging multiple shapefiles from within a folder

In [9]:
# State filename extension to look for within folder, in this case .shp which is the shapefile
shapefiles = folder.glob("*.shp")

# Merge/Combine multiple shapefiles in folder into one
gdf = pd.concat([
    gpd.read_file(shp)
    for shp in shapefiles
]).pipe(gpd.GeoDataFrame)

# Export merged geodataframe into shapefile
gdf.to_file(folder / 'SWOTReaches.shp')

### Converting to CSV

Converting merged geodataframe into a csv file. 

In [10]:
gdf.to_csv(folder / 'csvmerge.csv')

### Querying a Shapefile

If you want to search for a specific reach id or a specific length of river reach that is possible through a spatial query using Geopandas. 

Utilizing comparison operators (>, <, ==, >=, <=).

You can zoom into a particular river reach by specifying by it’s reach_id or looking for duplicate overlapping river reaches.

In [11]:
reach = gdf.query("reach_id == '74292500301'")
reach

Unnamed: 0,reach_id,time,time_tai,time_str,p_lat,p_lon,river_name,wse,wse_u,wse_r_u,...,p_width,p_wid_var,p_n_nodes,p_dist_out,p_length,p_maf,p_dam_id,p_n_ch_max,p_n_ch_mod,geometry
2,74292500301,-1000000000000.0,-1000000000000.0,no_data,40.063235,-98.551296,no_data,-1000000000000.0,-1000000000000.0,-1000000000000.0,...,54.0,387.837794,47,3200409.359,9496.587434,-1000000000000.0,0,2,1,"LINESTRING (-98.50490 40.06789, -98.50525 40.0..."
308,74292500301,-1000000000000.0,-1000000000000.0,no_data,40.063235,-98.551296,no_data,-1000000000000.0,-1000000000000.0,-1000000000000.0,...,54.0,387.837794,47,3200409.359,9496.587434,-1000000000000.0,0,2,1,"LINESTRING (-98.50490 40.06789, -98.50525 40.0..."
262,74292500301,-1000000000000.0,-1000000000000.0,no_data,40.063235,-98.551296,no_data,-1000000000000.0,-1000000000000.0,-1000000000000.0,...,54.0,387.837794,47,3200409.359,9496.587434,-1000000000000.0,0,2,1,"LINESTRING (-98.50490 40.06789, -98.50525 40.0..."
51,74292500301,-1000000000000.0,-1000000000000.0,no_data,40.063235,-98.551296,no_data,-1000000000000.0,-1000000000000.0,-1000000000000.0,...,54.0,387.837794,47,3200409.359,9496.587434,-1000000000000.0,0,2,1,"LINESTRING (-98.50490 40.06789, -98.50525 40.0..."
308,74292500301,-1000000000000.0,-1000000000000.0,no_data,40.063235,-98.551296,no_data,-1000000000000.0,-1000000000000.0,-1000000000000.0,...,54.0,387.837794,47,3200409.359,9496.587434,-1000000000000.0,0,2,1,"LINESTRING (-98.50490 40.06789, -98.50525 40.0..."


In [12]:
WSE = gdf.query('wse > 75')
WSE

Unnamed: 0,reach_id,time,time_tai,time_str,p_lat,p_lon,river_name,wse,wse_u,wse_r_u,...,p_width,p_wid_var,p_n_nodes,p_dist_out,p_length,p_maf,p_dam_id,p_n_ch_max,p_n_ch_mod,geometry
263,77158000011,713275000.0,713275000.0,2022-08-08T11:5628Z,25.297171,-108.473158,no_data,123.71461,-1000000000000.0,0.0,...,69.5,1719.195048,49,9731.61,9731.609922,-1000000000000.0,0,2,1,"LINESTRING (-108.49317 25.28405, -108.49287 25..."
119,73282800021,713441800.0,713441800.0,2022-08-10T10:1658Z,33.634414,-87.209808,no_data,88.18387,-1000000000000.0,4.2635,...,211.5,3285.033201,57,687962.665,11346.636403,-1000000000000.0,0,2,1,"LINESTRING (-87.23478 33.62552, -87.23452 33.6..."
630,74267700121,713441900.0,713441900.0,2022-08-10T10:1834Z,38.778477,-84.10726,no_data,134.81383,-1000000000000.0,2.6857,...,669.0,2311.101872,57,2560861.191,11466.933285,-1000000000000.0,0,2,1,"LINESTRING (-84.17021 38.79320, -84.16986 38.7..."
34,73290000041,714511800.0,714511800.0,2022-08-22T19:3017Z,30.597928,-88.626436,no_data,118.64166,-1000000000000.0,15.47494,...,105.0,754.311517,52,67960.8,10424.745294,-1000000000000.0,0,2,1,"LINESTRING (-88.60566 30.58840, -88.60597 30.5..."
242,74253000021,714511800.0,714511700.0,2022-08-22T19:2912Z,34.018836,-90.967538,no_data,91.37639,-1000000000000.0,4.93354,...,968.0,67506.844891,50,1108109.937,9988.011659,-1000000000000.0,0,4,1,"LINESTRING (-91.01678 33.99997, -91.01645 34.0..."
658,74291500071,714511700.0,714511600.0,2022-08-22T19:2746Z,38.843434,-92.441821,no_data,76.10944,-1000000000000.0,9.76231,...,408.0,4018.894985,59,2280021.224,11890.852274,-1000000000000.0,0,2,1,"LINESTRING (-92.39134 38.81822, -92.39168 38.8..."


### Converting to CSV

Converting querried variable into a csv file.

In [13]:
reach.to_csv(folder / 'reach.csv')

In [14]:
WSE.to_csv(folder / 'WSE.csv')