> From the PO.DAAC Cookbook, to access the GitHub version of the notebook, follow [this link](https://github.com/podaac/tutorials/blob/master/notebooks/GIS/SWOTshp_CSVconversion.ipynb).

# SWOT Shapefile Data Conversion to CSV

### Notebook showcasing how to merge/concatenate multiple shapefiles into a single file.
- Utilizing the merged shapefile and converting it to a csv file.
- Option to query the new dataset based on users choice; either 'reach_id' or water surface elevation ('wse'), etc.
- Using the queried variable to export it as a csv or shapefile.

### Import libraries

Use this while SWOT data has restricted access: need to install the dev version for earthaccess.

In [1]:
import geopandas as gpd
import glob
from pathlib import Path
import pandas as pd
import os
import zipfile
import earthaccess
from earthaccess import Auth, DataCollections, DataGranules, Store

## Before you start

Before you beginning this tutorial, make sure you have an account in the Earthdata Login, which is required to access data from the NASA Earthdata system. Please visit https://urs.earthdata.nasa.gov to register for an Earthdata Login account. It is free to create and only takes a moment to set up.

In [2]:
auth = earthaccess.login() 

EARTHDATA_USERNAME and EARTHDATA_PASSWORD are not set in the current environment, try setting them or use a different strategy (netrc, interactive)
You're now authenticated with NASA Earthdata Login
Using token with expiration date: 12/22/2023
Using .netrc file for EDL


### Search for SWOT data
Let's start our search for River Vector Shapefiles in April 2023 with a particular pass, pass 013. SWOT files come in "reach" and "node" versions in the same collection, here we want the 10km reaches rather than the nodes. We will also only get files for North America, or 'NA' and call out a specific pass number that we want. Each dataset has it's own shortname associate with it, for the SWOT River shapefiles, it is SWOT_L2_HR_RiverSP_1.1.

In [3]:
#Retrieves granule from the day we want, in this case by passing to `earthdata.search_data` function the data collection shortname, temporal bounds, and for restricted data one must specify the search count
results = earthaccess.search_data(short_name = 'SWOT_L2_HR_RIVERSP_1.1', 
                                  temporal = ('2023-04-08 00:00:00', '2023-04-25 23:59:59'),
                                  granule_name = '*Reach*_013_NA*') # here we filter by Reach files (not node), pass #13 and continent code=NA

Granules found: 17


During the fast sampling orbit for SWOT, the same passes were observed daily, thus 18 files makes sense! During the science orbit, a pass will be repeated once every 21 days. A particular location may have different passes observe it within the 21 days, however. See the [SWOT swath visualizer](https://swot.jpl.nasa.gov/mission/swath-visualizer/) for your location!

### Download the Data into a folder

In [4]:
earthaccess.download(results, "../datasets/data_downloads/SWOT_files")
folder = Path("../datasets/data_downloads/SWOT_files")

 Getting 17 granules, approx download size: 0.04 GB


QUEUEING TASKS | :   0%|          | 0/17 [00:00<?, ?it/s]

PROCESSING TASKS | :   0%|          | 0/17 [00:00<?, ?it/s]

COLLECTING RESULTS | :   0%|          | 0/17 [00:00<?, ?it/s]

### Unzip shapefiles in existing folder

In [5]:
for item in os.listdir(folder): # loop through items in dir
    if item.endswith(".zip"): # check for ".zip" extension
        zip_ref = zipfile.ZipFile(f"{folder}/{item}") # create zipfile object
        zip_ref.extractall(folder) # extract file to dir
        zip_ref.close() # close file

### Merging multiple shapefiles from within a folder
Lets merge all shapefiles we downloaded together. This approach is ideal for a small number of granules, but if you're looking to create large timeseries, consider using the PO.DAAC Hydrocron tool.

In [6]:
# State filename extension to look for within folder, in this case .shp which is the shapefile
shapefiles = folder.glob("*.shp")

# Merge/Combine multiple shapefiles in folder into one
gdf = pd.concat([
    gpd.read_file(shp)
    for shp in shapefiles
]).pipe(gpd.GeoDataFrame)

# Export merged geodataframe into shapefile
gdf.to_file(folder / 'SWOTReaches.shp')

### Querying a Shapefile

Let's get the attributes from a particular reach of the merged shapefile. If you want to search for a specific reach id or a specific length of river reach that is possible through a spatial query using Geopandas. Here, we'll look at a river reach on Cook Slough in Oregon, ID: 78310700041. River IDs can be identified in the [SWORD Database](https://www.swordexplorer.com/).

In [7]:
reach = gdf.query("reach_id == '78310700041'")
reach

Unnamed: 0,reach_id,time,time_tai,time_str,p_lat,p_lon,river_name,wse,wse_u,wse_r_u,...,p_wid_var,p_n_nodes,p_dist_out,p_length,p_maf,p_dam_id,p_n_ch_max,p_n_ch_mod,p_low_slp,geometry
256,78310700041,-1000000000000.0,-1000000000000.0,no_data,48.199575,-122.175914,Cook Slough,-1000000000000.0,-1000000000000.0,-1000000000000.0,...,567.328,66,29443.922,13254.395103,-1000000000000.0,0,3,1,0,"LINESTRING (-122.24281 48.19859, -122.24248 48..."
250,78310700041,-1000000000000.0,-1000000000000.0,no_data,48.199575,-122.175914,Cook Slough,-1000000000000.0,-1000000000000.0,-1000000000000.0,...,567.328,66,29443.922,13254.395103,-1000000000000.0,0,3,1,0,"LINESTRING (-122.24281 48.19859, -122.24248 48..."
250,78310700041,-1000000000000.0,-1000000000000.0,no_data,48.199575,-122.175914,Cook Slough,-1000000000000.0,-1000000000000.0,-1000000000000.0,...,567.328,66,29443.922,13254.395103,-1000000000000.0,0,3,1,0,"LINESTRING (-122.24281 48.19859, -122.24248 48..."
250,78310700041,-1000000000000.0,-1000000000000.0,no_data,48.199575,-122.175914,Cook Slough,-1000000000000.0,-1000000000000.0,-1000000000000.0,...,567.328,66,29443.922,13254.395103,-1000000000000.0,0,3,1,0,"LINESTRING (-122.24281 48.19859, -122.24248 48..."
250,78310700041,-1000000000000.0,-1000000000000.0,no_data,48.199575,-122.175914,Cook Slough,-1000000000000.0,-1000000000000.0,-1000000000000.0,...,567.328,66,29443.922,13254.395103,-1000000000000.0,0,3,1,0,"LINESTRING (-122.24281 48.19859, -122.24248 48..."
33,78310700041,-1000000000000.0,-1000000000000.0,no_data,48.199575,-122.175914,Cook Slough,-1000000000000.0,-1000000000000.0,-1000000000000.0,...,567.328,66,29443.922,13254.395103,-1000000000000.0,0,3,1,0,"LINESTRING (-122.24281 48.19859, -122.24248 48..."
134,78310700041,-1000000000000.0,-1000000000000.0,no_data,48.199575,-122.175914,Cook Slough,-1000000000000.0,-1000000000000.0,-1000000000000.0,...,567.328,66,29443.922,13254.395103,-1000000000000.0,0,3,1,0,"LINESTRING (-122.24281 48.19859, -122.24248 48..."
238,78310700041,-1000000000000.0,-1000000000000.0,no_data,48.199575,-122.175914,Cook Slough,-1000000000000.0,-1000000000000.0,-1000000000000.0,...,567.328,66,29443.922,13254.395103,-1000000000000.0,0,3,1,0,"LINESTRING (-122.24281 48.19859, -122.24248 48..."
256,78310700041,-1000000000000.0,-1000000000000.0,no_data,48.199575,-122.175914,Cook Slough,-1000000000000.0,-1000000000000.0,-1000000000000.0,...,567.328,66,29443.922,13254.395103,-1000000000000.0,0,3,1,0,"LINESTRING (-122.24281 48.19859, -122.24248 48..."
256,78310700041,-1000000000000.0,-1000000000000.0,no_data,48.199575,-122.175914,Cook Slough,-1000000000000.0,-1000000000000.0,-1000000000000.0,...,567.328,66,29443.922,13254.395103,-1000000000000.0,0,3,1,0,"LINESTRING (-122.24281 48.19859, -122.24248 48..."


### Converting to CSV

We can convert the merged timeseries geodataframe for this reach into a csv file. 

In [8]:
gdf.to_csv(folder / 'csv_78310700041.csv')