> From the PO.DAAC Cookbook, to access the GitHub version of the notebook, follow [this link](https://github.com/podaac/tutorials/blob/master/notebooks/GIS/SWOTshp_CSVconversion.ipynb).

# SWOT Shapefile Data Conversion to CSV

### Notebook showcasing how to merge/concatenate multiple shapefiles into a single file.
- Utilizing the merged shapefile and converting it to a csv file.
- Option to query the new dataset based on users choice; either 'reach_id' or water surface elevation ('wse'), etc.
- Using the queried variable to export it as a csv or shapefile.

### Import libraries

In [1]:
import geopandas as gpd
import glob
from pathlib import Path
import pandas as pd
import os
import zipfile
import earthaccess
from earthaccess import Auth, DataCollections, DataGranules, Store

## Before you start

Before you beginning this tutorial, make sure you have an account in the Earthdata Login, which is required to access data from the NASA Earthdata system. Please visit https://urs.earthdata.nasa.gov to register for an Earthdata Login account. It is free to create and only takes a moment to set up.

In [3]:
auth = earthaccess.login() 

We are already authenticated with NASA EDL


### Search for SWOT data
Let's start our search for River Vector Shapefiles in North America. SWOT files come in "reach" and "node" versions in the same collection, here we want the 10km reaches rather than the nodes. We will also only get files for North America, or 'NA' and can call out a specific pass number that we want. Each dataset has it's own shortname associate with it, for the SWOT River shapefiles, it is SWOT_L2_HR_RiverSP_2.0.

In [40]:
results = earthaccess.search_data(short_name = 'SWOT_L2_HR_RIVERSP_2.0', 
                                  #temporal = ('2024-01-01 00:00:00', '2024-01-31 23:59:59'), # can also specify by time
                                  granule_name = '*Reach*_008_*_NA*', # here we filter by Reach files (not node), pass=008, continent code=NA
                                  count = 2000) # necessary to specify count while the dataset is not fully public

#this portion of the code chunk below is only necessary while the dataset is not fully public
if len(results) >0:
    print(f"Actual granules returned: {len(results)}")

Granules found: 0
Actual granules returned: 98


During the science orbit, a pass will be repeated once every 21 days. A particular location may have different passes observe it within the 21 days, however. See the [SWOT swath visualizer](https://swot.jpl.nasa.gov/mission/swath-visualizer/) for your location!

### Download the Data into a folder

In [45]:
earthaccess.download(results, "../datasets/data_downloads/SWOT_files")
folder = Path("../datasets/data_downloads/SWOT_files")

 Getting 98 granules, approx download size: 0.37 GB


QUEUEING TASKS | :   0%|          | 0/98 [00:00<?, ?it/s]

File SWOT_L2_HR_RiverSP_Reach_008_007_NA_20231214T122936_20231214T122941_PIC0_01.zip already downloaded
File SWOT_L2_HR_RiverSP_Reach_008_009_NA_20231214T141139_20231214T141150_PIC0_01.zip already downloaded
File SWOT_L2_HR_RiverSP_Reach_008_011_NA_20231214T160019_20231214T160023_PIC0_01.zip already downloaded
File SWOT_L2_HR_RiverSP_Reach_008_013_NA_20231214T174955_20231214T174957_PIC0_01.zip already downloaded
File SWOT_L2_HR_RiverSP_Reach_008_020_NA_20231214T231628_20231214T231630_PIC0_01.zip already downloaded
File SWOT_L2_HR_RiverSP_Reach_008_022_NA_20231215T005824_20231215T005835_PIC0_01.zip already downloaded
File SWOT_L2_HR_RiverSP_Reach_008_024_NA_20231215T024027_20231215T024038_PIC0_01.zip already downloaded
File SWOT_L2_HR_RiverSP_Reach_008_035_NA_20231215T122656_20231215T122702_PIC0_01.zip already downloaded
File SWOT_L2_HR_RiverSP_Reach_008_037_NA_20231215T141227_20231215T141231_PIC0_01.zip already downloaded
File SWOT_L2_HR_RiverSP_Reach_008_039_NA_20231215T160129_2023121

PROCESSING TASKS | :   0%|          | 0/98 [00:00<?, ?it/s]

COLLECTING RESULTS | :   0%|          | 0/98 [00:00<?, ?it/s]

### Unzip shapefiles in existing folder

In [46]:
for item in os.listdir(folder): # loop through items in dir
    if item.endswith(".zip"): # check for ".zip" extension
        zip_ref = zipfile.ZipFile(f"{folder}/{item}") # create zipfile object
        zip_ref.extractall(folder) # extract file to dir
        zip_ref.close() # close file

### Opening multiple shapefiles from within a folder
Lets open all the shapefiles we've downloaded together into one database. This approach is ideal for a small number of granules, but if you're looking to create large timeseries, consider using the PO.DAAC Hydrocron tool.

In [47]:
# Initialize list of shapefiles containing all dates
SWOT_HR_shps = []

# Loop through queried granules to stack all acquisition dates
for j in range(len(results)):
    filename = earthaccess.results.DataGranule.data_links(results[j], access='external')
    filename = filename[0].split("/")[-1]
    filename_shp = filename.replace('.zip','.shp')
    filename_shp_path = f"{folder}\{filename_shp}"
    SWOT_HR_shps.append(gpd.read_file(filename_shp_path)) 

In [48]:
# Combine granules from all acquisition dates into one dataframe
SWOT_HR_df = gpd.GeoDataFrame(pd.concat(SWOT_HR_shps, ignore_index=True))

# Sort dataframe by reach_id and time
SWOT_HR_df = SWOT_HR_df.sort_values(['reach_id', 'time'])

SWOT_HR_df

Unnamed: 0,reach_id,time,time_tai,time_str,p_lat,p_lon,river_name,wse,wse_u,wse_r_u,...,p_wid_var,p_n_nodes,p_dist_out,p_length,p_maf,p_dam_id,p_n_ch_max,p_n_ch_mod,p_low_slp,geometry
13494,71120000013,7.562248e+08,7.562248e+08,2023-12-18T14:26:06Z,59.459741,-96.000230,no_data,-1.000000e+12,-1.000000e+12,-1.000000e+12,...,19905.993,3,653.398,653.397541,-1.000000e+12,0,2,1,0,"LINESTRING (-95.99995 59.46243, -95.99940 59.4..."
18839,71120000013,7.563430e+08,7.563431e+08,2023-12-19T23:17:17Z,59.459741,-96.000230,no_data,1.157180e+02,9.000000e-02,7.300000e-04,...,19905.993,3,653.398,653.397541,-1.000000e+12,0,2,1,0,"LINESTRING (-95.99995 59.46243, -95.99940 59.4..."
32359,71120000013,7.571693e+08,7.571694e+08,2023-12-29T12:49:05Z,59.459741,-96.000230,no_data,1.173534e+02,9.053000e-02,9.820000e-03,...,19905.993,3,653.398,653.397541,-1.000000e+12,0,2,1,0,"LINESTRING (-95.99995 59.46243, -95.99940 59.4..."
37716,71120000013,7.572876e+08,7.572876e+08,2023-12-30T21:40:07Z,59.459741,-96.000230,no_data,1.157887e+02,9.000000e-02,7.800000e-04,...,19905.993,3,653.398,653.397541,-1.000000e+12,0,2,1,0,"LINESTRING (-95.99995 59.46243, -95.99940 59.4..."
13495,71120000043,7.562248e+08,7.562248e+08,2023-12-18T14:26:06Z,59.397128,-95.967273,no_data,1.253760e+02,1.141500e-01,7.022000e-02,...,324408.065,10,8886.628,2080.696533,-1.000000e+12,0,2,1,0,"LINESTRING (-95.96211 59.38835, -95.96235 59.3..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
51919,78390800065,7.576077e+08,7.576077e+08,2024-01-03T14:34:24Z,59.392797,-135.888390,Chilkat River,3.445510e+01,6.670500e-01,6.609500e-01,...,29519.805,51,50548.355,10120.656225,-1.000000e+12,0,4,1,0,"LINESTRING (-135.84229 59.36173, -135.84261 59..."
3456,78390800075,-1.000000e+12,-1.000000e+12,no_data,59.451254,-135.988862,Chilkat River,-1.000000e+12,-1.000000e+12,-1.000000e+12,...,1833.987,51,60661.310,10112.954582,-1.000000e+12,0,2,1,0,"LINESTRING (-135.94191 59.41844, -135.94192 59..."
51920,78390800075,7.576077e+08,7.576077e+08,2024-01-03T14:34:24Z,59.451254,-135.988862,Chilkat River,4.109530e+01,4.597660e+00,4.596780e+00,...,1833.987,51,60661.310,10112.954582,-1.000000e+12,0,2,1,0,"LINESTRING (-135.94191 59.41844, -135.94192 59..."
3457,78390800085,7.559232e+08,7.559233e+08,2023-12-15T02:40:20Z,59.527589,-136.036081,Chilkat River,4.755120e+01,7.233770e+00,7.233210e+00,...,829.967,53,71338.685,10677.375498,-1.000000e+12,0,2,1,0,"LINESTRING (-136.01371 59.56408, -136.01388 59..."


### Querying a Shapefile

Let's get the attributes from a particular reach of the merged shapefile. If you want to search for a specific reach id or a specific length of river reach that is possible through a spatial query using Geopandas. Here, we'll look at a river reach on Cook Slough in Oregon, ID: 78310700041. River IDs can be identified in the [SWORD Database](https://www.swordexplorer.com/).

In [49]:
reach = SWOT_HR_df.query("reach_id == '71140500231'")
reach

Unnamed: 0,reach_id,time,time_tai,time_str,p_lat,p_lon,river_name,wse,wse_u,wse_r_u,...,p_wid_var,p_n_nodes,p_dist_out,p_length,p_maf,p_dam_id,p_n_ch_max,p_n_ch_mod,p_low_slp,geometry
17577,71140500231,756311200.0,756311200.0,2023-12-19T14:26:16Z,57.973509,-99.129638,no_data,292.0628,0.13304,0.09797,...,3864.764,94,522118.972,18789.226273,-1000000000000.0,0,2,1,0,"LINESTRING (-99.02466 57.94790, -99.02517 57.9..."
26363,71140500231,756515900.0,756516000.0,2023-12-21T23:18:50Z,57.973509,-99.129638,no_data,294.2075,137.7645,137.76447,...,3864.764,94,522118.972,18789.226273,-1000000000000.0,0,2,1,0,"LINESTRING (-99.02466 57.94790, -99.02517 57.9..."
32413,71140500231,757169300.0,757169300.0,2023-12-29T12:48:25Z,57.973509,-99.129638,no_data,292.0892,0.1276,0.09045,...,3864.764,94,522118.972,18789.226273,-1000000000000.0,0,2,1,0,"LINESTRING (-99.02466 57.94790, -99.02517 57.9..."
41590,71140500231,757374100.0,757374100.0,2023-12-31T21:40:59Z,57.973509,-99.129638,no_data,292.9485,41.28973,41.28963,...,3864.764,94,522118.972,18789.226273,-1000000000000.0,0,2,1,0,"LINESTRING (-99.02466 57.94790, -99.02517 57.9..."
45130,71140500231,757460500.0,757460500.0,2024-01-01T21:41:39Z,57.973509,-99.129638,no_data,302.197,259.38265,259.38264,...,3864.764,94,522118.972,18789.226273,-1000000000000.0,0,2,1,0,"LINESTRING (-99.02466 57.94790, -99.02517 57.9..."


### Converting to CSV

We can convert the merged timeseries geodataframe for this reach into a csv file. 

In [50]:
gdf.to_csv(folder / 'csv_71140500231.csv')