![](https://img.shields.io/badge/PO.DAAC-Contribution-%20?color=grey&labelColor=blue)

> From the PO.DAAC Cookbook, to access the GitHub version of the notebook, follow [this link](https://github.com/podaac/tutorials/blob/master/notebooks/GIS/SWOTshp_CSVconversion.ipynb).

# SWOT Shapefile Data Conversion to CSV

### Notebook showcasing how to merge/concatenate multiple lake shapefiles into a single file.
- Utilizing the merged shapefile and converting it to a csv file.
- Option to query the new dataset based on users choice; either 'lake_id' or water surface elevation ('wse'), etc.
- Using the queried variable to export it as a csv or shapefile.

**Note, PO.DAAC will be expanding the [Hydrocron API](https://podaac.github.io/hydrocron/intro.html) we built for rivers to include lakes, but as of this moment, here is a work around to produce lake time series from shapefiles!**

_Authors: Nicholas Tarpinian, Cassie Nickles, NASA PO.DAAC_

### Import libraries

In [3]:
import geopandas as gpd
import glob
from pathlib import Path
import pandas as pd
import os
import zipfile
import earthaccess

## Before you start

Before you beginning this tutorial, make sure you have an account in the Earthdata Login, which is required to access data from the NASA Earthdata system. Please visit https://urs.earthdata.nasa.gov to register for an Earthdata Login account. It is free to create and only takes a moment to set up.

In [4]:
auth = earthaccess.login() 

### Search for SWOT data
Let's start our search for Lake Vector Shapefiles in North America. SWOT files come in "Prior", "Obs" or "Unassigned" versions in the same collection, here we want the Prior Lakes from the Prior Lake Database. We will also only get files for North America, or 'NA' and can call out a specific pass number that we want. Each dataset has it's own shortname associate with it, for the SWOT Lake shapefiles, it is SWOT_L2_HR_LakeSP_2.0.

In [8]:
results = earthaccess.search_data(short_name = 'SWOT_L2_HR_LAKESP_2.0', 
                                  temporal = ('2024-01-01 00:00:00', '2024-05-30 23:59:59'), # can also specify by time
                                  granule_name = '*Prior*_009_NA*') # here we filter by Prior files (not observed or Unassigned), pass=009, continent code=NA

Granules found: 4


During the science orbit, a pass will be repeated once every 21 days. A particular location may have different passes observe it within the 21 days, however.

### Download the Data into a folder

In [9]:
earthaccess.download(results, "../datasets/data_downloads/SWOT_files/")
folder = Path("../datasets/data_downloads/SWOT_files")

 Getting 4 granules, approx download size: 0.27 GB


QUEUEING TASKS | :   0%|          | 0/4 [00:00<?, ?it/s]

PROCESSING TASKS | :   0%|          | 0/4 [00:00<?, ?it/s]

COLLECTING RESULTS | :   0%|          | 0/4 [00:00<?, ?it/s]

### Unzip shapefiles in existing folder

In [10]:
for item in os.listdir(folder): # loop through items in dir
    if item.endswith(".zip"): # check for ".zip" extension
        zip_ref = zipfile.ZipFile(f"{folder}/{item}") # create zipfile object
        zip_ref.extractall(folder) # extract file to dir
        zip_ref.close() # close file

### Opening multiple shapefiles from within a folder
Lets open all the shapefiles we've downloaded together into one database. This approach is ideal for a small number of granules, but if you're looking to create large timeseries, consider using the PO.DAAC Hydrocron tool.

In [11]:
# Initialize list of shapefiles containing all dates
SWOT_HR_shps = []

# Loop through queried granules to stack all acquisition dates
for j in range(len(results)):
    filename = earthaccess.results.DataGranule.data_links(results[j], access='external')
    filename = filename[0].split("/")[-1]
    filename_shp = filename.replace('.zip','.shp')
    filename_shp_path = f"{folder}\{filename_shp}"
    SWOT_HR_shps.append(gpd.read_file(filename_shp_path)) 

In [12]:
# Combine granules from all acquisition dates into one dataframe
SWOT_HR_df = gpd.GeoDataFrame(pd.concat(SWOT_HR_shps, ignore_index=True))

# Sort dataframe by lake_id and time
SWOT_HR_df = SWOT_HR_df.sort_values(['lake_id', 'time'])

SWOT_HR_df

Unnamed: 0,lake_id,reach_id,obs_id,overlap,n_overlap,time,time_tai,time_str,wse,wse_u,...,lake_name,p_res_id,p_lon,p_lat,p_ref_wse,p_ref_area,p_date_t0,p_ds_t0,p_storage,geometry
42426,7120818372,no_data,712239L000048,100,1,7.576815e+08,7.576816e+08,2024-01-04T11:05:46Z,4.374300e+02,6.300000e-02,...,no_data,-99999999,-93.412112,47.756423,-1.000000e+12,0.0207,no_data,-1.000000e+12,-1.000000e+12,"POLYGON ((-93.41107 47.75727, -93.41081 47.757..."
97969,7120818372,no_data,712239L000028,100,1,7.612870e+08,7.612870e+08,2024-02-15T04:35:57Z,4.373460e+02,7.900000e-02,...,no_data,-99999999,-93.412112,47.756423,-1.000000e+12,0.0207,no_data,-1.000000e+12,-1.000000e+12,"POLYGON ((-93.41115 47.75730, -93.41090 47.757..."
156135,7120818372,no_data,712239L000025,100,1,7.648924e+08,7.648924e+08,2024-03-27T22:06:05Z,4.377310e+02,5.600000e-02,...,no_data,-99999999,-93.412112,47.756423,-1.000000e+12,0.0207,no_data,-1.000000e+12,-1.000000e+12,"POLYGON ((-93.41200 47.75739, -93.41174 47.757..."
215861,7120818372,no_data,712239L000027,92,1,7.666951e+08,7.666951e+08,2024-04-17T18:51:11Z,4.376710e+02,2.300000e-02,...,no_data,-99999999,-93.412112,47.756423,-1.000000e+12,0.0207,no_data,-1.000000e+12,-1.000000e+12,"POLYGON ((-93.41199 47.75734, -93.41174 47.757..."
42460,7120818382,no_data,712239L000049,99,1,7.576815e+08,7.576816e+08,2024-01-04T11:05:46Z,4.257550e+02,2.100000e-02,...,RAT LAKE,-99999999,-93.357210,47.749259,-1.000000e+12,0.1989,no_data,-1.000000e+12,-1.000000e+12,"POLYGON ((-93.35303 47.75061, -93.35273 47.750..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
206470,7710076862,no_data,771190R000017,86,1,7.666946e+08,7.666946e+08,2024-04-17T18:43:01Z,2.810479e+03,3.900000e-02,...,no_data,-99999999,-99.916944,19.984955,-1.000000e+12,0.0459,no_data,-1.000000e+12,-1.000000e+12,"MULTIPOLYGON (((-99.91661 19.98636, -99.91650 ..."
36628,7710076872,no_data,no_data,no_data,no_data,-1.000000e+12,-1.000000e+12,no_data,-1.000000e+12,-1.000000e+12,...,no_data,-99999999,-100.263287,20.393474,-1.000000e+12,0.0513,no_data,-1.000000e+12,-1.000000e+12,
73000,7710076872,no_data,no_data,no_data,no_data,-1.000000e+12,-1.000000e+12,no_data,-1.000000e+12,-1.000000e+12,...,no_data,-99999999,-100.263287,20.393474,-1.000000e+12,0.0513,no_data,-1.000000e+12,-1.000000e+12,
130255,7710076872,no_data,no_data,no_data,no_data,-1.000000e+12,-1.000000e+12,no_data,-1.000000e+12,-1.000000e+12,...,no_data,-99999999,-100.263287,20.393474,-1.000000e+12,0.0513,no_data,-1.000000e+12,-1.000000e+12,


### Querying a Shapefile

Let's get the attributes from a particular reach of the merged shapefile. If you want to search for a specific lake_id, Lake IDs can be identified in the [Prior Lake Database (PLD)](https://hydroweb.next.theia-land.fr/) - Add in the PLD layer into Hydroweb.next to see the lakes SWOT products are based upon.

In [13]:
reach = SWOT_HR_df.query("lake_id == '7120818372'")
reach

Unnamed: 0,lake_id,reach_id,obs_id,overlap,n_overlap,time,time_tai,time_str,wse,wse_u,...,lake_name,p_res_id,p_lon,p_lat,p_ref_wse,p_ref_area,p_date_t0,p_ds_t0,p_storage,geometry
42426,7120818372,no_data,712239L000048,100,1,757681500.0,757681600.0,2024-01-04T11:05:46Z,437.43,0.063,...,no_data,-99999999,-93.412112,47.756423,-1000000000000.0,0.0207,no_data,-1000000000000.0,-1000000000000.0,"POLYGON ((-93.41107 47.75727, -93.41081 47.757..."
97969,7120818372,no_data,712239L000028,100,1,761287000.0,761287000.0,2024-02-15T04:35:57Z,437.346,0.079,...,no_data,-99999999,-93.412112,47.756423,-1000000000000.0,0.0207,no_data,-1000000000000.0,-1000000000000.0,"POLYGON ((-93.41115 47.75730, -93.41090 47.757..."
156135,7120818372,no_data,712239L000025,100,1,764892400.0,764892400.0,2024-03-27T22:06:05Z,437.731,0.056,...,no_data,-99999999,-93.412112,47.756423,-1000000000000.0,0.0207,no_data,-1000000000000.0,-1000000000000.0,"POLYGON ((-93.41200 47.75739, -93.41174 47.757..."
215861,7120818372,no_data,712239L000027,92,1,766695100.0,766695100.0,2024-04-17T18:51:11Z,437.671,0.023,...,no_data,-99999999,-93.412112,47.756423,-1000000000000.0,0.0207,no_data,-1000000000000.0,-1000000000000.0,"POLYGON ((-93.41199 47.75734, -93.41174 47.757..."


### Converting to CSV

We can convert the merged timeseries geodataframe for this reach into a csv file. 

In [None]:
gdf.to_csv(folder / 'csv_7120818372.csv')