## Step2: SOILSCAPE data

### This notebook performs the following:
- Reads csv file with info about the SOILSCAPE sites and NISAR track/frames relevant to each EASEGrid cell
- For each NISAR track/frame
    - Gets NISAR dates and retrievals 
    - Gets SOILSCAPE data for relevant dates
    - Writes to csv file

### Notes


### Cite data as:
- A. Melebari et al., "CYGNSS SoilSCAPE Sites: Sensor Calibration and Data Analysis," IGARSS 2023 - 2023 IEEE International Geoscience and Remote Sensing Symposium, Pasadena, CA, USA, 2023, pp. 4628-4630, doi: 10.1109/IGARSS52108.2023.10282411.
- Use the digital object identifier provided in the id attribute when citing this data. See https://podaac.jpl.nasa.gov/CitingPODAAC ; 
- 10.5067/CSCAP-L1V10

### SOILSCAPE contacts:
- Amer Melebari, Ruzbeh Akbar, Erik Hodges, Darren McKague, Christopher S. Ruf, Agnelo Silva, Mahta Moghaddam
- amelebar@usc.edu, rakbar@mit.edu, ehodges@usc.edu, dmckague@umich.edu, cruf@umich.edu, agnelors@gmail.com , mahta@usc.edu

### More info at:
-  https://soilscape.usc.edu/sites-and-data/

### Necessary files
- soilscape_site_nodeLocations.csv - generated by hand using info from the main SOILSCAPE site (name,node#,lon,lat)

### Notes
- csv link of form https://soilscape.usc.edu/?csv_download_moisture=1&csv_download_moisture_site_id=25&valid_only=1&start_date=2022-05-04&end_date=2022-05-20
- fails above some unknown threshold, a month of data seems fine.


In [27]:
import os
import re
from pathlib import Path
import datetime
import pandas as pd
import geopandas as gpd
import shapely
import numpy as np
import h5py
import matplotlib
import matplotlib.pyplot as plt

from io import BytesIO
from urllib.request import urlopen
from zipfile import ZipFile

import glob

from utilsCalVal import EASEconvert,readWalnutGulch
import setParams as p

In [25]:
soilscapePath      = p.soilscapePath
modNames           = p.modNames
# Use which depth sensors?
allDepths          = [5,10,20,30]
targetDepth        = 0 #index of which depth to use
startDate          = p.startDate
endDate            = p.endDate

#Step1 output: easegrid cells and info
tmpOut             = soilscapePath+'temp/' #for temporary unzipped files
soilscapeSitesPath = p.soilscapeSitesPath

#api info: root:
soilscapeAPI        = 'https://soilscape.usc.edu/'

In [26]:
#check files and make initial directory
Path(tmpOut).mkdir(parents=True, exist_ok=True) #make soilscape directory if does not already exist
if not(os.path.isdir(soilscapePath)):
    print('Error: Must run Step1 first')
if not(os.path.isfile(soilscapeSitesPath)):
    print('Error: need metadata file '+soilscapeSitesPath+' made in Step1')

## Read info file

In [None]:
sitesDF      = pd.read_csv(soilscapeSitesPath,index_col=None)  #sorted by EASEGRID cells, not individual SOILSCAPE sites
nsites       = len(sitesDF)

## For each site, for each track/frame, get SAR retrievals and metadata


In [None]:

for i in range(nsites):
    ezr        = sitesDF['EASEGridRowIndex'][i]
    ezc        = sitesDF['EASEGridColIndex'][i]
    framecount = sitesDF['framecount'][i]
    tracks=sitesDF['tracks'][i][1:-1] #take off [ ]
    tracks=np.fromstring(tracks,dtype='int',sep=',')
    
    frames=sitesDF['frames'][i][1:-1] #take off [ ]
    frames=np.fromstring(frames,dtype='int',sep=',')
    
    for j in range(framecount):
        #if Luckyhills or Kendall, read in Walnut Gulch area A
        if ezr<17310 and ezr>17303:
            dates,retr,rete,retq = readWalnutGulch(ezr,ezc)
        
            trackDF=pd.DataFrame(data=dates,columns=['datetimeUTC'])  
            for k in range(len(modNames)):
                trackDF[modNames[k]]          = retr[:,k]
                trackDF[modNames[k]+'stddev'] = rete[:,k]
                trackDF[modNames[k]+'Qflag']  = retq[:,k]

            for k in range(len(dates)):
                d1=datetime.datetime.strftime(dates[k]+datetime.timedelta(days=-1),'%Y-%m-%d')
                d2=datetime.datetime.strftime(dates[k]+datetime.timedelta(days=2),'%Y-%m-%d')
                apiCall=soilscapeAPI+'?csv_download_moisture=1&csv_download_moisture_site_id='+str(sitesDF['siteID'][i])+'&valid_only=1&start_date='+d1+'&end_date='+d2
                
                print(apiCall)

In [33]:
apiCall='https://soilscape.usc.edu/?csv_download_moisture=1&csv_download_moisture_site_id=15&valid_only=1&start_date=2022-07-16&end_date=2022-07-19'

In [35]:


with urlopen(apiCall) as zipresp:
    with ZipFile(BytesIO(zipresp.read())) as zfile:
        zfile.extractall(tmpOut)


csvfile  = np.array(glob.glob(tmpOut+'*.csv'))

newDF      = pd.read_csv(csvfile[0],index_col=None)  #sorted by EASEGRID cells, not individual SOILSCAPE sites


In [31]:
csvfile  = np.array(glob.glob(tmpOut+'*.csv'))
print(csvfile[0])
newDF      = pd.read_csv(csvfile[0],index_col=None)  #sorted by EASEGRID cells, not individual SOILSCAPE sites



/home/jovyan/SMCalValdir/SOILSCAPE/temp/jr-3_soil_20220504_20220520.csv


In [36]:
print(newDF)

     Unnamed: 0  Unnamed: 1  Unnamed: 2  Unnamed: 3  Unnamed: 4  Unnamed: 5   
0           NaN         NaN         NaN         NaN         NaN         NaN  \
1           NaN         NaN         NaN         NaN         NaN         NaN   
2           NaN         NaN         NaN         NaN         NaN         NaN   
3           NaN         NaN         NaN         NaN         NaN         NaN   
4           NaN         NaN         NaN         NaN         NaN         NaN   
..          ...         ...         ...         ...         ...         ...   
168         NaN         NaN         NaN         NaN         NaN         NaN   
169         NaN         NaN         NaN         NaN         NaN         NaN   
170         NaN         NaN         NaN         NaN         NaN         NaN   
171         NaN         NaN         NaN         NaN         NaN         NaN   
172         NaN         NaN         NaN         NaN         NaN         NaN   

     Unnamed: 6  Unnamed: 7  Unnamed: 8  Unnamed: 9