<a href="https://colab.research.google.com/github/sea-surface-teleconnections/sea-surface-teleconnections/blob/main/Reference_Cloud_Podaac_S3_API.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# PODAAC CLOUD API S3 Datasets ⭐

---
---





>[PODAAC CLOUD API S3 Datasets ⭐](#scrollTo=x86uByJgZzzb)

>>[Adding Subscriber Repo Python script](#scrollTo=JNoLuS7TahOZ)

>[Listing Available datasets](#scrollTo=dkhet2UB8U_n)

>>[Define function to determine environment](#scrollTo=2db7ad42)

>[GHRSST Level 4 AVHRR_OI Global Blended Sea Surface Temperature Analysis (GDS2) from NCEI](#scrollTo=UyDqAxFrfdGq)

>>[Connecting to DataSets](#scrollTo=fA5dp8Xsc1KX)

>>>[Code Block Repo Function 👽](#scrollTo=OGxeEOIJdfm3)

>>>[Collecting Sets via S3 API from Main&Archived in Cloud ♎](#scrollTo=rseWFdrrd1Bn)

>[Finding DataSet Based On Keywords ⚡](#scrollTo=EiJw4XGwkUhC)

>[✌ Slicing and Indexing Variables:](#scrollTo=0MGJv6k7lAn_)

>>[Finding Earliest Date Example](#scrollTo=gIW2IeYllftl)

>[Subscribing to datasets](#scrollTo=m7aq0s38mgi1)

>>[Attemping to download straight to Google colab](#scrollTo=2lmyKHihfhJ0)

>[DataSubscriber UseCase](#scrollTo=yPfCFA4uMCHT)

>[Direct Access to S3 Token](#scrollTo=uIXFrNnwjnMD)

>[Authenticate](#scrollTo=RW9OApO7_opa)

>[Creating Interactive Display Inline](#scrollTo=67dcQLr3i_v1)



In [None]:
!pip install pyspark
!pip install podaac-data-subscriber
!pip install prettyprinter
!pip install utils
!pip install datetime 
import pandas as pd
import numpy as np
import pyspark.pandas as ps
from pyspark.sql import SparkSession

Using Pyspark, Podaac, Subscriber

In [None]:
!git clone -l -s https://github.com/podaac/data-subscriber.git clonedgit

%cd clonedgit
!ls

In [None]:
%cd subscriber/
!ls

/content/clonedgit/subscriber
__init__.py	  podaac_data_downloader.py
podaac_access.py  podaac_data_subscriber.py


## Adding Subscriber Repo Python script 



---



---



*   1.Exporting Script to Desktop
*   2. Import Script back into Google Colab to save



In [None]:
import googleapiclient
#Save first then import 
from google.colab import files
#reading file cloned from github 
files.view('podaac_data_subscriber.py')
open_file = files.view('podaac_data_subscriber.py')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>



---



In [None]:
!git clone -l -s https://github.com/podaac/Data-Recipes.git datarecipes

In [None]:
# Accessing Existing collection in PODAAC CLOUD AWS S3
import utils
import pprint
import xarray as xr
from datetime import datetime
import warnings

warnings.simplefilter(action='ignore')

In [None]:
!git clone https://github.com/podaac/Data-Recipes.git recipes
%cd recipes
!ls

Cloning into 'recipes'...
remote: Enumerating objects: 45, done.[K
remote: Counting objects: 100% (45/45), done.[K
remote: Compressing objects: 100% (35/35), done.[K
remote: Total 45 (delta 19), reused 27 (delta 10), pack-reused 0[K
Unpacking objects: 100% (45/45), done.
/content/clonedgit/subscriber/recipes
dataset-introduction  LICENSE  README.md


In [None]:
%cd dataset-introduction/

/content/clonedgit/subscriber/recipes/dataset-introduction


In [None]:
import googleapiclient
from google.colab import files
files.view('utils.py')
open_file = files.view('utils.py')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [None]:
# Do not need to run 
files.upload()

In [None]:
!pip install recipes

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting recipes
  Downloading recipes-0.1.tar.gz (926 bytes)
Building wheels for collected packages: recipes
  Building wheel for recipes (setup.py) ... [?25l[?25hdone
  Created wheel for recipes: filename=recipes-0.1-py3-none-any.whl size=1435 sha256=65ca371f29b751c81fd08b1a33c8c0a84ceebb50c4a529629a72b5f598749b38
  Stored in directory: /root/.cache/pip/wheels/8b/10/2f/29a89c1d0a768aaed396e73eacd10ff356a8e6f25e4465ee2e
Successfully built recipes
Installing collected packages: recipes
Successfully installed recipes-0.1


In [None]:
import requests
from pprint import pprint

CMR_OPS = 'https://cmr.earthdata.nasa.gov/search'
collection_url = 'https://cmr.earthdata.nasa.gov/search/collections'
var_url = "https://cmr.earthdata.nasa.gov/search/variables"

In [None]:
#urls = ['s3://' + f for f in fs.glob("s3://noaa-goes16/ABI-L2-SSTF/2020/210/*/*.nc")]



---



# **Listing Available datasets**



---



---



In [None]:
#@title Base Function to run
"""
Some ground level functions
"""

import requests
from pprint import pprint
CMR_OPS = 'https://cmr.earthdata.nasa.gov/search'
collection_url = 'https://cmr.earthdata.nasa.gov/search/collections'
var_url = "https://cmr.earthdata.nasa.gov/search/variables"

def find_dataset(provider='podaac',
                 keywords=['swot','level-2']):
    """
    Find a list of collections/datasets that match all the keywords from the keywords list.
    
    
    """
    import pandas as pd

    if 'podaac' in provider.lower().replace('.',''):
        provider='POCLOUD'
        
    response = requests.get(collection_url,params={'cloud_hosted': 'True',
                                        'has_granules': 'True',
                                        'provider': provider,
                                        'page_size':2000,},
                                headers={'Accept': 'application/json', } )
    
    collections = response.json()['feed']['entry']
    
    entries={}
    entries['short_name']=[]
    entries['long_name']=[]
    entries['concept_id']=[]
    entries['time_start']=[]
    entries['time_end']=[]
    
    
    for collection in collections:
        
        title="%s %s %s"%(collection["short_name"],collection["dataset_id"][:97],collection["id"])
        match=1
        for kw in keywords:
            match *= kw.lower() in title.lower()
            
        if match==1:
            entries['short_name'].append(collection["short_name"])
            entries['concept_id'].append(collection["id"])
            entries['long_name'].append(collection["dataset_id"])
            entries['time_start'].append(collection["time_start"])
            try:
                entries['time_end'].append(collection["time_end"])
            except:
                entries['time_end'].append(['NaT/Present'])
    
    return pd.DataFrame(entries)

def all_pocloud_dataset(provider='podaac'):
    """
    a list of all POCLOUD collections.
    """
    import pandas as pd

    if 'podaac' in provider.lower().replace('.',''):
        provider='POCLOUD'
        
    response = requests.get(collection_url,params={'cloud_hosted': 'True',
                                        'provider': provider,
                                        'page_size':2000,},
                                headers={'Accept': 'application/json', } )
    
    collections = response.json()['feed']['entry']
    
    entries={}
    entries['short_name']=[]
    entries['long_name']=[]
    entries['concept_id']=[]
    entries['time_start']=[]
    entries['time_end']=[]
    
    
    for collection in collections:
        
        title="%s %s %s"%(collection["short_name"],collection["dataset_id"][:97],collection["id"])
        match=1
        entries['short_name'].append(collection["short_name"])
        entries['concept_id'].append(collection["id"])
        entries['long_name'].append(collection["dataset_id"])
        entries['time_start'].append(collection["time_start"][:10])
        try:
            entries['time_end'].append(collection["time_end"][:10])
        except:
            entries['time_end'].append(['NaT/Present'])
    
    return pd.DataFrame(entries)

def direct_s3(provider='podaac'):
    import requests,s3fs
    s3_cred_endpoint = {
        'podaac':'https://archive.podaac.earthdata.nasa.gov/s3credentials',
        'lpdaac':'https://data.lpdaac.earthdatacloud.nasa.gov/s3credentials'}

    temp_creds_url = s3_cred_endpoint[provider]
    creds = requests.get(temp_creds_url).json()
    s3 = s3fs.S3FileSystem(anon=False,
                           key=creds['accessKeyId'],
                           secret=creds['secretAccessKey'], 
                           token=creds['sessionToken'])
    return s3


In [None]:
import utils
import warnings
import pandas as pd
import time
pd.set_option('display.max_rows', None)
pd.set_option('max_colwidth', 150)
warnings.simplefilter(action='ignore')

data = all_pocloud_dataset('https://archive.podaac.earthdata.nasa.gov/s3credentials')
display(data)



Unnamed: 0,short_name,long_name,concept_id,time_start,time_end
0,MODIS_A-JPL-L2P-v2019.0,GHRSST Level 2P Global Sea Surface Skin Temperature from the Moderate Resolution Imaging Spectroradiometer (MODIS) on the NASA Aqua satellite (GDS2),C1940473819-POCLOUD,2002-07-04,[NaT/Present]
1,MODIS_T-JPL-L2P-v2019.0,GHRSST Level 2P Global Sea Surface Skin Temperature from the Moderate Resolution Imaging Spectroradiometer (MODIS) on the NASA Terra satellite (GDS2),C1940475563-POCLOUD,2000-02-24,[NaT/Present]
2,ASCATB-L2-25km,MetOp-B ASCAT Level 2 25.0km Ocean Surface Wind Vectors in Full Orbit Swath,C2075141559-POCLOUD,2012-10-29,[NaT/Present]
3,ASCATC-L2-25km,MetOp-C ASCAT Level 2 25.0km Ocean Surface Wind Vectors in Full Orbit Swath,C2075141638-POCLOUD,2019-10-22,[NaT/Present]
4,VIIRS_NPP-STAR-L3U-v2.80,GHRSST Level 3U NOAA STAR SST v2.80 from VIIRS on S-NPP Satellite,C2147485059-POCLOUD,2012-02-01,[NaT/Present]
5,MUR-JPL-L4-GLOB-v4.1,GHRSST Level 4 MUR Global Foundation Sea Surface Temperature Analysis (v4.1),C1996881146-POCLOUD,2002-05-31,[NaT/Present]
6,VIIRS_N20-OSPO-L2P-v2.61,GHRSST Level 2P OSPO dataset v2.61 from VIIRS on the NOAA-20 satellite (GDS v2),C1996880450-POCLOUD,2018-01-05,[NaT/Present]
7,VIIRS_NPP-OSPO-L2P-v2.61,GHRSST Level 2P OSPO dataset v2.61 from VIIRS on S-NPP Satellite (GDS v2),C1996880725-POCLOUD,2012-02-01,[NaT/Present]
8,JASON_CS_S6A_L2_ALT_HR_STD_OST_NRT_F,Sentinel-6A MF Jason-CS L2 P4 Altimeter High Resolution (HR) NRT Ocean Surface Topography,C1968979566-POCLOUD,2020-12-07,[NaT/Present]
9,VIIRS_N20-STAR-L3U-v2.80,GHRSST Level 3U NOAA STAR SST v2.80 from VIIRS on NOAA-20 Satellite,C2147488020-POCLOUD,2018-01-05,[NaT/Present]


AttributeError: ignored

In [None]:
from platform import system
from netrc import netrc
from getpass import getpass
from urllib import request
from http.cookiejar import CookieJar
from os.path import join, expanduser

TOKEN_DATA = ("<token>"
              "<username>%s</username>"
              "<password>%s</password>"
              "<client_id>PODAAC CMR Client</client_id>"
              "<user_ip_address>%s</user_ip_address>"
              "</token>")


def setup_earthdata_login_auth(urs: str='urs.earthdata.nasa.gov', cmr: str='cmr.earthdata.nasa.gov'):

    # GET URS LOGIN INFO FROM NETRC OR USER PROMPTS:
    netrc_name = "_netrc" if system()=="Windows" else ".netrc"
    try:
        username, _, password = netrc(file=join(expanduser('~'), netrc_name)).authenticators(urs)
        print("# Your URS credentials were securely retrieved from your .netrc file.")
    except (FileNotFoundError, TypeError):
        print('# Please provide your Earthdata Login credentials for access.')
        print('# Your info will only be passed to %s and will not be exposed in Jupyter.' % (urs))
        username = input('Username: ')
        password = getpass('Password: ')

    # SET UP URS AUTHENTICATION FOR HTTP DOWNLOADS:
    manager = request.HTTPPasswordMgrWithDefaultRealm()
    manager.add_password(None, urs, username, password)
    auth = request.HTTPBasicAuthHandler(manager)
    jar = CookieJar()
    processor = request.HTTPCookieProcessor(jar)
    opener = request.build_opener(auth, processor)
    request.install_opener(opener)

    # GET TOKEN TO ACCESS RESTRICTED CMR METADATA:
    ip = requests.get("https://ipinfo.io/ip").text.strip()
    r = requests.post(
        url="https://%s/legacy-services/rest/tokens" % cmr,
        data=TOKEN_DATA % (str(username), str(password), ip),
        headers={'Content-Type': 'application/xml', 'Accept': 'application/json'}
    )
    return r.json()['token']['id']

    # Provide URS credentials for HTTP download auth & CMR token retrieval:
    _token = setup_earthdata_login_auth(urs=urs, cmr=cmr)

####### This is code from tutorials to set credentials 

#https://github.com/podaac/tutorials/blob/master/notebooks/SWOT-EA-2021/Estuary_explore_inCloud_zarr.ipynb


In [None]:
import pandas as pd

df = pd.DataFrame(data)
df

Unnamed: 0,short_name,long_name,concept_id,time_start,time_end
0,MODIS_A-JPL-L2P-v2019.0,GHRSST Level 2P Global Sea Surface Skin Temperature from the Moderate Resolution Imaging Spectroradiometer (MODIS) on the NASA Aqua satellite (GDS2),C1940473819-POCLOUD,2002-07-04,[NaT/Present]
1,MODIS_T-JPL-L2P-v2019.0,GHRSST Level 2P Global Sea Surface Skin Temperature from the Moderate Resolution Imaging Spectroradiometer (MODIS) on the NASA Terra satellite (GDS2),C1940475563-POCLOUD,2000-02-24,[NaT/Present]
2,ASCATB-L2-25km,MetOp-B ASCAT Level 2 25.0km Ocean Surface Wind Vectors in Full Orbit Swath,C2075141559-POCLOUD,2012-10-29,[NaT/Present]
3,ASCATC-L2-25km,MetOp-C ASCAT Level 2 25.0km Ocean Surface Wind Vectors in Full Orbit Swath,C2075141638-POCLOUD,2019-10-22,[NaT/Present]
4,VIIRS_NPP-STAR-L3U-v2.80,GHRSST Level 3U NOAA STAR SST v2.80 from VIIRS on S-NPP Satellite,C2147485059-POCLOUD,2012-02-01,[NaT/Present]
5,MUR-JPL-L4-GLOB-v4.1,GHRSST Level 4 MUR Global Foundation Sea Surface Temperature Analysis (v4.1),C1996881146-POCLOUD,2002-05-31,[NaT/Present]
6,VIIRS_N20-OSPO-L2P-v2.61,GHRSST Level 2P OSPO dataset v2.61 from VIIRS on the NOAA-20 satellite (GDS v2),C1996880450-POCLOUD,2018-01-05,[NaT/Present]
7,VIIRS_NPP-OSPO-L2P-v2.61,GHRSST Level 2P OSPO dataset v2.61 from VIIRS on S-NPP Satellite (GDS v2),C1996880725-POCLOUD,2012-02-01,[NaT/Present]
8,JASON_CS_S6A_L2_ALT_HR_STD_OST_NRT_F,Sentinel-6A MF Jason-CS L2 P4 Altimeter High Resolution (HR) NRT Ocean Surface Topography,C1968979566-POCLOUD,2020-12-07,[NaT/Present]
9,VIIRS_N20-STAR-L3U-v2.80,GHRSST Level 3U NOAA STAR SST v2.80 from VIIRS on NOAA-20 Satellite,C2147488020-POCLOUD,2018-01-05,[NaT/Present]


In [None]:
# Calling short name to collect metadata
grace_ShortName = "TELLUS_GRAC-GRFO_MASCON_CRI_GRID_RL06_V2"
grace_ShortName

'TELLUS_GRAC-GRFO_MASCON_CRI_GRID_RL06_V2'

In [None]:
#### THIS LINK BELOW IS EXAMPLES GOOOOOOOOOD
#https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#json



#### THIS LINK BELOW IS EXAMPLES GOOOOOOOOOD
#https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#json


In [None]:
# Collect dataset
#r = requests.get(url="https://cmr.earthdata.nasa.gov/search/collections.umm_json", 


# find collection by short name
!curl "https://cmr.earthdata.nasa.gov/search/collections?provider_short_name\[\]=SHORT_5&options\[provider_short_name\]\[ignore_case\]=true"


<?xml version="1.0" encoding="UTF-8"?><results><hits>0</hits><took>12</took><references></references></results>

Find collections by entry id




In [None]:
!curl "https://cmr.earthdata.nasa.gov/search/collections?entry_id\[\]=SHORT_V5"

<?xml version="1.0" encoding="UTF-8"?><results><hits>0</hits><took>17</took><references></references></results>



---



---



In [None]:
!curl "https://cmr.earthdata.nasa.gov/search/collections?downloadable=true"


<?xml version="1.0" encoding="UTF-8"?><results><hits>39913</hits><took>21</took><references><reference><name>"The Omnivores Dilemma": The Effect of Autumn Diet on Winter Physiology and Condition of Juvenile Antarctic Krill</name><id>C1934541400-SCIOPS</id><location>https://cmr.earthdata.nasa.gov:443/search/concepts/C1934541400-SCIOPS/2</location><revision-id>2</revision-id></reference><reference><name>0.5 hour 1 M HCl extraction data for the Windmill Islands marine sediments</name><id>C1214305813-AU_AADC</id><location>https://cmr.earthdata.nasa.gov:443/search/concepts/C1214305813-AU_AADC/11</location><revision-id>11</revision-id></reference><reference><name>1-100Hz ULF/ELF Electromagnetic Wave Observation at Syowa Station</name><id>C1214590112-SCIOPS</id><location>https://cmr.earthdata.nasa.gov:443/search/concepts/C1214590112-SCIOPS/7</location><revision-id>7</revision-id></reference><reference><name>10 m firn temperature data: LGB traverses 1990-95</name><id>C1214313574-AU_AADC</id><l

In [None]:
(https://cmr.earthdata.nasa.gov:443/search/concepts/G550016-GHRC)
<OnlineResource>
    <URL>http://ghrc.nsstc.nasa.gov/opendap/ssmi/f14/monthly/</URL>
    <Type>OPeNDAP</Type>
</OnlineResource>

In [None]:
!curl https://cmr.earthdata.nasa.gov:443/search/concepts/G550016-GHRC.atom
 
    <link href="http://ghrc.nsstc.nasa.gov/opendap/ssmi/f14/monthly/" hreflang="en-US" title="(OPeNDAP)" rel="http://esipfed.org/ns/fedsearch/1.1/data#"></link>

IndentationError: ignored

entry_title
dataset_id - alias for entry_title
short_name
entry_id
start_date
end_date
platform
instrument
sensor
provider
revision_date
score - document relevance score, defaults to descending. See Document Scoring.
has_granules - Sorts collections by whether they have granules or not. Collections with granules are sorted before collections without granules.
has_granules_or_cwic - Sorts collections by whether they have granules or CWIC consortium. Collections with granules or CWIC consortium are sorted before collections without granules or a CWIC consortium.
usage_score - Sorts collection by usage. The usage score comes from the EMS metrics, which are ingested into the CMR.
ongoing - Sorts collection by fuzzy collection end-date in relation to ongoing-days configured. Any end-date after today, minus the configured ongoing-days (30 by default), is considered ongoing. Any end-date before that is not ongoing.
Examples of sorting by start_date in descending(Most recent data first) and ascending orders(Note: the + must be escaped with %2B):

curl "https://cmr.earthdata.nasa.gov/search/collections?sort_key\[\]=-start_date"
curl "https://cmr.earthdata.nasa.gov/search/collections?sort_key\[\]=%2Bstart_date"


In [None]:
!pip install requests
!pip install s3fs
!pip install awscli
!pip install --upgrade s3fs
!pip install pandas

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting s3fs
  Downloading s3fs-2022.5.0-py3-none-any.whl (27 kB)
Collecting aiobotocore~=2.3.0
  Downloading aiobotocore-2.3.3.tar.gz (65 kB)
[K     |████████████████████████████████| 65 kB 2.4 MB/s 
[?25hCollecting aiohttp<=4
  Downloading aiohttp-3.8.1-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.1 MB)
[K     |████████████████████████████████| 1.1 MB 30.8 MB/s 
[?25hCollecting fsspec==2022.5.0
  Downloading fsspec-2022.5.0-py3-none-any.whl (140 kB)
[K     |████████████████████████████████| 140 kB 42.4 MB/s 
[?25hCollecting botocore<1.24.22,>=1.24.21
  Downloading botocore-1.24.21-py3-none-any.whl (8.6 MB)
[K     |████████████████████████████████| 8.6 MB 15.7 MB/s 
Collecting aioitertools>=0.5.1
  Downloading aioitertools-0.1

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting awscli
  Downloading awscli-1.25.17-py3-none-any.whl (3.9 MB)
[K     |████████████████████████████████| 3.9 MB 4.1 MB/s 
Collecting docutils<0.17,>=0.10
  Downloading docutils-0.16-py2.py3-none-any.whl (548 kB)
[K     |████████████████████████████████| 548 kB 49.0 MB/s 
[?25hCollecting rsa<4.8,>=3.1.2
  Downloading rsa-4.7.2-py3-none-any.whl (34 kB)
Collecting s3transfer<0.7.0,>=0.6.0
  Downloading s3transfer-0.6.0-py3-none-any.whl (79 kB)
[K     |████████████████████████████████| 79 kB 6.2 MB/s 
[?25hCollecting colorama<0.4.5,>=0.2.5
  Downloading colorama-0.4.4-py2.py3-none-any.whl (16 kB)
Collecting botocore==1.27.17
  Downloading botocore-1.27.17-py3-none-any.whl (8.9 MB)
[K     |████████████████████████████████| 8.9 MB 54.5 MB/s 
Installing collected packages: botocore, s3transfer, rsa, docutils, colorama, awscli
  Attempting uninstall: botocore
    Found existing in

In [None]:
!git clone https://github.com/fsspec/s3fs.git 
%cd s3fs

Cloning into 's3fs'...
remote: Enumerating objects: 3962, done.[K
remote: Counting objects: 100% (806/806), done.[K
remote: Compressing objects: 100% (145/145), done.[K
remote: Total 3962 (delta 744), reused 661 (delta 657), pack-reused 3156[K
Receiving objects: 100% (3962/3962), 951.56 KiB | 1.84 MiB/s, done.
Resolving deltas: 100% (2699/2699), done.
/content/clonedgit/subscriber/recipes/dataset-introduction/s3fs


In [None]:
import awscli
import setuptools
import os 
import pprint

In [None]:
temp_creds_url = all_pocloud_dataset('https://archive.podaac.earthdata.nasa.gov/s3credentials')



---



## Define function to determine environment

In [None]:
def environment():
    try:
        get_ipython()
        return "notebook"
    except:
        return "server"
environment()

'notebook'



---



---



In [None]:
!pip install panel==0.12.6 hvplot==0.7.3

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting panel==0.12.6
  Downloading panel-0.12.6-py2.py3-none-any.whl (12.9 MB)
[K     |████████████████████████████████| 12.9 MB 3.8 MB/s 
[?25hCollecting hvplot==0.7.3
  Downloading hvplot-0.7.3-py2.py3-none-any.whl (3.1 MB)
[K     |████████████████████████████████| 3.1 MB 24.7 MB/s 
Collecting bokeh<2.5.0,>=2.4.0
  Downloading bokeh-2.4.3-py3-none-any.whl (18.5 MB)
[K     |████████████████████████████████| 18.5 MB 1.5 MB/s 
Installing collected packages: bokeh, panel, hvplot
  Attempting uninstall: bokeh
    Found existing installation: bokeh 2.3.3
    Uninstalling bokeh-2.3.3:
      Successfully uninstalled bokeh-2.3.3
  Attempting uninstall: panel
    Found existing installation: panel 0.12.1
    Uninstalling panel-0.12.1:
      Successfully uninstalled panel-0.12.1
Successfully installed bokeh-2.4.3 hvplot-0.7.3 panel-0.12.6


In [None]:
import hvplot
import holoviews as hv

In [None]:
#https://nasa-openscapes.github.io/2021-Cloud-Hackathon/tutorials/05_Data_Access_Direct_S3.html

# GHRSST Level 4 AVHRR_OI Global Blended Sea Surface Temperature Analysis (GDS2) from NCEI




---



> https://github.com/podaac/Data-Recipes/blob/main/dataset-introduction/AVHRR_OI-NCEI-L4-GLOB-v2.1.ipynb





---



## Connecting to DataSets

In [None]:
!pip install utils

In [None]:
!pip install datetime

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
# Connecting to datasets
import utils
import pprint
import xarray as xr
from datetime import datetime
import warnings

warnings.simplefilter(action='ignore')


### Code Block Repo Function 👽
---



---



In [None]:
#@title Repo Block Run Code
"""
Some ground level functions
"""

import requests
from pprint import pprint
CMR_OPS = 'https://cmr.earthdata.nasa.gov/search'
collection_url = 'https://cmr.earthdata.nasa.gov/search/collections'
var_url = "https://cmr.earthdata.nasa.gov/search/variables"

def find_dataset(provider='podaac',
                 keywords=['swot','level-2']):
    """
    Find a list of collections/datasets that match all the keywords from the keywords list.
    
    
    """
    import pandas as pd

    if 'podaac' in provider.lower().replace('.',''):
        provider='POCLOUD'
        
    response = requests.get(collection_url,params={'cloud_hosted': 'True',
                                        'has_granules': 'True',
                                        'provider': provider,
                                        'page_size':2000,},
                                headers={'Accept': 'application/json', } )
    
    collections = response.json()['feed']['entry']
    
    entries={}
    entries['short_name']=[]
    entries['long_name']=[]
    entries['concept_id']=[]
    entries['time_start']=[]
    entries['time_end']=[]
    
    
    for collection in collections:
        
        title="%s %s %s"%(collection["short_name"],collection["dataset_id"][:97],collection["id"])
        match=1
        for kw in keywords:
            match *= kw.lower() in title.lower()
            
        if match==1:
            entries['short_name'].append(collection["short_name"])
            entries['concept_id'].append(collection["id"])
            entries['long_name'].append(collection["dataset_id"])
            entries['time_start'].append(collection["time_start"])
            try:
                entries['time_end'].append(collection["time_end"])
            except:
                entries['time_end'].append(['NaT/Present'])
    
    return pd.DataFrame(entries)

def all_pocloud_dataset(provider='podaac'):
    """
    a list of all POCLOUD collections.
    """
    import pandas as pd

    if 'podaac' in provider.lower().replace('.',''):
        provider='POCLOUD'
        
    response = requests.get(collection_url,params={'cloud_hosted': 'True',
                                        'provider': provider,
                                        'page_size':2000,},
                                headers={'Accept': 'application/json', } )
    
    collections = response.json()['feed']['entry']
    
    entries={}
    entries['short_name']=[]
    entries['long_name']=[]
    entries['concept_id']=[]
    entries['time_start']=[]
    entries['time_end']=[]
    
    
    for collection in collections:
        
        title="%s %s %s"%(collection["short_name"],collection["dataset_id"][:97],collection["id"])
        match=1
        entries['short_name'].append(collection["short_name"])
        entries['concept_id'].append(collection["id"])
        entries['long_name'].append(collection["dataset_id"])
        entries['time_start'].append(collection["time_start"][:10])
        try:
            entries['time_end'].append(collection["time_end"][:10])
        except:
            entries['time_end'].append(['NaT/Present'])
    
    return pd.DataFrame(entries)

def direct_s3(provider='podaac'):
    import requests,s3fs
    s3_cred_endpoint = {
        'podaac':'https://archive.podaac.earthdata.nasa.gov/s3credentials',
        'lpdaac':'https://data.lpdaac.earthdatacloud.nasa.gov/s3credentials'}

    temp_creds_url = s3_cred_endpoint[provider]
    creds = requests.get(temp_creds_url).json()
    s3 = s3fs.S3FileSystem(anon=False,
                           key=creds['accessKeyId'],
                           secret=creds['secretAccessKey'], 
                           token=creds['sessionToken'])
    return s3


In [None]:
# Create Direct access to s3
def direct_s3(provider='podaac'):
    import requests,s3fs
    s3_cred_endpoint = {
        'podaac':'https://archive.podaac.earthdata.nasa.gov/s3credentials',
        'lpdaac':'https://data.lpdaac.earthdatacloud.nasa.gov/s3credentials'}

    temp_creds_url = s3_cred_endpoint[provider]
    creds = requests.get(temp_creds_url).json()
    s3 = s3fs.S3FileSystem(anon=False,
                           key=creds['accessKeyId'],
                           secret=creds['secretAccessKey'], 
                           token=creds['sessionToken'])
    return s3



---



---



In [None]:
# Function Used

def direct_s3(provider='podaac'):
    import requests,s3fs
    s3_cred_endpoint = {
        'podaac':'https://archive.podaac.earthdata.nasa.gov/s3credentials',
        'lpdaac':'https://data.lpdaac.earthdatacloud.nasa.gov/s3credentials'}

    temp_creds_url = s3_cred_endpoint[provider]
    creds = requests.get(temp_creds_url).json()
    s3 = s3fs.S3FileSystem(anon=False,
                           key=creds['accessKeyId'],
                           secret=creds['secretAccessKey'], 
                           token=creds['sessionToken'])
    return s3

### Collecting Sets via S3 API from Main&Archived in Cloud ♎


---



In [None]:
!pip install numpy
import numpy as np
import collections

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
#importing and Identifiying s3 Creds
#podaac_s3='podaac-ops-cumulus-protected'
#395 total collections in PODAAC CLOUD (POCLOUD) by 03/25/2022.




---



---



---



# Finding DataSet Based On Keywords ⚡


---

---





> Options:

*   User Input
*   Drop Down
*   G-Form 






In [None]:
# Keywords
find_dataset(keywords=['Sea Surface Temperature'])
sst_data = find_dataset(keywords=['Sea Surface Temperature'])
display(sst_data)

Unnamed: 0,short_name,long_name,concept_id,time_start,time_end
0,MUR-JPL-L4-GLOB-v4.1,GHRSST Level 4 MUR Global Foundation Sea Surface Temperature Analysis (v4.1),C1996881146-POCLOUD,2002-05-31T21:00:00.000Z,[NaT/Present]
1,VIIRS_NPP-NAVO-L2P-v3.0,GHRSST Level 2P 1 m Depth Global Sea Surface Temperature version 3.0 from the Visible Infrared Imaging Radiometer Suite (VIIRS) on the Suomi NPP s...,C1996881636-POCLOUD,2018-01-30T17:51:49.000Z,[NaT/Present]
2,AVHRR_SST_METOP_B-OSISAF-L2P-v1.0,GHRSST Level 2P sub-skin Sea Surface Temperature from the Advanced Very High Resolution Radiometer (AVHRR) on Metop satellites (currently Metop-B)...,C2036880717-POCLOUD,2016-01-19T08:07:03.000Z,[NaT/Present]
3,SEVIRI_IO_SST-OSISAF-L3C-v1.0,GHRSST Level 3C Indian-Ocean (IO) sub-skin Sea Surface Temperature from the Spinning Enhanced Visible and Infrared Imager (SEVIRI) on MSG in GDS2 ...,C2036877550-POCLOUD,2017-03-28T13:30:00.000Z,[NaT/Present]
4,SEVIRI_SST-OSISAF-L3C-v1.0,GHRSST Level 3C Atlantic sub-skin Sea Surface Temperature from the Spinning Enhanced Visible and Infrared Imager (SEVIRI) on MSG at 0 degree longi...,C2036878243-POCLOUD,2004-06-01T00:00:00.000Z,[NaT/Present]
5,MUR25-JPL-L4-GLOB-v04.2,GHRSST Level 4 MUR 0.25deg Global Foundation Sea Surface Temperature Analysis (v4.2),C2036880657-POCLOUD,2002-08-31T21:00:00.000Z,[NaT/Present]
6,OSTIA-UKMO-L4-GLOB-v2.0,GHRSST Level 4 OSTIA Global Foundation Sea Surface Temperature Analysis (GDS version 2),C2036877535-POCLOUD,2006-12-31T00:00:00.000Z,[NaT/Present]
7,CMC0.1deg-CMC-L4-GLOB-v3.0,GHRSST Level 4 CMC0.1deg Global Foundation Sea Surface Temperature Analysis (GDS version 2),C2036881720-POCLOUD,2016-01-01T00:00:00.000Z,[NaT/Present]
8,K10_SST-NAVO-L4-GLOB-v01,GHRSST Level 4 K10_SST Global 10 km Analyzed Sea Surface Temperature from Naval Oceanographic Office (NAVO) in GDS2.0,C2036881956-POCLOUD,2019-01-09T00:00:00.000Z,[NaT/Present]
9,AMSR2-REMSS-L2P-v8a,GHRSST Level 2P Global Subskin Sea Surface Temperature version 8a from the Advanced Microwave Scanning Radiometer 2 on the GCOM-W satellite,C2036880594-POCLOUD,2012-07-02T19:00:44.000Z,[NaT/Present]


In [None]:
#find_dataset(keywords=['Sea Surface Temperature'])
Sea_Surface_Temperature = find_dataset(keywords=['Sea Surface Temperature'])
display(Sea_Surface_Temperature)

Unnamed: 0,short_name,long_name,concept_id,time_start,time_end
0,MUR-JPL-L4-GLOB-v4.1,GHRSST Level 4 MUR Global Foundation Sea Surface Temperature Analysis (v4.1),C1996881146-POCLOUD,2002-05-31T21:00:00.000Z,[NaT/Present]
1,VIIRS_NPP-NAVO-L2P-v3.0,GHRSST Level 2P 1 m Depth Global Sea Surface Temperature version 3.0 from the Visible Infrared Imaging Radiometer Suite (VIIRS) on the Suomi NPP s...,C1996881636-POCLOUD,2018-01-30T17:51:49.000Z,[NaT/Present]
2,AVHRR_SST_METOP_B-OSISAF-L2P-v1.0,GHRSST Level 2P sub-skin Sea Surface Temperature from the Advanced Very High Resolution Radiometer (AVHRR) on Metop satellites (currently Metop-B)...,C2036880717-POCLOUD,2016-01-19T08:07:03.000Z,[NaT/Present]
3,SEVIRI_IO_SST-OSISAF-L3C-v1.0,GHRSST Level 3C Indian-Ocean (IO) sub-skin Sea Surface Temperature from the Spinning Enhanced Visible and Infrared Imager (SEVIRI) on MSG in GDS2 ...,C2036877550-POCLOUD,2017-03-28T13:30:00.000Z,[NaT/Present]
4,SEVIRI_SST-OSISAF-L3C-v1.0,GHRSST Level 3C Atlantic sub-skin Sea Surface Temperature from the Spinning Enhanced Visible and Infrared Imager (SEVIRI) on MSG at 0 degree longi...,C2036878243-POCLOUD,2004-06-01T00:00:00.000Z,[NaT/Present]
5,MUR25-JPL-L4-GLOB-v04.2,GHRSST Level 4 MUR 0.25deg Global Foundation Sea Surface Temperature Analysis (v4.2),C2036880657-POCLOUD,2002-08-31T21:00:00.000Z,[NaT/Present]
6,OSTIA-UKMO-L4-GLOB-v2.0,GHRSST Level 4 OSTIA Global Foundation Sea Surface Temperature Analysis (GDS version 2),C2036877535-POCLOUD,2006-12-31T00:00:00.000Z,[NaT/Present]
7,CMC0.1deg-CMC-L4-GLOB-v3.0,GHRSST Level 4 CMC0.1deg Global Foundation Sea Surface Temperature Analysis (GDS version 2),C2036881720-POCLOUD,2016-01-01T00:00:00.000Z,[NaT/Present]
8,K10_SST-NAVO-L4-GLOB-v01,GHRSST Level 4 K10_SST Global 10 km Analyzed Sea Surface Temperature from Naval Oceanographic Office (NAVO) in GDS2.0,C2036881956-POCLOUD,2019-01-09T00:00:00.000Z,[NaT/Present]
9,AMSR2-REMSS-L2P-v8a,GHRSST Level 2P Global Subskin Sea Surface Temperature version 8a from the Advanced Microwave Scanning Radiometer 2 on the GCOM-W satellite,C2036880594-POCLOUD,2012-07-02T19:00:44.000Z,[NaT/Present]


# ✌ Slicing and Indexing Variables:


---
> Time :
> Coordinates :
> Temp

In [None]:
import datetime as dt
       # or
from datetime import date
d1 = dt.date(2020, 11, 19)
d1

In [None]:
 print(d1.year)
 print(d1.month)
 print(d1.day)

2020
11
19




---



---



## Finding Earliest Date Example

In [None]:
earliest = Sea_Surface_Temperature['time_start'].min() # Earliest date
#newest = Sea_Surface_Temperature['time_end'].max() # Latest date

#print(newest)
print(earliest)

1854-01-01T00:00:00.000Z


In [None]:
# Dropna

df= Sea_Surface_Temperature.dropna(how= 'any', axis =0).reset_index()

missing_vals = ["NA","", "[NaT/Present]",None, np.NAN]
missing = df.isin(missing_vals)
missing.head()

df.fillna(0).head()

Unnamed: 0,index,short_name,long_name,concept_id,time_start,time_end
0,0,MUR-JPL-L4-GLOB-v4.1,GHRSST Level 4 MUR Global Foundation Sea Surface Temperature Analysis (v4.1),C1996881146-POCLOUD,2002-05-31T21:00:00.000Z,[NaT/Present]
1,1,VIIRS_NPP-NAVO-L2P-v3.0,GHRSST Level 2P 1 m Depth Global Sea Surface Temperature version 3.0 from the Visible Infrared Imaging Radiometer Suite (VIIRS) on the Suomi NPP s...,C1996881636-POCLOUD,2018-01-30T17:51:49.000Z,[NaT/Present]
2,2,AVHRR_SST_METOP_B-OSISAF-L2P-v1.0,GHRSST Level 2P sub-skin Sea Surface Temperature from the Advanced Very High Resolution Radiometer (AVHRR) on Metop satellites (currently Metop-B)...,C2036880717-POCLOUD,2016-01-19T08:07:03.000Z,[NaT/Present]
3,3,SEVIRI_IO_SST-OSISAF-L3C-v1.0,GHRSST Level 3C Indian-Ocean (IO) sub-skin Sea Surface Temperature from the Spinning Enhanced Visible and Infrared Imager (SEVIRI) on MSG in GDS2 ...,C2036877550-POCLOUD,2017-03-28T13:30:00.000Z,[NaT/Present]
4,4,SEVIRI_SST-OSISAF-L3C-v1.0,GHRSST Level 3C Atlantic sub-skin Sea Surface Temperature from the Spinning Enhanced Visible and Infrared Imager (SEVIRI) on MSG at 0 degree longi...,C2036878243-POCLOUD,2004-06-01T00:00:00.000Z,[NaT/Present]




---



---



[Time Series Panda resource](https://pandas.pydata.org/docs/user_guide/timeseries.html)

In [None]:
short_name = 'NOAA Smith and Reynolds Extended Reconstructed Sea Surface Temperature (ERSST) Level 4 Monthly Version 5 Dataset in netCDF'

In [None]:
import pprint

In [None]:
from typing_extensions import get_origin
origin_value = get_origin(short_name)

# one Example
short_name="MUR-JPL-L4-GLOB-v4.1"
fns=sorted(s3.glob(podaac_s3+'/%s/*nc'%short_name))
print('There are %i files in this dataset.'%len(fns))
print('The first five files are:')
pprint.pprint(fns[:5])
print('The last five files are:')
pprint.pprint(fns[-5:])

NameError: ignored

# Subscribing to datasets 

In [None]:
!pip install podaac_data_subscriber
#!pip install podaac

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
import pytest
import os
from os.path import exists
from subscriber import podaac_data_downloader as pdd
import shutil
from pathlib import Path

In [None]:
def test_downloader_GRACE_with_SHA_512(tmpdir):
    # start with empty directory
    directory_str = str(tmpdir)
    assert len( os.listdir(directory_str) ) == 0

    # run the command once -> should download the file. Note the modified time for the file
    args = create_downloader_args(f"-c GRACEFO_L2_CSR_MONTHLY_0060 -sd 2020-01-01T00:00:00Z -ed 2020-01-02T00:00:01Z -d {str(tmpdir)} --limit 1 --verbose -e 00".split())
    pdd.run(args)
    assert len( os.listdir(directory_str) ) > 0
    filename = directory_str + "/" + os.listdir(directory_str)[0]
    modified_time_1 = os.path.getmtime(filename)
    print( modified_time_1 )

In [None]:
# Getting a url from this dataset
!pip install urlextract
# A simple regex for URL matching like the following should fit your case.
from urllib.parse import urlparse
!pip install wget
import wget

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting urlextract
  Downloading urlextract-1.6.0-py3-none-any.whl (20 kB)
Collecting uritools
  Downloading uritools-4.0.0-py3-none-any.whl (10 kB)
Collecting platformdirs
  Downloading platformdirs-2.5.2-py3-none-any.whl (14 kB)
Installing collected packages: uritools, platformdirs, urlextract
Successfully installed platformdirs-2.5.2 uritools-4.0.0 urlextract-1.6.0
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting wget
  Downloading wget-3.2.zip (10 kB)
Building wheels for collected packages: wget
  Building wheel for wget (setup.py) ... [?25l[?25hdone
  Created wheel for wget: filename=wget-3.2-py3-none-any.whl size=9675 sha256=c50c48d1151284aa907e657d15fb70f5bc249625254ea726cfa703d36f4935bb
  Stored in directory: /root/.cache/pip/wheels/a1/b6/7c/0e63e34eb06634181c63adacca38b79ff8f35c37e3c13e3c02
Successfully built wget


In [None]:
from sys import meta_path
from numpy.ma.core import get_data
from traitlets.traitlets import List
from posix import listdir
#from traitlets.traitlets import List
from urllib.request import url2pathname
#get_origin(Literal[38])

In [None]:
# Extracting URL  

# It works for some things, not all 

from urlextract import URLExtract
extractor = URLExtract()
urls = extractor.find_urls('MUR-JPL-L4-GLOB-v4.1')
print(urls) 

[]




---



## Attemping to download straight to Google colab



---



In [None]:
df

Unnamed: 0,index,short_name,long_name,concept_id,time_start,time_end
0,0,MUR-JPL-L4-GLOB-v4.1,GHRSST Level 4 MUR Global Foundation Sea Surface Temperature Analysis (v4.1),C1996881146-POCLOUD,2002-05-31T21:00:00.000Z,[NaT/Present]
1,1,VIIRS_NPP-NAVO-L2P-v3.0,GHRSST Level 2P 1 m Depth Global Sea Surface Temperature version 3.0 from the Visible Infrared Imaging Radiometer Suite (VIIRS) on the Suomi NPP s...,C1996881636-POCLOUD,2018-01-30T17:51:49.000Z,[NaT/Present]
2,2,AVHRR_SST_METOP_B-OSISAF-L2P-v1.0,GHRSST Level 2P sub-skin Sea Surface Temperature from the Advanced Very High Resolution Radiometer (AVHRR) on Metop satellites (currently Metop-B)...,C2036880717-POCLOUD,2016-01-19T08:07:03.000Z,[NaT/Present]
3,3,SEVIRI_IO_SST-OSISAF-L3C-v1.0,GHRSST Level 3C Indian-Ocean (IO) sub-skin Sea Surface Temperature from the Spinning Enhanced Visible and Infrared Imager (SEVIRI) on MSG in GDS2 ...,C2036877550-POCLOUD,2017-03-28T13:30:00.000Z,[NaT/Present]
4,4,SEVIRI_SST-OSISAF-L3C-v1.0,GHRSST Level 3C Atlantic sub-skin Sea Surface Temperature from the Spinning Enhanced Visible and Infrared Imager (SEVIRI) on MSG at 0 degree longi...,C2036878243-POCLOUD,2004-06-01T00:00:00.000Z,[NaT/Present]
5,5,MUR25-JPL-L4-GLOB-v04.2,GHRSST Level 4 MUR 0.25deg Global Foundation Sea Surface Temperature Analysis (v4.2),C2036880657-POCLOUD,2002-08-31T21:00:00.000Z,[NaT/Present]
6,6,OSTIA-UKMO-L4-GLOB-v2.0,GHRSST Level 4 OSTIA Global Foundation Sea Surface Temperature Analysis (GDS version 2),C2036877535-POCLOUD,2006-12-31T00:00:00.000Z,[NaT/Present]
7,7,CMC0.1deg-CMC-L4-GLOB-v3.0,GHRSST Level 4 CMC0.1deg Global Foundation Sea Surface Temperature Analysis (GDS version 2),C2036881720-POCLOUD,2016-01-01T00:00:00.000Z,[NaT/Present]
8,8,K10_SST-NAVO-L4-GLOB-v01,GHRSST Level 4 K10_SST Global 10 km Analyzed Sea Surface Temperature from Naval Oceanographic Office (NAVO) in GDS2.0,C2036881956-POCLOUD,2019-01-09T00:00:00.000Z,[NaT/Present]
9,9,AMSR2-REMSS-L2P-v8a,GHRSST Level 2P Global Subskin Sea Surface Temperature version 8a from the Advanced Microwave Scanning Radiometer 2 on the GCOM-W satellite,C2036880594-POCLOUD,2012-07-02T19:00:44.000Z,[NaT/Present]


In [None]:
#Direct Access using S3

In [None]:
#-s 20200101 -t curl -f 20200710 -x OSCAR_L4_OC_third-deg



---



---



---



# DataSubscriber UseCase 


---



> podaac-data-downloader
https://github.com/podaac/data-subscriber/blob/main/Downloader.md
https://github.com/podaac/data-subscriber/blob/main/BUILD.md



In [None]:
#podaac-data-downloader -c SENTINEL-1A_SLC -d myData -f
#pdd -c SENTINEL-1A_SLC -d myData -f

!pip install poetry

In [None]:
import poetry
import pytest

In [None]:
#@title Test Case
import pytest
import os
from os.path import exists
from subscriber import podaac_data_downloader as pdd
import shutil
from pathlib import Path

# REGRESSION TEST CURRENTLY REQUIRES A .NETRC file for CMR/Data Download

def create_downloader_args(args):
    parser = pdd.create_parser()
    args2 = parser.parse_args(args)
    return args2


#Test the downlaoder on MUR25 data for start/stop/, yyyy/mmm/dd dir structure,
# and offset. Running it a second time to ensure it downlaods the files again-
# the downloader doesn't care about updates.
@pytest.mark.regression
def test_downloader_limit_MUR():
    shutil.rmtree('./MUR25-JPL-L4-GLOB-v04.2', ignore_errors=True)
    args2 = create_downloader_args('-c MUR25-JPL-L4-GLOB-v04.2 -d ./MUR25-JPL-L4-GLOB-v04.2  -sd 2020-01-01T00:00:00Z -ed 2020-01-30T00:00:00Z --limit 1'.split())
    pdd.run(args2)
    # count number of files downloaded...
    assert len([name for name in os.listdir('./MUR25-JPL-L4-GLOB-v04.2') if os.path.isfile('./MUR25-JPL-L4-GLOB-v04.2/' + name)])==1
    shutil.rmtree('./MUR25-JPL-L4-GLOB-v04.2')

#Test the downlaoder on MUR25 data for start/stop/, yyyy/mmm/dd dir structure,
# and offset. Running it a second time to ensure it downlaods the files again-
# the downloader doesn't care about updates.
@pytest.mark.regression
def test_downloader_MUR():
    shutil.rmtree('./MUR25-JPL-L4-GLOB-v04.2', ignore_errors=True)
    args2 = create_downloader_args('-c MUR25-JPL-L4-GLOB-v04.2 -d ./MUR25-JPL-L4-GLOB-v04.2  -sd 2020-01-01T00:00:00Z -ed 2020-01-02T00:00:00Z -dymd --offset 4'.split())
    pdd.run(args2)
    assert exists('./MUR25-JPL-L4-GLOB-v04.2/2020/01/01/20200101090000-JPL-L4_GHRSST-SSTfnd-MUR25-GLOB-v02.0-fv04.2.nc')
    assert exists('./MUR25-JPL-L4-GLOB-v04.2/2020/01/02/20200102090000-JPL-L4_GHRSST-SSTfnd-MUR25-GLOB-v02.0-fv04.2.nc')
    t1 = os.path.getmtime('./MUR25-JPL-L4-GLOB-v04.2/2020/01/01/20200101090000-JPL-L4_GHRSST-SSTfnd-MUR25-GLOB-v02.0-fv04.2.nc')
    t2 = os.path.getmtime('./MUR25-JPL-L4-GLOB-v04.2/2020/01/02/20200102090000-JPL-L4_GHRSST-SSTfnd-MUR25-GLOB-v02.0-fv04.2.nc')

    # this part of the test should not re-download the files unless the --force
    # option is used.
    pdd.run(args2)
    assert t1 == os.path.getmtime('./MUR25-JPL-L4-GLOB-v04.2/2020/01/01/20200101090000-JPL-L4_GHRSST-SSTfnd-MUR25-GLOB-v02.0-fv04.2.nc')
    assert t2 == os.path.getmtime('./MUR25-JPL-L4-GLOB-v04.2/2020/01/02/20200102090000-JPL-L4_GHRSST-SSTfnd-MUR25-GLOB-v02.0-fv04.2.nc')

    # Update a file to change the checksum, then re-download
    os.remove('./MUR25-JPL-L4-GLOB-v04.2/2020/01/01/20200101090000-JPL-L4_GHRSST-SSTfnd-MUR25-GLOB-v02.0-fv04.2.nc')
    Path('./MUR25-JPL-L4-GLOB-v04.2/2020/01/01/20200101090000-JPL-L4_GHRSST-SSTfnd-MUR25-GLOB-v02.0-fv04.2.nc').touch()
    pdd.run(args2)
    assert t1 != os.path.getmtime('./MUR25-JPL-L4-GLOB-v04.2/2020/01/01/20200101090000-JPL-L4_GHRSST-SSTfnd-MUR25-GLOB-v02.0-fv04.2.nc')
    assert t2 == os.path.getmtime('./MUR25-JPL-L4-GLOB-v04.2/2020/01/02/20200102090000-JPL-L4_GHRSST-SSTfnd-MUR25-GLOB-v02.0-fv04.2.nc')

    t1 = os.path.getmtime('./MUR25-JPL-L4-GLOB-v04.2/2020/01/01/20200101090000-JPL-L4_GHRSST-SSTfnd-MUR25-GLOB-v02.0-fv04.2.nc')

    # Set the args to --force to re-download those data
    args2 = create_downloader_args('-c MUR25-JPL-L4-GLOB-v04.2 -d ./MUR25-JPL-L4-GLOB-v04.2  -sd 2020-01-01T00:00:00Z -ed 2020-01-02T00:00:00Z -dymd --offset 4 -f'.split())
    pdd.run(args2)
    assert t1 != os.path.getmtime('./MUR25-JPL-L4-GLOB-v04.2/2020/01/01/20200101090000-JPL-L4_GHRSST-SSTfnd-MUR25-GLOB-v02.0-fv04.2.nc')
    assert t2 != os.path.getmtime('./MUR25-JPL-L4-GLOB-v04.2/2020/01/02/20200102090000-JPL-L4_GHRSST-SSTfnd-MUR25-GLOB-v02.0-fv04.2.nc')

    shutil.rmtree('./MUR25-JPL-L4-GLOB-v04.2')


@pytest.mark.regression
def test_downloader_GRACE_with_SHA_512(tmpdir):
    # start with empty directory
    directory_str = str(tmpdir)
    assert len( os.listdir(directory_str) ) == 0

    # run the command once -> should download the file. Note the modified time for the file
    args = create_downloader_args(f"-c GRACEFO_L2_CSR_MONTHLY_0060 -sd 2020-01-01T00:00:00Z -ed 2020-01-02T00:00:01Z -d {str(tmpdir)} --limit 1 --verbose -e 00".split())
    pdd.run(args)
    assert len( os.listdir(directory_str) ) > 0
    filename = directory_str + "/" + os.listdir(directory_str)[0]
    modified_time_1 = os.path.getmtime(filename)
    print( modified_time_1 )

    # run the command again -> should not redownload the file. The modified time for the file should not change
    pdd.run(args)
    modified_time_2 = os.path.getmtime(filename)
    print( modified_time_2 )
    assert modified_time_1 == modified_time_2

In [None]:
# run the command once -> should download the file. Note the modified time for the file
args = create_downloader_args(f"-c GRACEFO_L2_CSR_MONTHLY_0060 -sd 2020-01-01T00:00:00Z -ed 2020-01-02T00:00:01Z -d {str(tmpdir)} --limit 1 --verbose -e 00".split())
pdd.run(args)
assert len( os.listdir(directory_str) ) > 0
filename = directory_str + "/" + os.listdir(directory_str)[0]
modified_time_1 = os.path.getmtime(filename)
print( modified_time_1 )

    # run the command again -> should not redownload the file. The modified time for the file should not change
pdd.run(args)
modified_time_2 = os.path.getmtime(filename)
print(modified_time_2)
assert modified_time_1 == modified_time_2

NameError: ignored

In [None]:
#!/usr/bin/env python3
import argparse
import logging
import os
import sys
from datetime import datetime, timedelta
from os import makedirs
from os.path import isdir, basename, join, exists
from urllib.error import HTTPError
from urllib.request import urlretrieve

from subscriber import podaac_access as pa

__version__ = pa.__version__

page_size = 2000
edl = pa.edl
cmr = pa.cmr
token_url = pa.token_url

# The lines below are to get the IP address. You can make this static and
# assign a fixed value to the IPAddr variable

def parse_cycles(cycle_input):
    # if cycle_input is None:
    #     return None
    # if isinstance(cycle_input, list):
    #     return cycle_input
    # return [int(cycle_input)]
    return


def validate(args):
    if args.search_cycles is None and args.startDate is None and args.endDate is None:
        raise ValueError(
            "Error parsing command line arguments: one of [--start-date and --end-date] or [--cycles] are required")  # noqa E501
    if args.search_cycles is not None and args.startDate is not None:
        raise ValueError(
            "Error parsing command line arguments: only one of -sd/--start-date and --cycles are allowed")  # noqa E501
    if args.search_cycles is not None and args.endDate is not None:
        raise ValueError(
            "Error parsing command line arguments: only one of -ed/--end-date and --cycles are allowed")  # noqa E50
    if None in [args.endDate, args.startDate] and args.search_cycles is None:
        raise ValueError(
            "Error parsing command line arguments: Both --start-date and --end-date must be specified")  # noqa E50


def create_parser():
    # Initialize parser
    parser = argparse.ArgumentParser(prog='PO.DAAC bulk-data downloader')

    # Adding Required arguments
    parser.add_argument("-c", "--collection-shortname", dest="collection", required=True,
                        help="The collection shortname for which you want to retrieve data.")  # noqa E501
    parser.add_argument("-d", "--data-dir", dest="outputDirectory", required=True,
                        help="The directory where data products will be downloaded.")  # noqa E501

    # Required through validation
    parser.add_argument("--cycle", required=False, dest="search_cycles",
                        help="Cycle number for determining downloads. can be repeated for multiple cycles",
                        action='append', type=int)
    parser.add_argument("-sd", "--start-date", required=False, dest="startDate",
                        help="The ISO date time before which data should be retrieved. For Example, --start-date 2021-01-14T00:00:00Z")  # noqa E501
    parser.add_argument("-ed", "--end-date", required=False, dest="endDate",
                        help="The ISO date time after which data should be retrieved. For Example, --end-date 2021-01-14T00:00:00Z")  # noqa E501

    # Adding optional arguments
    parser.add_argument("-f", "--force", dest="force", action="store_true", help = "Flag to force downloading files that are listed in CMR query, even if the file exists and checksum matches")  # noqa E501

    # spatiotemporal arguments
    parser.add_argument("-b", "--bounds", dest="bbox",
                        help="The bounding rectangle to filter result in. Format is W Longitude,S Latitude,E Longitude,N Latitude without spaces. Due to an issue with parsing arguments, to use this command, please use the -b=\"-180,-90,180,90\" syntax when calling from the command line. Default: \"-180,-90,180,90\".",
                        default=None)  # noqa E501

    # Arguments for how data are stored locally - much processing is based on
    # the underlying directory structure (e.g. year/Day-of-year)
    parser.add_argument("-dc", dest="cycle", action="store_true",
                        help="Flag to use cycle number for directory where data products will be downloaded.")  # noqa E501
    parser.add_argument("-dydoy", dest="dydoy", action="store_true",
                        help="Flag to use start time (Year/DOY) of downloaded data for directory where data products will be downloaded.")  # noqa E501
    parser.add_argument("-dymd", dest="dymd", action="store_true",
                        help="Flag to use start time (Year/Month/Day) of downloaded data for directory where data products will be downloaded.")  # noqa E501
    parser.add_argument("-dy", dest="dy", action="store_true",
                        help="Flag to use start time (Year) of downloaded data for directory where data products will be downloaded.")  # noqa E501
    parser.add_argument("--offset", dest="offset",
                        help="Flag used to shift timestamp. Units are in hours, e.g. 10 or -10.")  # noqa E501

    parser.add_argument("-e", "--extensions", dest="extensions",
                        help="The extensions of products to download. Default is [.nc, .h5, .zip, .tar.gz]",
                        default=None, action='append')  # noqa E501
    parser.add_argument("--process", dest="process_cmd",
                        help="Processing command to run on each downloaded file (e.g., compression). Can be specified multiple times.",
                        action='append')

    parser.add_argument("--version", action="version", version='%(prog)s ' + __version__,
                        help="Display script version information and exit.")  # noqa E501
    parser.add_argument("--verbose", dest="verbose", action="store_true", help="Verbose mode.")  # noqa E501
    parser.add_argument("-p", "--provider", dest="provider", default='POCLOUD',
                        help="Specify a provider for collection search. Default is POCLOUD.")  # noqa E501

    parser.add_argument("--limit", dest="limit", default=None, type=int,
                        help="Integer limit for number of granules to download. Useful in testing. Defaults to no limit.")  # noqa E501

    return parser


def run(args=None):
    if args is None:
        parser = create_parser()
        args = parser.parse_args()

    try:
        pa.validate(args)

        # download specific validations
        # cannot specify all thre options (start, end, cycle)
        # must specify start/end togeher
        # if cycle, then no sd/ed can be given, and vice versa
        validate(args)

    except ValueError as v:
        logging.error(str(v))
        exit(1)

    pa.setup_earthdata_login_auth(edl)
    token = pa.get_token(token_url, 'podaac-subscriber', edl)

    provider = args.provider
    start_date_time = args.startDate
    end_date_time = args.endDate
    search_cycles = args.search_cycles
    short_name = args.collection
    extensions = args.extensions
    process_cmd = args.process_cmd
    data_path = args.outputDirectory

    download_limit = None
    if args.limit is not None and args.limit > 0:
        download_limit = args.limit

    if args.offset:
        ts_shift = timedelta(hours=int(args.offset))

    # Error catching for output directory specifications
    # Must specify -d output path or one time-based output directory flag

    if sum([args.cycle, args.dydoy, args.dymd, args.dy]) > 1:
        parser.error('Too many output directory flags specified, '
                     'Please specify exactly one flag '
                     'from -dc, -dy, -dydoy, or -dymd')

    # This cell will replace the timestamp above with the one read from the `.update` file in the data directory, if it exists.

    if not isdir(data_path):
        logging.info("NOTE: Making new data directory at " + data_path + "(This is the first run.)")
        makedirs(data_path, exist_ok=True)

    if search_cycles is not None:
        cmr_cycles = search_cycles
        params = [
            ('page_size', page_size),
            ('sort_key', "-start_date"),
            ('provider', provider),
            ('ShortName', short_name),
            ('token', token),
        ]
        for v in cmr_cycles:
            params.append(("cycle[]", v))
        if args.verbose:
            logging.info("cycles: " + str(cmr_cycles))

    else:
        temporal_range = pa.get_temporal_range(start_date_time, end_date_time,
                                               datetime.utcnow().strftime("%Y-%m-%dT%H:%M:%SZ"))  # noqa E501
        params = [
            ('page_size', page_size),
            ('sort_key', "-start_date"),
            ('provider', provider),
            ('ShortName', short_name),
            ('temporal', temporal_range),
        ]
        if args.verbose:
            logging.info("Temporal Range: " + temporal_range)

    if args.verbose:
        logging.info("Provider: " + provider)
    if args.bbox is not None:
        params.append(('bounding_box', args.bbox))

    # If 401 is raised, refresh token and try one more time
    try:
        results = pa.get_search_results(params, args.verbose)
    except HTTPError as e:
        if e.code == 401:
            token = pa.refresh_token(token, 'podaac-subscriber')
            params['token'] = token
            results = pa.get_search_results(params, args.verbose)
        else:
            raise e

    if args.verbose:
        logging.info(str(results['hits']) + " granules found for " + short_name)  # noqa E501

    if any([args.dy, args.dydoy, args.dymd]):
        file_start_times = pa.parse_start_times(results)
    elif args.cycle:
        cycles = pa.parse_cycles(results)

    downloads_all = []
    downloads_data = [[u['URL'] for u in r['umm']['RelatedUrls'] if
                       u['Type'] == "GET DATA" and ('Subtype' not in u or u['Subtype'] != "OPENDAP DATA")] for r in
                      results['items']]
    downloads_metadata = [[u['URL'] for u in r['umm']['RelatedUrls'] if u['Type'] == "EXTENDED METADATA"] for r in
                          results['items']]
    checksums = pa.extract_checksums(results)

    for f in downloads_data:
        downloads_all.append(f)
    for f in downloads_metadata:
        downloads_all.append(f)

    downloads = [item for sublist in downloads_all for item in sublist]

    if len(downloads) >= page_size:
        logging.warning("Only the most recent " + str(
            page_size) + " granules will be downloaded; try adjusting your search criteria (suggestion: reduce time period or spatial region of search) to ensure you retrieve all granules.")

    # filter list based on extension
    if not extensions:
        extensions = pa.extensions
    filtered_downloads = []
    for f in downloads:
        for extension in extensions:
            if f.lower().endswith(extension):
                filtered_downloads.append(f)

    downloads = filtered_downloads

    # https://github.com/podaac/data-subscriber/issues/33
    # Make this a non-verbose message
    # if args.verbose:
    logging.info("Found " + str(len(downloads)) + " total files to download")
    if download_limit:
        logging.info("Limiting downloads to " + str(args.limit) + " total files")
    if args.verbose:
        logging.info("Downloading files with extensions: " + str(extensions))

    # NEED TO REFACTOR THIS, A LOT OF STUFF in here
    # Finish by downloading the files to the data directory in a loop.
    # Overwrite `.update` with a new timestamp on success.
    success_cnt = failure_cnt = skip_cnt = 0
    for f in downloads:
        try:
            # -d flag, args.outputDirectory
            output_path = join(data_path, basename(f))
            # -dy, args.dy, -dydoy, args.dydoy and -dymd, args.dymd
            if any([args.dy, args.dydoy, args.dymd]):
                output_path = pa.prepare_time_output(
                    file_start_times, data_path, f, args, ts_shift)
            # -dc flag
            if args.cycle:
                output_path = pa.prepare_cycles_output(
                    cycles, data_path, f)

            # decide if we should actually download this file (e.g. we may already have the latest version)
            if(exists(output_path) and not args.force and pa.checksum_does_match(output_path, checksums)):
                logging.info(str(datetime.now()) + " SKIPPED: " + f)
                skip_cnt += 1
                continue

            urlretrieve(f, output_path)
            pa.process_file(process_cmd, output_path, args)
            logging.info(str(datetime.now()) + " SUCCESS: " + f)
            success_cnt = success_cnt + 1

            #if limit is set and we're at or over it, stop downloading
            if download_limit and success_cnt >= download_limit:
                break

        except Exception:
            logging.warning(str(datetime.now()) + " FAILURE: " + f, exc_info=True)
            failure_cnt = failure_cnt + 1

    logging.info("Downloaded Files: " + str(success_cnt))
    logging.info("Failed Files:     " + str(failure_cnt))
    logging.info("Skipped Files:    " + str(skip_cnt))
    pa.delete_token(token_url, token)
    logging.info("END\n\n")




def main():
    log_level = os.environ.get('PODAAC_LOGLEVEL', 'INFO').upper()
    %tb
    logging.basicConfig(stream=sys.stdout,
                        format='[%(asctime)s] {%(filename)s:%(lineno)d} %(levelname)s - %(message)s',
                        level=log_level)
    logging.debug("Log level set to " + log_level)

    try:
        run()
    except Exception as e:
        logging.exception("Uncaught exception occurred during execution.")
        exit(hash(e))


if __name__ == '__main__':
    main()
    

NameError: ignored

usage: PO.DAAC bulk-data downloader [-h] -c COLLECTION -d OUTPUTDIRECTORY
                                    [--cycle SEARCH_CYCLES] [-sd STARTDATE]
                                    [-ed ENDDATE] [-f] [-b BBOX] [-dc]
                                    [-dydoy] [-dymd] [-dy] [--offset OFFSET]
                                    [-e EXTENSIONS] [--process PROCESS_CMD]
                                    [--version] [--verbose] [-p PROVIDER]
                                    [--limit LIMIT]
PO.DAAC bulk-data downloader: error: the following arguments are required: -c/--collection-shortname, -d/--data-dir


SystemExit: ignored

  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)


In [None]:
create_downloader_args()

# Direct Access to S3 Token

In [None]:
!pip -q install boto3


In [None]:
import pytest
import os
from os.path import exists
from subscriber import podaac_data_downloader as pdd
import shutil
from pathlib import Path



---



# Authenticate


---



> Visit https://archive.podaac.earthdata.nasa.gov/s3credentials to get your access_key, secrect_access_key, and token. Paste the response into the s3_credential area below.



In [None]:
#https://archive.podaac.earthdata.nasa.gov/s3credentials
import json
import xarray as xr
%matplotlib inline

# Paste the result of your accessing and login to the s3Credential endpoint above into the 's3_credential' variable here:
s3_credential = '{"accessKeyId": "", "secretAccessKey": "", "sessionToken": "", "expiration": "2022-06-26 22:50:58+00:00"}'
creds = json.loads(s3_credential)

[2022-07-30 09:54:29,774] {utils.py:160} INFO - NumExpr defaulting to 4 threads.


In [None]:
# Client Lib

s3_client = boto3.client(
    's3',
    aws_access_key_id=creds["accessKeyId"],
    aws_secret_access_key=creds["secretAccessKey"],
    aws_session_token=creds["sessionToken"]
)

NameError: ignored

In [None]:

#s3_client.download_file()
print(df[:])

import DateTime

pdd(df)

#pdd('GHRSST Level 4 MUR Global Foundation Sea Surface Temperature Analysis (v4.1)',)
#df3 = files.download('NOAA Smith and Reynolds Extended Reconstructed...')

NameError: ignored

In [None]:
df.from_records(df[:])

In [None]:
#s3_client.download_file("podaac-ops-cumulus-protected", "ECCO_L4_ATM_STATE_05DEG_DAILY_V4R4/ATM_SURFACE_TEMP_HUM_WIND_PRES_day_mean_1992-01-01_ECCO_V4r4_latlon_0p50deg.nc","ATM_SURFACE_TEMP_HUM_WIND_PRES_day_mean_1992-01-01_ECCO_V4r4_latlon_0p50deg.nc")
s3_client.list_objects()



---



# Creating Interactive Display Inline


In [None]:
!pip install asyncio
!pip install aiobotocore
!pip install botocore
!pip install s3fs
!pip install --upgrade s3fs

In [None]:
%matplotlib inline

In [None]:
!pip install panel==0.12.6 hvplot==0.7.3

In [None]:
import panel as pn

pn.extension('tabulator', sizing_mode="stretch_width")

In [None]:
import hvplot.pandas
import holoviews as hv
hv.extension('bokeh')