<a href="https://colab.research.google.com/github/sea-surface-teleconnections/sea-surface-teleconnections/blob/main/Master_Harmony_GraphQL_to_GeoTiff.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ✨ ⭐ Master Notebook ⭐ 🌈


---



---


### Datasets ⚡



---


---


Ocean: 
 - 🌊
[GHRSST Level 4 AVHRR_OI Global Blended Sea Surface Temperature Analysis (GDS version 2) from NCEI](https://podaac.jpl.nasa.gov/dataset/AVHRR_OI-NCEI-L4-GLOB-v2.0?ids=Processing%20Levels:Keywords&values=4%20-%20Gridded%20Model%20Output::Oceans:Ocean%20Temperature&provider=PODAAC) 

Land/Atmosphere:

- [GOES](https://www.ospo.noaa.gov/Products/imagery/archive.html)
-[MODIS- Cloud/Vegitation](https://search.earthdata.nasa.gov/search?fi=MODIS) 


- [MERRA-2](https://disc.gsfc.nasa.gov/datasets?project=MERRA-2)

- SMOS/SMAP Soil Moisture



---


---


> [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1hV52V0hTNy-pOkXZOpi0J2uuLqZag3QO?usp=sharing)

> [Zarr Cloud native Resources](https://github.com/zarr-developers/tutorials/blob/main/zarr_cloud_native_geospatial_2022.ipynbtps://)





> [Zarr Resources](
https://github.com/podaac/tutorials/blob/master/notebooks/SWOT-EA-2021/Estuary_explore_inCloud_zarr.ipynb)




>[Zarr Dataset Sample Examples](https://notebooks.githubusercontent.com/view/ipynb?color_mode=auto&commit=c7ee47ad1bc9f925d276c310223c631547284368&enc_url=68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f676973742f727369676e656c6c2d757367732f64353335313238393263366339666262623232613761653866376437386465652f7261772f633765653437616431626339663932356432373663333130323233633633313534373238343336382f74687265655f7a6172722e6970796e62&logged_in=false&nwo=rsignell-usgs%2Fd53512892c6c9fbbb22a7ae8f7d78dee&path=three_zarr.ipynb&repository_id=111446341&repository_type=Gist)
---


---
Zarr is about 10x faster than NetCDF in Cloud Object Storage
Using 40 cores (20 dask workers), we were able to pull netCDF data from Google Cloud Storage at a rate of about 500 MB/s. Using the Zarr format, we could get to 5000 MB/s (5 GB/s) for the same number of dask workers.


# Install Modules




---



---


DIRECT ACCESS
PO.DAAC DRIVE	https://podaac-tools.jpl.nasa.gov/drive/files/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2
PO.DAAC Drive
OPENDAP DATA	https://podaac-opendap.jpl.nasa.gov/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/
The OPeNDAP base directory location for the collection.
THREDDS	https://thredds.jpl.nasa.gov/thredds/catalog_ghrsst_gds2.html?dataset=AVHRR_OI-NCEI-L4-GLOB-v2.0
THREDDS Data Server access for this dataset
Web Service	https://podaac.jpl.nasa.gov/ws/search/granule/?datasetId=PODAAC-GHAAO-4BC02
(Search Granule)
Format	NETCDF

---



In [6]:
#@title Module Install
!pip install s3fs
!pip install time
!pip install requests
!pip install numpy
!pip install pandas
!pip install xarrray 
!pip install cartopy
!pip install zarr
!pip install json
!pip install urllib3!=1.25.0

[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
    Preparing wheel metadata ... [?25l[?25hdone
Collecting pyshp>=2
  Downloading pyshp-2.3.1-py2.py3-none-any.whl (46 kB)
[K     |████████████████████████████████| 46 kB 4.7 MB/s 
[?25hBuilding wheels for collected packages: cartopy
  Building wheel for cartopy (PEP 517) ... [?25l[?25hdone
  Created wheel for cartopy: filename=Cartopy-0.19.0.post1-cp37-cp37m-linux_x86_64.whl size=12516294 sha256=af943162651c601919141ad60f8e6e05ceb9e07c4f02a304923164ac797ffca4
  Stored in directory: /root/.cache/pip/wheels/98/01/f7/bd10aeb96fe4b518cde5f7c4f5e12c7202f85b7353a5017847
Successfully built cartopy
Installing collected packages: pyshp, cartopy
Successfully installed cartopy-0.19.0.post1 pyshp-2.3.1
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting zarr
  Downloading zarr-2.12.0-py3-none-any.whl (185 kB)
[K     

In [7]:
#@title Module Install
import s3fs
import time
import requests
import numpy as np
import pandas as pd
import xarray as xr
import matplotlib.pyplot as plt
import matplotlib.animation as animation
import cartopy.crs as ccrs
import cartopy
import zarr
from IPython.display import HTML
from json import dumps
from json import loads

>
>
>


# Setting Endpoints for Harmony API 


---


> Set a few endpoints for use during the remainder of the workflow:





> cmr = "cmr.earthdata.nasa.gov"
urs = "urs.earthdata.nasa.gov"
harmony = "harmony.earthdata.nasa.gov"




---



In [8]:
#@title End Point Links
cmr = "cmr.earthdata.nasa.gov"
urs = "urs.earthdata.nasa.gov"
harmony = "harmony.earthdata.nasa.gov"



---



# Metadata

> https://podaac.jpl.nasa.gov/dataset/AVHRR_OI-NCEI-L4-GLOB-v2.0

In [9]:
#@title Short Name Query
grace_ShortName = "AVHRR_OI-NCEI-L4-GLOB-v2.0"
grace_ShortName

'AVHRR_OI-NCEI-L4-GLOB-v2.0'

In [10]:
#@title Module Harmony Client Install
!pip install harmony
!pip install requests
!pip install -U harmony-py
!pip install xarray
!pip install datetime
!pip install pprint
!pip install s3fs

from harmony import BBox
from harmony import Client
from harmony import Collection
from harmony import Request 
from harmony import LinkType
from harmony.config import Environment
import requests
from pprint import pprint
import datetime as dt
import s3fs
import xarray as xr

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting harmony
  Downloading harmony-1.2.2402.tar.gz (165 kB)
[K     |████████████████████████████████| 165 kB 14.5 MB/s 
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
    Preparing wheel metadata ... [?25l[?25hdone
Collecting antlr4-python3-runtime==4.9.3
  Downloading antlr4-python3-runtime-4.9.3.tar.gz (117 kB)
[K     |████████████████████████████████| 117 kB 62.5 MB/s 
Collecting automata-lib
  Downloading automata_lib-5.0.0-py3-none-any.whl (32 kB)
Collecting antlr-denter>=1.3.1
  Downloading antlr_denter-1.3.1-py3-none-any.whl (5.1 kB)
Building wheels for collected packages: harmony, antlr4-python3-runtime
  Building wheel for harmony (PEP 517) ... [?25l[?25hdone
  Created wheel for harmony: filename=harmony-1.2.2402-cp37-cp37m-linux_x86_64.whl size=365851 sha256=ff663139c1e21299fd99a104c31e182643aa614

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting datetime
  Downloading DateTime-4.5-py2.py3-none-any.whl (52 kB)
[K     |████████████████████████████████| 52 kB 764 kB/s 
[?25hCollecting zope.interface
  Downloading zope.interface-5.4.0-cp37-cp37m-manylinux2010_x86_64.whl (251 kB)
[K     |████████████████████████████████| 251 kB 21.1 MB/s 
Installing collected packages: zope.interface, datetime
Successfully installed datetime-4.5 zope.interface-5.4.0
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
[31mERROR: Could not find a version that satisfies the requirement pprint (from versions: none)[0m
[31mERROR: No matching distribution found for pprint[0m
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


### 1. Lets utilize the CMR API 

### 2. Inspect the access and service options that exist for collection

---




In [11]:
#@title CMR API
# Lets utilize the CMR API skills we learned on Day 1 to inspect service metadata:
url = 'https://cmr.earthdata.nasa.gov/search'
# We want to search by collection to. inspect the access and service options that exist:
collection_url = f'{url}/{"collections"}'

We are going to focus on GHRSST Level 4P Global Sea Surface Skin Temperature from the Moderate Resolution Imaging Spectroradiometer (MODIS) on the NASA Aqua satellite (GDS2).

 Let’s first save this as a variable that we can use later on once we request data from Harmony.


 ❓ double check this 

AVHRR_OI-NCEI-L4-GLOB-v2.1	GHRSST Level 4 AVHRR_OI Global Blended Sea Surface Temperature Analysis (GDS2) from NCEI	C2036881712-POCLOUD	2016-01-01T00:00:00.000Z	[NaT/Prese

In [12]:
short_name= 'AVHRR_OI-NCEI-L4-GLOB-v2.1'
concept_id = 'C2036881712-POCLOUD'

❌❌❌❌❌❌❌ Fix this before pushing ❌❌❌❌❌❌❌


---



In [13]:
harmony_client = Client(auth=('CHANGEME', 'CHANGEME'))

We will view the top-level metadata for this collection to see what additional service and variable metadata exist.



---



>
>
>
>
>

>
>
>
>
>
>
>
>
>

# Collection (dataset) using 'Request'


---


Get the UMM Collection metadata using requests.get:

In [14]:
response = requests.get(url=f"https://{cmr}/search/collections.umm_json", 
                        params={
                            'concept_id': concept_id,
                            },
                        headers={
                            'Accept': 'application/json'
                            }
                       )
response = response.json()

In [15]:
response['hits']

1

There should be only one result. Select and print its CMR Search metadata:

In [16]:
grace_coll_meta = response['items'][0]['meta']
grace_coll_meta

{'associations': {'services': ['S2004184019-POCLOUD'],
  'tools': ['TL2108419875-POCLOUD'],
  'variables': ['V2146304112-POCLOUD',
   'V2110155274-POCLOUD',
   'V2112015409-POCLOUD',
   'V2110155270-POCLOUD',
   'V2112015413-POCLOUD',
   'V2146304110-POCLOUD',
   'V2110155268-POCLOUD',
   'V2112015411-POCLOUD',
   'V2110155272-POCLOUD']},
 'concept-id': 'C2036881712-POCLOUD',
 'concept-type': 'collection',
 'deleted': False,
 'format': 'application/vnd.nasa.cmr.umm+json',
 'has-formats': True,
 'has-spatial-subsetting': True,
 'has-temporal-subsetting': True,
 'has-transforms': False,
 'has-variables': True,
 'native-id': 'GHRSST+Level+4+AVHRR_OI+Global+Blended+Sea+Surface+Temperature+Analysis+(GDS2)+from+NCEI',
 'provider-id': 'POCLOUD',
 'revision-date': '2022-06-16T16:36:42.938Z',
 'revision-id': 12,
 's3-links': ['podaac-ops-cumulus-protected/AVHRR_OI-NCEI-L4-GLOB-v2.1/',
  'podaac-ops-cumulus-public/AVHRR_OI-NCEI-L4-GLOB-v2.1/'],
 'user-id': 'wenhaoli'}

>
>
>


Granule (file)
Get the UMM Granule metadata using requests.get:

In [17]:
response = requests.get(url=f"https://{cmr}/search/granules.umm_json", 
                        params={
                            'concept_id': concept_id,
                            },
                        headers={
                            'Accept': 'application/json'
                            }
                       )
response_gran = response.json()

In [18]:
grace_gran = response.json()
grace_gran['hits']

2409

In [19]:
grace_gran['items'][0]['meta']

{'concept-id': 'G2049048962-POCLOUD',
 'concept-type': 'granule',
 'format': 'application/vnd.nasa.cmr.umm+json',
 'native-id': '20160101120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.1',
 'provider-id': 'POCLOUD',
 'revision-date': '2021-11-15T17:38:42.624Z',
 'revision-id': 2}

As you can see, one result was returned (one hit). Print the CMR Search metadata for the granule (meta):



---



The other component in each result (from the list of items) is the UMM metadata, accessible from the umm key. Print the RelatedUrls metadata field for the granule:

In [20]:
import json
from json import dumps

# Serializing json 
json_object = json.dumps(grace_gran['items'][0]['umm']['RelatedUrls'], indent = 4)

In [21]:
print(json_object)

[
    {
        "URL": "s3://podaac-ops-cumulus-protected/AVHRR_OI-NCEI-L4-GLOB-v2.1/20160101120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.1.nc",
        "Type": "GET DATA VIA DIRECT ACCESS",
        "Description": "This link provides direct download access via S3 to the granule."
    },
    {
        "URL": "https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-public/AVHRR_OI-NCEI-L4-GLOB-v2.1/20160101120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.1.nc.md5",
        "Description": "Download 20160101120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.1.nc.md5",
        "Type": "EXTENDED METADATA"
    },
    {
        "URL": "https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/AVHRR_OI-NCEI-L4-GLOB-v2.1/20160101120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.1.nc",
        "Description": "Download 20160101120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.1.nc",
        "Type": "GET DATA"
    },
    {
        "URL": "https:/

We want the URL corresponding to 'Type': 'GET DATA'. Select the URL from appropriate item in the list, then print:

---



In [22]:
grace_url = grace_gran['items'][0]['umm']['RelatedUrls'][2]['URL']
grace_url

'https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/AVHRR_OI-NCEI-L4-GLOB-v2.1/20160101120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.1.nc'


>

# Downloading a regular netcdf file from cloud 
Then do a regular https download




In [23]:
r = requests.get(grace_url)
with open('GHRSST_Level4_AVHRR_OI_Global Blended_SST_Analysis.nc', 'wb') as f:
    f.write(r.content)


#Reading Large Files 


---


Read Meta Data Information First


---



In [24]:
%matplotlib inline
from netCDF4 import Dataset    


In [29]:
data = Dataset('/content/20160101120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.1 (1).nc', 'r')
# print some metadata
print(data)
data.close()

<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
    Conventions: CF-1.6, ACDD-1.3
    title: NOAA/NCEI 1/4 Degree Daily Optimum Interpolation Sea Surface Temperature (OISST) Analysis, Version 2 - Final
    id: NCEI-L4LRblend-GLOB-AVHRR_OI
    references: Reynolds, et al.(2009) What is New in Version 2. Available at http://www.ncdc.noaa.gov/sites/default/files/attachments/Reynolds2009_oisst_daily_v02r00_version2-features.pdf;Daily 1/4 Degree Optimum Interpolation Sea Surface Temperature (OISST) - Climate Algorithm Theoretical Basis Document, NOAA Climate Data Record Program CDRP-ATBD-0303 Rev. 2 (2013). Available at http://www1.ncdc.noaa.gov/pub/data/sds/cdr/CDRs/Sea_Surface_Temperature_Optimum_Interpolation/AlgorithmDescription.pdf.
    institution: NOAA/NESDIS/NCEI
    creator_name: NCEI Products and Services
    creator_email: ncei.orders@noaa.gov
    creator_url: http://www.ncdc.noaa.gov/oisst
    gds_version_id: v2.0r5
    netcdf_version_id: 4.

In [30]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


>

>
>
>
>
>




---



In [31]:
!pip install netCDF4
!pip install nctoolkit
import warnings
warnings.filterwarnings('ignore')
import datetime
import numpy as np
import netCDF4 as nc4
import netCDF4
from netCDF4 import num2date
import numpy as np
import os
import pandas as pd
import nctoolkit as nc
import datetime
import os

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting nctoolkit
  Downloading nctoolkit-0.5.3-py3-none-any.whl (126 kB)
[K     |████████████████████████████████| 126 kB 15.1 MB/s 
[?25hCollecting ncplot>=0.0.5
  Downloading ncplot-0.2.4-py3-none-any.whl (22 kB)
Collecting hvplot
  Downloading hvplot-0.8.0-py2.py3-none-any.whl (3.1 MB)
[K     |████████████████████████████████| 3.1 MB 56.7 MB/s 
Collecting datashader
  Downloading datashader-0.14.1-py2.py3-none-any.whl (18.2 MB)
[K     |████████████████████████████████| 18.2 MB 827 kB/s 
Collecting metpy
  Downloading MetPy-1.2.0-py3-none-any.whl (367 kB)
[K     |████████████████████████████████| 367 kB 71.9 MB/s 
Collecting datashape>=0.5.1
  Downloading datashape-0.5.2.tar.gz (76 kB)
[K     |████████████████████████████████| 76 kB 6.0 MB/s 
Collecting distributed>=2.0
  Downloading distributed-2022.2.0-py3-none-any.whl (837 kB)
[K     |████████████████████████████████| 837 

Please install CDO version 1.9.7 or above: https://code.mpimet.mpg.de/projects/cdo/ or https://anaconda.org/conda-forge/cdo


In [32]:
!pip install pydap
import pydap.client

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pydap
  Downloading Pydap-3.2.2-py3-none-any.whl (2.3 MB)
[K     |████████████████████████████████| 2.3 MB 13.3 MB/s 
Collecting Webob
  Downloading WebOb-1.8.7-py2.py3-none-any.whl (114 kB)
[K     |████████████████████████████████| 114 kB 67.6 MB/s 
Collecting docopt
  Downloading docopt-0.6.2.tar.gz (25 kB)
Building wheels for collected packages: docopt
  Building wheel for docopt (setup.py) ... [?25l[?25hdone
  Created wheel for docopt: filename=docopt-0.6.2-py2.py3-none-any.whl size=13723 sha256=396c64ebcf30cd8395c9d25fb3712ac6f974be5a97f977574d35b819c03e94ae
  Stored in directory: /root/.cache/pip/wheels/72/b0/3f/1d95f96ff986c7dfffe46ce2be4062f38ebd04b506c77c81b9
Successfully built docopt
Installing collected packages: Webob, docopt, pydap
Successfully installed Webob-1.8.7 docopt-0.6.2 pydap-3.2.2


In [33]:
from pydap.client import open_url
dataset = open_url('https://opendap.jpl.nasa.gov/opendap/hyrax/allData/ghrsst/data/GDS2/L2P/AMSRE/REMSS/v7/2002/152/20020601161248-REMSS-L2P_GHRSST-SSTsubskin-AMSRE-l2b_v07a_r00414.dat-v02.0-fv01.0.nc')
# OPENDAP DATA	https://podaac-opendap.jpl.nasa.gov/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/

https://opendap.jpl.nasa.gov/opendap

In [34]:
dataset.keys

<bound method Mapping.keys of <DatasetType with children 'lat', 'lon', 'time', 'sea_surface_temperature', 'sst_dtime', 'dt_analysis', 'sses_bias', 'sses_standard_deviation', 'l2p_flags', 'quality_level', 'wind_speed', 'diurnal_amplitude', 'cool_skin', 'water_vapor', 'cloud_liquid_water', 'rain_rate'>>

#Reading Large Files 


---


Read Meta Data Information First


---



In [35]:
%matplotlib inline
from netCDF4 import Dataset    


In [None]:
data = Dataset('/content/20160101120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.1 (1).nc', 'r')
# print some metadata
print(data)
data.close()

# Reading MetaData from DataSet in 3 Lines



---



In [37]:
import xarray as xr

In [38]:
import pandas as pd

In [None]:
data = xr.open_dataset('/content/20160101120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.1 (1).nc')
data

In [None]:
!pip install leafmap 
!pip install localtileserver

In [42]:
import leafmap
filename = '/content/20160101120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.1 (1).nc'

# Reading directly with Leaflet


---



In [43]:
data_two = leafmap.read_netcdf(filename)

In [None]:
print(data_two)

# Convert the NetCDF


---



*   GeoTif
*   GeoJson
*   CSV



# Reading MetaData from DataSet in 3 Lines



---



In [45]:
import xarray as xr
import pandas as pd

In [46]:
# Reading to pandasdataframe

ds1= xr.open_dataset(filename)

In [52]:
ds2 = ds1.to_dataframe()

In [53]:
ds2

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,lat_bnds,lon_bnds,analysed_sst,analysis_error,mask,sea_ice_fraction
lat,lon,time,nv,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
-89.875,-179.875,2016-01-01,0,-90.00,-180.00,,,2.0,
-89.875,-179.875,2016-01-01,1,-89.75,-179.75,,,2.0,
-89.875,-179.625,2016-01-01,0,-90.00,-179.75,,,2.0,
-89.875,-179.625,2016-01-01,1,-89.75,-179.50,,,2.0,
-89.875,-179.375,2016-01-01,0,-90.00,-179.50,,,2.0,
...,...,...,...,...,...,...,...,...,...
89.875,179.375,2016-01-01,1,90.00,179.50,271.410004,0.3,1.0,1.0
89.875,179.625,2016-01-01,0,89.75,179.50,271.419983,0.3,1.0,1.0
89.875,179.625,2016-01-01,1,90.00,179.75,271.419983,0.3,1.0,1.0
89.875,179.875,2016-01-01,0,89.75,179.75,271.429993,0.3,1.0,1.0


In [54]:
#We open the netcdf file (using open_dataset() method), convert it to a dataframe (to_dataframe() method) and write this object to a csv file (to_csv() method).


ds2.to_csv('saved_frame_one.csv', index=False)

#display(pd)
