# Search for datasets coincident with a list of points

A physical oceanographer is interested in satellite derived sea surface temperature and ocean color along an ARGO drift track.

Similar use cases would be to select data coincident with a cruise, with ice mass balance buoys in Arctic and Antarctic, or the MOSAIC experiment.

## Learning objectives
1. Convert a list of coordinates into a GeoJSON file.
2. Write a query for the NASA CMR API.
3. Submit the query and interpret the response.
4. Order datasets returned by the query.
5. Visualize the results.

## Convert a list of coordinates to a GeoJSON file

There are two steps to this: first, read the list of coordinates; second, write cordinates as a GeoJSON file.  

We'll use `pandas` to read the file containing the coordinates becaue it offers a simple way to read comma separated text files (`csv`).  The `GeoPandas` package, which extends `pandas` into the spatial realm is then used to write a GeoJSON file.

_If you are not familiar with `pandas` it's worth exploring._

### What is GeoJSON?

[__GeoJSON__](https://geojson.org/) is an open standard data format for simple geographic data and non-spatial attributes, such as points, lines and polygons. 

In [8]:
import pandas as pd
import geopandas as gpd

Before reading a file, it is always useful to have a look at it.  Especially text files because they might not be formated nicely or have some strange characters that you need to deal with. On UNIX flavoured machines you can use `head` to look at the first few lines of a file.  On Windows, you can open the file in a text editor such as `notepad` __Check this is the best tool__ but make sure you don't save the file and __do not use a word processor__ it will likely change the file.

_I use `head`.  In Jupyter notebooks the `!` at the beginning of a line allows a shell command to be run_

In [13]:
!head argo_locations.csv

Date    	Latitude	Longitude	FloatID	DAC
 20201101	58.033	-47.105	6901170	bodc
 20201101	56.267	-54.291	4902510	meds
 20201101	57.178	-53.264	4902509	meds
 20201101	57.389	-51.571	4902505	meds
 20201101	59.921	-50.340	4902471	meds
 20201101	54.456	-50.419	3901669	coriolis
 20201101	54.339	-47.566	6901191	bodc
 20201101	66.973	-57.687	6902952	coriolis
 20201101	58.859	-58.043	6901194	bodc


We can learn a number of things from the file listing above.  The file has a header row, and the columns are separated by whitespace.  This whitespace could be multiple spaces or tabs.  `pandas.read_csv` can deal with this if the `delim_whitespace` keyword argument is set to true.  Setting `header=0` tells `pandas.read_csv` to use row 0 as column headings.

In [17]:
argo_df = pd.read_csv('argo_locations.csv', header=0, delim_whitespace=True)  # df is shorthand for Dataframe
argo_df.head()  # df.tail() prints the last few lines

Unnamed: 0,Date,Latitude,Longitude,FloatID,DAC
0,20201101,58.033,-47.105,6901170,bodc
1,20201101,56.267,-54.291,4902510,meds
2,20201101,57.178,-53.264,4902509,meds
3,20201101,57.389,-51.571,4902505,meds
4,20201101,59.921,-50.34,4902471,meds


Converting the `pandas.Dataframe` to a GeoPandas dataframe is done simply using the `geopandas.GeoDataFrame` method.  We need to tell this method which columns of `argo_df` contain spatial geometry information.  Note, in the argument to `geopandas.points_from_xy`, the x coordinate is _Longitude_ and the y coordinate is _Latitude_.

In [18]:
argo_gdf = gpd.GeoDataFrame(argo_df, geometry=geopandas.points_from_xy(df.Longitude, df.Latitude), )
argo_gdf.head()

Unnamed: 0,Date,Latitude,Longitude,FloatID,DAC,geometry
0,20201101,58.033,-47.105,6901170,bodc,POINT (-47.10500 58.03300)
1,20201101,56.267,-54.291,4902510,meds,POINT (-54.29100 56.26700)
2,20201101,57.178,-53.264,4902509,meds,POINT (-53.26400 57.17800)
3,20201101,57.389,-51.571,4902505,meds,POINT (-51.57100 57.38900)
4,20201101,59.921,-50.34,4902471,meds,POINT (-50.34000 59.92100)


`argo_gdf` looks similar to `argo_df` but it has a __geometry__ column.  This is the magic sauce that turns a dataframe into a geospatial dataframe.

It's worth taking a quick look at the GeoJSON object, if only to take the mystery out of it.  You can see that the object contains a collection of _features_.  Each of these _features_ is information about an ARGO float on a give date.  The column entries (_attributes_) for each float are listed as properties and the spatial information is the _geometry_. 

In [29]:
import json
print(json.dumps(json.loads(argo_gdf.to_json()), indent=1))

{
 "type": "FeatureCollection",
 "features": [
  {
   "id": "0",
   "type": "Feature",
   "properties": {
    "DAC": "bodc",
    "Date": 20201101,
    "FloatID": 6901170,
    "Latitude": 58.033,
    "Longitude": -47.105
   },
   "geometry": {
    "type": "Point",
    "coordinates": [
     -47.105,
     58.033
    ]
   }
  },
  {
   "id": "1",
   "type": "Feature",
   "properties": {
    "DAC": "meds",
    "Date": 20201101,
    "FloatID": 4902510,
    "Latitude": 56.266999999999996,
    "Longitude": -54.291000000000004
   },
   "geometry": {
    "type": "Point",
    "coordinates": [
     -54.291000000000004,
     56.266999999999996
    ]
   }
  },
  {
   "id": "2",
   "type": "Feature",
   "properties": {
    "DAC": "meds",
    "Date": 20201101,
    "FloatID": 4902509,
    "Latitude": 57.178000000000004,
    "Longitude": -53.263999999999996
   },
   "geometry": {
    "type": "Point",
    "coordinates": [
     -53.263999999999996,
     57.178000000000004
    ]
   }
  },
  {
   "id": "3",

`argo_gdf` can be written to a GeoJSON formatted file using the `to_file` method.

_Is it worth covering how to convert between projections.  For example, users might have data in Polar Stereographic or USGS Quad coordinates._

In [30]:
argo_gdf.to_file('argo-data.geojson', driver='GeoJSON')

While we've gone through this step by step, coordinate data can be converted from a text file to a GeoJSON file in three lines of code.
```
argo_df = pd.read_csv('argo_locations.csv', header=0, delim_whitespace=True)
argo_gdf = gpd.GeoDataFrame(argo_df, geometry=geopandas.points_from_xy(df.Longitude, df.Latitude), )
argo_gdf.to_file('argo-data.geojson', driver='GeoJSON')
```

### Submit a query via the CMR API
_CMR_ is the __Common Metadata Repository__.  It is a metadata system that catalogs Earth Science data and associated service metadata records. These metadata records can be discovered and accessed through programmatic interfaces leveraging standard protocols and an Application Programming Interface (API).

1. _Need to search by date as well_
2. _How do we find collection_concept_id_
3. _We want SST and ocean colour_

In [32]:
import requests
import pprint

search_url = "https://cmr.earthdata.nasa.gov/search/granules"
files = {"shapefile": ("argo-data.geojson", open('argo-data.geojson', 'r'), "application/geo+json")}
parameters = {
    "scroll": "true",
    "page_size": 100,
    # set any search criteria here
    "collection_concept_id": "C1706334166-NSIDC_ECS",
}
output_format = "json"
response = requests.post(f"{search_url}.{output_format}", data=parameters, files=files)

print("status:", response.status_code)
print("hits:", response.headers["CMR-Hits"])
pprint(response.json()["feed"]["entry"][0])

status: 200
hits: 500


TypeError: 'module' object is not callable

In [36]:
response.json()["feed"]["entry"][0]

{'producer_granule_id': 'ATL07-01_20181014062057_02390101_003_02.h5',
 'time_start': '2018-10-14T06:41:54.069Z',
 'orbit': {'ascending_crossing': '138.47314550293703',
  'start_lat': '27',
  'start_direction': 'A',
  'end_lat': '27',
  'end_direction': 'D'},
 'updated': '2020-06-11T09:32:36.910Z',
 'orbit_calculated_spatial_domains': [{'equator_crossing_date_time': '2018-10-14T06:20:56.558Z',
   'equator_crossing_longitude': '138.47314550293703',
   'orbit_number': '440'}],
 'dataset_id': 'ATLAS/ICESat-2 L3A Sea Ice Height V003',
 'data_center': 'NSIDC_ECS',
 'title': 'SC:ATL07.003:181300929',
 'coordinate_system': 'ORBIT',
 'time_end': '2018-10-14T06:45:52.208Z',
 'id': 'G1814912564-NSIDC_ECS',
 'original_format': 'ISO-SMAP',
 'granule_size': '27.4623594284',
 'browse_flag': True,
 'polygons': [['79.82054625244331 121.0806052101708 84.3733695246644 -25.22175702582129 84.48987791811967 -28.337309444853723 79.88471463831527 122.88291576068897 79.82054625244331 121.0806052101708']],
 'co