# Overview of Notebooks

* [HAPI_00.ipynb](HAPI_00.ipynb) - Introduction
* [HAPI_01.ipynb](HAPI_01.ipynb) - Basics
* **[HAPI_02.ipynb](HAPI_02.ipynb) - Data structures (this Notebook)**
* [HAPI_03.ipynb](HAPI_03.ipynb) - Plotting
* [HAPI_04.ipynb](HAPI_04.ipynb) - Problems

# Setup

In [None]:
# Show Matplotlib plots in page instead of opening a window
%matplotlib inline 
# Have Matplotlib create vector (svg) instead of raster (png) images
%config InlineBackend.figure_formats = ['svg'] 

# Misc. configuration
import warnings
# See https://github.com/boto/boto3/issues/454 for an explanation of the following warning
warnings.simplefilter("ignore", ResourceWarning)

# Data Model

A request for data using

```python
data, meta = hapi(server, dataset, parameters, start, stop)
```

returns the [NumPy `ndarray` with named fields](https://docs.scipy.org/doc/numpy-1.15.1/user/quickstart.html) `data` and a Python dictionary `meta` by making requests to a HAPI-compliant data server `server`. The structure of `meta` mirrors the structure of the JSON metadata response from a HAPI server.

Internally, `hapi()` makes a request to a HAPI server, which returns a CSV stream in which the first column is a timestamp, and subsequent columns are data measured or associated with that timestamp. The columns are mapped to one or more parameters (that may be multi-dimensional arrays) using the metadata associated with the request for CSV data. Note that `hapi()` requests a much faster HAPI Binary stream from a server if possible. For more information on the HAPI server specification, see https://github.com/hapi-server/data-specification. 

# Extracting Data

In [None]:
from hapiclient import hapi

server     = 'http://hapi-server.org/servers/TestData2.0/hapi'
dataset    = 'dataset1'
parameters = 'scalar,vector'
start      = '1970-01-01T00:00:00Z'
stop       = '1970-01-01T00:00:10Z'
opts       = {'logging': False, 'usecache': True, 'cachedir': './hapicache'}

data,meta = hapi(server,dataset,parameters,start,stop,**opts)

`data` is a NumPy `ndarray` with named fields of `Time`, `scalar`, and `vector`. The array has 10 elements (one for each time value) and each element is a list.

In [None]:
data

Access all values for parameter `Time`

In [None]:
data['Time']

Convert elements of `data['Time']` to Python `datetime` objects

In [None]:
from hapiclient import hapitime2datetime
dateTimes = hapitime2datetime(data['Time'])
dateTimes

Convert elements in `data['Time']` to Unicode strings 

In [None]:
TimeStamps = data['Time'].astype('U')
TimeStamps

Access all values for parameter `vector`

In [None]:
data['vector']

Access all parameters at first timestamp.

In [None]:
data[0]

Access value of `vector` at second timestep.

In [None]:
data['vector'][1] 

Access value of second component of `vector` at second timestep

In [None]:
data['vector'][1,1]

<div style="background-color:yellow">
<h3>Problem 02a</h3>

<p>Starting with the following script, find the average radial distance of the moon on the first 9 days of January of 2022. (To avoid 100+ users requesting data from the same data server, do not modify <code>start</code> and <code>stop</code>; the data required to solve this problem is locally cached and <code>hapi()</code> will use this cached data by default.)</p>
</h3>

In [None]:
from hapiclient import hapi

server     = 'https://hapi-server.org/servers/SSCWeb/hapi'
dataset    = 'moon'
parameters = 'X_GEO'
# Do not modify start/stop. See note above.
# HAPI stop dates/times are inclusive, so first returned timestamp could be on start.
start      = '2022-01-01T00:00:00.000Z'
# HAPI stop dates/times are exclusive, so last returned timestamp will be before stop.
stop       = '2022-01-10T00:00:00.000Z' 
opts       = {'logging': False, 'usecache': True, 'cachedir': './hapicache'}

data, meta = hapi(server, dataset, parameters, start, stop, **opts);

data

# Your code here

# Time Representation

A HAPI-compliant server represents time as an ISO 8601 string (with several constraints - see the [HAPI specification](https://github.com/hapi-server/data-specification/blob/master/hapi-dev/HAPI-data-access-spec-dev.md#representation-of-time)). `hapi()` reads these into a NumPy array of [Python byte literals](https://stackoverflow.com/a/6273618). To convert byte literals to Python `datetime` objects, the function [`hapitime2datetime`](https://github.com/hapi-server/client-python/blob/master/hapiclient/hapi.py) can be used. Internally, this function uses `pandas.to_datetime` for parsing if possible. Otherwise it falls back to a manual method for parsing.

In [None]:
from hapiclient import hapi
from hapiclient import hapitime2datetime

server     = 'http://hapi-server.org/servers/TestData2.0/hapi'
dataset    = 'dataset1'
parameters = 'scalar,vector'
start      = '1970-01-01T00:00:00Z'
stop       = '1970-01-01T00:00:10Z'
opts       = {'logging': False, 'usecache': True, 'cachedir': './hapicache'}

data, meta = hapi(server, dataset, parameters, start, stop, **opts)

In [None]:
print(data['Time'])

# In the above, we assumed the time parameter name is 'Time'.
# In general, use the fact that the first parameter in the meta['parameters']
# lis is always the key time parameter:
time_name = meta['parameters'][0]['name']
print(f'\nDataset time parameter name: "{time_name}"\n') 
print(data[time_name])

# A fast way to compare times is to use byte comparison (and fact that ISO 8601 strings sort
# in order of increasing time).
print('\nTime value less than next?')
print(data[time_name][0:-1] < data[time_name][1:])

In [None]:
date_times = hapitime2datetime(data['Time'])
date_times

In [None]:
# Convert from Python bytes to UTF-8 (regular Python 3 strings)
time_strings = data['Time'].astype('U')
print(time_strings)
print('\nFirst time value: ' + time_strings[0])

In [None]:
# Create a custom formatted time string
# See https://docs.python.org/3/library/datetime.html for more on datetime object manipulation
print(date_times[0].strftime('%Y-%j at %H hours, %M minues, %S seconds, and %f microseconds'))

# Convert to Pandas DataFrame

Conversion of a NumPy `ndarray` to a Pandas DataFrame can be made using [the `pandas.DataFrame` function.]((https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html)

In [None]:
from hapiclient import hapi
from hapiclient import hapitime2datetime

server     = 'http://hapi-server.org/servers/TestData2.0/hapi'
dataset    = 'dataset1'
parameters = 'scalar,vector'
start      = '1970-01-01T00:00:00Z'
stop       = '1970-01-01T00:00:10Z'
opts       = {'logging': False, 'usecache': True, 'cachedir': './hapicache'}

data, meta = hapi(server,dataset,parameters,start,stop, **opts)

import pandas

# Put each parameter into a DataFrame
df_Time   = pandas.DataFrame(hapitime2datetime(data['Time']))
df_scalar = pandas.DataFrame(data['scalar'])
df_vector = pandas.DataFrame(data['vector'])

# Create DataFrame to hold all parameters
df = pandas.DataFrame()

# Place parameter DataFrames into single DataFrame
df = pandas.concat([df_Time, df_scalar, df_vector], axis=1)

# Name columns (more generally, one would want to obtaine the column labels from meta, if available)
df.columns = ['Time', 'scalar','vector_x', 'vector_y', 'vector_z']

# Set Time to be index
df.set_index('Time', inplace=True)

df

<div style="background-color:yellow">
<h3>Problem 02b</h3>

<a href="https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html">Using <code>DataFrame</code> methods</a>, modify the code above to
 
<p>1. find the mean and standard deviation of each column and</p>
<p>2. find the time that <code>scalar</code> is a maximum.</p>
</h3>

# Convert to NDCube

HAPI's data arrays can be converted to [SunPy NDCubes](https://docs.sunpy.org/projects/ndcube/en/stable/introduction.html) using [the `ndcube.NDCube` function](https://docs.sunpy.org/projects/ndcube/en/stable/api/ndcube.NDCube.html#ndcube.NDCube).

We also set a WCS array, and create appropriate timestamps.

In [None]:
from hapiclient import hapi
from hapiclient import hapitime2datetime

server     = 'http://hapi-server.org/servers/TestData2.0/hapi'
dataset    = 'dataset1'
parameters = 'scalar,vector'
start      = '1970-01-01T00:00:00Z'
stop       = '1970-01-01T00:00:10Z'
opts       = {'logging': False, 'usecache': True, 'cachedir': './hapicache'}

data, meta = hapi(server, dataset, parameters, start, stop, **opts)

dateTimes = hapitime2datetime(data['Time'])

import astropy.wcs

my_wcs = astropy.wcs.WCS({"CTYPE1": "TIME", 
                          "CUNIT1": "s", 
                          "CDELT1": 1, 
                          "CRPIX1": 0, 
                          "CRVAL1": 0, 
                          "NAXIS1": 10})

import ndcube
cube = ndcube.NDCube(data['scalar'], my_wcs)

from astropy.time import Time
t = Time(dateTimes)
cube.extra_coords.add('time', 0, t)

print(cube)

# Generating Data (optional)

For testing, it may be useful to create a simulated HAPI data respsonse in Python. A HAPI response of

```
1970-01-01T00:00:00.000Z, 1.,2.,3.
1970-01-01T00:00:02.000Z, 4.,5.,6.
```

where the metadata indicates there is one parameter named `vector` with `size=[3]` and `type=double` could be created by

In [None]:
import numpy as np
data = np.ndarray(shape=(2), dtype=[('Time', 'S24'), ('vector', '<f8', (3,))])

# Populate: method 1
data['Time'] = np.array([b'1970-01-01T00:00:00.000Z', b'1970-01-01T00:00:01.000Z'])
data['vector'] = np.array([[1.0,2.0,3.0],[4.0,5.0,6.0]])

# Populate: method 2
data[0] = (b'1970-01-01T00:00:00.000Z', [1.0,2.0,3.0])
data[1] = (b'1970-01-01T00:00:01.000Z', [4.0,5.0,6.0])

data

A HAPI response of

```
1970-01-01T00:00:00.000Z, 1.,2.,3.,4.,5.,6.,7.,8.,9.
1970-01-01T00:00:02.000Z, 11.,12.,13.,14.,15.,16.,17.,18.,19.
```
where the metadata indicates there is one parameter named `matrix` with `size=[3,3]` and `type=double` could be created by

In [None]:
import numpy as np

# Allocate
data = np.ndarray(shape=(2), dtype=[('Time', 'S24'), ('matrix', '<f8', (3,3,))])

# Populate
data['Time'] = np.array([b'1970-01-01T00:00:00.000Z', b'1970-01-01T00:00:01.000Z'])
data['matrix'] = np.array( [ [[1.0,2.0,3.0],[4.0,5.0,6.0],[7.0,8.0,9.0]], [[11.0,12.0,13.0],[14.0,15.0,16.0],[17.0,18.0,19.0]]] )

data

Multiple parameters, e.g. a response with both the vector and matrix parameters considered above

```
1970-01-01T00:00:00.000Z, 1.,2.,3.,  1.,2.,3.,4.,5.,6.,7.,8.,9.
1970-01-01T00:00:02.000Z, 4.,5.,6., 11.,12.,13.,14.,15.,16.,17.,18.,19.
```

can be created by populating

In [None]:
data = np.ndarray(shape=(2), dtype=[('Time', 'S24'), ('vector', '<f8', (3,)), ('matrix', '<f8', (3,3,))])
data['Time'] = np.array([b'1970-01-01T00:00:00.000Z', b'1970-01-01T00:00:01.000Z'])
data['vector'] = np.array([[1.0,2.0,3.0],[4.0,5.0,6.0]])
data['matrix'] = np.array( [ [[1.0,2.0,3.0],[4.0,5.0,6.0],[7.0,8.0,9.0]], [[11.0,12.0,13.0],[14.0,15.0,16.0],[17.0,18.0,19.0]]] )
data


# Metadata

The metadata returned by `hapi()` is a straightforward mapping of the JSON metadata from a HAPI server.  Earlier we showed the metadata for a dataset; now we look at (a) creating a list of all HAPI servers and (b) asking any specific HAPI server which datasets it has available.

## Listing all Servers

HAPI has a query function to return all current HAPI servers, which is identical to the 
[Equivalent URL](https://github.com/hapi-server/data-specification/blob/master/servers.txt).

In [None]:
from hapiclient import hapi

servers = hapi() # servers is an array of URLs
display(servers)

## Listing all Datasets from a Server

For a given server - in this example, CDAWeb - you can fetch the full list of dataset ids it serves. For this example, `hapi()` internally makes a request to [https://cdaweb.gsfc.nasa.gov/hapi/catalog](https://cdaweb.gsfc.nasa.gov/hapi/catalog).

In [None]:
from hapiclient import hapi

server = 'https://cdaweb.gsfc.nasa.gov/hapi'
meta = hapi(server)

# Display first 5 entries
display(meta['catalog'][0:5])

## Listing all Parameters in a Dataset

Each dataset's metadata is available from a query, without needing to fetch the actual data. For this example, `hapi()` internally makes a request to [https://cdaweb.gsfc.nasa.gov/hapi/info?id=AC_H0_MFI](https://cdaweb.gsfc.nasa.gov/hapi/info?id=AC_H0_MFI).

In [None]:
from hapiclient import hapi

server  = 'https://cdaweb.gsfc.nasa.gov/hapi'
dataset = 'AC_H0_MFI'
meta = hapi(server, dataset)
display(meta)

## Listing Parameter Metadata

One can request a subset of metadata for dataset (rather than all parameters as in the last example) by adding `parameters` to the call. For this example, `hapi()` internally makes a request to [https://cdaweb.gsfc.nasa.gov/hapi/info?id=AC_H0_MFI&parameters=Magnitude,BGSEc](https://cdaweb.gsfc.nasa.gov/hapi/info?id=AC_H0_MFI&parameters=Magnitude,BGSEc).

(Note that HAPI allows non-standard server-specific keys in `meta`, which are prefixed by `x_`.  This is similar to the Python convention where variables and methods prefixed with an underscore indicate that they are for internal use only and may change.)

Here we fetch metadata for the `Magnitude` and `BGSEc` parameters in the `AC_H0_MFI` dataset from `CDAWeb`.

In [None]:
from hapiclient import hapi

server     = 'https://cdaweb.gsfc.nasa.gov/hapi'
dataset    = 'AC_H0_MFI'
parameters = 'Magnitude,BGSEc'

meta = hapi(server,dataset,parameters)

display(meta)

----
Next up, [plotting data](HAPI_03.ipynb)
----