# Adding data

This notebook shows how to add the data with the datalayer.

Several parameters must be given for any document, and must be defined in order to add new data.
These parameters are the ones given in the next example.
In addition, one may add any other parameters to the document.

The data is added using the next method:

In [6]:
from hera import datalayer

projectName = "addDataExample" # must be a string
documentType = "ExampleData" # must be a string
desc = {"description_A": "A", "description_B": "B"} # must be a dectionary. Contains descriptors of the data.
dataFormat = "geopandas" # must be a string. The allowed data formats are given below. 
resource = "/mnt/public/New-MAPI-data/BNTL_MALE_ARZI/BNTL_MALE_ARZI/RELIEF/CONTOUR.shp" # A dynamic fields, that points to a specific file in a folder.

datalayer.Measurements.addDocument(projectName=projectName, desc=desc, type=documentType, dataFormat=dataFormat, resource=resource)

Notice that the desc dictionary may not contain a key named "type".
The allowed data formats are:

- string
- time
- HDF
- dict
- netcdf_xarray
- JSON_dict
- JSON_pandas
- geopandas 
- parquet

They indicate how to read the data, and therefore must correspond to the type of data located in the resource.

The added document can be loaded as presented in the "Getting data" notebook.

# Getting data
This notebook shows how to get the data with the datalayer.

In [1]:
import datalayer as datalayer
import pandas

After importing the datalayer, you can get the data that fits your requirments.
Below we see an example of getting the document of the experimental data between 2 dates of Haifa campaign in station Check_Post, instrument Sonic, height 9(m).

In [2]:
projectName = 'Haifa'
station = 'Check_Post'
instrument = 'Sonic'
height = 9
start = pandas.Timestamp('2015-08-01')
end = pandas.Timestamp('2015-08-02')
data = datalayer.Measurements.getData(projectName=projectName,
                                      station=station,
                                      instrument=instrument,
                                      height=height,
                                      start__lte=end,
                                      end__gte=start
                                      )[start:end]
print(data)

Dask DataFrame Structure:
                               u        v        w        T
npartitions=3                                              
2015-08-01 00:00:00.000  float64  float64  float64  float64
2015-08-01 13:53:22.550      ...      ...      ...      ...
2015-08-02 00:00:00.000      ...      ...      ...      ...
2015-08-02 00:00:00.000      ...      ...      ...      ...
Dask Name: loc, 15 tasks


Now we got the data as dask dataframe.
To get the data as pandas dataframe we need to use the 'compute' function.

In [3]:
data = data.compute()
print(data)

                            u     v     w      T
Time                                            
2015-08-01 00:00:00.000  0.75 -0.80 -0.17  26.90
2015-08-01 00:00:00.050  0.77 -0.77 -0.12  26.85
2015-08-01 00:00:00.100  0.77 -0.77 -0.12  26.85
2015-08-01 00:00:00.150  0.72 -0.85 -0.14  26.89
2015-08-01 00:00:00.200  0.72 -0.85 -0.14  26.89
2015-08-01 00:00:00.250  0.72 -0.85 -0.14  26.87
2015-08-01 00:00:00.300  0.76 -0.81 -0.13  26.85
2015-08-01 00:00:00.350  0.76 -0.81 -0.13  26.85
2015-08-01 00:00:00.400  0.75 -0.91 -0.15  26.85
2015-08-01 00:00:00.450  0.75 -0.88 -0.16  26.80
2015-08-01 00:00:00.500  0.79 -0.84 -0.13  26.82
2015-08-01 00:00:00.550  0.79 -0.84 -0.13  26.82
2015-08-01 00:00:00.600  0.75 -0.91 -0.15  26.82
2015-08-01 00:00:00.650  0.75 -0.91 -0.15  26.82
2015-08-01 00:00:00.700  0.75 -0.91 -0.15  26.80
2015-08-01 00:00:00.750  0.75 -0.91 -0.15  26.80
2015-08-01 00:00:00.800  0.78 -0.85 -0.11  26.85
2015-08-01 00:00:00.850  0.75 -0.91 -0.15  26.82
2015-08-01 00:00:00.

Alternatively you can use the argument 'usePandas' with value True to get data directly as pandas and not dask.
(Should be used only when the data is small)

In [4]:
projectName = 'Haifa'
station = 'Check_Post'
instrument = 'Sonic'
height = 9
start = pandas.Timestamp('2015-08-01')
end = pandas.Timestamp('2015-08-02')
data = datalayer.Measurements.getData(projectName=projectName,
                                      station=station,
                                      instrument=instrument,
                                      height=height,
                                      start__lte=end,
                                      end__gte=start,
                                      usePandas=True
                                      )[start:end]
print(data)

                            u     v     w      T
Time                                            
2015-08-01 00:00:00.000  0.75 -0.80 -0.17  26.90
2015-08-01 00:00:00.050  0.77 -0.77 -0.12  26.85
2015-08-01 00:00:00.100  0.77 -0.77 -0.12  26.85
2015-08-01 00:00:00.150  0.72 -0.85 -0.14  26.89
2015-08-01 00:00:00.200  0.72 -0.85 -0.14  26.89
2015-08-01 00:00:00.250  0.72 -0.85 -0.14  26.87
2015-08-01 00:00:00.300  0.76 -0.81 -0.13  26.85
2015-08-01 00:00:00.350  0.76 -0.81 -0.13  26.85
2015-08-01 00:00:00.400  0.75 -0.91 -0.15  26.85
2015-08-01 00:00:00.450  0.75 -0.88 -0.16  26.80
2015-08-01 00:00:00.500  0.79 -0.84 -0.13  26.82
2015-08-01 00:00:00.550  0.79 -0.84 -0.13  26.82
2015-08-01 00:00:00.600  0.75 -0.91 -0.15  26.82
2015-08-01 00:00:00.650  0.75 -0.91 -0.15  26.82
2015-08-01 00:00:00.700  0.75 -0.91 -0.15  26.80
2015-08-01 00:00:00.750  0.75 -0.91 -0.15  26.80
2015-08-01 00:00:00.800  0.78 -0.85 -0.11  26.85
2015-08-01 00:00:00.850  0.75 -0.91 -0.15  26.82
2015-08-01 00:00:00.

# Using Project 

Using the Project class simplifies the access to the different documents of the project. 

Define the project with 

In [None]:
from hera.datalayer import Project 

p = Project(projectName="testProject")

p.simulations