# Prerequisites

- Install the [Cognite Python SDK](https://github.com/cognitedata/cognite-sdk-python)
- Make sure that you have received an API key from http://openindustrialdata.com/ 
- Set the API key as an environment variable, see instructions in the Cognite Python SDK [github repository](https://github.com/cognitedata/cognite-sdk-python)

# Get started

First, we make the imports we'll need for this mini-intro

In [None]:
%pylab notebook
import numpy as np
import pandas as pd
import os

from datetime import datetime

from cognite import CogniteClient

# The API key is needed to authenticate
Use your API key to authenticate. Make sure that you have first set the API key as an environment variable.

Your API key is specific to a project. Now we will work with the data as part of the Open Industrial Data. The project is then called "publicdata"

In [None]:
client = CogniteClient(api_key=os.environ['PUBLICDATA_API_KEY'])

# Data in the Cognite Data Platform is structured around assets

The data in the Cognite Data Platform is structured by assets, where an asset can be a specific piece of equipment or an equipment type. 

We'll get all assets with get_assets(). You can also include a description or asset name as a parameter -- see more information in the reference docs.


In [None]:
assets = client.assets.get_assets()
assets

The API returns response objects. We have added a .to_pandas() method in the Python SDK which makes it easier to view the data.

In [None]:
assets_df = assets.to_pandas()
assets_df

Looking in the table, we see that assets have names, descriptions, metadata, and an own ID. This ID is generated by Cognite and unique for each asset.

But what now? That is a big table and it is hard to know what we are looking at.

# How to explore data

In order to explore the data, you can:
1. Use what you know about the physical system to immediately fetch the relevant data 
2. Navigate the asset hiearchy


## 1. When you the physical system you can immediately zoom in on the relevant data
Looking at the diagram on (TODO: CREATE LINK WITH DIAGRAM), we might want to investigate the scrubber further.


The following P&IDs are uploaded to the Cognite Data Platform.

- PH-25578-P-4110006-001: 1st stage lube oil
- PH-25578-P-4110010-001: 1 st stage dry gas seal
- PH-25578-P-4110119-001: stage 1 - P & I diagram
- PH-ME-P-0151-001: 1 st stage suction cooler
- PH-ME-P-0152-001: 1 st stage suction scrubber
- PH-ME-P-0153-001: 1 st stage compressor
- PH-ME-P-0156-001: 1 st stage compressor. Temperatur and vibration monitoring
- PH-ME-P-0156-002: 1 st stage compressor. Temperatur and vibration monitoring
- PH-ME-0160-001: 1 st stage discharge cooler 

We can have a look at the PID for the 1st stage suction scrubber.

In [None]:
scrubber_file_name = 'PH-ME-P-0153-001'
client.files.list_files(name=scrubber_file_name).to_pandas()

In [None]:
# Now download the file using the file id
client.files.download_file(client.files.list_files(name=scrubber_file_name).to_pandas().id[0])

If you run the code yourself, you will get a new download url which you can use to look at the P&ID in the browser.

Skilled engineers can look at the P&ID and understand how the system works. I asked an engineer and was told that I could look at timeseries for the following: 

- The scrubber level working setpoint (tag name = 'VAL_23-LIC-92521:Control Module:YR')
- The scrubber level measured value (tag name = 'VAL_23-LIC-92521:Z.X.Value')
- The scrubber level output (tag name = 'VAL_23-LIC-92521:Z.Y.Value')

### Specify the data

We can get the relevant data for the scrubber level setpoint, measured value, and output.
 


In [None]:
scrubber_level_working_setpoint = 'VAL_23-LIC-92521:Control Module:YR'
scrubber_level_measured_value  = 'VAL_23-LIC-92521:Z.X.Value'
scrubber_level_output  = 'VAL_23-LIC-92521:Z.Y.Value'
all_ts_names = [scrubber_level_working_setpoint, scrubber_level_measured_value, scrubber_level_output]
print(all_ts_names)

One way to pull data is using the function get_datapoints_frame. See reference documentation, http://cognite-sdk-python.readthedocs.io/en/latest/cognite.html#module-cognite.v05.timeseries.

This function gets datapoints for given timeseries all on the same timestamps, saving you from otherwise interpolating to get the timeseries data on the same timestamps.

Specifying a start and end time gives you data for the desired time range
Granularity of 1 hour pulls a single aggregate value for all data points per hour
Providing multiple aggregates pulls data using all the three aggregates

In [None]:
start = datetime(2018, 7, 1)
end = '1d-ago'
data = client.datapoints.get_datapoints_frame(all_ts_names, start=start, end=end, granularity='1h', aggregates=['average', 'min', 'max'])

### Investigate the data
The data is returned with the timeseries for the different aggregates in the columns.

In [None]:
data.head()

Substitute missing values before plotting the data. Pandas has useful functionality for this, e.g. see https://pandas.pydata.org/pandas-docs/version/0.22/generated/pandas.DataFrame.fillna.html.

In [None]:
data = data.fillna(method='ffill')

### Visualize the data

In [None]:
T = pd.to_datetime(data.timestamp, unit='ms')
plt.figure(figsize=(10, 5))
plt.plot(T, data[scrubber_level_working_setpoint+'|average'].values, label='setpoint')
plt.plot(T, data[scrubber_level_measured_value+'|average'].values, label='measured value')
plt.plot(T, data[scrubber_level_output+'|average'].values, label='level output')
plt.legend()
plt.show()

Wooh, you've made a plot! Now what does this data mean?

Background:
- The blue line is the desired level in the scrubber. 
- The orange line is actual, measured value in the scrubber.
- The green line is the output from the scrubber to control the level in the scrubber.

Ideally, the orange line lies on top of the blue line, i.e. the actual measured value is equal to the desired level in the scrubber. The green line can adjust in order to make sure the measured value matches the desired value.

We then see that 


## 2. Navigating the asset hierarchy is useful when you don't know what you're looking at

We can start by navigating up to the root of the asset hierarchy.

In [None]:
# select an asset id from the table above
first_row_asset_id = 3111454725058294
client.assets.get_asset(first_row_asset_id).to_pandas()

In [None]:
# look at the parentId
parent_id = 4650652196144007
client.assets.get_asset(parent_id).to_pandas()

In [None]:
# Cool, we found the Valhall plattform asset! Let's look at the parent of that
parent_id = 6687602007296940
client.assets.get_asset(parent_id).to_pandas()


We navigated all the way up to the AkerBP project! Note, we could also find that node by including description when getting the asset. For example:

In [None]:
client.assets.get_assets(description="Aker BP").to_pandas()

You can see that the get_assets() query gets the same asset as we found by navigating the asset tree. Now, when we are the top, we can also navigate downward the asset hiearchy.

In [None]:
# Explore the subtree below the root
akerbp_asset_id = 6687602007296940
akerbp_subtree = client.assets.get_asset_subtree(akerbp_asset_id, depth=4)
akerbp_subtree.to_pandas()

Can you spot which assets have which parents in the table above?

## Congrats, you've gotten started using the Cognite Data Platform! What's next?

You can find ideas for what to model or explore on http://openindustrialdata.com/.

