<a href="https://colab.research.google.com/github/naderkhash/Capstone-Three/blob/main/v2_Introduction_to_Cognite_Python_SDK_DE.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##Read the Cognite Learn content before running code examples.

##1. Environment Set Up

###Install the Cognite SDK package

If you recieve the errors:

`ERROR: datascience 0.10.6 has requirement folium==0.2.1, but you'll have folium 0.8.3 which is incompatible.`

`ERROR: albumentations 0.1.12 has requirement imgaug<0.2.7,>=0.2.5, but you'll have imgaug 0.2.9 which is incompatible.`

You can disregard them and do not need to click "Restart Runtime".

In [None]:
!pip install "cognite-sdk>=1.1.10"
!pip install --upgrade numpy

###Import other required packages

In [None]:
%matplotlib inline

import os
from datetime import datetime, timedelta
from datetime import datetime
from getpass import getpass

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score

from cognite.client import CogniteClient

### Connect to Cognite Data Fusion
This client object is how all queries will be sent to the Cognite API to retrieve data.

When prompted for your API key, use the stored key generated previously in the course.

In [None]:
client = CogniteClient(api_key=getpass("Open Industrial Data API-KEY: "),
                       project="publicdata", client_name="OID_example")

##2. Retrieving Lists of Assets

###List assets
The `client.assets.list(limit=20)` function retrieves the first `limit` assets, and returns it as an `AssetList`.

In [None]:
client.assets.list(limit=20)

##Search Assets##
The `client.assets.search()` function allows you to search by a specific property of the asset, including its name, parent, etc.

###Fuzzy Search by name
The search by name includes results that are similar in name, but not an exact match.

In [None]:
asset_name = "23-HA-9103"
assets = client.assets.search(name=asset_name)
assets[:5]

###Specific Search
The `client.assets.retrieve()` interface provides the same information for one specific asset based on the provided ID or external ID.

In [None]:
asset_id = [a.id for a in assets if a.name==asset_name][0]
client.assets.retrieve(id=asset_id)

##3. Events


Like we did for assets, we can list events. We will not go more in depth on events here, but you can also filter and search for events. 

In [None]:
client.events.list(limit=7)

##4. Asset Hierarchy and Relationships

We will generate a list of all children of the main asset of interest. The main asset of interest is listed first, then the children are listed underneath in following rows.

In [None]:
subtree = client.assets.retrieve_subtree(id=asset_id)
subtree[:5]

##5. Collecting Time Series Data

###Compile a list of time series objects under the asset
For each of the assets in the subtree we retrieved, we get the associated time series objects and merge them into a single `TimeSeriesList` object.

In [None]:
all_timeseries = subtree.time_series()
print(len(all_timeseries),'time series in subtree')
all_timeseries[:5]

If you are curious about which asset a time series is attached to, you can retrieve more information of the asset by using the retrieve function. Note that the property is called `asset_id` following typical python style, while `assetId` is used in the underlying API objects and tabular outputs.

In [None]:
client.assets.retrieve(id=all_timeseries[0].asset_id)

###View datapoints for specific time series
The identifier to retrieve Datapoints is the externalId column from the output above.

In [None]:
client.datapoints.retrieve(external_id="pi:160184", start="10d-ago", end="now")[:10]

##6. Use Cases of CDF Data

###Collect datapoints from CDF
The time series names are defined in the in_ts_exids and out_ts_exid lists below.

In [None]:
in_ts_exids = ["pi:160182", "pi:160697", "pi:160882"]
out_ts_exid = "pi:160696"

###Retrieve Data Points from CDF
Most object types in the Python SDK have a `to_pandas` method which converts the result to a pandas dataframe. For retrieving aggregates such as the average over each time period, you can use `client.datapoints.retrieve_dataframe` to get a pandas dataframe directly. 

In [None]:
ts_exids = in_ts_exids + [out_ts_exid]

train_start_date = datetime(2018, 8, 1)

train_end_date = train_start_date + timedelta(days=30)

datapoints_df = client.datapoints.retrieve_dataframe(external_id=ts_exids,
                                                     aggregates=['average'],
                                                     granularity='1m',
                                                     start=train_start_date,
                                                     end=train_end_date,
                                                     include_aggregate_name=False
                                                     )
datapoints_df.fillna(method="ffill", inplace=True)
datapoints_df.head()

There are also shortcuts for filling the dataframe when using interpolation or count aggregates. Note that without the `include_aggregate_name=False` option, the aggregate name is appended to the external id to form a unique column name.

In [None]:
datapoints_df_interp = client.datapoints.retrieve_dataframe(external_id=ts_exids[0:2],
                                                           aggregates=['interpolation','count'],
                                                           granularity='1h',
                                                           start=train_start_date,
                                                           end=train_end_date,
                                                           complete="fill"
                                                          )
datapoints_df_interp.head()

###Visualize the Time Series Data
The bottom right plot is the output time series, while the other 3 are the inputs used to create an estimate for the output.

In [None]:
cols = datapoints_df.columns

fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(10,10))
for i, col in enumerate(cols):
    datapoints_df.loc[:, [col]].plot(ax=axes.ravel()[i])