# Part 2: Time Series Forecasting with the HPE Machine Learning Development Environment
author: Andrew Mendez, andrew.mendez@hpe.com

Version: 0.0.1

Date: 3.14.23

In this notebook, we create an end-to-end MLOPs pipeline to train and deploy a deep learning model for time series forecasting. This model specifically forecasting energy demand.

In [1]:
# Connect to deployed pachyderm application
!pachctl connect pachd-peer.pachyderm.svc.cluster.local:30653
# list current projects
!pachctl list project

Context 'pachd-peer.pachyderm.svc.cluster.local:30653' set as active
ACTIVE PROJECT                    CREATED       DESCRIPTION
       pipeline-finbert           10 months ago Tyler - Legacy FinBERT PDK Demo
       pipeline-hpe-fsi-retrieval 9 months ago  Tyler - PDK demo of for HPE FSI RAG/Retrieval Demo
       pdk-dogs-and-cats          6 months ago  Tyler - Legacy Brain Dogs and Cats Demo
       pdk-brain-mri              6 months ago  Tyler - Legacy Brain MRI PDK Demo
       starcoder                  5 months ago  Tyler - A fine-tuned version of the huggingface starcoder model with titanML serving
       playground_tp              5 months ago  Tanguy -  Pachyderm tutorial
       object-detection-demo      4 months ago  -
       Test-TensorRT-LLM          4 months ago  Tanguy - Testing model optimization with TensorRT-LLM and deployment with Triton
       deploy-rag                 3 months ago  -
       deploy-rag-finetune        3 months ago  -
       test-catdog-pipe-test     

In [2]:
!pachctl config get active-context

pachd-peer.pachyderm.svc.cluster.local:30653


In [3]:
!pachctl version

COMPONENT           VERSION             
pachctl             2.9.0               
pachd               2.9.0               


In [4]:
# Create Pachyderm application
!pachctl create project energy-forecasting
# Set pachctl's active context to the deploy-rag project
!pachctl config update context --project energy-forecasting

project "energy-forecasting" already exists
project energy-forecasting already exists
editing the currently active context "pachd-peer.pachyderm.svc.cluster.local:30653"


In [5]:
# !wget https://raw.githubusercontent.com/unit8co/darts/master/datasets/energy_dataset.csv

In [6]:
!pachctl create repo data

In [7]:
!pachctl create repo code

In [8]:
!pachctl create repo model

In [9]:
!pachctl put file model@master: -r -f models/TCN_model.pt
!pachctl put file model@master: -r -f models/TCN_model.pt.ckpt



In [10]:
%%capture
!pachctl put file data@master: -r -f data/energy_dataset.csv

In [11]:
%%capture
!pachctl put file code@master: -r -f code/

## Process

In [12]:
%%writefile process.yaml
pipeline:
    name: 'process'
description: 'Extract content in xml files to a csv file'
input:
    cross:
        - pfs: 
            repo: 'data'
            branch: 'master'
            glob: '/'
        - pfs: 
            repo: 'code'
            branch: 'master'
            glob: '/'
transform:
    image: mendeza/python38_process:0.2
    cmd: 
        - '/bin/sh'
    stdin: 
    # - "while :; do echo 'Hello'; sleep 5 ; done"
    - 'pip install darts;'
    - 'bash /pfs/code/code/scripts/process.sh'
autoscaling: False

Overwriting process.yaml


In [13]:
!pachctl create pipeline -f process.yaml

# Train

In [14]:
%%writefile train.yaml
pipeline:
    name: 'train'
description: 'Extract content in xml files to a csv file'
input:
    cross:
        - pfs: 
            repo: 'process'
            branch: 'master'
            glob: '/'
        - pfs: 
            repo: 'code'
            branch: 'master'
            glob: '/'
        - pfs: 
            repo: 'model'
            branch: 'master'
            glob: '/'
transform:
    image: mendeza/python38_process:0.2
    cmd: 
        - '/bin/sh'
    stdin: 
    # - "while :; do echo 'Hello'; sleep 5 ; done"
    - 'pip install darts'
    - 'bash /pfs/code/code/scripts/train.sh'
autoscaling: False

Overwriting train.yaml


In [15]:
!pachctl create pipeline -f train.yaml

## Validate

In [16]:
%%writefile val.yaml
pipeline:
    name: 'val'
description: 'Extract content in xml files to a csv file'
input:
    cross:
        - pfs: 
            repo: 'process'
            branch: 'master'
            glob: '/'
        - pfs: 
            repo: 'train'
            branch: 'master'
            glob: '/'
        - pfs: 
            repo: 'code'
            branch: 'master'
            glob: '/'
transform:
    image: mendeza/python38_process:0.2
    cmd: 
        - '/bin/sh'
    stdin:
    # - "while :; do echo 'Hello'; sleep 5 ; done"
    - 'pip install darts'
    - 'bash /pfs/code/code/scripts/val.sh'
autoscaling: False

Overwriting val.yaml


In [17]:
!pachctl create pipeline -f val.yaml

## Retrain end to end with new data

In [None]:
%%capture
!pachctl put file data@master: -r -f data/energy_dataset2.csv

# Clean up Environment

In [None]:
!pachctl delete pipeline val
!pachctl delete pipeline train
!pachctl delete pipeline process
!pachctl delete repo data
!pachctl delete repo code
!pachctl delete repo model