-
Notifications
You must be signed in to change notification settings - Fork 751
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add financial time series example #252
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
vcs.xml | ||
kubeflow_ks_app/* | ||
kubeflow_repo/* |
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,285 @@ | ||
Using Kubeflow for Financial Time Series | ||
==================== | ||
|
||
This repository is linked to a series of blogposts. | ||
The first blogpost can be found on: https://blog.ml6.eu/using-kubeflow-for-financial-time-series-18580ef5df0b | ||
|
||
The open-source project [Kubeflow](https://www.kubeflow.org/) is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable. | ||
This repository walks through the exploration, training and serving of a machine learning model by leveraging Kubeflow's main components. | ||
As an example, we will use the [Machine Learning with Financial Time Series Data](https://cloud.google.com/solutions/machine-learning-with-financial-time-series-data) use case | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. trailing period. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Noted |
||
|
||
### Pre-requisites | ||
**Note** You will need a Linux or Mac environment with Python 3.6.x and install the following requirements | ||
* Install [Cloud SDK](https://cloud.google.com/sdk/) | ||
* Install [gcloud](https://cloud.google.com/sdk/gcloud/) | ||
* Install [ksonnet](https://ksonnet.io/#get-started) version 0.11.0 or later | ||
* Install [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/) | ||
* Access to Google Cloud Project and its GKE resources | ||
|
||
### Installing kubeflow on GKE | ||
We will first create a cluster named 'kubeflow' on google kubernetes engine. | ||
``` | ||
gcloud container clusters create kubeflow --zone [ZONE] --machine-type n1-standard-2 --scopes=https://www.googleapis.com/auth/cloud-platform | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it possible these setup steps will change in the future? If not no problem. If so it would be good to reference the central kubeflow setup docs to save on maintenance. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Normally these steps should not change. For deploying kubeflow, I see that on the getting started webpage the steps are slightly different but in essence they do the same as the single line deployment script. I will add the central kubeflow setup link as another option on how to deploy kubeflow. |
||
gcloud container clusters get-credentials kubeflow --zone [ZONE] --project [PROJECT-NAME] | ||
kubectl create clusterrolebinding default-admin --clusterrole=cluster-admin --user=[EMAIL-ADDRESS] | ||
``` | ||
|
||
The set of commands above creates a cluster, connects our local environment to the cluster and changes the permissions on the cluster to allow kubeflow to run properly. | ||
Note that we had to define the scopes specifically since Kubernetes v1.10. If we would drop the scopes argument, the machines in the cluster have a lot of restrictions to use google cloud APIs to connect to other google cloud services such as Google Cloud Storage, BigQuery etc. | ||
Our cluster is now up and running and properly set up in order to install kubeflow. | ||
``` | ||
export KUBEFLOW_VERSION=0.2.2 | ||
curl https://raw.githubusercontent.com/kubeflow/kubeflow/v${KUBEFLOW_VERSION}/scripts/deploy.sh | bash | ||
``` | ||
Note that it requires only a single command to deploy Kubeflow to an existing cluster. | ||
Once the script is finished, you should two new folders in your directory | ||
``` | ||
$ tree | ||
. | ||
├── kubeflo_ks_app | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. typo? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Indeed - I will correct this. |
||
└── kubeflow_repo | ||
``` | ||
Next, we can easily verify the status of the pods by running ```kubectl get pods```: | ||
``` | ||
NAME READY STATUS | ||
ambassador-788655d76f-8fkpv 2/2 Running | ||
ambassador-788655d76f-fvjld 2/2 Running | ||
ambassador-788655d76f-t4xqt 2/2 Running | ||
centraldashboard-6665fc46cb-jrwvf 1/1 Running | ||
spartakus-volunteer-9c546f4db-5pztt 1/1 Running | ||
tf-hub-0 1/1 Running | ||
tf-job-dashboard-644865ddff-fbwnw 1/1 Running | ||
tf-job-operator-v1alpha2-75bcb7f5f7-fgf9h 1/1 Running | ||
``` | ||
|
||
Note that the corresponding services were also generated by checking the output of ```kubectl get svc``` . | ||
|
||
### Exploration via tf-hub | ||
The tf-hub component of Kubeflow allows us to leverage Jupyter Notebooks to investigate the data and start building a feasible machine learning model for our problem statement. | ||
In order to access this component, we will set up port-forwarding between the tf-hub pod and our local machine. | ||
``` | ||
POD=`kubectl get pods --selector=app=tf-hub | awk '{print $1}' | tail -1` | ||
kubectl port-forward $POD 8000:8000 2>&1 >/dev/null & | ||
``` | ||
You should now be able to access the JupyterHub via ```localhost:8000```. | ||
After filling in a username and password you are prompted to select parameters to spawn a Jupyter Notebook. | ||
In this case, we will just set the ```image``` to ```gcr.io/kubeflow-images-public/tensorflow-1.8.0-notebook-cpu:v0.2.1``` and hit spawn. | ||
|
||
Once the Jupyter Notebook instance is ready, we will launch a terminal to install the required packages that our code uses. | ||
In order to launch a terminal, click 'new' > 'terminal' and subsequently install the required packages. | ||
``` | ||
pip3 install google-cloud-bigquery==1.5.0 --user | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You may not need the --user flag with the most recent notebook container, @pdmack? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good call. Also just FYI this might have been related to a change to the base jupyter containers that got pushed out because I just started seeing errors like this myself and yeah to get around them have to use the --user flag. |
||
``` | ||
|
||
Our Jupyter Notebook instance should be ready to run the code from the slightly adjusted notebook which is available on the github repository ```Machine Learning with Financial Time Series Data.ipynb```. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Missing link? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I will reference the link from the original google page on the top of the README, in the introduction section and clarify here that the slightly adjusted notebook is available in this repository. |
||
You can simply upload the notebook and walk through it step by step to better understand the problem and suggested solution(s). | ||
In this blogpost, the goal is not focus on the notebook itself but rather on how this notebook is being translated in more scalable training jobs and later on serving. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In this example? Perhaps you're posting this as a blog post elsewhere which is great! |
||
|
||
### Training at scale with tf-jobs | ||
The next step is to 're-factor' the notebook code into python scripts which can then be containerized onto a docker image. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Capital P Python, capital D Docker. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Noted :-) |
||
In the folder ```tensorflow-model``` on the github repository you can find these scripts together with a ```Dockerfile```. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Product and company names in caps, e.g. GitHub, Google Cloud Storage, Google Container Registry. Acronyms in all caps (e.g. GRPC, GKE). Other examples throughout. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Noted |
||
Subsequently we will build a docker image on google cloud by running following command: | ||
|
||
``` | ||
cd tensorflow-model/ | ||
gcloud builds submit --tag gcr.io/<project-name>/<image-name>/cpu:v1 . | ||
``` | ||
|
||
Now that we have an image ready on google cloud container registry, it's time we start launching a training job. | ||
|
||
``` | ||
cd kubeflow_ks_app/ | ||
ks generate tf-job-simple train | ||
``` | ||
This ksonnet protoytype needs to be slightly modified to our needs, you can simply copy an updated version of this prototype by copying the updated version from the repository | ||
``` | ||
cp ../tensorflow-model/CPU/train.jsonnet ./components/train.jsonnet | ||
``` | ||
|
||
Now we need to define the parameters which are currently set as placeholders in the training job prototype. | ||
Note that this introduces a flexible and clean way of working, by changing the parameters you can easily launch another training job without maintaining multiple YAML files in your repository. | ||
|
||
``` | ||
export TRAINING_NAME=trainingjob1 | ||
ks param set train name $TRAINING_NAME | ||
ks param set train namespace "default" | ||
export TRAIN_PATH=gcr.io/<project-name>/<image-name>/cpu:v1 | ||
ks param set train image $TRAIN_PATH | ||
ks param set train workingDir "opt/workdir" | ||
ks param set train args -- python,run_train.py,--model=FlatModel,--epochs=50000,--version=1 | ||
``` | ||
|
||
You can verify the parameter settings in the params.libsonnet in the directorykubeflow_ks_app/components. | ||
This file keeps track of all the parameters used to instantiate components from prototypes. | ||
In order to submit our tf-job, we need to add our cloud cluster as an environment. | ||
Next we can launch the tf-job to our cloud environment and follow the progress via the logs of the pod. | ||
|
||
``` | ||
ks env add cloud | ||
ks apply cloud -c train | ||
POD_NAME=$(kubectl get pods --selector=tf_job_key=$TRAINING_NAME,tf-replica-type=worker \ | ||
--template '{{range .items}}{{.metadata.name}}{{"\n"}}{{end}}') | ||
kubectl logs -f $POD_NAME | ||
``` | ||
|
||
In the logs you can see that the trained model is being exported to google cloud storage. This saved model will be used later on for serving requests. With these parameters, the accuracy is approximating 67%. | ||
Alternatively, you can also port-forward the ambassador and check the progress on ```localhost:8080```. | ||
The ambassador functions as the central point of the Kubeflow deployment and monitors the different components. From the ambassador, you can see the Jupyter Notebooks, tf-jobs and kubernetes resources. | ||
|
||
``` | ||
POD=`kubectl get pods --selector=service=ambassador | awk '{print $1}' | tail -1` | ||
kubectl port-forward $POD 8080:80 2>&1 >/dev/null & | ||
``` | ||
|
||
### Deploy and serve with tf-serving | ||
Once the model is trained, the next step will be to deploy it and serve requests. | ||
Kubeflow comes with a tf-serving module which you can use to deploy your model with only a few commands. | ||
``` | ||
ks generate tf-serving serve --name=tf-serving | ||
BUCKET_NAME=saved-models | ||
ks param set serve modelPath gs://$BUCKET_NAME/ | ||
ks apply cloud -c serve | ||
``` | ||
|
||
After running these commands, a deployment and service will be launched on Kubernetes that will enable you to easily send requests to get predictions from your module. | ||
We will do a local test via grpc to illustrate how to get results from this serving component. Once the pod is up we can set up port-forwarding to our localhost. | ||
``` | ||
POD=`kubectl get pods --selector=app=tf-serving | awk '{print $1}' | tail -1` | ||
kubectl port-forward $POD 9000:9000 2>&1 >/dev/null & | ||
``` | ||
|
||
Now the only thing we need to do is send a request to ```localhost:9000``` with the expected input of the saved model and it will return a prediction. | ||
The saved model expects a time series from closing stocks and spits out the prediction as a 0 (S&P closes positive) or 1 (S&P closes negative) together with the version of the saved model which was memorized upon saving the model. | ||
Let's start with a script that populates a request with random numbers to test the service. | ||
``` | ||
cd ../tensorflow-model | ||
pip3 install numpy tensorflow-serving-api | ||
python request.py | ||
``` | ||
|
||
The output should return an integer, 0 or 1 as explained above, and a string that represents the version. | ||
There is another script available that builds a more practical request, with time series data of closing stocks for a certain date. | ||
In this script, the same date is used as the one used at the end of the notebook ```Machine Learning with Financial Time Series Data.ipynb``` for comparison reasons. | ||
|
||
``` | ||
pip3 install pandas | ||
python request.py | ||
``` | ||
|
||
The response should indicate that S&P index is expected to close positive but from the actual data (which is prospected in the notebook mentioned above) we can see that it actually closed negative that day. | ||
Let's get back to training and see if we can improve our accuracy. | ||
|
||
### Running another tf-job and serving update | ||
Most likely a single training job will never be sufficient. It is very common to create a continuous training pipeline to iterate training and verify the output. | ||
Submitting another training job with Kubeflow is very easy. | ||
By simply adjusting the parameters we can instantiate another component from the ```train.jsonnet```prototype. | ||
This time, we will train a more complex neural network with several hidden layers. | ||
``` | ||
cd ../kubeflow_ks_app | ||
export TRAINING_NAME=trainingjob2 | ||
ks param set train name $TRAINING_NAME | ||
ks param set train args -- python,run_train.py,--model=DeepModel,--epochs=50000,--version=2 | ||
ks apply cloud -c train | ||
``` | ||
|
||
Verify the logs or use ```kubectl describe tfjobs trainingjob2``` | ||
``` | ||
POD_NAME=$(kubectl get pods --selector=tf_job_key=$TRAINING_NAME,tf-replica-type=worker \ | ||
--template '{{range .items}}{{.metadata.name}}{{"\n"}}{{end}}') | ||
kubectl logs -f $POD_NAME | ||
``` | ||
|
||
You should notice that the training now takes a few minutes instead of less than one minute, however the accuracy is now close to 77%. | ||
Our training job uploads the trained model to the serving directory of our running tf-serving component. | ||
The tf-serving component watches this serving directory and automatically loads the model of the folder with the highest version (as integer). | ||
Since the newer version has a higher number than the previous one, our tf-serving should have switched to this new model. | ||
Let's see if we get a response from the new version and if the new model gets it right this time. | ||
|
||
``` | ||
cd ../tensorflow-model | ||
python request.py | ||
``` | ||
|
||
The response returns the updated version number '2' and predicts the correct output 1, which means the S&P index closes negative, hurray! | ||
|
||
### Expose | ||
In the previous section we used a prediction service on our kubeflow cluster which is by default only available from within the cluster. | ||
It's also possible to expose the service so it can be accessed from outside the cluster. | ||
The following command will expose the prediction service on a fixed external IP address and create a loadbalancer to orchestrate the traffic. | ||
|
||
``` | ||
kubectl expose deployment tf-serving --type=LoadBalancer --port=9000 | ||
``` | ||
|
||
### Running tf-job on a GPU | ||
|
||
Can we also run the tf-job on a GPU? | ||
Imagine the training job does not just take a few minutes but rather hours or days. | ||
We will need another image that installs ```tensorflow-gpu``` and has the necessary drivers. | ||
|
||
``` | ||
cp GPU/Dockerfile ./Dockerfile | ||
gcloud builds submit --tag gcr.io/<project-name>/<image-name>/gpu:v1 . | ||
export TRAIN_PATH_GPU=gcr.io/<project-name>/<image-name>/gpu:v1 | ||
``` | ||
|
||
Also the train.jsonnet will need to be slightly adjusted to make it flexible to also run on GPUs. | ||
You can simply copy the adjusted jsonnet by running following command. | ||
|
||
``` | ||
cp GPU/train.jsonnet ../kubeflow_ks_app/components/train.jsonnet | ||
``` | ||
|
||
Now we have to add a GPU to our cloud cluster. | ||
We will create a separate pool and install the necessary NVIDIA GPU device drivers. | ||
For more instruction on how to handle GPUs on Kubernetes, see https://cloud.google.com/kubernetes-engine/docs/how-to/gpus. | ||
|
||
``` | ||
cloud container node-pools create gpu-pool --accelerator type=nvidia-tesla-k80,count=1 --zone europe-west1-b --cluster kubeflow --num-nodes 1 --min-nodes 1 --max-nodes 1 --enable-autoscaling --scopes=https://www.googleapis.com/auth/cloud-platform | ||
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/stable/nvidia-driver-installer/cos/daemonset-preloaded.yaml | ||
``` | ||
|
||
Subsequently, the parameters must be updated to fit with new prototype in ```train.jsonnet```. | ||
``` | ||
cd ../kubeflow_ks_app | ||
export TRAINING_NAME=trainingjob3 | ||
ks param set train name $TRAINING_NAME | ||
ks param delete train image | ||
ks param set train cpuImage $TRAIN_PATH | ||
ks param set train gpuImage $TRAIN_PATH_GPU | ||
ks param set train num_gpu 1 | ||
``` | ||
Next we can deploy the tf-job to our GPU by simpy running following command. | ||
``` | ||
ks apply cloud -c train | ||
``` | ||
Once the pod is up, you can check the logs and verify that the training time is significantly reduced compared to the previous tf-job. | ||
``` | ||
POD_NAME=$(kubectl get pods --selector=tf_job_key=$TRAINING_NAME,tf-replica-type=worker \ | ||
--template '{{range .items}}{{.metadata.name}}{{"\n"}}{{end}}') | ||
kubectl logs -f $POD_NAME` | ||
``` | ||
|
||
### Clean up | ||
``` | ||
kubectl delete tfjobs trainingjob1 | ||
kubectl delete tfjobs trainingjob2 | ||
kubectl delete tfjobs trainingjob3 | ||
ks delete cloud -c train | ||
ks delete cloud -c serve | ||
ks delete cloud -c kubeflow-core | ||
gcloud container clusters delete kubeflow | ||
``` | ||
|
||
|
||
### Summary | ||
Kubeflow makes it easy for everyone to develop, deploy, and manage portable, scalable ML everywhere and supports the full lifecycle of an ML product, including iteration via Jupyter notebooks. | ||
It removes the need for expertise in a large number of areas, reducing the barrier to entry for developing and maintaining ML products. | ||
|
||
If you want to get started with Kubeflow, make sure to checkout the updated information on https://www.kubeflow.org. | ||
|
||
|
||
|
||
|
||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
FROM python:3.6 | ||
MAINTAINER Sven Degroote "sven.degroote@ml6.eu" | ||
|
||
# Install python - pip | ||
RUN apt-get update -y && \ | ||
apt-get install -y python3-pip | ||
|
||
# Install packages (steps to work around package incompatibility) | ||
COPY requirements.txt / | ||
RUN pip3 install --no-cache-dir -r requirements.txt | ||
RUN pip3 install tensorflow | ||
RUN pip3 install --upgrade google-cloud-bigquery | ||
|
||
COPY . /opt/workdir | ||
WORKDIR /opt/workdir | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
local env = std.extVar("__ksonnet/environments"); | ||
local params = std.extVar("__ksonnet/params").components.train; | ||
|
||
local k = import "k.libsonnet"; | ||
|
||
local name = params.name; | ||
local namespace = env.namespace; | ||
local image = params.image; | ||
|
||
local argsParam = params.args; | ||
local args = | ||
if argsParam == "null" then | ||
[] | ||
else | ||
std.split(argsParam, ","); | ||
|
||
local tfjob = { | ||
apiVersion: "kubeflow.org/v1alpha2", | ||
kind: "TFJob", | ||
metadata: { | ||
name: name, | ||
namespace: namespace, | ||
}, | ||
spec: { | ||
tfReplicaSpecs: { | ||
Worker: { | ||
replicas: 1, | ||
template: { | ||
spec: { | ||
containers: [ | ||
{ | ||
args: args, | ||
image: image, | ||
name: "tensorflow", | ||
workingDir: "/opt/workdir", | ||
}, | ||
], | ||
restartPolicy: "OnFailure", | ||
}, | ||
}, | ||
}, | ||
}, | ||
}, | ||
}; | ||
|
||
k.core.v1.list.new([ | ||
tfjob, | ||
]) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
FROM python:3.6 | ||
MAINTAINER Sven Degroote "sven.degroote@ml6.eu" | ||
|
||
# Install python - pip | ||
RUN apt-get update -y && \ | ||
apt-get install -y python3-pip | ||
|
||
# Install packages | ||
COPY requirements.txt / | ||
RUN pip3 install --no-cache-dir -r requirements.txt | ||
RUN pip3 install tensorflow | ||
RUN pip3 install --upgrade google-cloud-bigquery | ||
|
||
COPY . /opt/workdir | ||
WORKDIR /opt/workdir | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
FROM tensorflow/tensorflow:1.8.0-devel-gpu-py3 | ||
MAINTAINER Sven Degroote "sven.degroote@ml6.eu" | ||
|
||
RUN apt-get update -y && \ | ||
apt-get install -y build-essential python-numpy python-dev python3-pip python3 wget | ||
|
||
RUN pip3 install --upgrade pip setuptools | ||
|
||
# Install packages | ||
COPY requirements.txt . | ||
RUN pip3 install -r requirements.txt | ||
RUN pip3 uninstall protobuf -y | ||
RUN pip3 install tensorflow-gpu==1.8.0 | ||
RUN pip3 install --upgrade google-cloud-bigquery | ||
|
||
# Verify that tensorflow is installed | ||
RUN pip3 show tensorflow | ||
|
||
COPY . /opt/workdir | ||
WORKDIR /opt/workdir | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link instead of raw URL (or maybe this renders fine idk).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted - I removed the link to the blogpost as it's not relevant in this context.