-
Notifications
You must be signed in to change notification settings - Fork 747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add financial time series example #252
Conversation
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed (or fixed any issues), please reply here (e.g. What to do if you already signed the CLAIndividual signers
Corporate signers
|
I signed it!
Op 26 sep. 2018, om 20:24 heeft googlebot <notifications@github.com<mailto:notifications@github.com>> het volgende geschreven:
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).
📝 Please visit https://cla.developers.google.com/ to sign.
Once you've signed (or fixed any issues), please reply here (e.g. I signed it!) and we'll verify it.
…________________________________
What to do if you already signed the CLA
Individual signers
* It's possible we don't have your GitHub username or you're using a different email address on your commit. Check your existing CLA data<https://cla.developers.google.com/clas> and verify that your email is set on your git commits<https://help.github.com/articles/setting-your-email-in-git/>.
Corporate signers
* Your company has a Point of Contact who decides which employees are authorized to participate. Ask your POC to be added to the group of authorized contributors. If you don't know who your Point of Contact is, direct the Google project maintainer to go/cla#troubleshoot<http://go/cla#troubleshoot> (Public version<https://opensource.google.com/docs/cla/#troubleshoot>).
* The email used to register you as an authorized contributor must be the email used for the Git commit. Check your existing CLA data<https://cla.developers.google.com/clas> and verify that your email is set on your git commits<https://help.github.com/articles/setting-your-email-in-git/>.
* The email used to register you as an authorized contributor must also be attached to your GitHub account<https://github.com/settings/emails>.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#252 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AgZETQVWRHrRZBA1rXZ9nPapgPIL67TFks5ue8Y_gaJpZM4W7OlG>.
|
CLAs look good, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @Svendegroote91! 🎉
Two things:
-
Would you mind doing another close read through the docs to look for grammatical (e.g. capitalization) issues or any other minor fixes (missing links)?
-
What do you think about adding some tests? Maybe that would help this example stay up to date as dependencies change? A local test would probably be sufficient but extra credit if you can test the submission and resulting status of a job on Kubeflow!
In the mean time I'll try running your example, this is really cool!
financial_time_series/README.md
Outdated
### Installing kubeflow on GKE | ||
We will first create a cluster named 'kubeflow' on google kubernetes engine. | ||
``` | ||
gcloud container clusters create kubeflow --zone [ZONE] --machine-type n1-standard-2 --scopes=https://www.googleapis.com/auth/cloud-platform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible these setup steps will change in the future? If not no problem. If so it would be good to reference the central kubeflow setup docs to save on maintenance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally these steps should not change. For deploying kubeflow, I see that on the getting started webpage the steps are slightly different but in essence they do the same as the single line deployment script. I will add the central kubeflow setup link as another option on how to deploy kubeflow.
financial_time_series/README.md
Outdated
``` | ||
$ tree | ||
. | ||
├── kubeflo_ks_app |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed - I will correct this.
Once the Jupyter Notebook instance is ready, we will launch a terminal to install the required packages that our code uses. | ||
In order to launch a terminal, click 'new' > 'terminal' and subsequently install the required packages. | ||
``` | ||
pip3 install google-cloud-bigquery==1.5.0 --user |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may not need the --user flag with the most recent notebook container, @pdmack?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call. Also just FYI this might have been related to a change to the base jupyter containers that got pushed out because I just started seeing errors like this myself and yeah to get around them have to use the --user flag.
financial_time_series/README.md
Outdated
pip3 install google-cloud-bigquery==1.5.0 --user | ||
``` | ||
|
||
Our Jupyter Notebook instance should be ready to run the code from the slightly adjusted notebook which is available on the github repository ```Machine Learning with Financial Time Series Data.ipynb```. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing link?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will reference the link from the original google page on the top of the README, in the introduction section and clarify here that the slightly adjusted notebook is available in this repository.
financial_time_series/README.md
Outdated
|
||
Our Jupyter Notebook instance should be ready to run the code from the slightly adjusted notebook which is available on the github repository ```Machine Learning with Financial Time Series Data.ipynb```. | ||
You can simply upload the notebook and walk through it step by step to better understand the problem and suggested solution(s). | ||
In this blogpost, the goal is not focus on the notebook itself but rather on how this notebook is being translated in more scalable training jobs and later on serving. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this example? Perhaps you're posting this as a blog post elsewhere which is great!
financial_time_series/README.md
Outdated
In this blogpost, the goal is not focus on the notebook itself but rather on how this notebook is being translated in more scalable training jobs and later on serving. | ||
|
||
### Training at scale with tf-jobs | ||
The next step is to 're-factor' the notebook code into python scripts which can then be containerized onto a docker image. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capital P Python, capital D Docker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted :-)
financial_time_series/README.md
Outdated
|
||
### Training at scale with tf-jobs | ||
The next step is to 're-factor' the notebook code into python scripts which can then be containerized onto a docker image. | ||
In the folder ```tensorflow-model``` on the github repository you can find these scripts together with a ```Dockerfile```. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Product and company names in caps, e.g. GitHub, Google Cloud Storage, Google Container Registry. Acronyms in all caps (e.g. GRPC, GKE). Other examples throughout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted
financial_time_series/README.md
Outdated
==================== | ||
|
||
This repository is linked to a series of blogposts. | ||
The first blogpost can be found on: https://blog.ml6.eu/using-kubeflow-for-financial-time-series-18580ef5df0b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link instead of raw URL (or maybe this renders fine idk).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted - I removed the link to the blogpost as it's not relevant in this context.
financial_time_series/README.md
Outdated
|
||
The open-source project [Kubeflow](https://www.kubeflow.org/) is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable. | ||
This repository walks through the exploration, training and serving of a machine learning model by leveraging Kubeflow's main components. | ||
As an example, we will use the [Machine Learning with Financial Time Series Data](https://cloud.google.com/solutions/machine-learning-with-financial-time-series-data) use case |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trailing period.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted
/ok-to-test |
@Svendegroote91 Looks like the CI is showing linter errors, you can find the .pylintrc file we use here: https://github.com/kubeflow/examples/blob/master/.pylintrc |
Hi Christopher,
Thanks for reviewing this PR.
I’ll have a look at your comments and the results from the CI test and update the code accordingly.
Indeed I need to reword some blocks because originally this code is used to support a blogpost :-).
I’ll also look to add some tests.
Best regards,
Sven
Op 29 sep. 2018, om 16:14 heeft Christopher Beitel <notifications@github.com<mailto:notifications@github.com>> het volgende geschreven:
@cwbeitel requested changes on this pull request.
Thanks for the PR @Svendegroote91<https://github.com/Svendegroote91>! 🎉
Two things:
1. Would you mind doing another close read through the docs to look for grammatical (e.g. capitalization) issues or any other minor fixes (missing links)?
2. What do you think about adding some tests? Maybe that would help this example stay up to date as dependencies change? A local test would probably be sufficient but extra credit if you can test the submission and resulting status of a job on Kubeflow!
In the mean time I'll try running your example, this is really cool!
________________________________
In financial_time_series/README.md<#252 (comment)>:
+The open-source project [Kubeflow](https://www.kubeflow.org/) is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable.
+This repository walks through the exploration, training and serving of a machine learning model by leveraging Kubeflow's main components.
+As an example, we will use the [Machine Learning with Financial Time Series Data](https://cloud.google.com/solutions/machine-learning-with-financial-time-series-data) use case
+
+### Pre-requisites
+**Note** You will need a Linux or Mac environment with Python 3.6.x and install the following requirements
+ * Install [Cloud SDK](https://cloud.google.com/sdk/)
+ * Install [gcloud](https://cloud.google.com/sdk/gcloud/)
+ * Install [ksonnet](https://ksonnet.io/#get-started) version 0.11.0 or later
+ * Install [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/)
+ * Access to Google Cloud Project and its GKE resources
+
+### Installing kubeflow on GKE
+We will first create a cluster named 'kubeflow' on google kubernetes engine.
+```
+gcloud container clusters create kubeflow --zone [ZONE] --machine-type n1-standard-2 --scopes=https://www.googleapis.com/auth/cloud-platform
Is it possible these setup steps will change in the future? If not no problem. If so it would be good to reference the central kubeflow setup docs to save on maintenance.
________________________________
In financial_time_series/README.md<#252 (comment)>:
+kubectl create clusterrolebinding default-admin --clusterrole=cluster-admin --user=[EMAIL-ADDRESS]
+```
+
+The set of commands above creates a cluster, connects our local environment to the cluster and changes the permissions on the cluster to allow kubeflow to run properly.
+Note that we had to define the scopes specifically since Kubernetes v1.10. If we would drop the scopes argument, the machines in the cluster have a lot of restrictions to use google cloud APIs to connect to other google cloud services such as Google Cloud Storage, BigQuery etc.
+Our cluster is now up and running and properly set up in order to install kubeflow.
+```
+export KUBEFLOW_VERSION=0.2.2
+curl https://raw.githubusercontent.com/kubeflow/kubeflow/v${KUBEFLOW_VERSION}/scripts/deploy.sh | bash
+```
+Note that it requires only a single command to deploy Kubeflow to an existing cluster.
+Once the script is finished, you should two new folders in your directory
+```
+$ tree
+.
+├── kubeflo_ks_app
typo?
________________________________
In financial_time_series/README.md<#252 (comment)>:
+
+### Exploration via tf-hub
+The tf-hub component of Kubeflow allows us to leverage Jupyter Notebooks to investigate the data and start building a feasible machine learning model for our problem statement.
+In order to access this component, we will set up port-forwarding between the tf-hub pod and our local machine.
+```
+POD=`kubectl get pods --selector=app=tf-hub | awk '{print $1}' | tail -1`
+kubectl port-forward $POD 8000:8000 2>&1 >/dev/null &
+```
+You should now be able to access the JupyterHub via ```localhost:8000```.
+After filling in a username and password you are prompted to select parameters to spawn a Jupyter Notebook.
+In this case, we will just set the ```image``` to ```gcr.io/kubeflow-images-public/tensorflow-1.8.0-notebook-cpu:v0.2.1```<http://gcr.io/kubeflow-images-public/tensorflow-1.8.0-notebook-cpu:v0.2.1```> and hit spawn.
+
+Once the Jupyter Notebook instance is ready, we will launch a terminal to install the required packages that our code uses.
+In order to launch a terminal, click 'new' > 'terminal' and subsequently install the required packages.
+```
+pip3 install google-cloud-bigquery==1.5.0 --user
You may not need the --user flag with the most recent notebook container, @pdmack<https://github.com/pdmack>?
________________________________
In financial_time_series/README.md<#252 (comment)>:
+In order to access this component, we will set up port-forwarding between the tf-hub pod and our local machine.
+```
+POD=`kubectl get pods --selector=app=tf-hub | awk '{print $1}' | tail -1`
+kubectl port-forward $POD 8000:8000 2>&1 >/dev/null &
+```
+You should now be able to access the JupyterHub via ```localhost:8000```.
+After filling in a username and password you are prompted to select parameters to spawn a Jupyter Notebook.
+In this case, we will just set the ```image``` to ```gcr.io/kubeflow-images-public/tensorflow-1.8.0-notebook-cpu:v0.2.1```<http://gcr.io/kubeflow-images-public/tensorflow-1.8.0-notebook-cpu:v0.2.1```> and hit spawn.
+
+Once the Jupyter Notebook instance is ready, we will launch a terminal to install the required packages that our code uses.
+In order to launch a terminal, click 'new' > 'terminal' and subsequently install the required packages.
+```
+pip3 install google-cloud-bigquery==1.5.0 --user
+```
+
+Our Jupyter Notebook instance should be ready to run the code from the slightly adjusted notebook which is available on the github repository ```Machine Learning with Financial Time Series Data.ipynb```.
Missing link?
________________________________
In financial_time_series/README.md<#252 (comment)>:
+POD=`kubectl get pods --selector=app=tf-hub | awk '{print $1}' | tail -1`
+kubectl port-forward $POD 8000:8000 2>&1 >/dev/null &
+```
+You should now be able to access the JupyterHub via ```localhost:8000```.
+After filling in a username and password you are prompted to select parameters to spawn a Jupyter Notebook.
+In this case, we will just set the ```image``` to ```gcr.io/kubeflow-images-public/tensorflow-1.8.0-notebook-cpu:v0.2.1```<http://gcr.io/kubeflow-images-public/tensorflow-1.8.0-notebook-cpu:v0.2.1```> and hit spawn.
+
+Once the Jupyter Notebook instance is ready, we will launch a terminal to install the required packages that our code uses.
+In order to launch a terminal, click 'new' > 'terminal' and subsequently install the required packages.
+```
+pip3 install google-cloud-bigquery==1.5.0 --user
+```
+
+Our Jupyter Notebook instance should be ready to run the code from the slightly adjusted notebook which is available on the github repository ```Machine Learning with Financial Time Series Data.ipynb```.
+You can simply upload the notebook and walk through it step by step to better understand the problem and suggested solution(s).
+In this blogpost, the goal is not focus on the notebook itself but rather on how this notebook is being translated in more scalable training jobs and later on serving.
In this example? Perhaps you're posting this as a blog post elsewhere which is great!
________________________________
In financial_time_series/README.md<#252 (comment)>:
+You should now be able to access the JupyterHub via ```localhost:8000```.
+After filling in a username and password you are prompted to select parameters to spawn a Jupyter Notebook.
+In this case, we will just set the ```image``` to ```gcr.io/kubeflow-images-public/tensorflow-1.8.0-notebook-cpu:v0.2.1```<http://gcr.io/kubeflow-images-public/tensorflow-1.8.0-notebook-cpu:v0.2.1```> and hit spawn.
+
+Once the Jupyter Notebook instance is ready, we will launch a terminal to install the required packages that our code uses.
+In order to launch a terminal, click 'new' > 'terminal' and subsequently install the required packages.
+```
+pip3 install google-cloud-bigquery==1.5.0 --user
+```
+
+Our Jupyter Notebook instance should be ready to run the code from the slightly adjusted notebook which is available on the github repository ```Machine Learning with Financial Time Series Data.ipynb```.
+You can simply upload the notebook and walk through it step by step to better understand the problem and suggested solution(s).
+In this blogpost, the goal is not focus on the notebook itself but rather on how this notebook is being translated in more scalable training jobs and later on serving.
+
+### Training at scale with tf-jobs
+The next step is to 're-factor' the notebook code into python scripts which can then be containerized onto a docker image.
Capital P Python, capital D Docker.
________________________________
In financial_time_series/README.md<#252 (comment)>:
+After filling in a username and password you are prompted to select parameters to spawn a Jupyter Notebook.
+In this case, we will just set the ```image``` to ```gcr.io/kubeflow-images-public/tensorflow-1.8.0-notebook-cpu:v0.2.1```<http://gcr.io/kubeflow-images-public/tensorflow-1.8.0-notebook-cpu:v0.2.1```> and hit spawn.
+
+Once the Jupyter Notebook instance is ready, we will launch a terminal to install the required packages that our code uses.
+In order to launch a terminal, click 'new' > 'terminal' and subsequently install the required packages.
+```
+pip3 install google-cloud-bigquery==1.5.0 --user
+```
+
+Our Jupyter Notebook instance should be ready to run the code from the slightly adjusted notebook which is available on the github repository ```Machine Learning with Financial Time Series Data.ipynb```.
+You can simply upload the notebook and walk through it step by step to better understand the problem and suggested solution(s).
+In this blogpost, the goal is not focus on the notebook itself but rather on how this notebook is being translated in more scalable training jobs and later on serving.
+
+### Training at scale with tf-jobs
+The next step is to 're-factor' the notebook code into python scripts which can then be containerized onto a docker image.
+In the folder ```tensorflow-model``` on the github repository you can find these scripts together with a ```Dockerfile```.
Product and company names in caps, e.g. GitHub, Google Cloud Storage, Google Container Registry. Acronyms in all caps (e.g. GRPC, GKE). Other examples throughout.
________________________________
In financial_time_series/README.md<#252 (comment)>:
@@ -0,0 +1,285 @@
+Using Kubeflow for Financial Time Series
+====================
+
+This repository is linked to a series of blogposts.
+The first blogpost can be found on: https://blog.ml6.eu/using-kubeflow-for-financial-time-series-18580ef5df0b
Link instead of raw URL (or maybe this renders fine idk).
________________________________
In financial_time_series/README.md<#252 (comment)>:
@@ -0,0 +1,285 @@
+Using Kubeflow for Financial Time Series
+====================
+
+This repository is linked to a series of blogposts.
+The first blogpost can be found on: https://blog.ml6.eu/using-kubeflow-for-financial-time-series-18580ef5df0b
+
+The open-source project [Kubeflow](https://www.kubeflow.org/) is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable.
+This repository walks through the exploration, training and serving of a machine learning model by leveraging Kubeflow's main components.
+As an example, we will use the [Machine Learning with Financial Time Series Data](https://cloud.google.com/solutions/machine-learning-with-financial-time-series-data) use case
trailing period.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#252 (review)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AgZETc2_mAAanEOYHwDLrQD53tWAZ_Plks5uf4BegaJpZM4W7OlG>.
|
Hi Christopher,
I ran the pylint script and cleaned up all the remarks.
[cid:800B1F94-1C49-431C-9F8B-5640C917EDF9]
However when I am pushing to the remote I see that the Pylint is still having remarks on my code:
[cid:104F9C7A-1F40-43EF-8B10-BE08A1E60BC7]
What is the reason for this discrepancy and how can I take action to succeed the test?
Best regards,
Sven
Op 30 sep. 2018, om 18:00 heeft Christopher Beitel <notifications@github.com<mailto:notifications@github.com>> het volgende geschreven:
@Svendegroote91<https://github.com/Svendegroote91> Looks like the CI is showing linter errors, you can find the .pylintrc file we user here: https://github.com/kubeflow/examples/blob/master/.pylintrc
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#252 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AgZETQlnfff5S8Gl70ZEQ7-rXcoSxbluks5ugOqUgaJpZM4W7OlG>.
|
@Svendegroote91 Are you providing the .pylintrc file (in the root of kubeflow/examples) when running your pylint command locally? I think the command is |
dc25169
to
494b121
Compare
494b121
to
eb9dab1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 17 files reviewed, 15 unresolved discussions (waiting on @cwbeitel, @Svendegroote91, and @zjj2wry)
financial_time_series/README.md, line 74 at r1 (raw file):
Previously, Svendegroote91 wrote…
I will reference the link from the original google page on the top of the README, in the introduction section and clarify here that the slightly adjusted notebook is available in this repository.
How about a relative link to the notebook available in this repo?
financial_time_series/tensorflow_model/models.py, line 1 at r3 (raw file):
"""
Add a license and make doc string single line? Same with other python files. I.e. Python files 😉
financial_time_series/tensorflow_model/models.py, line 8 at r3 (raw file):
class FlatModel(): """ Model that contains only single layer"""
Did you mean for the doc strings to have leading spaces?
financial_time_series/tensorflow_model/preprocess.py, line 10 at r3 (raw file):
def load_data(tickers): """
Maybe on something like this it would be nice to see a description as well (same with others)?
financial_time_series/tensorflow_model/run_train.py, line 25 at r3 (raw file):
def upload_to_storage(bucket, export_path):
This is fine or you might tgz it then upload that.
financial_time_series/tensorflow_model/run_train.py, line 112 at r3 (raw file):
# create signature for TensorFlow Serving logging.warning('Exporting model for tensorflow-serving...')
Hmm maybe these warning level logs can be info and the info level logs can be debug?
financial_time_series/tensorflow_model/run_train.py, line 158 at r3 (raw file):
if __name__ == '__main__': logging.info("Starting training")
Hmm maybe we can use tf.logging instead?
financial_time_series/tensorflow-model/Dockerfile, line 11 at r1 (raw file):
COPY requirements.txt / RUN pip3 install --no-cache-dir -r requirements.txt RUN pip3 install tensorflow
Same comment as with other tf dockerfiles, can build from base tf image unless you e.g. feel the image is overly large (I think it includes jupyter?)
financial_time_series/tensorflow-model/requirements.txt, line 3 at r1 (raw file):
google-cloud-storage==1.10.0 pandas numpy
Want to tag these to specific versions?
financial_time_series/tensorflow-model/CPU/Dockerfile, line 1 at r1 (raw file):
FROM python:3.6
You might use from tensorflow/tensorflow...-cpu as you were doing with the GPU container.
financial_time_series/tensorflow-model/GPU/Dockerfile, line 2 at r1 (raw file):
FROM tensorflow/tensorflow:1.8.0-devel-gpu-py3 MAINTAINER Sven Degroote "sven.degroote@ml6.eu"
Good practice but do you think it's necessary to include the maintainer line when this is included in a larger repo? Idk. Looks like the standard for kubeflow is along these lines https://github.com/kubeflow/kubeflow/blob/master/components/jupyterhub/docker/Dockerfile. If someone has a problem with the dockerfile hopefully they'll file an issue in this repo instead of contacting you directly. But you can still include your name and contact info in the primary examples readme and in the readme for this example.
financial_time_series/tensorflow-model/GPU/Dockerfile, line 5 at r1 (raw file):
RUN apt-get update -y && \ apt-get install -y build-essential python-numpy python-dev python3-pip python3 wget
Are there deps for numpy that are obtained via apt-get that you can't obtain via the standard pip install?
financial_time_series/tensorflow-model/GPU/Dockerfile, line 13 at r1 (raw file):
RUN pip3 install -r requirements.txt RUN pip3 uninstall protobuf -y RUN pip3 install tensorflow-gpu==1.8.0
If we're building from the tf gpu base image you can remove this but it a no-op at the moment.
financial_time_series/tensorflow-model/GPU/Dockerfile, line 17 at r1 (raw file):
# Verify that tensorflow is installed RUN pip3 show tensorflow
This fails silently if tf is not installed, how about python -c 'import tensorflow' or something similar?
financial_time_series/tensorflow-model/GPU/train.jsonnet, line 1 at r1 (raw file):
local env = std.extVar("__ksonnet/environments");
This is one of the things I was hoping to see unit tested. You can do this with a small python wrapper that uses ksonnet to generate a job YAML then uses the Kuberenetes Python api (https://github.com/kubernetes-client/python) to submit (e.g. https://github.com/cwbeitel/tk/blob/master/tk/kube.py#L629) the job, wait for completion (e.g. https://github.com/cwbeitel/tk/blob/master/tk/kube.py#L557), and then check the resulting status code (i.e. self.assertEqual(results.get("status", {}).get("state", {}), "Succeeded")).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 18 files reviewed, 15 unresolved discussions (waiting on @Svendegroote91, @cwbeitel, and @zjj2wry)
financial_time_series/README.md, line 74 at r1 (raw file):
Previously, cwbeitel (Christopher Beitel) wrote…
How about a relative link to the notebook available in this repo?
I will add a relative link.
financial_time_series/tensorflow_model/models.py, line 1 at r3 (raw file):
Previously, cwbeitel (Christopher Beitel) wrote…
Add a license and make doc string single line? Same with other python files. I.e. Python files 😉
OK
financial_time_series/tensorflow_model/models.py, line 8 at r3 (raw file):
Previously, cwbeitel (Christopher Beitel) wrote…
Did you mean for the doc strings to have leading spaces?
Done.
financial_time_series/tensorflow_model/preprocess.py, line 10 at r3 (raw file):
Previously, cwbeitel (Christopher Beitel) wrote…
Maybe on something like this it would be nice to see a description as well (same with others)?
Done.
financial_time_series/tensorflow_model/run_train.py, line 112 at r3 (raw file):
Previously, cwbeitel (Christopher Beitel) wrote…
Hmm maybe these warning level logs can be info and the info level logs can be debug?
Done.
financial_time_series/tensorflow_model/run_train.py, line 158 at r3 (raw file):
Previously, cwbeitel (Christopher Beitel) wrote…
Hmm maybe we can use tf.logging instead?
I'll remove this, it's obsolete as we have logging in the training script itself and from tf.
financial_time_series/tensorflow-model/Dockerfile, line 11 at r1 (raw file):
Previously, cwbeitel (Christopher Beitel) wrote…
Same comment as with other tf dockerfiles, can build from base tf image unless you e.g. feel the image is overly large (I think it includes jupyter?)
Done.
financial_time_series/tensorflow-model/requirements.txt, line 3 at r1 (raw file):
Previously, cwbeitel (Christopher Beitel) wrote…
Want to tag these to specific versions?
Done.
financial_time_series/tensorflow-model/CPU/Dockerfile, line 1 at r1 (raw file):
Previously, cwbeitel (Christopher Beitel) wrote…
You might use from tensorflow/tensorflow...-cpu as you were doing with the GPU container.
Done.
financial_time_series/tensorflow-model/GPU/Dockerfile, line 2 at r1 (raw file):
Previously, cwbeitel (Christopher Beitel) wrote…
Good practice but do you think it's necessary to include the maintainer line when this is included in a larger repo? Idk. Looks like the standard for kubeflow is along these lines https://github.com/kubeflow/kubeflow/blob/master/components/jupyterhub/docker/Dockerfile. If someone has a problem with the dockerfile hopefully they'll file an issue in this repo instead of contacting you directly. But you can still include your name and contact info in the primary examples readme and in the readme for this example.
Done.
financial_time_series/tensorflow-model/GPU/Dockerfile, line 5 at r1 (raw file):
Previously, cwbeitel (Christopher Beitel) wrote…
Are there deps for numpy that are obtained via apt-get that you can't obtain via the standard pip install?
Done.
financial_time_series/tensorflow-model/GPU/Dockerfile, line 13 at r1 (raw file):
Previously, cwbeitel (Christopher Beitel) wrote…
If we're building from the tf gpu base image you can remove this but it a no-op at the moment.
Done.
financial_time_series/tensorflow-model/GPU/Dockerfile, line 17 at r1 (raw file):
Previously, cwbeitel (Christopher Beitel) wrote…
This fails silently if tf is not installed, how about python -c 'import tensorflow' or something similar?
Done.
financial_time_series/tensorflow-model/GPU/train.jsonnet, line 1 at r1 (raw file):
Previously, cwbeitel (Christopher Beitel) wrote…
This is one of the things I was hoping to see unit tested. You can do this with a small python wrapper that uses ksonnet to generate a job YAML then uses the Kuberenetes Python api (https://github.com/kubernetes-client/python) to submit (e.g. https://github.com/cwbeitel/tk/blob/master/tk/kube.py#L629) the job, wait for completion (e.g. https://github.com/cwbeitel/tk/blob/master/tk/kube.py#L557), and then check the resulting status code (i.e. self.assertEqual(results.get("status", {}).get("state", {}), "Succeeded")).
I'll have a look and see if I can put something together that performs these actions.
How do you prefer to structure this test inside the code base? Should this be a separate script (e.g. unit_test.py) that is executed manually and throws an AssertionError if not working?
b7b2ec8
to
a979c1e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 19 files reviewed, 21 unresolved discussions (waiting on @cwbeitel, @Svendegroote91, and @zjj2wry)
financial_time_series/README.md, line 26 at r6 (raw file):
git clone https://github.com/Svendegroote91/examples.git
Why are you cloning your personal repo?
Why aren't you just telling users to follow the instructions on kubeflow.org?
For GKE
https://v0-2.kubeflow.org/docs/started/getting-started-gke/
Note I think you're using 0.2 so you should point them at the 0.2 branch
financial_time_series/README.md, line 33 at r6 (raw file):
Note that we had to define the scopes specifically since Kubernetes v1.10.
If you follow our instructions for deploying Kubeflow; it should be deployed with the proper scopes
financial_time_series/README.md, line 41 at r6 (raw file):
export KUBEFLOW_VERSION=0.2.2
I think 0.2.5 is the latest for the 0.2 series
financial_time_series/README.md, line 131 at r6 (raw file):
You can verify the parameter settings in the params.libsonnet in the directorykubeflow_ks_app/components.
missing a space after "directory"
financial_time_series/README.md, line 224 at r6 (raw file):
### Expose In the previous section we used a prediction service on our Kubeflow cluster which is by default only available from within the cluster.
Can you drop this section please?
Exposing an ip publicly is not secure and I would prefer not to talk about that in our docs. Anyone who likely understands the security implications of doing so likely already knows how to do so.
Predictions can be made securely from outside the cluster using IAP.
financial_time_series/README.md, line 257 at r6 (raw file):
cloud container node-pools create gpu-pool --accelerator type=nvidia-tesla-k80,count=1 --zone europe-west1-b --cluster kubeflow --num-nodes 1 --min-nodes 1 --max-nodes 1 --enable-autoscaling --scopes=https://www.googleapis.com/auth/cloud-platform
The preffered approach would be to update the deployment manager config for your cluster. But we can fix that in a follow on PR.
financial_time_series/tensorflow-model/GPU/train.jsonnet, line 1 at r1 (raw file):
Previously, Svendegroote91 wrote…
I'll have a look and see if I can put something together that performs these actions.
How do you prefer to structure this test inside the code base? Should this be a separate script (e.g. unit_test.py) that is executed manually and throws an AssertionError if not working?
Use the unittest module and the assert functions to test values.
This test will fail.
Once you have that we can add it to our E2E tests.
Please create an OWNERs file and add yourself as an approver I made some minor comments. /assign @texasmichelle Overall this LGTM do you want to review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 19 files reviewed, 19 unresolved discussions (waiting on @cwbeitel, @Svendegroote91, and @zjj2wry)
financial_time_series/README.md, line 74 at r1 (raw file):
Previously, Svendegroote91 wrote…
I will add a relative link.
Let's make this a relative link to the kubeflow/examples location of the notebook, e.g. here.
Instead of to your personal repo which might change.
financial_time_series/tensorflow_model/metrics.py, line 1 at r5 (raw file):
"""
Let's have all the module docs follow """This is a description fitting on one line...\n maybe more details ...""" and in general the formatting style illustrated here: https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html
financial_time_series/tensorflow_model/metrics.py, line 7 at r5 (raw file):
def tf_calc_confusion_matrix(actuals, predictions): """
Description please
financial_time_series/tensorflow_model/metrics.py, line 67 at r5 (raw file):
def tf_calc_confusion_metrics(true_pos, true_neg, false_pos, false_neg): """
Description please
financial_time_series/tensorflow_model/metrics.py, line 98 at r5 (raw file):
def tf_confusion_metrics(model, actual_classes, session, feed_dict): """ calculates confusion metrics when training
Caps and leading space
financial_time_series/tensorflow_model/preprocess.py, line 42 at r5 (raw file):
def preprocess_data(closing_data): """
Description please
financial_time_series/tensorflow_model/preprocess.py, line 112 at r5 (raw file):
def train_test_split(training_test_data, train_test_ratio=0.8): """
Description please
financial_time_series/tensorflow_model/run_train.py, line 46 at r5 (raw file):
def run_training(args): """
Description please
financial_time_series/tensorflow_model/run_train.py, line 129 at r5 (raw file):
def main(): """ main """
No description needed, should be covered by the module doc string.
financial_time_series/tensorflow_model/serving_requests/request.py, line 13 at r5 (raw file):
""" function that gets the prediction for a certain date in the test set Args:
Docs please
financial_time_series/tensorflow_model/serving_requests/request_helper.py, line 12 at r5 (raw file):
def send_request(input_tensor): """
Description and return docs please.
financial_time_series/tensorflow-model/Dockerfile, line 11 at r1 (raw file):
Previously, Svendegroote91 wrote…
Done.
Let's use 2018 and The Kubeflow Authors as they do in https://github.com/kubeflow/kubeflow/blob/430d4f7ae16af389ac570182af519f6571f160cf/build/boilerplate/boilerplate.txt
financial_time_series/tensorflow-model/GPU/train.jsonnet, line 1 at r1 (raw file):
Previously, jlewi (Jeremy Lewi) wrote…
Use the unittest module and the assert functions to test values.
This test will fail.
Once you have that we can add it to our E2E tests.
As far as structure I would like to see a *_test.py file for each primary python file, i.e. metrics_test.py, models_test.py, preprocess_test.py, and train_test.py. Don't worry about testing request.py that's a little dicy and we have an issue open for it. In the case of training start with local training with a trivial number of steps and network size and you probably only need to worry about testing the ksonnet generate/submit/check status loop after having that.
To summarize my remaining comments are (1) function and module docs and aesthetics and (2) local unit tests, and (3) tests of a remote run using config generated by ksonnet. @jlewi @texasmichelle What do you think of dropping (3) from this PR given that so many of the other examples have this need and this could be solved with some shared tooling (instead of asking data scientists to do it)? E.g. ks_test_runner.py component_name |
I think its @texasmichelle 's call |
Would you mind just addressing the remaining comments of mine on the instructions? Since we haven't been very diligent about requiring tests I don't think we should require them for this PR (although they would be nice:). We would certainly appreciate help keeping this sample up to date and refining it over time. |
/assign @jlewi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 20 files reviewed, 19 unresolved discussions (waiting on @cwbeitel, @Svendegroote91, @jlewi, and @zjj2wry)
financial_time_series/README.md, line 74 at r1 (raw file):
Previously, cwbeitel (Christopher Beitel) wrote…
Let's make this a relative link to the kubeflow/examples location of the notebook, e.g. here.
Instead of to your personal repo which might change.
Done.
financial_time_series/README.md, line 26 at r6 (raw file):
Previously, jlewi (Jeremy Lewi) wrote…
Why are you cloning your personal repo?
Why aren't you just telling users to follow the instructions on kubeflow.org?For GKE
https://v0-2.kubeflow.org/docs/started/getting-started-gke/Note I think you're using 0.2 so you should point them at the 0.2 branch
Updated to kubeflow/example repo & link to v0.2 branch.
I use the minikube deployment script instead of the general instructions for GKE but I have now clearly mentioned why and where the deployment instructions for GKE can be found (section right below 'Deploying Kubeflow on GKE').
financial_time_series/README.md, line 33 at r6 (raw file):
Previously, jlewi (Jeremy Lewi) wrote…
If you follow our instructions for deploying Kubeflow; it should be deployed with the proper scopes
Correct, this is related to previous comment. I mention now that when using the deployment script for GKE, the cluster will be created automatically with the appropiate scopes.
financial_time_series/README.md, line 41 at r6 (raw file):
Previously, jlewi (Jeremy Lewi) wrote…
I think 0.2.5 is the latest for the 0.2 series
I have not yet tested the full workflow with 0.2.5 and would like to stick for now with 0.2.2 until I found some time to test if everything still works with v0.2.5.
financial_time_series/README.md, line 131 at r6 (raw file):
Previously, jlewi (Jeremy Lewi) wrote…
missing a space after "directory"
Done.
financial_time_series/README.md, line 224 at r6 (raw file):
Previously, jlewi (Jeremy Lewi) wrote…
Can you drop this section please?
Exposing an ip publicly is not secure and I would prefer not to talk about that in our docs. Anyone who likely understands the security implications of doing so likely already knows how to do so.
Predictions can be made securely from outside the cluster using IAP.
Done.
financial_time_series/README.md, line 257 at r6 (raw file):
Previously, jlewi (Jeremy Lewi) wrote…
The preffered approach would be to update the deployment manager config for your cluster. But we can fix that in a follow on PR.
I would be interested to learn how to do it with the deployment manager config, and can include it in an update PR. Where can I find information on this topic?
financial_time_series/tensorflow_model/metrics.py, line 1 at r5 (raw file):
Previously, cwbeitel (Christopher Beitel) wrote…
Let's have all the module docs follow """This is a description fitting on one line...\n maybe more details ...""" and in general the formatting style illustrated here: https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html
Done.
financial_time_series/tensorflow_model/metrics.py, line 7 at r5 (raw file):
Previously, cwbeitel (Christopher Beitel) wrote…
Description please
Done.
financial_time_series/tensorflow_model/metrics.py, line 67 at r5 (raw file):
Previously, cwbeitel (Christopher Beitel) wrote…
Description please
Done.
financial_time_series/tensorflow_model/metrics.py, line 98 at r5 (raw file):
Previously, cwbeitel (Christopher Beitel) wrote…
Caps and leading space
Done.
financial_time_series/tensorflow_model/preprocess.py, line 42 at r5 (raw file):
Previously, cwbeitel (Christopher Beitel) wrote…
Description please
Done.
financial_time_series/tensorflow_model/preprocess.py, line 112 at r5 (raw file):
Previously, cwbeitel (Christopher Beitel) wrote…
Description please
Done.
financial_time_series/tensorflow_model/run_train.py, line 46 at r5 (raw file):
Previously, cwbeitel (Christopher Beitel) wrote…
Description please
Done.
financial_time_series/tensorflow_model/run_train.py, line 129 at r5 (raw file):
Previously, cwbeitel (Christopher Beitel) wrote…
No description needed, should be covered by the module doc string.
Done.
financial_time_series/tensorflow_model/serving_requests/request.py, line 13 at r5 (raw file):
Previously, cwbeitel (Christopher Beitel) wrote…
Docs please
Done.
financial_time_series/tensorflow_model/serving_requests/request_helper.py, line 12 at r5 (raw file):
Previously, cwbeitel (Christopher Beitel) wrote…
Description and return docs please.
Done.
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jlewi The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Just seeing this now - thanks @Svendegroote91 & @chrisheecho for putting this together! Agreed on not requiring tests, but it's a high priority for follow-on work. I support shared tooling for use with other examples. |
And thanks to @cwbeitel for the detailed reviews! |
* add financial time series example * fix ReadMe comments * fix PyLint remarks * clean up based on PR remarks * Completing docstrings and fixing PR remarks
This PR includes the addition of a financial time series example.
The initiative to develop a financial time series demo was discussed with:
This change is