Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Friction log for TFX Chicago taxi cab example on minikube #594

Closed
jlewi opened this issue Apr 5, 2018 · 6 comments
Closed

Friction log for TFX Chicago taxi cab example on minikube #594

jlewi opened this issue Apr 5, 2018 · 6 comments

Comments

@jlewi
Copy link
Contributor

jlewi commented Apr 5, 2018

We'd love for some folks to try running the TFX Chicago Taxi Example
https://github.com/tensorflow/model-analysis/tree/master/examples/chicago_taxi

and providing a log of any pain points or problems they encounter trying to run it.

We are especially interested in the experience deploying and running on minikube as that's were folks will start.

The goal would be to start with deploying minikube and capture any information needed to properly setup minikube for kubeflow (e.g disk size). As noted in #502 we want to provide guidance on how to setup minikube for Kubeflow.

Then go through all the steps in the example for running locally and note any problems or difficulties you encounter.

It would also be fantastic to run it on other Kubeflow distributions.

@Maerville
Copy link
Contributor

Maerville commented Apr 6, 2018

  1. if while installing kubeflow person wants to install jupyter hub as well, default 20Gb disk-size of minikube is not enough due to large tf image
  2. kubectl get svc -n=${NAMESPACE} -- this command shows outdated results in tutorial, now it also includes k8s-dashboard and ambassadors
  3. ks param set kubeflow-core jupyterNotebookPVCMount /home/jovyan/work -- this is also funny

@jlewi
Copy link
Contributor Author

jlewi commented Apr 7, 2018

@Maerville Thank you very much.

The jupyter notebook is always running as user jovyan which is why you have that path.

@Maerville
Copy link
Contributor

  1. Using tf-image version 1.7.0-cpu

  2. Opened terminal in Jupyter hub and cloned repository https://github.com/tensorflow/model-analysis

  3. Opened chicago_taxi_tfma_local_playground.ipynb
    Visualization: Plots -- kernel died.
    At first resources were cpu-1, memory-1Gi. With memory 2Gi, 3Gi kernel died as well. 4Gi finally worked out.
    Shutdown the notebook

  4. Open terminal. Run steps in "Running the local example" section.
    bash ./preprocess_local.sh -- fails. Because uses "python" command, while python is pointing to /opt/conda/bin/python and jupyter kernels are pointing to /opt/conda/envs/ipykernel_py2/bin/python. Last one has all dependencies.
    So after changing path to python in preprocess_local.sh script worked.
    Had to make same changes in train_local.sh and process_tfma_local.sh

  5. Ran all cells in chicago_taxi_tfma.ipynb successfully

  6. bash ./start_model_server_local.sh
    ERROR: This script requires Docker
    [kinda obvious haha]

Conclusion:

  1. minimum 4Gb memory for Jupyter hub (although even 2Gb was enough for training, problems started when plotting the results).
  2. Need alias python to normal pythons. (normal means the ones who have all dependencies).

This test was performed on Minikube.

@Maerville
Copy link
Contributor

Performed same test on Kubeadm (1 machine, provided 4 CPU and 16Gi to Jupyter hub).
Everything works successfully (but also had to change python link)

@jlewi
Copy link
Contributor Author

jlewi commented Apr 18, 2018

Fantastic thank you.

@jlewi
Copy link
Contributor Author

jlewi commented May 10, 2018

/close

yanniszark pushed a commit to arrikto/kubeflow that referenced this issue Feb 15, 2021
surajkota pushed a commit to surajkota/kubeflow that referenced this issue Jun 13, 2022
* Fix ml-pipeline-ui doesn't have gcp permission bug

* Moved patch to gcp overlay

* Regenerated tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants