|
| 1 | ++++ |
| 2 | +title = "Kubeflow Overview" |
| 3 | +description = "How Kubeflow helps you organize your ML workflow" |
| 4 | +weight = 10 |
| 5 | ++++ |
| 6 | + |
| 7 | +<!-- |
| 8 | +Note for authors: The source of the diagrams is held in Google Slides decks, |
| 9 | +in the "Doc diagrams" folder in the public Kubeflow shared drive. |
| 10 | +--> |
| 11 | + |
| 12 | +This guide introduces Kubeflow as a platform for developing and deploying a |
| 13 | +machine learning (ML) system. |
| 14 | + |
| 15 | +Kubeflow is a platform for data scientists who want to build and experiment with |
| 16 | +ML pipelines. Kubeflow is also for ML engineers and operational teams who want |
| 17 | +to deploy ML systems to various environments for development, testing, and |
| 18 | +production-level serving. |
| 19 | + |
| 20 | +## Conceptual overview |
| 21 | + |
| 22 | +Kubeflow is *the ML toolkit for Kubernetes*. |
| 23 | +The following diagram shows Kubeflow as a platform for arranging the |
| 24 | +components of your ML system on top of Kubernetes: |
| 25 | + |
| 26 | +<img src="/docs/images/kubeflow-overview-platform-diagram.svg" |
| 27 | + alt="An architectural overview of Kubeflow on Kubernetes" |
| 28 | + class="mt-3 mb-3 border border-info rounded"> |
| 29 | + |
| 30 | +Kubeflow builds on [Kubernetes](https://kubernetes.io/) as a system for |
| 31 | +deploying, scaling, and managing complex systems. |
| 32 | + |
| 33 | +Using the Kubeflow configuration interfaces (see [below](#interfaces)) you can |
| 34 | +specify the ML tools required for your workflow. Then you can deploy the |
| 35 | +workflow to various clouds, local, and on-premises platforms for experimentation and |
| 36 | +for production use. |
| 37 | + |
| 38 | +## Introducing the ML workflow |
| 39 | + |
| 40 | +When you develop and deploy an ML system, the ML workflow typically consists of |
| 41 | +several stages. Developing an ML system is an iterative process. |
| 42 | +You need to evaluate the output of various stages of the ML workflow, and apply |
| 43 | +changes to the model and parameters when necessary to ensure the model keeps |
| 44 | +producing the results you need. |
| 45 | + |
| 46 | +For the sake of simplicity, the following diagram |
| 47 | +shows the workflow stages in sequence. The arrow at the end of the workflow |
| 48 | +points back into the flow to indicate the iterative nature of the process: |
| 49 | + |
| 50 | +<img src="/docs/images/kubeflow-overview-workflow-diagram-1.svg" |
| 51 | + alt="A typical machine learning workflow" |
| 52 | + class="mt-3 mb-3 border border-info rounded"> |
| 53 | + |
| 54 | +Looking at the stages in more detail: |
| 55 | + |
| 56 | +* In the experimental phase, you develop your model based on initial |
| 57 | + assumptions, and test and update the model iteratively to produce the |
| 58 | + results you're looking for: |
| 59 | + |
| 60 | + * Identify the problem you want the ML system to solve. |
| 61 | + * Collect and analyze the data you need to train your ML model. |
| 62 | + * Choose an ML framework and algorithm, and code the initial version of your |
| 63 | + model. |
| 64 | + * Experiment with the data and with training your model. |
| 65 | + * Tune the model hyperparameters to ensure the most efficient processing and the |
| 66 | + most accurate results possible. |
| 67 | + |
| 68 | +* In the production phase, you deploy a system that performs the following |
| 69 | + processes: |
| 70 | + |
| 71 | + * Transform the data into the format that your training system needs. |
| 72 | + To ensure that your model behaves consistently during training and |
| 73 | + prediction, the transformation process must be the same in the experimental |
| 74 | + and production phases. |
| 75 | + * Train the ML model. |
| 76 | + * Serve the model for online prediction or for running in batch mode. |
| 77 | + * Monitor the model's performance, and feed the results into your processes |
| 78 | + for tuning or retraining the model. |
| 79 | + |
| 80 | +## Kubeflow components in the ML workflow |
| 81 | + |
| 82 | +The next diagram adds Kubeflow to the workflow, showing which Kubeflow |
| 83 | +components are useful at each stage: |
| 84 | + |
| 85 | +<img src="/docs/images/kubeflow-overview-workflow-diagram-2.svg" |
| 86 | + alt="Where Kubeflow fits into a typical machine learning workflow" |
| 87 | + class="mt-3 mb-3 border border-info rounded"> |
| 88 | + |
| 89 | +To learn more, read the following guides to the Kubeflow components: |
| 90 | + |
| 91 | +* Kubeflow includes services for spawning and managing |
| 92 | + [Jupyter notebooks](/docs/notebooks/). Use notebooks for interactive data |
| 93 | + science and experimenting with ML workflows. |
| 94 | + |
| 95 | +* [Kubeflow Pipelines](/docs/pipelines/pipelines-overview/) is a platform for |
| 96 | + building, deploying, and managing multi-step ML workflows based on Docker |
| 97 | + containers. |
| 98 | + |
| 99 | +* Kubeflow offers several [components](/docs/components/) that you can use |
| 100 | + to build your ML training, hyperparameter tuning, and serving workloads across |
| 101 | + multiple platforms. |
| 102 | + |
| 103 | +## Example of a specific ML workflow |
| 104 | + |
| 105 | +The following diagram shows a simple example of a specific ML workflow that you |
| 106 | +can use to train and serve a model trained on the MNIST dataset: |
| 107 | + |
| 108 | +<img src="/docs/images/kubeflow-gcp-e2e-tutorial-simplified.svg" |
| 109 | + alt="ML workflow for training and serving an MNIST model" |
| 110 | + class="mt-3 mb-3 border border-info rounded"> |
| 111 | + |
| 112 | +For details of the workflow and to run the system yourself, see the |
| 113 | +[end-to-end tutorial for Kubeflow on GCP](/docs/gke/gcp-e2e/). |
| 114 | + |
| 115 | +<a id="interfaces"></a> |
| 116 | +## Kubeflow interfaces |
| 117 | + |
| 118 | +This section introduces the interfaces that you can use to interact with |
| 119 | +Kubeflow and to build and run your ML workflows on Kubeflow. |
| 120 | + |
| 121 | +### Kubeflow user interface (UI) |
| 122 | + |
| 123 | +The Kubeflow UI looks like this: |
| 124 | + |
| 125 | +<img src="/docs/images/central-ui.png" |
| 126 | + alt="The Kubeflow UI" |
| 127 | + class="mt-3 mb-3 border border-info rounded"> |
| 128 | + |
| 129 | +The UI offers a central dashboard that you can use to access the components |
| 130 | +of your Kubeflow deployment. Read |
| 131 | +[how to access the UI](/docs/other-guides/accessing-uis/). |
| 132 | + |
| 133 | +### Kubeflow command line interface (CLI) |
| 134 | + |
| 135 | +**Kfctl** is the Kubeflow CLI that you can use to install and configure |
| 136 | +Kubeflow. Read about kfctl in the guide to |
| 137 | +[configuring Kubeflow](/docs/other-guides/kustomize/). |
| 138 | + |
| 139 | +The Kubernetes CLI, **kubectl**, is useful for running commands against your |
| 140 | +Kubeflow cluster. You can use kubectl to deploy applications, inspect and manage |
| 141 | +cluster resources, and view logs. Read about kubectl in the [Kubernetes |
| 142 | +documentation](https://kubernetes.io/docs/tasks/tools/install-kubectl/). |
| 143 | + |
| 144 | +## Kubeflow APIs and SDKs |
| 145 | + |
| 146 | +Various components of Kubeflow offer APIs and Python SDKs. See the following |
| 147 | +sets of reference documentation: |
| 148 | + |
| 149 | +* [Kubeflow reference docs](/docs/reference/) for guides to the Kubeflow |
| 150 | + Metadata API and SDK, the PyTorchJob CRD, and the TFJob CRD. |
| 151 | +* [Pipelines reference docs](/docs/pipelines/reference/) for the Kubeflow |
| 152 | + Pipelines API and SDK, including the Kubeflow Pipelines domain-specific |
| 153 | + language (DSL). |
| 154 | +* [Fairing reference docs](/docs/fairing/reference/) for the Kubeflow Fairing |
| 155 | + SDK. |
| 156 | + |
| 157 | +## Next steps |
| 158 | + |
| 159 | +See how to [install Kubeflow](/docs/started/getting-started/) depending on |
| 160 | +your chosen environment (local, cloud, or on-premises). |
0 commit comments