Skip to content

Commit

Permalink
install kubeflow
Browse files Browse the repository at this point in the history
  • Loading branch information
huataihuang committed Aug 30, 2023
1 parent d6fe7d6 commit d2fb2f4
Show file tree
Hide file tree
Showing 12 changed files with 102 additions and 4 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -147,4 +147,5 @@ Namespace LimitRange
- `Resource Management for Pods and Containers <https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/>`_
- `Kubernetes best practices: Resource requests and limits <https://cloud.google.com/blog/products/containers-kubernetes/kubernetes-best-practices-resource-requests-and-limits>`_
- `A Deep Dive into Kubernetes Metrics — Part 3 Container Resource Metrics <https://blog.freshtracks.io/a-deep-dive-into-kubernetes-metrics-part-3-container-resource-metrics-361c5ee46e66>`_
- `Understanding Kubernetes Limits and Requests <https://sysdig.com/blog/kubernetes-limits-requests/>`_ 这篇sysdig的博文非常形象,提供了一个很好的图解案例,值得参考学习
- `Understanding Kubernetes Limits and Requests <https://sysdig.com/blog/kubernetes-limits-requests/>`_ 这篇sysdig的博文非常形象,提供了一个很好的图解案例,值得参考学习; 此外sysdig还有类似文档 `Kubernetes OOM and CPU Throttling <https://sysdig.com/blog/troubleshoot-kubernetes-oom/>`_ 通过图示解析了OOM和CPU节流的原理,后续再学习
- `为命名空间配置默认的内存请求和限制 <https://kubernetes.io/zh-cn/docs/tasks/administer-cluster/manage-resources/memory-default-namespace/>`_ Kubernetes官方文档,补充上文配置namespace设置默认内存限制方法,这个文档系列中也包含对cpu的约束配置案例
1 change: 1 addition & 0 deletions source/kubernetes/concepts/scheduling/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ Kubernetes调度
assign_pod_node.rst
kube-scheduling-framework.rst
kube-scheduling-tuning.rst
node_pressure_eviction.rst

.. only:: subproject and html

Expand Down
20 changes: 20 additions & 0 deletions source/kubernetes/concepts/scheduling/node_pressure_eviction.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
.. _node_pressure_eviction:

=========================
节点压力驱逐
=========================

节点压力驱逐(node pressure eviction) 是 ``kubelet`` 主动终止Pod以回收节点上资源的过程。Kubelet监控集群节点的内存、磁盘空间和文件系统inode等资源,当这些资源的一个或多个达到特定消耗水平,kubelet可以主动使节点上的一个或多个Pod失效以回收资源。

kubelet 使用各种参数来做出驱逐决定:

- 驱逐信号
- 驱逐条件
- 监控间隔


参考
======

- `Kubernetes 文档 / 概念 / 调度、抢占和驱逐 / 节点压力驱逐 <https://kubernetes.io/zh-cn/docs/concepts/scheduling-eviction/node-pressure-eviction/>`_
- `Under Disk Pressure <https://neilcameronwhite.medium.com/under-disk-pressure-34b5ba4284b6>`_
5 changes: 4 additions & 1 deletion source/kubernetes/deploy/kustomize.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,10 @@ kustomize

choco install kustomize

- 全系列各平台都可以使用都安装方法是直接从 `kustomize 官方release <https://github.com/kubernetes-sigs/kustomize/releases>`_ 下载。
- 全系列各平台都可以使用都安装方法是直接从 `kustomize 官方release <https://github.com/kubernetes-sigs/kustomize/releases>`_ 下载,推荐采用官方安装脚本安装:

.. literalinclude:: kustomize/install_kustomize_script
:caption: 官方二进制安装脚本执行(需要非常畅通的网络),在当前目录下对应OS的 ``kustomize``

- 也可以通过Go v.10.1 或更高版本安装::

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
curl -s "https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/install_kustomize.sh" | bash
1 change: 1 addition & 0 deletions source/kubernetes/kubeflow/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ Kubeflow - Kubernetes机器学习工作流平台

intro_kubeflow.rst
install_kubeflow.rst
install_kubeflow_single_command.rst
charmed/index

.. only:: subproject and html
Expand Down
7 changes: 6 additions & 1 deletion source/kubernetes/kubeflow/install_kubeflow.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,12 @@ Kubeflow安装方式
Kubeflow Manifests
----------------------

``Kubeflow Manifests`` 是在纯粹的Kubernetes机群上通过 :ref:`kustomize` 和 :ref:`kubectl` 完成Kubeflow部署,是一种更为底层和复杂的部署技术,将在积累一定经验之后再实践
``Kubeflow Manifests`` 是在纯粹的Kubernetes机群上通过 :ref:`kustomize` 和 :ref:`kubectl` 完成Kubeflow部署,是一种更为底层和复杂的部署技术。

我的实践计划是想在 :ref:`kubespray` 部署的 :ref:`y-k8s` 集群上迭代部署 ``Kubeflow Manifests`` :

- GPU节点采用 :ref:`vgpu` 将单块 :ref:`tesla_p10` 拆分成2块,模拟集群的2个 :ref:`gpu_k8s` 节点
- 首先尝试 :ref:`install_kubeflow_single_command`

参考
=======
Expand Down
51 changes: 51 additions & 0 deletions source/kubernetes/kubeflow/install_kubeflow_single_command.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
.. _install_kubeflow_single_command:

==================================
单条命令安装kubeflow
==================================

准备工作
=========

- 使用默认 :ref:`k8s_storage` : Provisioner种类其实不多,我考虑使用以下几种类型:

- :ref:`cephfs`
- :ref:`linux_iscsi`
- :ref:`nfs`
- :ref:`ceph_rbd`
- :ref:`k8s_local`

- :ref:`kustomize` 5.0.3 以上:

.. literalinclude:: ../deploy/kustomize/install_kustomize_script
:caption: 官方二进制安装脚本执行(需要非常畅通的网络),在当前目录下对应OS的 ``kustomize``

- ``kubectl``

安装
========

- clone下仓库并进入 ``apps`` 目录:

.. literalinclude:: install_kubeflow_single_command/install_kubeflow
:language: bash
:caption: 单条命令安装kubeflow

.. note::

安装是如此简洁,令人击节赞叹...我厂的软件交付...

- 完成安装后,可能需要等待一些时间让所有的pods就绪,可以通过以下命令来确认:

.. literalinclude:: install_kubeflow_single_command/check_kubeflow
:language: bash
:caption: 检查是否所有安装的 ``kubeflow`` 相关Pods就绪

.. note::

我在 :ref:`y-k8s` 部署的虚拟机中采用了极小化的虚拟磁盘,遇到一个尴尬的问题就是 :ref:`node_pressure_eviction` ,也就是磁盘空间不足导致运行Pod被驱逐。在上述Pods检测就绪发现存在问题时,通过 :ref:`ceph_extend_rbd_drive_with_libvirt_xfs` 实现扩容解决

参考
=====

- `Kubeflow Manifests: Install with a single command <https://github.com/kubeflow/manifests#install-with-a-single-command>`_
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
kubectl get pods -n cert-manager
kubectl get pods -n istio-system
kubectl get pods -n auth
kubectl get pods -n knative-eventing
kubectl get pods -n knative-serving
kubectl get pods -n kubeflow
kubectl get pods -n kubeflow-user-example-com
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
git clone git@github.com:kubeflow/manifests.git
cd manifests

# 只需要以下单一命令进行安装
while ! kustomize build example | kubectl apply -f -; do echo "Retrying to apply resources"; sleep 10; done
2 changes: 1 addition & 1 deletion source/kubernetes/storage/k8s_local.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
.. _k8s_local:

=============================
在Kubernetes中部署local存储
Kubernetes local存储
=============================

3 changes: 3 additions & 0 deletions source/linux/storage/iscsi/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@ Linux iSCSI
.. toctree::
:maxdepth: 1

../zfs/admin/zfs_iscsi.rst
../../../ceph/rbd/ceph_iscsi/index

.. only:: subproject and html

Indices
Expand Down

0 comments on commit d2fb2f4

Please sign in to comment.