 <span style="font-size:25px">Docker</span> is an open-source platform that enables developers to automate the deployment and management of applications by utilizing containerization. Containers are lightweight, portable, and self-contained units that encapsulate software and its dependencies, ensuring consistent operation across different computing environments.

**Key Concepts and Working Principles:**
- **Images:** Docker images are read-only templates used to create containers. They contain the application code, libraries, and dependencies needed to run the application.
- **Containers:** Containers are runnable instances created from Docker images. They run isolated from the host system and can communicate with other containers.
- **Dockerfile:** Dockerfile is a script that defines the steps needed to create a Docker image. It includes instructions to build the image, configure the environment, and set up the application.
- **Registry:** Docker Registry is a repository for Docker images. Docker Hub is a popular public registry, and private registries can be set up for secure storage and sharing.

**Why Docker:**
- **Portability:** Docker containers can run on any machine that has Docker installed, providing consistency across development, testing, and production environments.
- **Isolation:** Containers isolate applications and their dependencies, avoiding conflicts and ensuring that changes made in one container don't affect others.
- **Efficiency:** Containers share the host system's OS kernel, making them lightweight and quick to start, stop, and scale.
- **Simplified Deployment:** Docker automates application deployment, making it easier to manage, update, and roll back versions.

Overall, Docker simplifies the development, deployment, and scaling of applications by providing a standardized, portable, and efficient environment through containerization.


## Kubernetes Overview<sub>(https://github.com/CUKykkim/k8s_object/blob/main/bigdata_engineering_k8s_object.ipynb)</sub>

**Kubernetes** is an open-source container orchestration platform that automates the deployment, scaling, and management of containerized applications. It provides a framework to run distributed systems resiliently.

### Key Features

1. **Automated Deployment and Scaling:**
   Kubernetes automates the deployment and scaling of application containers, ensuring efficient resource utilization and high availability.

2. **Container Orchestration:**
   It manages the orchestration of containers, automating the deployment, scaling, and load balancing across various containers.

3. **Self-Healing:**
   Kubernetes continuously monitors the state of applications and automatically restarts containers or launches new ones to replace failed instances, enhancing system resilience.

4. **Service Discovery and Load Balancing:**
   It offers automated service discovery and load balancing for applications, making it easy to manage network traffic.

5. **Configuration Management:**
   Kubernetes allows for flexible configuration management, enabling changes to be made without modifying the application's source code.

### How Kubernetes Works

Kubernetes operates based on a master-worker architecture:

- **Master Node:**
  The master node manages the Kubernetes cluster, overseeing scheduling, deployments, and scaling. It includes components like the API server, controller manager, scheduler, and etcd (a distributed key-value store for configuration data).

- **Worker Node:**
  Worker nodes host the actual applications. Each worker node runs a container runtime (e.g., Docker), a Kubelet (which communicates with the master), and a Kube Proxy (for managing network rules).

- **Pods:**
  Pods are the smallest deployable units in Kubernetes and can host one or more containers. Containers within a pod share the same network namespace, making them easy to communicate with each other.

- **Deployments:**
  Deployments allow you to describe the desired state for your application, managing the deployment and scaling of pods automatically.

### Key Concepts and Terminology

- **Pods:** The basic execution unit in Kubernetes, comprising one or more containers.
- **Services:** An abstraction to expose a set of pods as a network service.
- **ReplicaSets:** Ensures a specified number of pod replicas are running at any given time.
- **Namespace:** A way to divide cluster resources between multiple users.
- **Labels and Selectors:** Used to organize and select subsets of objects in Kubernetes.

### Why Use Kubernetes

- **Scalability:** Kubernetes facilitates scaling applications quickly to handle increased load.
- **Resilience and High Availability:** It enhances application reliability and availability through automated failover and self-healing mechanisms.
- **Resource Efficiency:** Kubernetes optimizes resource utilization, ensuring cost-effectiveness.
- **Portability:** It provides a consistent environment, allowing applications to run reliably across different infrastructure setups.

Kubernetes has become a standard for managing containerized applications, streamlining the deployment and management of modern, scalable, and highly available systems.


## K8S

### Features:
1. **Kubernetes Native**: Kubeflow is specialized in running and managing workloads on Kubernetes, leveraging the powerful capabilities and flexibility of Kubernetes.

2. **Comprehensive ML Toolkit**: Kubeflow integrates various tools and libraries for ML workflows, making it useful for creating and executing ML pipelines.

3. **Community Support**: Supported by an active community, it receives continuous updates and improvements, providing users with stability and reliability.

4. **Scalability**: Scalable in cloud, on-premises, or hybrid cloud environments, offering users various options for flexibility.

### Use Cases:
The main purposes of using Kubeflow are as follows:

- **ML Workflow Management**: Easily manage complex ML/DL workflows, efficiently handling scaling, monitoring, debugging, and tracking.

- **Enhanced Productivity**: ML engineers and data scientists can reduce model development and deployment times and work more efficiently using Kubeflow.

- **Manage Complex ML Pipelines**: Manage data preprocessing, model training, serving, and the flow between models, making it easy to construct such pipelines.

### Key Terms and Concepts:
- **Pipeline**: A structure that defines and manages each stage of an ML workflow.

- **Component**: An executable unit in an ML pipeline, often referring to containerized ML workloads.

- **Notebook**: A web-based environment used for data exploration, visualization, experimentation, and model development.

- **Artifact**: A file or object representing data, model, metadata, or various outcomes.

- **Experiment**: A unit that collects all relevant experiments and observations for a specific goal.

Kubeflow, through these features and use cases, supports efficient management and operation of ML workloads, enhancing productivity in ML/DL projects.
