In [None]:
TOPIC: Docker
1. Scenario: You are building a microservices-based application using Docker. Design a Docker Compose file that sets up three containers: a web server container, a database container, and a cache container. Ensure that the containers can communicate with each other properly.
Design a Docker Compose file that sets up three containers: a web server, database, and cache. These containers need to communicate with each other properly.

version: '3.8'

services:
  web:
    image: nginx:latest
    ports:
      - "8080:80"
    depends_on:
      - db
      - cache
    networks:
      - app-network

  db:
    image: mysql:5.7
    environment:
      MYSQL_ROOT_PASSWORD: rootpassword
      MYSQL_DATABASE: myapp_db
      MYSQL_USER: user
      MYSQL_PASSWORD: password
    ports:
      - "3306:3306"
    networks:
      - app-network

  cache:
    image: redis:latest
    ports:
      - "6379:6379"
    networks:
      - app-network

networks:
  app-network:
    driver: bridge

The web server (NGINX), database (MySQL), and cache (Redis) containers are part of the same network app-network, allowing them to communicate internally.
The depends_on option ensures that the web container starts after the database and cache containers are ready

2. Scenario: You want to scale your Docker containers dynamically based on the incoming traffic. Write a Python script that utilizes Docker SDK to monitor the CPU usage of a container and automatically scales the number of replicas based on a threshold.

Write a Python script that uses the Docker SDK to monitor CPU usage and automatically scale replicas based on a threshold.

import docker
import time

# Initialize Docker client
client = docker.from_env()

def get_container_cpu_usage(container_name):
    container = client.containers.get(container_name)
    stats = container.stats(stream=False)
    cpu_delta = stats['cpu_stats']['cpu_usage']['total_usage'] - stats['precpu_stats']['cpu_usage']['total_usage']
    system_delta = stats['cpu_stats']['system_cpu_usage'] - stats['precpu_stats']['system_cpu_usage']
    num_cpus = len(stats['cpu_stats']['cpu_usage']['percpu_usage'])
    if system_delta > 0.0 and cpu_delta > 0.0:
        cpu_percentage = (cpu_delta / system_delta) * num_cpus * 100.0
        return cpu_percentage
    return 0.0

def scale_service(service_name, replicas):
    service = client.services.get(service_name)
    current_replica_count = service.attrs['Spec']['Mode']['Replicated']['Replicas']
    if current_replica_count != replicas:
        print(f"Scaling service {service_name} to {replicas} replicas...")
        service.update(mode={'Replicated': {'Replicas': replicas}})

def monitor_and_scale(container_name, service_name, threshold, max_replicas):
    while True:
        cpu_usage = get_container_cpu_usage(container_name)
        print(f"CPU usage for {container_name}: {cpu_usage:.2f}%")

        if cpu_usage > threshold:
            scale_service(service_name, min(max_replicas, current_replicas + 1))
        elif cpu_usage < threshold / 2 and current_replicas > 1:
            scale_service(service_name, max(1, current_replicas - 1))

        time.sleep(10)

# Set the container to monitor and service to scale
container_name = "web_container"
service_name = "web_service"
cpu_threshold = 70.0
max_replicas = 10

# Monitor and scale
monitor_and_scale(container_name, service_name, cpu_threshold, max_replicas)

Python Script for Docker Auto-scaling:

import docker
import time

# Initialize Docker client
client = docker.from_env()

def get_container_cpu_usage(container_name):
    container = client.containers.get(container_name)
    stats = container.stats(stream=False)
    cpu_delta = stats['cpu_stats']['cpu_usage']['total_usage'] - stats['precpu_stats']['cpu_usage']['total_usage']
    system_delta = stats['cpu_stats']['system_cpu_usage'] - stats['precpu_stats']['system_cpu_usage']
    num_cpus = len(stats['cpu_stats']['cpu_usage']['percpu_usage'])
    if system_delta > 0.0 and cpu_delta > 0.0:
        cpu_percentage = (cpu_delta / system_delta) * num_cpus * 100.0
        return cpu_percentage
    return 0.0

def scale_service(service_name, replicas):
    service = client.services.get(service_name)
    current_replica_count = service.attrs['Spec']['Mode']['Replicated']['Replicas']
    if current_replica_count != replicas:
        print(f"Scaling service {service_name} to {replicas} replicas...")
        service.update(mode={'Replicated': {'Replicas': replicas}})

def monitor_and_scale(container_name, service_name, threshold, max_replicas):
    while True:
        cpu_usage = get_container_cpu_usage(container_name)
        print(f"CPU usage for {container_name}: {cpu_usage:.2f}%")

        if cpu_usage > threshold:
            scale_service(service_name, min(max_replicas, current_replicas + 1))
        elif cpu_usage < threshold / 2 and current_replicas > 1:
            scale_service(service_name, max(1, current_replicas - 1))

        time.sleep(10)

# Set the container to monitor and service to scale
container_name = "web_container"
service_name = "web_service"
cpu_threshold = 70.0
max_replicas = 10

# Monitor and scale
monitor_and_scale(container_name, service_name, cpu_threshold, max_replicas)

This script monitors the CPU usage of a container and scales a Docker service (via Docker Swarm) if usage exceeds a certain threshold.

3. Scenario: You have a Docker image stored on a private registry. Develop a script in Bash that authenticates with the registry, pulls the latest version of the image, and runs a container based on that image.

#!/bin/bash

# Docker registry credentials
REGISTRY_URL="myregistry.com"
USERNAME="myusername"
PASSWORD="mypassword"
IMAGE_NAME="myregistry.com/myimage:latest"

# Authenticate with the private Docker registry
echo "Authenticating with Docker registry..."
docker login $REGISTRY_URL -u $USERNAME -p $PASSWORD

if [ $? -ne 0 ]; then
  echo "Failed to authenticate!"
  exit 1
fi

# Pull the latest version of the image
echo "Pulling the latest image..."
docker pull $IMAGE_NAME

# Run a new container based on the image
echo "Running the container..."
docker run -d --name mycontainer $IMAGE_NAME

This script logs in to a private Docker registry, pulls the latest version of the specified image, and runs a container

TOPIC: Airflow
1. Scenario: You have a data pipeline that requires executing a shell command as part of a task. Create an Airflow DAG that includes a BashOperator to execute a specific shell command.
Airflow DAG with BashOperator:
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import datetime

default_args = {
    'owner': 'airflow',
    'start_date': datetime(2023, 1, 1),
}

with DAG('bash_operator_example', default_args=default_args, schedule_interval='@daily') as dag:
    run_shell_command = BashOperator(
        task_id='run_shell_command',
        bash_command='echo "Running shell command in Airflow!"'
    )

This DAG runs a simple shell command using the BashOperator.

2. Scenario: You want to create dynamic tasks in Airflow based on a list of inputs. Design an Airflow DAG that generates tasks dynamically using PythonOperator, where each task processes an element from the input list.

Dynamic Task Creation in Airflow DAG:

from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from datetime import datetime

default_args = {
    'owner': 'airflow',
    'start_date': datetime(2023, 1, 1),
}

def process_data(item):
    print(f"Processing {item}")

input_list = ['item1', 'item2', 'item3']

with DAG('dynamic_task_example', default_args=default_args, schedule_interval='@daily') as dag:
    for item in input_list:
        task = PythonOperator(
            task_id=f'process_{item}',
            python_callable=process_data,
            op_args=[item]
        )
This DAG creates a task for each element in input_list using the PythonOperator.

3. Scenario: You need to set up a complex task dependency in Airflow, where Task B should start only if Task A has successfully completed. Implement this dependency using the "TriggerDagRunOperator" in Airflow.

from airflow import DAG
from airflow.operators.trigger_dagrun import TriggerDagRunOperator
from airflow.operators.dummy_operator import DummyOperator
from datetime import datetime

default_args = {
    'owner': 'airflow',
    'start_date': datetime(2023, 1, 1),
}

with DAG('trigger_dagrun_example', default_args=default_args, schedule_interval='@daily') as dag:
    task_a = DummyOperator(task_id='task_a')

    trigger_b = TriggerDagRunOperator(
        task_id='trigger_task_b',
        trigger_dag_id='target_dag',
    )

    task_a >> trigger_b

    This DAG triggers another DAG (target_dag) after task_a completes.

TOPIC: Sqoop
1. Scenario: You want to import data from an Oracle database into Hadoop using Sqoop, but you only need to import specific columns from a specific table. Write a Sqoop command that performs the import, including the necessary arguments for column selection and table mapping.
sqoop import \
  --connect jdbc:oracle:thin:@//your_oracle_server:1521/your_db \
  --username your_username \
  --password your_password \
  --table your_table_name \
  --columns "column1,column2,column3" \
  --target-dir /user/hadoop/your_output_dir

2. Scenario: You have a requirement to perform an incremental import of data from a MySQL database into Hadoop using Sqoop. Design a Sqoop command that imports only the new or updated records since the last import.

sqoop import \
  --connect jdbc:mysql://your_mysql_server:3306/your_db \
  --username your_username \
  --password your_password \
  --table your_table_name \
  --incremental append \
  --check-column last_modified_column \
  --last-value '2023-01-01' \
  --target-dir /user/hadoop/your_output_dir


3. Scenario: You need to export data from Hadoop to a Microsoft SQL Server database using Sqoop. Develop a Sqoop command that exports the data, considering factors like database connection details, table mapping, and appropriate data types.

sqoop export \
  --connect jdbc:sqlserver://your_sql_server:1433;databaseName=your_db \
  --username your_username \
  --password your_password \
  --table your_table_name \
  --export-dir /user/hadoop/your_hadoop_data \
  --input-fields-terminated-by ',' \
  --update-mode allowinsert

These Sqoop commands handle various data integration scenarios, from Oracle to MySQL and SQL Server.