TOPIC: Docker
1. Scenario: You are building a microservices-based application using Docker. Design a Docker Compose file that sets up three containers: a web server container, a database container, and a cache container. Ensure that the containers can communicate with each other properly.
2. Scenario: You want to scale your Docker containers dynamically based on the incoming traffic. Write a Python script that utilizes Docker SDK to monitor the CPU usage of a container and automatically scales the number of replicas based on a threshold.
3. Scenario: You have a Docker image stored on a private registry. Develop a script in Bash that authenticates with the registry, pulls the latest version of the image, and runs a container based on that image.


In [None]:
version: '3'
services:
  webserver:
    image: nginx:latest
    ports:
      - 80:80
    depends_on:
      - database
      - cache

  database:
    image: mysql:latest
    environment:
      - MYSQL_ROOT_PASSWORD=your_root_password
      - MYSQL_DATABASE=your_database
      - MYSQL_USER=your_user
      - MYSQL_PASSWORD=your_password

  cache:
    image: redis:latest


In [None]:
docker-compose up


In [None]:
import docker
import psutil

# Docker SDK client
client = docker.from_env()

# Container details
container_name = "your_container_name"
threshold = 70  # CPU usage threshold in percentage

# Get the container object
container = client.containers.get(container_name)

while True:
    # Get the current CPU usage of the container
    cpu_percent = psutil.cpu_percent(interval=1)

    # Scale the number of replicas based on the threshold
    if cpu_percent > threshold:
        container.scale(2)  # Scale up to 2 replicas
        print("Scaling up...")
    else:
        container.scale(1)  # Scale down to 1 replica
        print("Scaling down...")

    # Sleep for a specific interval before checking again
    time.sleep(10)


In [None]:
#!/bin/bash

# Registry authentication details
registry_url="your_registry_url"
registry_username="your_username"
registry_password="your_password"

# Docker image details
image_name="your_image_name"
container_name="your_container_name"

# Authenticate with the private registry
docker login "$registry_url" --username "$registry_username" --password "$registry_password"

# Pull the latest version of the image from the registry
docker pull "$registry_url/$image_name"

# Run a container based on the image
docker run --name "$container_name" -d "$registry_url/$image_name"


TOPIC: Airflow
1. Scenario: You have a data pipeline that requires executing a shell command as part of a task. Create an Airflow DAG that includes a BashOperator to execute a specific shell command.
2. Scenario: You want to create dynamic tasks in Airflow based on a list of inputs. Design an Airflow DAG that generates tasks dynamically using PythonOperator, where each task processes an element from the input list.
3. Scenario: You need to set up a complex task dependency in Airflow, where Task B should start only if Task A has successfully completed. Implement this dependency using the "TriggerDagRunOperator" in Airflow.


In [None]:
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import datetime

# Define the DAG
dag = DAG(
    dag_id='execute_shell_command',
    start_date=datetime(2022, 1, 1),
    schedule_interval=None
)

# Define the BashOperator task
execute_command_task = BashOperator(
    task_id='execute_command',
    bash_command='your_shell_command',
    dag=dag
)


In [None]:
from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from datetime import datetime

# List of inputs
input_list = ['input1', 'input2', 'input3']

# Function to process an input
def process_input(input):
    # Process the input
    print(f"Processing input: {input}")

# Define the DAG
dag = DAG(
    dag_id='dynamic_tasks',
    start_date=datetime(2022, 1, 1),
    schedule_interval=None
)

# Generate tasks dynamically
for input in input_list:
    task_id = f"process_input_{input}"
    task = PythonOperator(
        task_id=task_id,
        python_callable=process_input,
        op_args=[input],
        dag=dag
    )


In [None]:
from airflow import DAG
from airflow.operators.dagrun_operator import TriggerDagRunOperator
from datetime import datetime

# Define the DAG
dag = DAG(
    dag_id='task_dependency',
    start_date=datetime(2022, 1, 1),
    schedule_interval=None
)

# Define Task A
task_a = ...

# Define Task B
task_b = ...

# Define the TriggerDagRunOperator
trigger_task_b = TriggerDagRunOperator(
    task_id='trigger_task_b',
    trigger_dag_id='task_dependency',
    execution_date="{{ execution_date }}",
    dag=dag
)

# Set the task dependencies
task_a >> trigger_task_b >> task_b


TOPIC: Sqoop
1. Scenario: You want to import data from an Oracle database into Hadoop using Sqoop, but you only need to import specific columns from a specific table. Write a Sqoop command that performs the import, including the necessary arguments for column selection and table mapping.
2. Scenario: You have a requirement to perform an incremental import of data from a MySQL database into Hadoop using Sqoop. Design a Sqoop command that imports only the new or updated records since the last import.
3. Scenario: You need to export data from Hadoop to a Microsoft SQL Server database using Sqoop. Develop a Sqoop command that exports the data, considering factors like database connection details, table mapping, and appropriate data types.


In [None]:
sqoop import \
  --connect jdbc:oracle:thin:@<oracle_host>:<oracle_port>:<oracle_sid> \
  --username <username> \
  --password <password> \
  --table <table_name> \
  --columns "<column1>,<column2>,<column3>" \
  --target-dir <target_directory>


In [None]:
sqoop import \
  --connect jdbc:mysql://<mysql_host>:<mysql_port>/<database_name> \
  --username <username> \
  --password <password> \
  --table <table_name> \
  --incremental append \
  --check-column <column_name> \
  --last-value <last_imported_value> \
  --target-dir <target_directory>


In [None]:
sqoop export \
  --connect "jdbc:sqlserver://<sqlserver_host>:<sqlserver_port>;database=<database_name>" \
  --username <username> \
  --password <password> \
  --table <table_name> \
  --export-dir <hadoop_directory> \
  --input-fields-terminated-by ',' \
  --input-lines-terminated-by '\n' \
  --input-null-string '\\N' \
  --input-null-non-string '\\N'
