TOPIC: Docker
1. Scenario: You are building a microservices-based application using Docker. Design a Docker Compose file that sets up three containers: a web server container, a database container, and a cache container. Ensure that the containers can communicate with each other properly.
2. Scenario: You want to scale your Docker containers dynamically based on the incoming traffic. Write a Python script that utilizes Docker SDK to monitor the CPU usage of a container and automatically scales the number of replicas based on a threshold.
3. Scenario: You have a Docker image stored on a private registry. Develop a script in Bash that authenticates with the registry, pulls the latest version of the image, and runs a container based on that image.


In [None]:
version: '3'
services:
  web:
    build: ./web
    ports:
      - "80:80"
    depends_on:
      - database
      - cache
  database:
    image: mysql:latest
    environment:
      - MYSQL_ROOT_PASSWORD=your_password
  cache:
    image: redis:latest

import docker
import psutil

client = docker.from_env()

def get_container_cpu_percent(container_id):
    container = client.containers.get(container_id)
    stats = container.stats(stream=False)
    cpu_stats = stats['cpu_stats']
    cpu_usage = cpu_stats['cpu_usage']['total_usage']
    system_cpu_usage = cpu_stats['system_cpu_usage']
    cpu_percent = (cpu_usage / system_cpu_usage) * 100
    return cpu_percent

def scale_containers(container_id, replicas, threshold):
    cpu_percent = get_container_cpu_percent(container_id)
    if cpu_percent > threshold:
        print(f"CPU usage ({cpu_percent}%) exceeded the threshold. Scaling up...")
        scale_up(container_id, replicas + 1)
    else:
        print(f"CPU usage ({cpu_percent}%) is below the threshold. No scaling required.")

def scale_up(container_id, replicas):
    container = client.containers.get(container_id)
    container.scale(replicas)

container_id = 'your_container_id'
replicas = 3
threshold = 80

scale_containers(container_id, replicas, threshold)


In [None]:
import docker
import psutil

client = docker.from_env()

def get_container_cpu_percent(container_id):
    container = client.containers.get(container_id)
    stats = container.stats(stream=False)
    cpu_stats = stats['cpu_stats']
    cpu_usage = cpu_stats['cpu_usage']['total_usage']
    system_cpu_usage = cpu_stats['system_cpu_usage']
    cpu_percent = (cpu_usage / system_cpu_usage) * 100
    return cpu_percent

def scale_containers(container_id, replicas, threshold):
    cpu_percent = get_container_cpu_percent(container_id)
    if cpu_percent > threshold:
        print(f"CPU usage ({cpu_percent}%) exceeded the threshold. Scaling up...")
        scale_up(container_id, replicas + 1)
    else:
        print(f"CPU usage ({cpu_percent}%) is below the threshold. No scaling required.")

def scale_up(container_id, replicas):
    container = client.containers.get(container_id)
    container.scale(replicas)

container_id = 'your_container_id'
replicas = 3
threshold = 80

scale_containers(container_id, replicas, threshold)


In [None]:

DOCKER_REGISTRY_URL="your_registry_url"
DOCKER_REGISTRY_USERNAME="your_username"
DOCKER_REGISTRY_PASSWORD="your_password"
DOCKER_IMAGE_NAME="your_image_name"

docker login -u "$DOCKER_REGISTRY_USERNAME" -p "$DOCKER_REGISTRY_PASSWORD" "$DOCKER_REGISTRY_URL"
docker pull "$DOCKER_REGISTRY_URL/$DOCKER_IMAGE_NAME"
docker run -d "$DOCKER_REGISTRY_URL/$DOCKER_IMAGE_NAME"


TOPIC: Airflow
1. Scenario: You have a data pipeline that requires executing a shell command as part of a task. Create an Airflow DAG that includes a BashOperator to execute a specific shell command.
2. Scenario: You want to create dynamic tasks in Airflow based on a list of inputs. Design an Airflow DAG that generates tasks dynamically using PythonOperator, where each task processes an element from the input list.
3. Scenario: You need to set up a complex task dependency in Airflow, where Task B should start only if Task A has successfully completed. Implement this dependency using the "TriggerDagRunOperator" in Airflow.


In [None]:
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import datetime

dag = DAG(
    'execute_shell_command',
    description='DAG to execute a shell command',
    schedule_interval=None,
    start_date=datetime(2023, 7, 18),
    catchup=False
)

execute_command = BashOperator(
    task_id='execute_command',
    bash_command='your_shell_command_here',
    dag=dag
)


In [None]:
from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from datetime import datetime

dag = DAG(
    'dynamic_tasks',
    description='DAG to generate tasks dynamically',
    schedule_interval=None,
    start_date=datetime(2023, 7, 18),
    catchup=False
)

def process_input(input_value):
    print(f"Processing input: {input_value}")

input_list = ['input1', 'input2', 'input3']

for input_value in input_list:
    task_id = f'task_{input_value}'
    task = PythonOperator(
        task_id=task_id,
        python_callable=process_input,
        op_kwargs={'input_value': input_value},
        dag=dag
    )


In [None]:
from airflow import DAG
from airflow.operators.dagrun_operator import TriggerDagRunOperator
from datetime import datetime

dag = DAG(
    'complex_task_dependency',
    description='DAG with complex task dependency',
    schedule_interval=None,
    start_date=datetime(2023, 7, 18),
    catchup=False
)

trigger_dag_run = TriggerDagRunOperator(
    task_id='trigger_dag_run',
    trigger_dag_id='target_dag_id',
    dag=dag
)

task_a = DummyOperator(
    task_id='task_a',
    dag=dag
)

task_b = DummyOperator(
    task_id='task_b',
    dag=dag
)

task_a >> trigger_dag_run
trigger_dag_run >> task_b


TOPIC: Sqoop
1. Scenario: You want to import data from an Oracle database into Hadoop using Sqoop, but you only need to import specific columns from a specific table. Write a Sqoop command that performs the import, including the necessary arguments for column selection and table mapping.
2. Scenario: You have a requirement to perform an incremental import of data from a MySQL database into Hadoop using Sqoop. Design a Sqoop command that imports only the new or updated records since the last import.
3. Scenario: You need to export data from Hadoop to a Microsoft SQL Server database using Sqoop. Develop a Sqoop command that exports the data, considering factors like database connection details, table mapping, and appropriate data types.


In [None]:
sqoop import \
--connect jdbc:oracle:thin:@<oracle_host>:<oracle_port>:<oracle_sid> \
--username <username> \
--password <password> \
--table <table_name> \
--columns "<column1>,<column2>,<column3>" \
--target-dir <target_directory>


In [None]:
sqoop import \
--connect jdbc:mysql://<mysql_host>/<database_name> \
--username <username> \
--password <password> \
--table <table_name> \
--incremental append \
--check-column <timestamp_column> \
--last-value <last_import_timestamp> \
--target-dir <target_directory>


In [None]:
sqoop export \
--connect "jdbc:sqlserver://<sql_server_host>:<sql_server_port>;database=<database_name>" \
--username <username> \
--password <password> \
--table <table_name> \
--export-dir <export_directory> \
--input-fields-terminated-by ',' \
--input-lines-terminated-by '\n' \
--input-null-string '\\N' \
--input-null-non-string '\\N'
