TOPIC: Docker
1. Scenario: You are building a microservices-based application using Docker. Design a Docker Compose file that sets up three containers: a web server container, a database container, and a cache container. Ensure that the containers can communicate with each other properly.
2. Scenario: You want to scale your Docker containers dynamically based on the incoming traffic. Write a Python script that utilizes Docker SDK to monitor the CPU usage of a container and automatically scales the number of replicas based on a threshold.
3. Scenario: You have a Docker image stored on a private registry. Develop a script in Bash that authenticates with the registry, pulls the latest version of the image, and runs a container based on that image.


In [None]:
# Designing a Docker Compose File for Microservices:

version: '3'
services:
  webserver:
    image: your-webserver-image
    ports:
      - 80:80
    depends_on:
      - database
      - cache

  database:
    image: your-database-image
    ports:
      - 3306:3306

  cache:
    image: your-cache-image
    ports:
      - 6379:6379


In [None]:
import docker
import time

def monitor_cpu_usage(container_id):
    client = docker.from_env()
    container = client.containers.get(container_id)
    while True:
        stats = container.stats(stream=False)
        cpu_usage = stats['cpu_stats']['cpu_usage']['total_usage']
        cpu_limit = stats['cpu_stats']['cpu_usage']['max_usage']
        cpu_percentage = (cpu_usage / cpu_limit) * 100

        print(f"CPU Usage: {cpu_percentage}%")

        if cpu_percentage > 80:
            scale_up()  # Call the function to scale up the number of replicas
        elif cpu_percentage < 50:
            scale_down()  # Call the function to scale down the number of replicas

        time.sleep(10)  # Monitor every 10 seconds

def scale_up():
    # Code to scale up the number of replicas
    # Replace this with your actual scaling logic

def scale_down():
    # Code to scale down the number of replicas
    # Replace this with your actual scaling logic

# Example usage
container_id = 'your-container-id'  # Replace with the ID of your Docker container
monitor_cpu_usage(container_id)


In [None]:
#!/bin/bash

REGISTRY_URL="your-registry-url"
IMAGE_NAME="your-image-name"
CONTAINER_NAME="your-container-name"

# Authenticate with the private registry
docker login -u <username> -p <password> $REGISTRY_URL

# Pull the latest version of the image
docker pull $REGISTRY_URL/$IMAGE_NAME:latest

# Run a container based on the image
docker run -d --name $CONTAINER_NAME $REGISTRY_URL/$IMAGE_NAME:latest



TOPIC: Airflow
1. Scenario: You have a data pipeline that requires executing a shell command as part of a task. Create an Airflow DAG that includes a BashOperator to execute a specific shell command.
2. Scenario: You want to create dynamic tasks in Airflow based on a list of inputs. Design an Airflow DAG that generates tasks dynamically using PythonOperator, where each task processes an element from the input list.
3. Scenario: You need to set up a complex task dependency in Airflow, where Task B should start only if Task A has successfully completed. Implement this dependency using the "TriggerDagRunOperator" in Airflow.


In [None]:
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import datetime

default_args = {
    'owner': 'airflow',
    'start_date': datetime(2023, 1, 1),
}

dag = DAG(
    'shell_command_dag',
    default_args=default_args,
    schedule_interval='0 0 * * *'  # Run once daily at midnight
)

bash_task = BashOperator(
    task_id='execute_shell_command',
    bash_command='echo "Hello, World!"',  # Replace with your shell command
    dag=dag
)


In [None]:
from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from datetime import datetime

default_args = {
    'owner': 'airflow',
    'start_date': datetime(2023, 1, 1),
}

dag = DAG(
    'dynamic_tasks_dag',
    default_args=default_args,
    schedule_interval=None  # Disable automatic scheduling
)

def process_input(input_value):
    # Code to process the input value
    print(f"Processing input: {input_value}")

input_list = [1, 2, 3, 4, 5]  # Replace with your list of inputs

for input_value in input_list:
    task_id = f'process_input_{input_value}'
    task = PythonOperator(
        task_id=task_id,
        python_callable=process_input,
        op_kwargs={'input_value': input_value},
        dag=dag
    )


In [None]:
from airflow import DAG
from airflow.operators.dagrun_operator import TriggerDagRunOperator
from datetime import datetime

default_args = {
    'owner': 'airflow',
    'start_date': datetime(2023, 1, 1),
}

dag = DAG(
    'complex_dependency_dag',
    default_args=default_args,
    schedule_interval=None  # Disable automatic scheduling
)

def dummy_function():
    pass

trigger_task = TriggerDagRunOperator(
    task_id='trigger_task',
    trigger_dag_id='subdag_id',
    dag=dag
)

dummy_task = PythonOperator(
    task_id='dummy_task',
    python_callable=dummy_function,
    dag=dag
)

trigger_task >> dummy_task



TOPIC: Sqoop
1. Scenario: You want to import data from an Oracle database into Hadoop using Sqoop, but you only need to import specific columns from a specific table. Write a Sqoop command that performs the import, including the necessary arguments for column selection and table mapping.
2. Scenario: You have a requirement to perform an incremental import of data from a MySQL database into Hadoop using Sqoop. Design a Sqoop command that imports only the new or updated records since the last import.
3. Scenario: You need to export data from Hadoop to a Microsoft SQL Server database using Sqoop. Develop a Sqoop command that exports the data, considering factors like database connection details, table mapping, and appropriate data types.


In [None]:
sqoop import \
  --connect jdbc:oracle:thin:@<host>:<port>:<database> \
  --username <username> \
  --password <password> \
  --table <table_name> \
  --columns "<column1>,<column2>,<column3>" \
  --target-dir <target_directory>


In [None]:
sqoop import \
  --connect jdbc:mysql://<host>:<port>/<database> \
  --username <username> \
  --password <password> \
  --table <table_name> \
  --incremental append \
  --check-column <column_name> \
  --last-value <last_imported_value> \
  --target-dir <target_directory>


In [None]:
sqoop export \
  --connect "jdbc:sqlserver://<server_name>:<port>;database=<database_name>" \
  --username <username> \
  --password <password> \
  --table <table_name> \
  --export-dir <source_directory> \
  --input-fields-terminated-by ',' \
  --input-lines-terminated-by '\n'
