# Cloud Computing Week Demo

This notebook contains short demonstrations and explanations for each topic in this week's cloud computing module. Every code cell is paired with a markdown description so you can walk through the concepts during a live session or self-paced review.

## 2.1 Cloud Computing History
The following code assembles a short timeline of major milestones that led to modern cloud computing. Use it to introduce the evolution from time-sharing mainframes to today's serverless platforms.

In [None]:
events = [
    ("1960s", "Mainframe time-sharing enables multiple users to share expensive computing resources."),
    ("1990s", "Early application service providers deliver software over the internet."),
    ("2006", "Amazon Web Services launches EC2 and S3, marking the start of mainstream IaaS."),
    ("2010s", "Cloud-native architectures, containers, and DevOps accelerate cloud adoption."),
    ("2020s", "Serverless, edge computing, and AI services become foundational cloud offerings."),
]

for period, milestone in events:
    print(f"{period:<8} - {milestone}")


## 2.2 Defining Cloud Computing
This example contrasts an on-premises deployment with a cloud-based alternative. It highlights how elasticity and pay-as-you-go pricing affect the total monthly cost of a workload.

In [None]:
from dataclasses import dataclass
from typing import List

@dataclass
class DeploymentOption:
    name: str
    base_monthly_cost: float
    cost_per_hour: float

    def monthly_cost(self, usage_hours: List[float]) -> float:
        variable = sum(hours * self.cost_per_hour for hours in usage_hours)
        return self.base_monthly_cost + variable

workload_hours = [120, 140, 160, 130]  # Weekly usage hours over a month
on_prem = DeploymentOption("On-Premises", base_monthly_cost=2800, cost_per_hour=2.5)
cloud = DeploymentOption("Cloud", base_monthly_cost=500, cost_per_hour=4.0)

for option in (on_prem, cloud):
    cost = option.monthly_cost(workload_hours)
    print(f"{option.name:12}: ${cost:,.2f}")

savings = on_prem.monthly_cost(workload_hours) - cloud.monthly_cost(workload_hours)
print(f"Monthly savings with cloud: ${savings:,.2f}")


## 2.3 Virtualization
Virtualization slices a physical host into multiple isolated virtual machines. The snippet below emulates a hypervisor that provisions VMs and tracks resource allocation.

In [None]:
from dataclasses import dataclass, field
from typing import Dict

@dataclass
class VirtualMachine:
    name: str
    vcpu: int
    memory_gb: int

@dataclass
class Hypervisor:
    total_vcpu: int
    total_memory_gb: int
    vms: Dict[str, VirtualMachine] = field(default_factory=dict)

    def allocate_vm(self, vm: VirtualMachine) -> bool:
        if self.available_vcpu() >= vm.vcpu and self.available_memory() >= vm.memory_gb:
            self.vms[vm.name] = vm
            return True
        return False

    def available_vcpu(self) -> int:
        used = sum(vm.vcpu for vm in self.vms.values())
        return self.total_vcpu - used

    def available_memory(self) -> int:
        used = sum(vm.memory_gb for vm in self.vms.values())
        return self.total_memory_gb - used

    def utilization_snapshot(self) -> Dict[str, int]:
        return {
            "allocated_vcpu": self.total_vcpu - self.available_vcpu(),
            "allocated_memory_gb": self.total_memory_gb - self.available_memory(),
        }

host = Hypervisor(total_vcpu=32, total_memory_gb=128)

for vm in [
    VirtualMachine("web", vcpu=4, memory_gb=16),
    VirtualMachine("api", vcpu=8, memory_gb=32),
    VirtualMachine("analytics", vcpu=12, memory_gb=48),
]:
    success = host.allocate_vm(vm)
    print(f"Provisioning {vm.name:10}: {'success' if success else 'insufficient capacity'}")

print("Remaining capacity:", host.available_vcpu(), "vCPU,", host.available_memory(), "GB RAM")
print("Utilization snapshot:", host.utilization_snapshot())


## 2.4 Economies of Scale
Cloud providers lower unit costs as demand grows. The code computes the cost per 10,000 requests at different traffic scales to visualize economies of scale.

In [None]:
def cost_per_unit(fixed_cost: float, variable_cost: float, requests: int) -> float:
    total_cost = fixed_cost + variable_cost * requests
    return total_cost / (requests / 10_000)

traffic_levels = [50_000, 100_000, 500_000, 1_000_000]
fixed = 1200
variable = 0.0025

for requests in traffic_levels:
    cpu = cost_per_unit(fixed, variable, requests)
    print(f"{requests:>8,} requests -> ${cpu:,.2f} per 10k requests")


## 2.5 Obstacles to Cloud Computing
Security, compliance, and organizational readiness all influence adoption. This snippet models a lightweight risk score and suggests mitigations.

In [None]:
obstacles = {
    "Security": 0.8,
    "Compliance": 0.6,
    "Vendor Lock-In": 0.5,
    "Skill Gaps": 0.7,
}

mitigations = {
    "Security": "Adopt zero-trust controls and continuous monitoring.",
    "Compliance": "Map regulatory controls to shared responsibility models.",
    "Vendor Lock-In": "Design with open standards and multi-cloud abstractions.",
    "Skill Gaps": "Invest in training and adopt managed services where possible.",
}

for name, severity in sorted(obstacles.items(), key=lambda item: item[1], reverse=True):
    print(f"{name:13} risk score: {severity:.1f} -> {mitigations[name]}")


## 2.6 Elasticity
Elastic systems right-size infrastructure as demand fluctuates. The example scales application instances up or down in response to incoming request volume.

In [None]:
from math import ceil

request_profile = [120, 250, 430, 390, 210, 160]
capacity_per_instance = 120

instance_plan = []
for minute, requests in enumerate(request_profile, start=1):
    instances_needed = max(1, ceil(requests / capacity_per_instance))
    instance_plan.append(instances_needed)
    print(f"Minute {minute}: {requests:3d} requests -> {instances_needed} instance(s)")

print("Total instance-minutes consumed:", sum(instance_plan))


## 2.7 Amazon EC2 and S3 (Conceptual Demo)
Without real AWS credentials we can still estimate costs. The code approximates a simple architecture using EC2 compute and S3 storage to show how usage drives monthly spend.

In [None]:
from dataclasses import dataclass

@dataclass
class EC2Instance:
    instance_type: str
    hourly_rate: float
    hours_per_month: int

    def monthly_cost(self) -> float:
        return self.hourly_rate * self.hours_per_month

@dataclass
class S3Bucket:
    storage_gb: float
    storage_price_per_gb: float
    requests_per_month: int
    request_price_per_1k: float

    def monthly_cost(self) -> float:
        storage_cost = self.storage_gb * self.storage_price_per_gb
        request_cost = (self.requests_per_month / 1000) * self.request_price_per_1k
        return storage_cost + request_cost

ec2 = EC2Instance("t3.medium", hourly_rate=0.0416, hours_per_month=24 * 30)
s3 = S3Bucket(storage_gb=250, storage_price_per_gb=0.023, requests_per_month=800_000, request_price_per_1k=0.0004)

total_cost = ec2.monthly_cost() + s3.monthly_cost()
print(f"EC2 monthly estimate: ${ec2.monthly_cost():.2f}")
print(f"S3 monthly estimate:  ${s3.monthly_cost():.2f}")
print(f"Combined architecture: ${total_cost:.2f}")


## 2.8 The Business of Cloud Computing
Cloud business models balance subscription revenue with infrastructure and support costs. The code projects annual gross margin under different pricing strategies.

In [None]:
def project_margin(customers: int, price_per_customer: float, infra_cost_per_customer: float, support_cost_per_customer: float) -> float:
    revenue = customers * price_per_customer
    cost = customers * (infra_cost_per_customer + support_cost_per_customer)
    return revenue - cost

scenarios = [
    {"customers": 500, "price": 49, "infra": 12, "support": 8},
    {"customers": 500, "price": 59, "infra": 12, "support": 10},
    {"customers": 750, "price": 45, "infra": 11, "support": 9},
]

for scenario in scenarios:
    margin = project_margin(
        scenario["customers"],
        scenario["price"],
        scenario["infra"],
        scenario["support"],
    )
    print(
        f"Customers: {scenario['customers']:3d}, Price: ${scenario['price']:>2} -> Annual gross margin: ${margin:,.0f}"
    )


## 2.9 Big Data Requires Parallel Computing
Large log datasets benefit from parallel processing. This demo splits log data into chunks and uses a thread pool to count errors concurrently.

In [None]:
from concurrent.futures import ThreadPoolExecutor
import random

random.seed(7)

levels = ["INFO", "DEBUG", "WARN", "ERROR"]
log_lines = [random.choice(levels) for _ in range(50_000)]
chunk_size = 5_000


def count_errors(chunk):
    return sum(1 for entry in chunk if entry == "ERROR")


chunks = [log_lines[i : i + chunk_size] for i in range(0, len(log_lines), chunk_size)]
with ThreadPoolExecutor(max_workers=4) as executor:
    error_counts = list(executor.map(count_errors, chunks))

print("Chunk-level error counts:", error_counts)
print("Total errors detected:", sum(error_counts))


## 2.10 MapReduce Example: Word Count
MapReduce splits work into map and reduce phases. The code below tokenizes documents, emits intermediate key-value pairs, and then reduces them into a global word count.

In [None]:
from collections import defaultdict
import re

documents = [
    "Cloud computing enables elastic scaling of resources.",
    "MapReduce is a programming model for processing large datasets in the cloud.",
    "Elastic workloads benefit from distributed systems like MapReduce.",
]

word_pattern = re.compile(r"[A-Za-z']+")

intermediate = defaultdict(list)
for doc in documents:
    for match in word_pattern.findall(doc.lower()):
        intermediate[match].append(1)

word_counts = {word: sum(counts) for word, counts in intermediate.items()}
for word, count in sorted(word_counts.items()):
    if count > 1:
        print(f"{word:12}: {count}")


## 2.11 General Form of a MapReduce Task
This reusable helper mirrors the generic MapReduce contract. Provide any map and reduce functions to process structured datasets.

In [None]:
from collections import defaultdict
from statistics import mean
from typing import Callable, Iterable, Any, Dict, List, Tuple

MapFn = Callable[[Any], Iterable[Tuple[str, Any]]]
ReduceFn = Callable[[str, List[Any]], Any]


def map_reduce(dataset: Iterable[Any], map_fn: MapFn, reduce_fn: ReduceFn) -> Dict[str, Any]:
    grouped: Dict[str, List[Any]] = defaultdict(list)
    for item in dataset:
        for key, value in map_fn(item):
            grouped[key].append(value)
    return {key: reduce_fn(key, values) for key, values in grouped.items()}


requests = [
    {"service": "auth", "duration_ms": 120},
    {"service": "auth", "duration_ms": 140},
    {"service": "payments", "duration_ms": 220},
    {"service": "payments", "duration_ms": 260},
    {"service": "search", "duration_ms": 95},
]


def request_mapper(event):
    yield event["service"], event["duration_ms"]


def average_reducer(service, durations):
    return round(mean(durations), 1)

average_latency = map_reduce(requests, request_mapper, average_reducer)
print("Average latency per service:")
for service, latency in average_latency.items():
    print(f"  {service:9} -> {latency} ms")


## 2.12 Hash Functions and Hash Tables
Hash tables support quick lookups in distributed systems. The class below implements open addressing with linear probing to highlight collision handling.

In [None]:
class HashTable:
    def __init__(self, size: int = 8):
        self.size = size
        self.keys = [None] * size
        self.values = [None] * size

    def _hash(self, key: str) -> int:
        return sum(ord(ch) for ch in key) % self.size

    def insert(self, key: str, value):
        index = self._hash(key)
        start_index = index
        while self.keys[index] not in (None, key):
            index = (index + 1) % self.size
            if index == start_index:
                raise RuntimeError("Hash table is full")
        self.keys[index] = key
        self.values[index] = value

    def get(self, key: str):
        index = self._hash(key)
        start_index = index
        while self.keys[index] is not None:
            if self.keys[index] == key:
                return self.values[index]
            index = (index + 1) % self.size
            if index == start_index:
                break
        raise KeyError(key)


routing_table = HashTable(size=10)
routing_table.insert("api-server", {"ip": "10.0.0.5"})
routing_table.insert("worker-01", {"ip": "10.0.0.21"})
routing_table.insert("worker-02", {"ip": "10.0.0.22"})

print("Lookup api-server ->", routing_table.get("api-server"))
print("Lookup worker-02 ->", routing_table.get("worker-02"))


## 2.13 MapReduce Details
Combiners reduce shuffle volume by aggregating mapper outputs locally. The snippet compares raw mapper emissions with the optimized combiner workflow.

In [None]:
documents = [
    "server1 ERROR disk full",
    "server2 INFO ok",
    "server1 ERROR disk full",
    "server3 WARN latency",
]


def mapper(log_line):
    server, level, *_ = log_line.split()
    yield (server, 1)


def combiner(emissions):
    partial = defaultdict(int)
    for key, value in emissions:
        partial[key] += value
    return list(partial.items())


# Mapper emissions without combiner
raw_emissions = [emission for line in documents for emission in mapper(line)]
print("Raw mapper emissions:", raw_emissions)

# With combiner applied per document source (e.g., per server)
combined = []
for server in {line.split()[0] for line in documents}:
    server_lines = [line for line in documents if line.startswith(server)]
    emissions = [emission for line in server_lines for emission in mapper(line)]
    combined.extend(combiner(emissions))

print("After combiner (reduced shuffle):", combined)


## 2.14 MapReduce Details (Partitioning)
Partitioners decide which reducer processes each key. Here, keys are routed based on their first letter to simulate custom partitioning strategies.

In [None]:
def partitioner(key: str, num_reducers: int) -> int:
    return (ord(key[0].lower()) - ord('a')) % num_reducers

keys = ["alpha", "beta", "gamma", "delta", "epsilon", "zeta"]
assignments = {key: partitioner(key, num_reducers=3) for key in keys}

print("Reducer assignments:")
for key, reducer_id in assignments.items():
    print(f"  Key '{key}' -> reducer {reducer_id}")


## 2.15 MapReduce Implementation in Hadoop
While Hadoop jobs run on Java, we can simulate the execution flow in Python. The code prints a simplified sequence of Hadoop job stages along with configuration parameters.

In [None]:
job_steps = [
    "1. Client submits job configuration to ResourceManager",
    "2. Input data is split and distributed across HDFS blocks",
    "3. Map tasks execute on DataNodes close to the data",
    "4. Intermediate output is shuffled and sorted",
    "5. Reduce tasks aggregate results and write to HDFS",
    "6. Job history is recorded for monitoring and auditing",
]

for step in job_steps:
    print(step)

job_configuration = {
    "mapreduce.job.reduces": 4,
    "mapreduce.input.fileinputformat.split.maxsize": "128MB",
    "mapreduce.output.fileoutputformat.compress": True,
}

print()  # Add a blank line before showing configuration values
print("Sample Hadoop configuration parameters:")
for key, value in job_configuration.items():
    print(f"  {key}: {value}")
