I am looking to simulate a data stream that emits randomly-sized record batches every 25 seconds, running continuously for 2 minutes. Below is an enhanced Python script that does exactly that, with the customizable record structure:

# Real-world Data Structure for Streaming - Sample
Streaming data often simulates real-time systems where data is generated continuously or in bursts, and the structure depends on the domain. Here are real-world examples of streaming record structures, across different industries:

1. Web Analytics (User Behavior)
Use case: Tracking user actions on a website in real-time (like Google Analytics)

```json
{
  "event_id": "e123456",
  "user_id": "u78910",
  "event_type": "page_view",
  "timestamp": "2025-05-22T15:24:00Z",
  "url": "/product/123",
  "referrer": "/home",
  "device": "mobile",
  "location": "New York, USA"
}
```
🔁 Often streamed to tools like Kafka, Kinesis, or Google Pub/Sub.


2. IoT Sensor Data (Smart Factory / Agriculture / Vehicles)
Use case: Real-time sensor monitoring
```json
{
  "device_id": "sensor-001",
  "timestamp": "2025-05-22T15:25:12Z",
  "temperature": 23.5,
  "humidity": 45.2,
  "pressure": 1013,
  "status": "active"
}
```

🔧 Common in predictive maintenance and environment monitoring.

3. Financial Transactions (FinTech / Banking)
Use case: Fraud detection, real-time auditing
```json
{
  "transaction_id": "tx987654",
  "account_id": "acc123456",
  "timestamp": "2025-05-22T15:26:45Z",
  "amount": 1500.75,
  "currency": "USD",
  "merchant": "Amazon",
  "transaction_type": "debit",
  "location": "San Francisco, USA"
}
```
🔍 Streamed for fraud detection and real-time dashboards.

4. Ride-Sharing / Delivery Tracking
Use case: Location tracking and ETA prediction (like Uber or DoorDash)
```json
{
  "ride_id": "ride123",
  "driver_id": "driver456",
  "timestamp": "2025-05-22T15:27:30Z",
  "location": {
    "lat": 37.7749,
    "lon": -122.4194
  },
  "speed_kmph": 45,
  "status": "en_route"
}
```
📡 GPS updates streamed every few seconds.

5. Healthcare Monitoring (Wearables / Patient Vitals)
Use case: Real-time health alert systems
```json
{
  "patient_id": "p00123",
  "timestamp": "2025-05-22T15:28:00Z",
  "heart_rate": 72,
  "blood_pressure": "120/80",
  "oxygen_level": 98,
  "device_status": "ok"
}
```
❤️ Used in remote patient monitoring systems.



In [1]:
import random

# Option 1
- random.randint(10, 50): Chooses how many records to generate this iteration.

- Each record has a unique id using the iteration and record number (e.g., "3-7").

- You can customize the record structure as needed.

In [2]:
def generate_records(n):
    all_records = []  # This will store all generated records for all iterations

    for i in range(1, n + 1):
        num_records = random.randint(10, 50)
        print(f"Iteration {i}: Generating {num_records} records.")

        records = [
            {"id": f"{i}-{j}", "value": random.randint(100, 999)}
            for j in range(1, num_records + 1)
        ]

        print(f"Generated records: {records}")

        all_records.extend(records)
        # Optional: print records for each iteration
        # for record in records:
        #     print(record)

    return all_records


In [3]:
# Example usage
n = 5
all_data = generate_records(n)
print(f"\nTotal records generated: {len(all_data)}")


Iteration 1: Generating 29 records.
Generated records: [{'id': '1-1', 'value': 150}, {'id': '1-2', 'value': 757}, {'id': '1-3', 'value': 159}, {'id': '1-4', 'value': 686}, {'id': '1-5', 'value': 522}, {'id': '1-6', 'value': 109}, {'id': '1-7', 'value': 473}, {'id': '1-8', 'value': 982}, {'id': '1-9', 'value': 829}, {'id': '1-10', 'value': 558}, {'id': '1-11', 'value': 138}, {'id': '1-12', 'value': 355}, {'id': '1-13', 'value': 299}, {'id': '1-14', 'value': 594}, {'id': '1-15', 'value': 298}, {'id': '1-16', 'value': 787}, {'id': '1-17', 'value': 662}, {'id': '1-18', 'value': 318}, {'id': '1-19', 'value': 134}, {'id': '1-20', 'value': 680}, {'id': '1-21', 'value': 835}, {'id': '1-22', 'value': 299}, {'id': '1-23', 'value': 311}, {'id': '1-24', 'value': 229}, {'id': '1-25', 'value': 940}, {'id': '1-26', 'value': 978}, {'id': '1-27', 'value': 259}, {'id': '1-28', 'value': 526}, {'id': '1-29', 'value': 421}]
Iteration 2: Generating 43 records.
Generated records: [{'id': '2-1', 'value': 452}

# Option 2
✅ Features Implemented:
- Runs for 2 minutes (duration = 120 seconds)

- Emits data every 25 seconds (interval = 25 seconds)

- Generates 10–50 records per batch

- Each record includes: id, name, timestamp, age, salary, and job_title


✅ Customizing It:
- You can easily change:

- Duration (simulate_streaming(duration=...))

- Emission interval (interval=...))

- Record structure in generate_record()

In [4]:
import random
import time
from datetime import datetime

In [5]:
# Sample data for name and job title generation
names = ["Alice", "Bob", "Charlie", "Diana", "Eve", "Frank", "Grace", "Heidi", "Ivan", "Judy"]
job_titles = ["Engineer", "Manager", "Analyst", "Developer", "Consultant", "Designer", "Architect"]


In [6]:
def generate_record(batch_num, record_num):
    return {
        "id": f"{batch_num}-{record_num}",
        "name": random.choice(names),
        "timestamp": datetime.utcnow().isoformat(),
        "age": random.randint(22, 60),
        "salary": random.randint(50000, 150000),
        "job_title": random.choice(job_titles)
    }

In [None]:
def simulate_streaming(duration=120, interval=25):
    start_time = time.time()
    batch_num = 1

    while (time.time() - start_time) < duration:
        num_records = random.randint(10, 50)
        print(f"\nBatch {batch_num}: Generating {num_records} records at {datetime.now().isoformat()}")

        records = [generate_record(batch_num, i + 1) for i in range(num_records)]

        # Print a preview of the first few records
        for r in records[:5]:
            print(r)
        if len(records) > 5:
            print(f"... {len(records) - 5} more records.")

        batch_num += 1
        time.sleep(interval)

    print("\n✅ Simulation complete.")

In [8]:
# Run simulation
simulate_streaming(duration=120, interval=25)

  print(f"\nBatch {batch_num}: Generating {num_records} records at {datetime.utcnow().isoformat()}")
  "timestamp": datetime.utcnow().isoformat(),



Batch 1: Generating 42 records at 2025-05-23T04:43:52.509755
{'id': '1-1', 'name': 'Heidi', 'timestamp': '2025-05-23T04:43:52.509809', 'age': 23, 'salary': 133821, 'job_title': 'Consultant'}
{'id': '1-2', 'name': 'Grace', 'timestamp': '2025-05-23T04:43:52.509817', 'age': 50, 'salary': 75572, 'job_title': 'Analyst'}
{'id': '1-3', 'name': 'Judy', 'timestamp': '2025-05-23T04:43:52.509820', 'age': 45, 'salary': 106805, 'job_title': 'Designer'}
{'id': '1-4', 'name': 'Frank', 'timestamp': '2025-05-23T04:43:52.509823', 'age': 33, 'salary': 81732, 'job_title': 'Designer'}
{'id': '1-5', 'name': 'Alice', 'timestamp': '2025-05-23T04:43:52.509825', 'age': 28, 'salary': 81110, 'job_title': 'Architect'}
... 37 more records.

Batch 2: Generating 44 records at 2025-05-23T04:44:17.514699
{'id': '2-1', 'name': 'Frank', 'timestamp': '2025-05-23T04:44:17.515211', 'age': 28, 'salary': 101601, 'job_title': 'Manager'}
{'id': '2-2', 'name': 'Judy', 'timestamp': '2025-05-23T04:44:17.515241', 'age': 24, 'salar

# ✅ Healthcare Monitoring Stream Simulator
This Python script simulates real-time streaming of patient vitals every 25 seconds for 2 minutes, similar to how a smartwatch or medical wearable might send periodic updates.

## 📋 Record Structure
Each record will contain:
```json
{
  "patient_id": "p00123",
  "timestamp": "2025-05-22T15:28:00Z",
  "heart_rate": 72,
  "blood_pressure": "120/80",
  "oxygen_level": 98,
  "device_status": "ok"
}
```

## ✅ Python Code

In [11]:
import random
import time
from datetime import datetime

# Simulated patient pool
patient_ids = [f"p{str(i).zfill(5)}" for i in range(1, 21)]  # 20 patients

def generate_health_record(batch_num, record_num):
    systolic = random.randint(100, 140)
    diastolic = random.randint(60, 90)

    return {
        "record_id": f"{batch_num}-{record_num}",
        "patient_id": random.choice(patient_ids),
        "timestamp": datetime.now().isoformat(),
        "heart_rate": random.randint(60, 100),                # BPM
        "blood_pressure": f"{systolic}/{diastolic}",          # mmHg
        "oxygen_level": random.randint(95, 100),              # %
        "device_status": random.choice(["ok", "low_battery", "error"])
    }

def simulate_health_stream(duration=120, interval=25):
    start_time = time.time()
    batch_num = 1

    print("🔄 Starting healthcare data stream simulation...\n")

    while (time.time() - start_time) < duration:
        num_records = random.randint(10, 50)
        print(f"\n📦 Batch {batch_num}: Generating {num_records} health records at {datetime.now().isoformat()}")

        records = [generate_health_record(batch_num, i + 1) for i in range(num_records)]

        # Print a few sample records
        for r in records[:3]:
            print(r)
        if len(records) > 3:
            print(f"... {len(records) - 3} more records.")

        batch_num += 1
        time.sleep(interval)

    print("\n✅ Health data stream simulation completed.")




In [12]:
# Run the simulation
simulate_health_stream(duration=120, interval=25)

🔄 Starting healthcare data stream simulation...


📦 Batch 1: Generating 36 health records at 2025-05-22T23:53:12.679659
{'record_id': '1-1', 'patient_id': 'p00015', 'timestamp': '2025-05-22T23:53:12.680085', 'heart_rate': 78, 'blood_pressure': '104/88', 'oxygen_level': 100, 'device_status': 'ok'}
{'record_id': '1-2', 'patient_id': 'p00009', 'timestamp': '2025-05-22T23:53:12.680101', 'heart_rate': 75, 'blood_pressure': '107/78', 'oxygen_level': 95, 'device_status': 'error'}
{'record_id': '1-3', 'patient_id': 'p00008', 'timestamp': '2025-05-22T23:53:12.680111', 'heart_rate': 91, 'blood_pressure': '127/84', 'oxygen_level': 96, 'device_status': 'error'}
... 33 more records.

📦 Batch 2: Generating 40 health records at 2025-05-22T23:53:37.680775
{'record_id': '2-1', 'patient_id': 'p00008', 'timestamp': '2025-05-22T23:53:37.681742', 'heart_rate': 62, 'blood_pressure': '102/69', 'oxygen_level': 95, 'device_status': 'low_battery'}
{'record_id': '2-2', 'patient_id': 'p00017', 'timestamp': '2025-

🛠️ Want to Extend This?
You could add:

- JSON or CSV writing (for downstream ingestion)

- Simulated alert conditions (e.g., if heart rate > 120 → “send alert”)

- Push to Kafka, MQTT, or a cloud stream (if building end-to-end)