# <center> <img src="../labs/img/ITESOLogo.png" alt="ITESO" width="480" height="130"> </center>
# <center> **Departamento de Electrónica, Sistemas e Informática** </center>
---
## <center> **Procesamiento de Datos Masivos** </center>
---
### <center> **Primavera 2025** </center>
---
## <center> **Final Project (Website Activity) Producer** </center>


**Team**:
- Luis Raul Acosta Mendoza 
- Samantha Abigail Quintero 
- Arturo Benajamin Vergara Romo
    
**Profesor**: Dr. Pablo Camarillo Ramirez

## Overview
This notebook simulates user behavior by generating and sending three types of events to Kafka:
- Page Views
- Click Events
- User Interactions

## 1. Imports and Dependencies

In [10]:
from gatubelxs.page_views import generate_page_view
from gatubelxs.click_events import generate_click_event
from gatubelxs.user_interaction import generate_user_interaction
from kafka import KafkaProducer
from gatubelxs.producer_monitor import ProducerMonitor
from random import randint
import json
import time

## 2. Kafka Configuration

In [11]:
KAFKA_SERVER = '963ee95a1f99:9093'
TOPIC_PAGE_VIEWS = "page_views"
TOPIC_CLICK_EVENTS = "click_events"
TOPIC_USER_INTERACTIONS = "user_interactions"

NUMBER_SESSION = 400 #Number of web sessions to simulate
MAX_EVENTS_PER_SESSION = 10

## 3. Producer Setup

In [12]:
#Producer
producer = KafkaProducer(
    bootstrap_servers=KAFKA_SERVER,
    value_serializer=lambda m: json.dumps(m).encode('utf-8')
)

In [13]:
producer_monitor = ProducerMonitor(producer)

## 4. Session Simulation

In [14]:
import uuid
from random import randint

def simulate_session():
    user_id = str(uuid.uuid4())
    session_id = str(uuid.uuid4())

    n_page_views = randint(1, MAX_EVENTS_PER_SESSION)
    n_clicks = randint(0, MAX_EVENTS_PER_SESSION)
    n_interactions = randint(0, MAX_EVENTS_PER_SESSION)

    stages = ["browse", "cart", "checkout"]
    stage_treshholds = [0.6, 0.85]

    for _ in range(n_page_views):
        evt = generate_page_view(user_id, session_id)
        producer_monitor.send_with_monitoring(TOPIC_PAGE_VIEWS, evt)
        print(f"[{TOPIC_PAGE_VIEWS}] ->", evt)
    
    for i in range(n_clicks):
        progress = i / max(n_clicks, 1)
        if progress < stage_treshholds[0]:
            stage = "browse"
        elif progress < stage_treshholds[1]:
            stage = "cart"
        else:
            stage = "checkout"
        evt = generate_click_event(user_id, session_id, user_stage=stage)
        producer_monitor.send_with_monitoring(TOPIC_CLICK_EVENTS, evt)
        print(f"[{TOPIC_CLICK_EVENTS}] ->", evt)
    
    for i in range(n_interactions):
        evt = generate_user_interaction(user_id, session_id)
        producer_monitor.send_with_monitoring(TOPIC_USER_INTERACTIONS, evt)
        print(f"[{TOPIC_USER_INTERACTIONS}] ->", evt)

## 5. Data Generation and Sending

In [15]:
try:
    print(f"Starting to produce sessions...")
    for i in range(NUMBER_SESSION):
        simulate_session()
        time.sleep(2)
except KeyboardInterrupt:
    print("Interrumpted by user; stopping.")
finally:
    producer.flush()
    producer.close()

Starting to produce sessions...
[page_views] -> {'user_id': 'dfe93b2c-d127-4a1d-b478-2a16dd5e8834', 'session_id': '7599bc0b-0e72-4a19-94b5-768c7646ad56', 'page_url': '/product/book_4004', 'referrer_url': 'blog/tags', 'category': 'books', 'price': 207.98, 'timestamp': '2025-05-13T03:51:39.435679'}
[page_views] -> {'user_id': 'dfe93b2c-d127-4a1d-b478-2a16dd5e8834', 'session_id': '7599bc0b-0e72-4a19-94b5-768c7646ad56', 'page_url': '/product/book_4004', 'referrer_url': 'category/main/tag', 'category': 'books', 'price': 207.98, 'timestamp': '2025-05-13T03:51:39.447635'}
[page_views] -> {'user_id': 'dfe93b2c-d127-4a1d-b478-2a16dd5e8834', 'session_id': '7599bc0b-0e72-4a19-94b5-768c7646ad56', 'page_url': '/product/book_4004', 'referrer_url': 'tags', 'category': 'books', 'price': 207.98, 'timestamp': '2025-05-13T03:51:39.447907'}
[page_views] -> {'user_id': 'dfe93b2c-d127-4a1d-b478-2a16dd5e8834', 'session_id': '7599bc0b-0e72-4a19-94b5-768c7646ad56', 'page_url': '/product/book_4004', 'referrer_u

## 6. Final Results

In [16]:
producer_monitor.print_final_summary()


=== Final Producer Summary ===
Total Runtime: 813.70 seconds
Total Messages: 6,114
Total Data: 1,683,564 bytes

Per-Topic Analysis:

page_views:
  Messages: 2,153
  Total Data: 545,620 bytes
  Average Message Size: 253.42 bytes

click_events:
  Messages: 2,014
  Total Data: 582,719 bytes
  Average Message Size: 289.33 bytes

user_interactions:
  Messages: 1,947
  Total Data: 555,225 bytes
  Average Message Size: 285.17 bytes
