# Generate Events
---
O objetivo deste notebook é criar a base de eventos dos cliente, os dados intencionalmente serão gerados com inconsistências, que simula **tracking quebrado**.


* Problemas intencionais:
  * Usuários inexistente
  * Evento duplicado
  * Timestamp inválido

---

Campos da tabela **campaigns**:

* **event_id:** *event_id do usuários*
* **user_id:** *Representa o id do usuário*
* **campaign_id:** *id da campanha associada*
* **event_type:** *Tipo de evento ("view", "click")*
* **event_timestamp:** *data do evento*

In [0]:
%pip install faker

Collecting faker
  Downloading faker-40.1.0-py3-none-any.whl.metadata (16 kB)
Downloading faker-40.1.0-py3-none-any.whl (2.0 MB)
[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/2.0 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m41.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: faker
Successfully installed faker-40.1.0
[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m


In [0]:
from faker import Faker
import random

In [0]:
# Setup
random.seed(42)
fake = Faker()
fake.seed_instance(42)

#### Geração dos Dados

In [0]:
events = []

for _ in range(100_000):

    events.append({
        "event_id" : fake.uuid4(),
        "user_id" : fake.uuid4() if random.random() > 0.1 else None,
        "campaign_id" : fake.uuid4(),
        "event_type" : random.choice(["view", "click"]),
        "event_timestamp" : fake.iso8601()
    })

df_events = spark.createDataFrame(events)
display(df_events.limit(5))

campaign_id,event_id,event_timestamp,event_type,user_id
bd9c66b3-ad3c-4d6d-9a3d-1fa7bc8960a9,bdd640fb-0667-4ad1-9c80-317fa3b1799d,2019-12-17T16:17:54.240000,view,23b8c1e9-3924-46de-beb1-3b9046685257
8fadc1a6-06cb-4fb3-9a1d-e644815ef6d1,0822e8f3-6c03-4199-972a-846916419f82,1981-02-18T19:27:46.798518,view,3b8faa18-37f8-488b-97fc-695a07a0ca6e
c241330b-01a9-471f-9e8a-774bcf36d58b,6b65a6a4-8b81-48f6-b38a-088ca65ed389,2015-02-15T08:35:56.806669,view,47378190-96da-4dac-b2ff-5d2a386ecbe0
6142ea7d-17be-4111-9a2a-73ed562b0f79,47229389-571a-4876-ac30-7511b2b9437a,1975-06-02T03:10:48.916006,view,c37459ee-f50b-4a63-b71e-cd7b27cd8130
759cde66-bacf-43d0-8b1f-9163ce9ff57f,43b7a3a6-9a8d-4a03-980d-7b71d8f56413,2000-01-11T10:21:06.373450,view,


* Escrita na RAW

In [0]:
BASE_PATH = "/Volumes/main/lakehouse_marketing/raw"

df_events.write\
    .mode("overwrite")\
    .option("header", "true")\
    .csv(f"{BASE_PATH}/events")

* Validação

In [0]:
dbutils.fs.ls("/Volumes/main/lakehouse_marketing/raw")

[FileInfo(path='dbfs:/Volumes/main/lakehouse_marketing/raw/campaigns/', name='campaigns/', size=0, modificationTime=1767132667209),
 FileInfo(path='dbfs:/Volumes/main/lakehouse_marketing/raw/events/', name='events/', size=0, modificationTime=1767132667209),
 FileInfo(path='dbfs:/Volumes/main/lakehouse_marketing/raw/users/', name='users/', size=0, modificationTime=1767132667209)]