# Generate Campaigns
---
O objetivo deste notebook é criar a base de campanhas, os dados intencionalmente serão gerados com inconsistências.


* Canal: `email`, `EMAIL`, 
* Datas invertidas
* Campanhas duplicadas

---

Campos da tabela **campaigns**:

* **campaign_id:** *Representa o id da campanha*
* **campaign_name:** *Nome da campanha*
* **channel:** *Canal de origem ("E-mail", "Social")*
* **start_date:** *Inicio da campanha*
* **end_date:** *Fim da campanha*

In [0]:
%pip install faker

Collecting faker
  Downloading faker-40.1.0-py3-none-any.whl.metadata (16 kB)
Downloading faker-40.1.0-py3-none-any.whl (2.0 MB)
[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/2.0 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m45.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: faker
Successfully installed faker-40.1.0
[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m


In [0]:
from faker import Faker
import random 

In [0]:
# Setup
random.seed(42)
fake = Faker()
fake.seed_instance(42)

#### Geração dos Dados

In [0]:
channels = ["email", "EMAIL", "e-mail", "social", "SOCIAL"]

campaigns = []

for i in range(30):

    start = fake.date_between("-6m", "today")
    end = fake.date_between("-6m", "today")

    campaigns.append({
        "campaign_id" : fake.uuid4(),
        "campaign_name" : f"Campaign_{i}",
        "channel" : random.choice(channels),
        "start_date" : start.strftime("%Y-%m-%d"),
        "end_date" : end.strftime("%Y-%m-%d")
    })

df_campaigns = spark.createDataFrame(campaigns)
display(df_campaigns.limit(5))

campaign_id,campaign_name,channel,end_date,start_date
eb2263dd-87c5-421e-ac24-a3c5c754108f,Campaign_0,EMAIL,2025-12-29,2025-12-29
5cec4eb5-edd9-4831-9ca3-5cfb04fc6d82,Campaign_1,SOCIAL,2025-12-29,2025-12-29
3da9c2a9-0ed4-4f1a-bd4c-bf374eb93eff,Campaign_2,EMAIL,2025-12-29,2025-12-29
d0e6e660-7c69-4ee1-bb5e-4bcf15ed6269,Campaign_3,EMAIL,2025-12-29,2025-12-29
a8e56e0c-20de-435d-a031-d750c40db9b4,Campaign_4,social,2025-12-29,2025-12-29


#### Escrita na RAW

In [0]:
BASE_PATH = "/Volumes/main/lakehouse_marketing/raw"

df_campaigns.write\
    .mode("overwrite")\
    .option("header", "true")\
    .csv(f"{BASE_PATH}/campaigns")

* Validação

In [0]:
dbutils.fs.ls(f"{BASE_PATH}")

[FileInfo(path='dbfs:/Volumes/main/lakehouse_marketing/raw/campaigns/', name='campaigns/', size=0, modificationTime=1767128411886),
 FileInfo(path='dbfs:/Volumes/main/lakehouse_marketing/raw/users/', name='users/', size=0, modificationTime=1767128411886)]