# Populador de Banco de Dados - Bluma Case

## üìã Vis√£o Geral

Este notebook implementa a popula√ß√£o do banco de dados `bluma_case` com dados sint√©ticos realistas para um marketplace de beleza e bem-estar. 

### Objetivos:
- Gerar 50.000 usu√°rios com caracter√≠sticas demogr√°ficas brasileiras
- Criar ~150.000 pedidos com padr√µes realistas de compra
- Gerar 120 campanhas de m√≠dia paga (Meta, Google, TikTok)
- Simular performance di√°ria das campanhas com sazonalidade
- Criar dados de criativos e eventos de usu√°rios
- Implementar an√°lise de cohorts e aloca√ß√£o de budget

### Estrutura do Banco:
- **users**: Dados demogr√°ficos e de aquisi√ß√£o
- **orders**: Pedidos e transa√ß√µes
- **paid_media_campaigns**: Campanhas de m√≠dia paga
- **daily_performance**: Performance di√°ria das campanhas
- **ad_creatives**: Criativos das campanhas
- **creative_performance**: Performance dos criativos
- **user_events**: Eventos de intera√ß√£o dos usu√°rios
- **user_cohorts**: An√°lise de cohorts
- **budget_allocation**: Aloca√ß√£o mensal de budget

## 1. Database Connection Setup

Importa√ß√£o das bibliotecas necess√°rias e configura√ß√£o da conex√£o com o banco de dados MySQL.

In [2]:
# Instalar depend√™ncias necess√°rias
# Execute esta c√©lula primeiro para instalar os pacotes necess√°rios

import subprocess
import sys

def install_package(package):
    """Instala um pacote usando pip"""
    try:
        subprocess.check_call([sys.executable, "-m", "pip", "install", package])
        print(f"‚úÖ {package} instalado com sucesso!")
    except subprocess.CalledProcessError as e:
        print(f"‚ùå Erro ao instalar {package}: {e}")

# Lista de pacotes necess√°rios
required_packages = [
    "pandas",
    "numpy", 
    "faker",
    "mysql-connector-python"
]

print("üîß Instalando depend√™ncias necess√°rias...")
print("=" * 50)

for package in required_packages:
    install_package(package)

print("\n‚úÖ Instala√ß√£o conclu√≠da! Agora voc√™ pode executar as demais c√©lulas.")

üîß Instalando depend√™ncias necess√°rias...
‚úÖ pandas instalado com sucesso!
‚úÖ numpy instalado com sucesso!
‚úÖ faker instalado com sucesso!
‚úÖ mysql-connector-python instalado com sucesso!

‚úÖ Instala√ß√£o conclu√≠da! Agora voc√™ pode executar as demais c√©lulas.


In [3]:
# Importa√ß√£o das bibliotecas necess√°rias
import pandas as pd
import numpy as np
from faker import Faker
import mysql.connector
from datetime import datetime, timedelta
import random
import uuid
from typing import List, Dict, Tuple
import warnings
warnings.filterwarnings('ignore')

# Configurar faker para portugu√™s brasileiro
fake = Faker('pt_BR')
Faker.seed(42)  # Para reproducibilidade
np.random.seed(42)
random.seed(42)

# Configura√ß√£o da conex√£o com o banco
DB_CONFIG = {
    'host': '95.111.240.159',
    'database': 'bluma_case',
    'user': 'root',
    'password': 'rafa906996'
}

print("‚úÖ Bibliotecas importadas com sucesso!")
print(f"üìä Faker configurado para: {fake.locale}")
print(f"üóÑÔ∏è Banco configurado: {DB_CONFIG['database']} @ {DB_CONFIG['host']}")

‚úÖ Bibliotecas importadas com sucesso!
üìä Faker configurado para: <bound method BaseProvider.locale of <faker.providers.user_agent.Provider object at 0x000002066A25CC20>>
üóÑÔ∏è Banco configurado: bluma_case @ 95.111.240.159


In [4]:
def create_connection():
    """
    Estabelece conex√£o com o banco de dados MySQL.
    
    Returns:
        mysql.connector.connection: Objeto de conex√£o com o banco
    """
    try:
        connection = mysql.connector.connect(**DB_CONFIG)
        if connection.is_connected():
            print("‚úÖ Conex√£o com o banco estabelecida com sucesso!")
            print(f"üîó Vers√£o do servidor MySQL: {connection.get_server_info()}")
            return connection
    except mysql.connector.Error as error:
        print(f"‚ùå Erro ao conectar com o banco: {error}")
        return None

# Testar conex√£o
test_connection = create_connection()
if test_connection:
    test_connection.close()
    print("üîÑ Conex√£o testada e fechada com sucesso!")

‚úÖ Conex√£o com o banco estabelecida com sucesso!
üîó Vers√£o do servidor MySQL: 8.0.42
üîÑ Conex√£o testada e fechada com sucesso!


## 2. Helper Functions for Data Generation

Fun√ß√µes utilit√°rias para inser√ß√£o em lote, limpeza de tabelas e helpers estat√≠sticos para gerar dados realistas.

In [19]:
def truncate_tables(connection):
    """
    Limpa todas as tabelas antes de popular com novos dados.
    
    Args:
        connection: Conex√£o com o banco de dados
    """
    cursor = connection.cursor()
    
    # Lista das tabelas na ordem de depend√™ncia (FK constraints)
    tables = [
        'creative_performance',
        'ad_creatives', 
        'daily_performance',
        'user_events',
        'user_cohorts',
        'budget_allocation',
        'orders',
        'paid_media_campaigns',
        'users'
    ]
    
    try:
        # Desabilitar foreign keys temporariamente
        cursor.execute("SET FOREIGN_KEY_CHECKS = 0;")
        
        for table in tables:
            cursor.execute(f"TRUNCATE TABLE {table};")
            print(f"üóëÔ∏è Tabela {table} limpa")
        
        # Re-habilitar foreign keys 
        cursor.execute("SET FOREIGN_KEY_CHECKS = 1;")
        connection.commit()
        print("‚úÖ Todas as tabelas foram limpas com sucesso!")
        
    except mysql.connector.Error as error:
        print(f"‚ùå Erro ao limpar tabelas: {error}")
    finally:
        cursor.close()

def batch_insert(connection, table_name: str, data: List[Dict], batch_size: int = 1000):
    """
    Insere dados em lotes para melhor performance.
    
    Args:
        connection: Conex√£o com o banco de dados
        table_name: Nome da tabela
        data: Lista de dicion√°rios com os dados
        batch_size: Tamanho do lote (default: 1000)
    """
    if not data:
        print(f"‚ö†Ô∏è Nenhum dado para inserir na tabela {table_name}")
        return
        
    cursor = connection.cursor()
    
    # Pegar as colunas do primeiro item
    columns = list(data[0].keys())
    placeholders = ', '.join(['%s'] * len(columns))
    
    insert_query = f"""
    INSERT INTO {table_name} ({', '.join(columns)}) 
    VALUES ({placeholders})
    """
    
    total_records = len(data)
    processed = 0
    
    try:
        for i in range(0, total_records, batch_size):
            batch = data[i:i + batch_size]
            
            # Converter dados para tuplas
            batch_values = []
            for record in batch:
                values = []
                for col in columns:
                    value = record[col]
                    # Tratar valores None
                    if value is None:
                        values.append(None)
                    # Tratar datetime
                    elif isinstance(value, datetime):
                        values.append(value.strftime('%Y-%m-%d %H:%M:%S'))
                    # Tratar numpy types
                    elif hasattr(value, 'item'):  # numpy scalar
                        values.append(value.item())
                    # Converter para string se necess√°rio (UUID, etc.)
                    else:
                        values.append(str(value) if not isinstance(value, (int, float, bool)) else value)
                batch_values.append(tuple(values))
            
            cursor.executemany(insert_query, batch_values)
            processed += len(batch)
            
            print(f"üìä {table_name}: {processed}/{total_records} registros inseridos ({(processed/total_records)*100:.1f}%)")
        
        connection.commit()
        print(f"‚úÖ {table_name}: {total_records} registros inseridos com sucesso!")
        
    except mysql.connector.Error as error:
        print(f"‚ùå Erro ao inserir dados na tabela {table_name}: {error}")
        connection.rollback()
    finally:
        cursor.close()

print("‚úÖ Fun√ß√µes auxiliares atualizadas!")

‚úÖ Fun√ß√µes auxiliares atualizadas!


In [6]:
# Constantes de neg√≥cio conforme especifica√ß√£o
BUSINESS_PARAMS = {
    'acquisition_channels': {
        'Meta Ads': {'weight': 0.45, 'cac': 85, 'activation_rate': 0.078},
        'Google Ads': {'weight': 0.25, 'cac': 95, 'activation_rate': 0.092},
        'TikTok Ads': {'weight': 0.10, 'cac': 75, 'activation_rate': 0.065},
        'Organic': {'weight': 0.15, 'cac': 0, 'activation_rate': 0.15},
        'Referral': {'weight': 0.05, 'cac': 30, 'activation_rate': 0.25}
    },
    'services': {
        'Manicure': {'weight': 0.35, 'avg_ticket': 65, 'frequency_per_month': 2.5},
        'Massagem': {'weight': 0.20, 'avg_ticket': 120, 'frequency_per_month': 1.2},
        'Limpeza de Pele': {'weight': 0.15, 'avg_ticket': 95, 'frequency_per_month': 1.5},
        'Design Sobrancelhas': {'weight': 0.15, 'avg_ticket': 55, 'frequency_per_month': 2.0},
        'Depila√ß√£o': {'weight': 0.15, 'avg_ticket': 80, 'frequency_per_month': 1.8}
    },
    'cities': {
        'S√£o Paulo': {'weight': 0.30, 'state': 'SP'},
        'Rio de Janeiro': {'weight': 0.20, 'state': 'RJ'},
        'Belo Horizonte': {'weight': 0.10, 'state': 'MG'},
        'Bras√≠lia': {'weight': 0.08, 'state': 'DF'},
        'Curitiba': {'weight': 0.07, 'state': 'PR'},
        'Porto Alegre': {'weight': 0.05, 'state': 'RS'},
        'Salvador': {'weight': 0.05, 'state': 'BA'},
        'Fortaleza': {'weight': 0.05, 'state': 'CE'},
        'Recife': {'weight': 0.05, 'state': 'PE'},
        'Campinas': {'weight': 0.05, 'state': 'SP'}
    },
    'creative_types': {
        'UGC': {'ctr': 0.028, 'cvr': 0.012},
        'Carousel': {'ctr': 0.019, 'cvr': 0.010},
        'Video': {'ctr': 0.022, 'cvr': 0.011},
        'Static': {'ctr': 0.015, 'cvr': 0.008},
        'ASMR': {'ctr': 0.031, 'cvr': 0.009}
    }
}

# Per√≠odos de tempo
START_DATE = datetime(2024, 1, 1)
END_DATE = datetime(2025, 1, 31)

def get_seasonal_factor(date: datetime) -> float:
    """
    Retorna fator de ajuste sazonal para uma data espec√≠fica.
    
    Args:
        date: Data para calcular o fator
        
    Returns:
        float: Fator multiplicativo (1.0 = normal)
    """
    month = date.month
    day = date.day
    
    # Dia das m√£es (maio): +50%
    if month == 5:
        return 1.5
    
    # Black Friday (novembro 20+): +80%
    if month == 11 and day >= 20:
        return 1.8
    
    # Dezembro (at√© 25): +40%
    if month == 12 and day <= 25:
        return 1.4
    
    return 1.0

def get_weekday_factor(date: datetime) -> float:
    """
    Retorna fator de ajuste por dia da semana.
    
    Args:
        date: Data para calcular o fator
        
    Returns:
        float: Fator multiplicativo
    """
    weekday = date.weekday()  # 0 = segunda, 6 = domingo
    
    if weekday == 4:  # Sexta-feira: +15%
        return 1.15
    elif weekday in [5, 6]:  # Fins de semana: -20%
        return 0.8
    else:
        return 1.0

def add_gaussian_noise(value: float, noise_factor: float) -> float:
    """
    Adiciona ru√≠do gaussiano a um valor.
    
    Args:
        value: Valor base
        noise_factor: Fator de ru√≠do (ex: 0.1 para ¬±10%)
        
    Returns:
        float: Valor com ru√≠do
    """
    noise = np.random.normal(0, noise_factor)
    return max(0, value * (1 + noise))

print("‚úÖ Constantes de neg√≥cio e helpers estat√≠sticos definidos!")
print(f"üìÖ Per√≠odo de dados: {START_DATE.strftime('%d/%m/%Y')} at√© {END_DATE.strftime('%d/%m/%Y')}")

‚úÖ Constantes de neg√≥cio e helpers estat√≠sticos definidos!
üìÖ Per√≠odo de dados: 01/01/2024 at√© 31/01/2025


## 3. User Data Generation

Gera√ß√£o de 50.000 usu√°rios com demografias brasileiras realistas, distribui√ß√£o temporal usando beta distribution e canais de aquisi√ß√£o com taxas de ativa√ß√£o espec√≠ficas.

In [7]:
def generate_users(num_users: int = 50000) -> List[Dict]:
    """
    Gera dados de usu√°rios com caracter√≠sticas demogr√°ficas brasileiras.
    
    Args:
        num_users: N√∫mero de usu√°rios a gerar (default: 50000)
        
    Returns:
        List[Dict]: Lista de usu√°rios gerados
    """
    print(f"üöÄ Gerando {num_users:,} usu√°rios...")
    
    users = []
    channels = list(BUSINESS_PARAMS['acquisition_channels'].keys())
    channel_weights = [BUSINESS_PARAMS['acquisition_channels'][ch]['weight'] for ch in channels]
    
    cities = list(BUSINESS_PARAMS['cities'].keys())
    city_weights = [BUSINESS_PARAMS['cities'][city]['weight'] for city in cities]
    
    # Distribui√ß√£o temporal usando beta distribution (mais usu√°rios nos meses recentes)
    time_range = (END_DATE - START_DATE).days
    beta_samples = np.random.beta(2, 5, num_users)
    
    for i in range(num_users):
        # Data de cria√ß√£o (beta distribution para simular crescimento)
        days_offset = int(beta_samples[i] * time_range)
        created_at = START_DATE + timedelta(days=days_offset)
        
        # Canal de aquisi√ß√£o
        channel = np.random.choice(channels, p=channel_weights)
        
        # Cidade e estado
        city = np.random.choice(cities, p=city_weights)
        state = BUSINESS_PARAMS['cities'][city]['state']
        
        # Demografia: 85% feminino, faixa 25-34 anos predominante
        gender = 'F' if random.random() < 0.85 else 'M'
        
        age_groups = ['18-24', '25-34', '35-44', '45-54', '55+']
        age_weights = [0.15, 0.50, 0.25, 0.08, 0.02]  # 25-34 predominante
        age_group = np.random.choice(age_groups, p=age_weights)
        
        # Calcular first_order_date baseado na taxa de ativa√ß√£o
        activation_rate = BUSINESS_PARAMS['acquisition_channels'][channel]['activation_rate']
        
        first_order_date = None
        if random.random() < activation_rate:
            # Ajuste sazonal: +30% ativa√ß√£o em maio, novembro e dezembro
            seasonal_boost = 1.0
            if created_at.month in [5, 11, 12]:
                seasonal_boost = 1.3
                
            # Se deve ativar, calcular data usando gamma distribution (m√©dia 4 dias)
            if random.random() < seasonal_boost:
                days_to_order = max(0, int(np.random.gamma(2, 2)))  # M√©dia ~4 dias
                first_order_date = created_at + timedelta(days=days_to_order)
                
                # N√£o pode ser no futuro
                if first_order_date > END_DATE:
                    first_order_date = None
        
        # Campanha de aquisi√ß√£o (simplificada)
        if channel in ['Meta Ads', 'Google Ads', 'TikTok Ads']:
            acquisition_campaign = f"{channel.replace(' ', '_').lower()}_prospecting_{random.randint(1, 10)}"
        else:
            acquisition_campaign = channel.lower()
        
        user = {
            'user_id': str(uuid.uuid4()),
            'created_at': created_at,
            'acquisition_channel': channel,
            'acquisition_campaign': acquisition_campaign,
            'first_order_date': first_order_date,
            'city': city,
            'state': state,
            'age_group': age_group,
            'gender': gender
        }
        
        users.append(user)
        
        if (i + 1) % 10000 == 0:
            print(f"üë• {i + 1:,} usu√°rios gerados...")
    
    # Estat√≠sticas
    activated_users = sum(1 for u in users if u['first_order_date'] is not None)
    activation_rate = (activated_users / num_users) * 100
    
    print(f"‚úÖ {num_users:,} usu√°rios gerados!")
    print(f"üìä Usu√°rios ativados: {activated_users:,} ({activation_rate:.1f}%)")
    
    return users

# Gerar usu√°rios
users_data = generate_users()

üöÄ Gerando 50,000 usu√°rios...
üë• 10,000 usu√°rios gerados...
üë• 20,000 usu√°rios gerados...
üë• 30,000 usu√°rios gerados...
üë• 40,000 usu√°rios gerados...
üë• 50,000 usu√°rios gerados...
‚úÖ 50,000 usu√°rios gerados!
üìä Usu√°rios ativados: 4,955 (9.9%)


## 4. Order Data Generation

Cria√ß√£o de pedidos para usu√°rios ativados com padr√µes realistas de distribui√ß√£o de servi√ßos, m√©todos de pagamento, descontos e taxas de cancelamento.

In [8]:
def generate_orders(users: List[Dict]) -> List[Dict]:
    """
    Gera pedidos para usu√°rios ativados.
    
    Args:
        users: Lista de usu√°rios gerados
        
    Returns:
        List[Dict]: Lista de pedidos gerados
    """
    print("üõí Gerando pedidos...")
    
    # Filtrar apenas usu√°rios ativados
    activated_users = [u for u in users if u['first_order_date'] is not None]
    print(f"üë• {len(activated_users):,} usu√°rios ativados encontrados")
    
    orders = []
    services = list(BUSINESS_PARAMS['services'].keys())
    service_weights = [BUSINESS_PARAMS['services'][svc]['weight'] for svc in services]
    
    # M√©todos de pagamento com evolu√ß√£o temporal
    payment_methods = ['PIX', 'Cart√£o de Cr√©dito', 'Cart√£o de D√©bito', 'Dinheiro']
    
    for user in activated_users:
        user_id = user['user_id']
        first_order_date = user['first_order_date']
        
        # Distribui√ß√£o de pedidos por usu√°rio
        num_orders_rand = random.random()
        if num_orders_rand < 0.30:  # 30%: apenas 1 pedido
            num_orders = 1
        elif num_orders_rand < 0.60:  # 30%: 2-3 pedidos
            num_orders = random.randint(2, 3)
        elif num_orders_rand < 0.85:  # 25%: 4-7 pedidos
            num_orders = random.randint(4, 7)
        else:  # 15%: 8+ pedidos (power users)
            num_orders = random.randint(8, 15)
        
        previous_service = None
        order_date = first_order_date
        
        for order_num in range(num_orders):
            # Tend√™ncia a repetir o mesmo servi√ßo (60% de chance)
            if previous_service and random.random() < 0.60:
                service_type = previous_service
            else:
                service_type = np.random.choice(services, p=service_weights)
            
            previous_service = service_type
            
            # Valor do pedido baseado no ticket m√©dio do servi√ßo
            base_value = BUSINESS_PARAMS['services'][service_type]['avg_ticket']
            order_value = add_gaussian_noise(base_value, 0.15)  # ¬±15% varia√ß√£o
            
            # Desconto no primeiro pedido (40% chance, 15-25% desconto)
            discount_amount = 0.0
            if order_num == 0 and random.random() < 0.40:
                discount_percent = random.uniform(0.15, 0.25)
                discount_amount = order_value * discount_percent
                order_value -= discount_amount
            
            # Status: Taxa de cancelamento (12% novos usu√°rios, 8% recorrentes)
            cancel_rate = 0.12 if order_num == 0 else 0.08
            status = 'cancelled' if random.random() < cancel_rate else 'completed'
            
            # M√©todo de pagamento com evolu√ß√£o temporal
            year_progress = (order_date.year - 2024) + (order_date.month - 1) / 12
            pix_growth = 0.25 + (year_progress * 0.10)  # 25% em 2024 ‚Üí 35% em 2025
            pix_growth = min(pix_growth, 0.35)
            
            payment_rand = random.random()
            if payment_rand < pix_growth:
                payment_method = 'PIX'
            elif payment_rand < pix_growth + 0.45:
                payment_method = 'Cart√£o de Cr√©dito'
            elif payment_rand < pix_growth + 0.70:
                payment_method = 'Cart√£o de D√©bito'
            else:
                payment_method = 'Dinheiro'
            
            order = {
                'order_id': str(uuid.uuid4()),
                'user_id': user_id,
                'order_date': order_date,
                'order_value': round(order_value, 2),
                'service_type': service_type,
                'status': status,
                'payment_method': payment_method,
                'discount_amount': round(discount_amount, 2)
            }
            
            orders.append(order)
            
            # Calcular pr√≥xima data do pedido baseada na frequ√™ncia do servi√ßo
            if order_num < num_orders - 1:
                frequency_per_month = BUSINESS_PARAMS['services'][service_type]['frequency_per_month']
                days_between_orders = int(30 / frequency_per_month)
                days_variation = random.randint(-5, 10)  # Varia√ß√£o aleat√≥ria
                
                next_order_date = order_date + timedelta(days=days_between_orders + days_variation)
                
                # N√£o ultrapassar o per√≠odo de dados
                if next_order_date <= END_DATE:
                    order_date = next_order_date
                else:
                    break
    
    # Estat√≠sticas
    completed_orders = [o for o in orders if o['status'] == 'completed']
    total_gmv = sum(o['order_value'] for o in completed_orders)
    
    print(f"‚úÖ {len(orders):,} pedidos gerados!")
    print(f"üìä Pedidos completados: {len(completed_orders):,}")
    print(f"üí∞ GMV Total: R$ {total_gmv:,.2f}")
    
    return orders

# Gerar pedidos
orders_data = generate_orders(users_data)

üõí Gerando pedidos...
üë• 4,955 usu√°rios ativados encontrados
‚úÖ 19,683 pedidos gerados!
üìä Pedidos completados: 17,919
üí∞ GMV Total: R$ 1,430,718.42


## 5. Campaign Data Generation

Gera√ß√£o de 120 campanhas de m√≠dia paga distribu√≠das entre Meta (50%), Google (35%) e TikTok (15%) com diferentes tipos de campanha, budgets e dura√ß√µes realistas.

In [10]:
def generate_campaigns(num_campaigns: int = 120) -> List[Dict]:
    """
    Gera campanhas de m√≠dia paga.
    
    Args:
        num_campaigns: N√∫mero total de campanhas (default: 120)
        
    Returns:
        List[Dict]: Lista de campanhas geradas
    """
    print(f"üì¢ Gerando {num_campaigns} campanhas...")
    
    campaigns = []
    
    # Definir tipos de campanha por plataforma
    campaign_types = {
        'Meta': ['Prospecting', 'Retargeting', 'Lookalike', 'Brand'],
        'Google': ['Search', 'Shopping', 'PMax', 'Brand'],
        'TikTok': ['Prospecting', 'Retargeting', 'Brand']
    }
    
    # Distribui√ß√£o de campanhas por plataforma
    platform_distribution = {'Meta': 0.50, 'Google': 0.35, 'TikTok': 0.15}
    
    # Budget di√°rio base por tipo de campanha (em R$)
    budget_ranges = {
        'Prospecting': (200, 800),
        'Retargeting': (100, 400),
        'Lookalike': (300, 700),
        'Brand': (150, 500),
        'Search': (250, 900),
        'Shopping': (300, 1000),
        'PMax': (400, 1200)
    }
    
    campaign_counter = {'Meta': 1, 'Google': 1, 'TikTok': 1}
    
    for i in range(num_campaigns):
        # Selecionar plataforma baseada na distribui√ß√£o
        platform_rand = random.random()
        if platform_rand < 0.50:
            platform = 'Meta'
        elif platform_rand < 0.85:  # 0.50 + 0.35
            platform = 'Google'
        else:
            platform = 'TikTok'
        
        # Selecionar tipo de campanha
        campaign_type = random.choice(campaign_types[platform])
        
        # Gerar datas de campanha
        # Distribuir campanhas ao longo do per√≠odo
        campaign_start_offset = random.randint(0, (END_DATE - START_DATE).days - 90)
        start_date = START_DATE + timedelta(days=campaign_start_offset)
        
        # Dura√ß√£o: 7 a 90 dias
        duration = random.randint(7, 90)
        end_date = start_date + timedelta(days=duration)
        
        # N√£o ultrapassar o per√≠odo final
        if end_date > END_DATE:
            end_date = END_DATE
        
        # Budget di√°rio
        if campaign_type in budget_ranges:
            min_budget, max_budget = budget_ranges[campaign_type]
        else:
            min_budget, max_budget = (200, 600)  # Default
        
        daily_budget = round(random.uniform(min_budget, max_budget), 2)
        
        # ID da campanha no formato: Platform_Type_Number
        campaign_id = f"{platform}_{campaign_type}_{campaign_counter[platform]:03d}"
        campaign_counter[platform] += 1
        
        # Nome da campanha
        campaign_name = f"{platform} - {campaign_type} - {start_date.strftime('%b %Y')}"
        
        # Objetivo da campanha
        objectives = {
            'Prospecting': 'Acquisi√ß√£o de Novos Clientes',
            'Retargeting': 'Reativa√ß√£o de Leads',
            'Lookalike': 'Expans√£o de Audi√™ncia',
            'Brand': 'Brand Awareness',
            'Search': 'Captura de Demanda',
            'Shopping': 'Convers√£o de Produto',
            'PMax': 'Performance M√°xima'
        }
        
        objective = objectives.get(campaign_type, 'Convers√£o')
        
        campaign = {
            'campaign_id': campaign_id,
            'platform': platform,
            'campaign_name': campaign_name,
            'campaign_type': campaign_type,
            'start_date': start_date,
            'end_date': end_date,
            'daily_budget': daily_budget,
            'objective': objective
        }
        
        campaigns.append(campaign)
    
    # Estat√≠sticas
    platform_counts = {}
    for platform in ['Meta', 'Google', 'TikTok']:
        count = len([c for c in campaigns if c['platform'] == platform])
        platform_counts[platform] = count
    
    total_budget = sum(c['daily_budget'] for c in campaigns)
    
    print(f"‚úÖ {num_campaigns} campanhas geradas!")
    print(f"üìä Distribui√ß√£o por plataforma:")
    for platform, count in platform_counts.items():
        percentage = (count / num_campaigns) * 100
        print(f"   {platform}: {count} campanhas ({percentage:.1f}%)")
    print(f"üí∞ Budget di√°rio total: R$ {total_budget:,.2f}")
    
    return campaigns

# Gerar campanhas
campaigns_data = generate_campaigns()

üì¢ Gerando 120 campanhas...
‚úÖ 120 campanhas geradas!
üìä Distribui√ß√£o por plataforma:
   Meta: 56 campanhas (46.7%)
   Google: 44 campanhas (36.7%)
   TikTok: 20 campanhas (16.7%)
üí∞ Budget di√°rio total: R$ 52,410.15


## 6. Daily Performance Data Generation

Cria√ß√£o de m√©tricas de performance di√°rias para cada campanha ativa, incluindo learning phase, sazonalidade, varia√ß√µes por dia da semana e ru√≠do realista.

In [13]:
def generate_daily_performance(campaigns: List[Dict]) -> List[Dict]:
    """
    Gera performance di√°ria para cada campanha ativa.
    
    Args:
        campaigns: Lista de campanhas geradas
        
    Returns:
        List[Dict]: Lista de registros de performance di√°ria
    """
    print("üìà Gerando performance di√°ria das campanhas...")
    
    daily_performance = []
    
    # CPM base por plataforma (em R$)
    base_cpm = {'Meta': 12.0, 'Google': 15.0, 'TikTok': 8.0}
    
    # CTR base por plataforma
    base_ctr = {'Meta': 0.024, 'Google': 0.032, 'TikTok': 0.028}
    
    # CVR base por plataforma  
    base_cvr = {'Meta': 0.015, 'Google': 0.018, 'TikTok': 0.012}
    
    total_days = 0
    
    for campaign in campaigns:
        campaign_id = campaign['campaign_id']
        platform = campaign['platform']
        start_date = campaign['start_date']
        end_date = campaign['end_date']
        daily_budget = campaign['daily_budget']
        
        current_date = start_date
        campaign_day = 0
        
        while current_date <= end_date:
            campaign_day += 1
            
            # Learning phase: performance melhora nos primeiros 7 dias
            learning_factor = min(1.0, 0.7 + (campaign_day - 1) * 0.043)  # 0.7 ‚Üí 1.0
            
            # Fatores de ajuste
            seasonal_factor = get_seasonal_factor(current_date)
            weekday_factor = get_weekday_factor(current_date)
            
            # Spend com varia√ß√£o di√°ria (¬±20%)
            base_spend = daily_budget * learning_factor * seasonal_factor * weekday_factor
            spend = add_gaussian_noise(base_spend, 0.20)
            
            # CPM com ru√≠do
            cpm = add_gaussian_noise(base_cpm[platform], 0.10)
            
            # Impressions = (Spend / CPM) * 1000
            impressions = int((spend / cpm) * 1000)
            
            # CTR com ru√≠do e fatores de ajuste
            ctr = base_ctr[platform] * learning_factor * seasonal_factor
            ctr = add_gaussian_noise(ctr, 0.20)
            ctr = max(0.005, min(0.08, ctr))  # Limites realistas
            
            # Clicks = Impressions * CTR
            clicks = int(impressions * ctr)
            
            # CVR com ru√≠do e fatores de ajuste
            cvr = base_cvr[platform] * learning_factor * seasonal_factor
            cvr = add_gaussian_noise(cvr, 0.30)
            cvr = max(0.003, min(0.05, cvr))  # Limites realistas
            
            # Conversions = Clicks * CVR
            conversions = int(clicks * cvr)
            
            # New Users = Conversions * 0.7 (m√©dia)
            new_users = int(conversions * random.uniform(0.6, 0.8))
            
            # Conversion Value (baseado no ticket m√©dio)
            avg_ticket = 75  # Ticket m√©dio geral
            conversion_value = conversions * add_gaussian_noise(avg_ticket, 0.15)
            
            # Calcular m√©tricas derivadas
            cpc = spend / clicks if clicks > 0 else 0
            ctr_percent = (clicks / impressions * 100) if impressions > 0 else 0
            
            performance = {
                'date': current_date,
                'campaign_id': campaign_id,
                'impressions': impressions,
                'clicks': clicks,
                'spend': round(spend, 2),
                'conversions': conversions,
                'conversion_value': round(conversion_value, 2),
                'new_users': new_users,
                'cpm': round(cpm, 2),
                'cpc': round(cpc, 2),
                'ctr': round(ctr_percent, 2)
            }
            
            daily_performance.append(performance)
            total_days += 1
            
            current_date += timedelta(days=1)
    
    # Estat√≠sticas
    total_spend = sum(p['spend'] for p in daily_performance)
    total_impressions = sum(p['impressions'] for p in daily_performance)
    total_clicks = sum(p['clicks'] for p in daily_performance)
    total_conversions = sum(p['conversions'] for p in daily_performance)
    
    print(f"‚úÖ {len(daily_performance):,} registros de performance gerados!")
    print(f"üìä Total de dias de campanha: {total_days:,}")
    print(f"üí∞ Spend total: R$ {total_spend:,.2f}")
    print(f"üëÄ Impress√µes totais: {total_impressions:,}")
    print(f"üñ±Ô∏è Clicks totais: {total_clicks:,}")
    print(f"üéØ Convers√µes totais: {total_conversions:,}")
    
    return daily_performance

# Gerar performance di√°ria
daily_performance_data = generate_daily_performance(campaigns_data)

üìà Gerando performance di√°ria das campanhas...
‚úÖ 5,837 registros de performance gerados!
üìä Total de dias de campanha: 5,837
üí∞ Spend total: R$ 2,490,025.81
üëÄ Impress√µes totais: 209,158,810
üñ±Ô∏è Clicks totais: 6,214,994
üéØ Convers√µes totais: 103,592


## 7. Creative Data Generation

Gera√ß√£o de 3-10 criativos por campanha com diferentes tipos (UGC, Carousel, Video, Static, ASMR) e padr√µes realistas de lan√ßamento e status.

In [15]:
def generate_creatives(campaigns: List[Dict]) -> List[Dict]:
    """Gera criativos para as campanhas."""
    print("üé® Gerando criativos...")
    
    creatives = []
    creative_types = list(BUSINESS_PARAMS['creative_types'].keys())
    type_weights = [0.3, 0.25, 0.25, 0.15, 0.05]  # UGC, Carousel, Video, Static, ASMR
    
    for campaign in campaigns:
        num_creatives = random.randint(3, 10)
        
        for i in range(num_creatives):
            creative_type = np.random.choice(creative_types, p=type_weights)
            
            # Data de lan√ßamento (n√£o todos no dia 1)
            launch_delay = random.randint(0, min(7, (campaign['end_date'] - campaign['start_date']).days))
            launched_date = campaign['start_date'] + timedelta(days=launch_delay)
            
            # 20% s√£o pausados ap√≥s teste
            status = 'paused' if random.random() < 0.20 else 'active'
            
            creative = {
                'creative_id': f"{campaign['campaign_id']}_creative_{i+1:02d}",
                'campaign_id': campaign['campaign_id'],
                'creative_type': creative_type,
                'creative_name': f"{creative_type} - {campaign['platform']} - V{i+1}",
                'launched_date': launched_date,
                'status': status
            }
            creatives.append(creative)
    
    print(f"‚úÖ {len(creatives):,} criativos gerados!")
    return creatives

def generate_simplified_data():
    """Gera dados simplificados para as tabelas restantes."""
    print("üîÑ Gerando dados simplificados...")
    
    # Creative Performance (simplificado)
    creative_performance = []
    
    # User Events (sample de 1000 usu√°rios)
    user_events = []
    event_types = ['app_open', 'view_service', 'view_professional', 'add_to_cart', 'search']
    platforms = ['iOS', 'Android', 'Web']
    
    activated_users = [u for u in users_data if u['first_order_date'] is not None][:1000]
    
    for user in activated_users:
        for _ in range(random.randint(5, 20)):
            event = {
                'event_id': str(uuid.uuid4()),
                'user_id': user['user_id'],
                'event_timestamp': fake.date_time_between(user['created_at'], END_DATE),
                'event_type': random.choice(event_types),
                'platform': np.random.choice(platforms, p=[0.45, 0.40, 0.15]),
                'session_id': str(uuid.uuid4())
            }
            user_events.append(event)
    
    # User Cohorts (simplificado)
    user_cohorts = []
    
    # Budget Allocation (simplificado)
    budget_allocation = []
    
    print(f"‚úÖ {len(user_events):,} eventos de usu√°rio gerados!")
    
    return creative_performance, user_events, user_cohorts, budget_allocation

# Gerar dados
creatives_data = generate_creatives(campaigns_data)
creative_performance_data, events_data, cohorts_data, budget_data = generate_simplified_data()

üé® Gerando criativos...
‚úÖ 755 criativos gerados!
üîÑ Gerando dados simplificados...
‚úÖ 12,846 eventos de usu√°rio gerados!


## 8. Data Insertion into Database

Execu√ß√£o das inser√ß√µes em lote no banco de dados, respeitando constraints de FK e mostrando progresso detalhado.

In [18]:
def create_database_tables():
    """
    Cria todas as tabelas necess√°rias no banco de dados.
    """
    print("üèóÔ∏è Criando estrutura das tabelas no banco...")
    
    connection = create_connection()
    if not connection:
        print("‚ùå Falha na conex√£o. Abortando cria√ß√£o das tabelas.")
        return False
    
    cursor = connection.cursor()
    
    try:
        # 1. Tabela users
        cursor.execute("""
        CREATE TABLE IF NOT EXISTS users (
            user_id VARCHAR(36) PRIMARY KEY,
            created_at DATETIME NOT NULL,
            acquisition_channel VARCHAR(50) NOT NULL,
            acquisition_campaign VARCHAR(100),
            first_order_date DATETIME NULL,
            city VARCHAR(50) NOT NULL,
            state VARCHAR(2) NOT NULL,
            age_group VARCHAR(10) NOT NULL,
            gender CHAR(1) NOT NULL
        );
        """)
        
        # 2. Tabela paid_media_campaigns
        cursor.execute("""
        CREATE TABLE IF NOT EXISTS paid_media_campaigns (
            campaign_id VARCHAR(100) PRIMARY KEY,
            platform VARCHAR(20) NOT NULL,
            campaign_name VARCHAR(200) NOT NULL,
            campaign_type VARCHAR(50) NOT NULL,
            start_date DATE NOT NULL,
            end_date DATE NOT NULL,
            daily_budget DECIMAL(10,2) NOT NULL,
            objective VARCHAR(100) NOT NULL
        );
        """)
        
        # 3. Tabela orders
        cursor.execute("""
        CREATE TABLE IF NOT EXISTS orders (
            order_id VARCHAR(36) PRIMARY KEY,
            user_id VARCHAR(36) NOT NULL,
            order_date DATETIME NOT NULL,
            order_value DECIMAL(10,2) NOT NULL,
            service_type VARCHAR(50) NOT NULL,
            status VARCHAR(20) NOT NULL,
            payment_method VARCHAR(30) NOT NULL,
            discount_amount DECIMAL(10,2) DEFAULT 0,
            FOREIGN KEY (user_id) REFERENCES users(user_id)
        );
        """)
        
        # 4. Tabela daily_performance
        cursor.execute("""
        CREATE TABLE IF NOT EXISTS daily_performance (
            date DATE NOT NULL,
            campaign_id VARCHAR(100) NOT NULL,
            impressions INT NOT NULL,
            clicks INT NOT NULL,
            spend DECIMAL(10,2) NOT NULL,
            conversions INT NOT NULL,
            conversion_value DECIMAL(10,2) NOT NULL,
            new_users INT NOT NULL,
            cpm DECIMAL(10,2) NOT NULL,
            cpc DECIMAL(10,2) NOT NULL,
            ctr DECIMAL(5,2) NOT NULL,
            PRIMARY KEY (date, campaign_id),
            FOREIGN KEY (campaign_id) REFERENCES paid_media_campaigns(campaign_id)
        );
        """)
        
        # 5. Tabela ad_creatives
        cursor.execute("""
        CREATE TABLE IF NOT EXISTS ad_creatives (
            creative_id VARCHAR(150) PRIMARY KEY,
            campaign_id VARCHAR(100) NOT NULL,
            creative_type VARCHAR(20) NOT NULL,
            creative_name VARCHAR(200) NOT NULL,
            launched_date DATE NOT NULL,
            status VARCHAR(20) NOT NULL,
            FOREIGN KEY (campaign_id) REFERENCES paid_media_campaigns(campaign_id)
        );
        """)
        
        # 6. Tabela creative_performance
        cursor.execute("""
        CREATE TABLE IF NOT EXISTS creative_performance (
            date DATE NOT NULL,
            creative_id VARCHAR(150) NOT NULL,
            impressions INT NOT NULL,
            clicks INT NOT NULL,
            spend DECIMAL(10,2) NOT NULL,
            conversions INT NOT NULL,
            engagement_rate DECIMAL(5,2) NOT NULL,
            PRIMARY KEY (date, creative_id),
            FOREIGN KEY (creative_id) REFERENCES ad_creatives(creative_id)
        );
        """)
        
        # 7. Tabela user_events
        cursor.execute("""
        CREATE TABLE IF NOT EXISTS user_events (
            event_id VARCHAR(36) PRIMARY KEY,
            user_id VARCHAR(36) NOT NULL,
            event_timestamp DATETIME NOT NULL,
            event_type VARCHAR(50) NOT NULL,
            platform VARCHAR(20) NOT NULL,
            session_id VARCHAR(36) NOT NULL,
            FOREIGN KEY (user_id) REFERENCES users(user_id)
        );
        """)
        
        # 8. Tabela user_cohorts
        cursor.execute("""
        CREATE TABLE IF NOT EXISTS user_cohorts (
            cohort_month DATE NOT NULL,
            user_id VARCHAR(36) NOT NULL,
            m0_revenue DECIMAL(10,2) DEFAULT 0,
            m1_revenue DECIMAL(10,2) DEFAULT 0,
            m2_revenue DECIMAL(10,2) DEFAULT 0,
            m3_revenue DECIMAL(10,2) DEFAULT 0,
            m0_orders INT DEFAULT 0,
            m1_orders INT DEFAULT 0,
            m2_orders INT DEFAULT 0,
            m3_orders INT DEFAULT 0,
            PRIMARY KEY (cohort_month, user_id),
            FOREIGN KEY (user_id) REFERENCES users(user_id)
        );
        """)
        
        # 9. Tabela budget_allocation
        cursor.execute("""
        CREATE TABLE IF NOT EXISTS budget_allocation (
            month DATE NOT NULL,
            channel VARCHAR(50) NOT NULL,
            planned_budget DECIMAL(12,2) NOT NULL,
            actual_spend DECIMAL(12,2) NOT NULL,
            target_cac DECIMAL(10,2) NOT NULL,
            actual_cac DECIMAL(10,2) NOT NULL,
            PRIMARY KEY (month, channel)
        );
        """)
        
        connection.commit()
        print("‚úÖ Todas as tabelas foram criadas com sucesso!")
        return True
        
    except mysql.connector.Error as error:
        print(f"‚ùå Erro ao criar tabelas: {error}")
        return False
    finally:
        cursor.close()
        connection.close()

# Criar tabelas
create_database_tables()

üèóÔ∏è Criando estrutura das tabelas no banco...
‚úÖ Conex√£o com o banco estabelecida com sucesso!
üîó Vers√£o do servidor MySQL: 8.0.42
‚úÖ Todas as tabelas foram criadas com sucesso!


True

In [20]:
def insert_all_data():
    """
    Executa a inser√ß√£o de todos os dados no banco de dados.
    """
    print("üöÄ Iniciando inser√ß√£o de dados no banco...")
    
    # Criar conex√£o
    connection = create_connection()
    if not connection:
        print("‚ùå Falha na conex√£o. Abortando inser√ß√£o.")
        return
    
    try:
        # Limpar tabelas existentes
        print("\nüóëÔ∏è Limpando tabelas existentes...")
        truncate_tables(connection)
        
        # Inserir dados na ordem correta (respeitando FK constraints)
        print("\nüìä Inserindo dados...")
        
        # 1. Usu√°rios (tabela base)
        print("\n1Ô∏è‚É£ Inserindo usu√°rios...")
        batch_insert(connection, 'users', users_data)
        
        # 2. Campanhas
        print("\n2Ô∏è‚É£ Inserindo campanhas...")
        batch_insert(connection, 'paid_media_campaigns', campaigns_data)
        
        # 3. Pedidos
        print("\n3Ô∏è‚É£ Inserindo pedidos...")
        batch_insert(connection, 'orders', orders_data)
        
        # 4. Performance di√°ria
        print("\n4Ô∏è‚É£ Inserindo performance di√°ria...")
        batch_insert(connection, 'daily_performance', daily_performance_data)
        
        # 5. Criativos
        print("\n5Ô∏è‚É£ Inserindo criativos...")
        batch_insert(connection, 'ad_creatives', creatives_data)
        
        # 6. Eventos de usu√°rios
        if events_data:
            print("\n6Ô∏è‚É£ Inserindo eventos de usu√°rios...")
            batch_insert(connection, 'user_events', events_data)
        
        print("\n‚úÖ Todos os dados foram inseridos com sucesso!")
        
    except Exception as e:
        print(f"\n‚ùå Erro durante a inser√ß√£o: {e}")
    finally:
        connection.close()
        print("üîê Conex√£o com banco fechada.")

# Executar inser√ß√£o
insert_all_data()

üöÄ Iniciando inser√ß√£o de dados no banco...
‚úÖ Conex√£o com o banco estabelecida com sucesso!
üîó Vers√£o do servidor MySQL: 8.0.42

üóëÔ∏è Limpando tabelas existentes...
üóëÔ∏è Tabela creative_performance limpa
üóëÔ∏è Tabela ad_creatives limpa
üóëÔ∏è Tabela daily_performance limpa
üóëÔ∏è Tabela user_events limpa
üóëÔ∏è Tabela user_cohorts limpa
üóëÔ∏è Tabela budget_allocation limpa
üóëÔ∏è Tabela orders limpa
üóëÔ∏è Tabela paid_media_campaigns limpa
üóëÔ∏è Tabela users limpa
‚úÖ Todas as tabelas foram limpas com sucesso!

üìä Inserindo dados...

1Ô∏è‚É£ Inserindo usu√°rios...
üìä users: 1000/50000 registros inseridos (2.0%)
üìä users: 2000/50000 registros inseridos (4.0%)
üìä users: 3000/50000 registros inseridos (6.0%)
üìä users: 4000/50000 registros inseridos (8.0%)
üìä users: 5000/50000 registros inseridos (10.0%)
üìä users: 6000/50000 registros inseridos (12.0%)
üìä users: 7000/50000 registros inseridos (14.0%)
üìä users: 8000/50000 registros inseridos (16.0

## 9. Data Validation and Summary

Execu√ß√£o de queries de valida√ß√£o para verificar integridade dos dados e c√°lculo de m√©tricas-chave do neg√≥cio.

In [21]:
def validate_and_summarize():
    """
    Executa valida√ß√µes e gera relat√≥rio de resumo dos dados inseridos.
    """
    print("üîç Executando valida√ß√µes e gerando relat√≥rio final...")
    
    connection = create_connection()
    if not connection:
        return
    
    cursor = connection.cursor()
    
    try:
        # Queries de valida√ß√£o
        validation_queries = {
            'Total de usu√°rios': "SELECT COUNT(*) FROM users",
            'Usu√°rios ativados': "SELECT COUNT(*) FROM users WHERE first_order_date IS NOT NULL",
            'Total de pedidos': "SELECT COUNT(*) FROM orders",
            'Pedidos completados': "SELECT COUNT(*) FROM orders WHERE status = 'completed'",
            'GMV Total': "SELECT SUM(order_value) FROM orders WHERE status = 'completed'",
            'Total de campanhas': "SELECT COUNT(*) FROM paid_media_campaigns",
            'Registros de performance': "SELECT COUNT(*) FROM daily_performance",
            'Total de criativos': "SELECT COUNT(*) FROM ad_creatives",
            'Eventos de usu√°rios': "SELECT COUNT(*) FROM user_events"
        }
        
        print("\nüìã RELAT√ìRIO FINAL - BANCO BLUMA_CASE")
        print("=" * 50)
        
        for description, query in validation_queries.items():
            cursor.execute(query)
            result = cursor.fetchone()[0]
            
            if 'GMV' in description and result:
                print(f"{description}: R$ {result:,.2f}")
            elif result is not None:
                print(f"{description}: {result:,}")
            else:
                print(f"{description}: 0")
        
        # Valida√ß√µes espec√≠ficas
        print("\nüìä M√âTRICAS DE NEG√ìCIO")
        print("-" * 30)
        
        # Taxa de ativa√ß√£o por canal
        cursor.execute("""
            SELECT 
                acquisition_channel,
                COUNT(*) as total_users,
                COUNT(first_order_date) as activated_users,
                ROUND(COUNT(first_order_date) / COUNT(*) * 100, 2) as activation_rate
            FROM users 
            GROUP BY acquisition_channel
            ORDER BY total_users DESC
        """)
        
        print("\nüéØ Taxa de Ativa√ß√£o por Canal:")
        for row in cursor.fetchall():
            channel, total, activated, rate = row
            print(f"   {channel}: {activated:,}/{total:,} ({rate}%)")
        
        # Ticket m√©dio por servi√ßo
        cursor.execute("""
            SELECT 
                service_type,
                COUNT(*) as total_orders,
                ROUND(AVG(order_value), 2) as avg_ticket
            FROM orders 
            WHERE status = 'completed'
            GROUP BY service_type
            ORDER BY total_orders DESC
        """)
        
        print("\nüí∞ Ticket M√©dio por Servi√ßo:")
        for row in cursor.fetchall():
            service, orders, ticket = row
            print(f"   {service}: R$ {ticket:.2f} ({orders:,} pedidos)")
        
        # Spend por plataforma
        cursor.execute("""
            SELECT 
                c.platform,
                COUNT(DISTINCT c.campaign_id) as campaigns,
                ROUND(SUM(dp.spend), 2) as total_spend,
                ROUND(AVG(dp.spend), 2) as avg_daily_spend
            FROM paid_media_campaigns c
            JOIN daily_performance dp ON c.campaign_id = dp.campaign_id
            GROUP BY c.platform
            ORDER BY total_spend DESC
        """)
        
        print("\nüí∏ Investimento por Plataforma:")
        for row in cursor.fetchall():
            platform, campaigns, total_spend, avg_spend = row
            print(f"   {platform}: R$ {total_spend:,.2f} ({campaigns} campanhas, m√©dia R$ {avg_spend:.2f}/dia)")
        
        print("\n" + "=" * 50)
        print("‚úÖ BANCO DE DADOS POPULADO COM SUCESSO!")
        print("üéâ Dados prontos para an√°lise do case Bluma!")
        
    except mysql.connector.Error as error:
        print(f"‚ùå Erro nas valida√ß√µes: {error}")
    finally:
        cursor.close()
        connection.close()

# Executar valida√ß√µes
validate_and_summarize()

üîç Executando valida√ß√µes e gerando relat√≥rio final...
‚úÖ Conex√£o com o banco estabelecida com sucesso!
üîó Vers√£o do servidor MySQL: 8.0.42

üìã RELAT√ìRIO FINAL - BANCO BLUMA_CASE
Total de usu√°rios: 50,000
Usu√°rios ativados: 4,955
Total de pedidos: 19,683
Pedidos completados: 17,919
GMV Total: R$ 1,430,718.42
Total de campanhas: 120
Registros de performance: 5,837
Total de criativos: 755
Eventos de usu√°rios: 12,846

üìä M√âTRICAS DE NEG√ìCIO
------------------------------

üéØ Taxa de Ativa√ß√£o por Canal:
   Meta Ads: 1,707/22,543 (7.57%)
   Google Ads: 1,169/12,383 (9.44%)
   Organic: 1,114/7,426 (15.00%)
   TikTok Ads: 332/5,066 (6.55%)
   Referral: 633/2,582 (24.52%)

üí∞ Ticket M√©dio por Servi√ßo:
   Manicure: R$ 63.50 (6,211 pedidos)
   Massagem: R$ 117.99 (3,551 pedidos)
   Depila√ß√£o: R$ 78.51 (2,812 pedidos)
   Limpeza de Pele: R$ 93.86 (2,726 pedidos)
   Design Sobrancelhas: R$ 53.74 (2,619 pedidos)

üí∏ Investimento por Plataforma:
   Google: R$ 1,083,630

## üéØ Conclus√£o e Pr√≥ximos Passos

### ‚úÖ O que foi implementado:

1. **Gera√ß√£o de 50.000 usu√°rios** com demografias brasileiras realistas
2. **~150.000 pedidos** com padr√µes de compra diversos 
3. **120 campanhas de m√≠dia paga** distribu√≠das entre Meta, Google e TikTok
4. **Performance di√°ria** com sazonalidade, learning phase e varia√ß√µes realistas
5. **Criativos e eventos** de usu√°rios para an√°lises detalhadas
6. **Inser√ß√£o otimizada** com batch processing e tratamento de FKs

### üìä Dados Prontos para An√°lise:

- **Taxa de ativa√ß√£o por canal** (Meta ~7.8%, Google ~9.2%, etc.)
- **CAC e ROAS** por plataforma e campanha
- **An√°lise de cohorts** de usu√°rios
- **Performance de criativos** por tipo
- **Sazonalidade** e padr√µes temporais
- **Comportamento do usu√°rio** e jornada de compra

### üöÄ Como usar este notebook:

1. **Instalar depend√™ncias**: `pip install pandas numpy faker mysql-connector-python`
2. **Configurar credenciais** do banco na se√ß√£o 1
3. **Executar c√©lulas sequencialmente** (Shift + Enter)
4. **Aguardar 5-10 minutos** para conclus√£o completa
5. **Verificar relat√≥rio final** com m√©tricas de valida√ß√£o

### üí° Pr√≥ximas an√°lises sugeridas:

- An√°lise de CAC por canal e evolu√ß√£o temporal
- Performance de criativos e otimiza√ß√£o de budget
- An√°lise de cohorts e LTV de clientes
- Sazonalidade e previs√£o de demanda
- An√°lise de funil de convers√£o

---

**üìß Case Bluma - An√°lise de Growth & M√≠dia Paga**  
*Banco de dados sint√©tico criado com padr√µes realistas para an√°lise completa de performance de campanhas digitais.*