# Extra√ß√£o de Dados da API do Spotify

Este notebook extrai dados da API do Spotify e salva no Unity Catalog Volume.

## Fluxo:
1. Autentica com Spotify API usando Client Credentials
2. Executa queries de busca (g√™neros, artistas, anos)
3. Deduplica tracks
4. Salva JSON no Unity Catalog Volume

## Pr√©-requisitos:
- Databricks Secrets configurados (instru√ß√µes abaixo)

## ‚öôÔ∏è IMPORTANTE: Configurar Databricks Secrets

**Antes de executar este notebook, voc√™ precisa criar os secrets.**

Execute estes comandos no **Databricks CLI** (terminal local):

```bash
# 1. Criar scope (apenas uma vez)
databricks secrets create-scope spotify

# 2. Adicionar client_id
databricks secrets put-secret spotify client_id
# Cole seu SPOTIFY_CLIENT_ID quando solicitado

# 3. Adicionar client_secret
databricks secrets put-secret spotify client_secret
# Cole seu SPOTIFY_CLIENT_SECRET quando solicitado
```

**Alternativa (via UI do Databricks):**
1. Acesse o Databricks Workspace -> Compute -> Secrets
2. Crie um Secret Scope (ex: `spotify`)
3. Adicione os secrets `client_id` e `client_secret` usando o Databricks CLI ou um notebook (conforme detalhado em `docs/DATABRICKS_SECRETS_SETUP.md`).
> **Nota**: A interface UI n√£o permite adicionar secrets diretamente por quest√µes de seguran√ßa. Use o CLI ou o notebook para adicionar os valores.

## 1. Imports e Configura√ß√µes

In [0]:
import base64
import json
import time
import requests
from datetime import datetime, timezone
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

# Configura√ß√µes
VOLUME_PATH = "/Volumes/spotify_analytics/landing/raw_data"  # Caminho do Unity Catalog Volume
SPOTIFY_BASE_URL = "https://api.spotify.com/v1"
SPOTIFY_AUTH_URL = "https://accounts.spotify.com/api/token"

print("‚úÖ Imports carregados")

## 2. Carregar Credenciais (Databricks Secrets)

In [0]:
# Carrega credenciais do Databricks Secrets
try:
    SPOTIFY_CLIENT_ID = dbutils.secrets.get(scope="spotify", key="client_id")
    SPOTIFY_CLIENT_SECRET = dbutils.secrets.get(scope="spotify", key="client_secret")
    print("‚úÖ Credenciais carregadas com sucesso do Databricks Secrets")
except Exception as e:
    print(f"‚ùå ERRO: N√£o foi poss√≠vel carregar credenciais. {e}")
    print("\nüîß Siga as instru√ß√µes na c√©lula de markdown acima para configurar os secrets.")
    raise

## 3. Classe SpotifyClient (Autentica√ß√£o e Requisi√ß√µes)

In [0]:
class SpotifyClient:
    """Cliente para interagir com a API Web do Spotify usando OAuth 2.0 Client Credentials."""

    def __init__(self, client_id: str, client_secret: str):
        self.client_id = client_id
        self.client_secret = client_secret
        self.access_token = None
        self.token_expires_at = 0
        self.session = self._create_session()

    def _create_session(self):
        """Cria uma sess√£o requests com estrat√©gia de retry."""
        session = requests.Session()
        retry_strategy = Retry(
            total=3,
            backoff_factor=1,
            status_forcelist=[429, 500, 502, 503, 504],
            allowed_methods=["GET", "POST"]
        )
        adapter = HTTPAdapter(max_retries=retry_strategy)
        session.mount("http://", adapter)
        session.mount("https://", adapter)
        return session

    def _authenticate(self):
        """Obt√©m token de acesso usando fluxo Client Credentials."""
        print("üîë Autenticando com Spotify API...")
        
        auth_string = f"{self.client_id}:{self.client_secret}"
        auth_bytes = auth_string.encode("utf-8")
        auth_base64 = base64.b64encode(auth_bytes).decode("utf-8")

        headers = {
            "Authorization": f"Basic {auth_base64}",
            "Content-Type": "application/x-www-form-urlencoded"
        }
        data = {"grant_type": "client_credentials"}

        response = self.session.post(SPOTIFY_AUTH_URL, headers=headers, data=data, timeout=10)
        response.raise_for_status()

        token_data = response.json()
        self.access_token = token_data["access_token"]
        expires_in = token_data["expires_in"]
        self.token_expires_at = time.time() + expires_in

        print(f"‚úÖ Autentica√ß√£o bem-sucedida! Token expira em {expires_in} segundos.")

    def _ensure_token_valid(self):
        """Garante que temos um token de acesso v√°lido, renovando se necess√°rio."""
        if not self.access_token or time.time() >= self.token_expires_at - 60:
            self._authenticate()

    def _get_headers(self):
        """Retorna headers para requisi√ß√µes √† API."""
        self._ensure_token_valid()
        return {
            "Authorization": f"Bearer {self.access_token}",
            "Content-Type": "application/json"
        }

    def search_tracks(self, query: str, limit: int = 50):
        """Busca tracks usando o endpoint de search."""
        print(f"üîç Buscando tracks: '{query}'")
        
        url = f"{SPOTIFY_BASE_URL}/search"
        params = {
            "q": query,
            "type": "track",
            "limit": min(limit, 50),
            "market": "US"
        }

        response = self.session.get(url, headers=self._get_headers(), params=params, timeout=15)
        
        # Tratamento de rate limit
        if response.status_code == 429:
            retry_after = int(response.headers.get("Retry-After", 5))
            print(f"‚è≥ Rate limit atingido. Aguardando {retry_after} segundos...")
            time.sleep(retry_after)
            return self.search_tracks(query, limit)

        response.raise_for_status()
        data = response.json()
        items = data.get("tracks", {}).get("items", [])
        
        print(f"   ‚úÖ Encontradas {len(items)} tracks")
        return items

print("‚úÖ SpotifyClient definido")

## 4. Inicializar Cliente

In [0]:
# Inicializa o cliente Spotify
client = SpotifyClient(
    client_id=SPOTIFY_CLIENT_ID,
    client_secret=SPOTIFY_CLIENT_SECRET
)

print("‚úÖ Cliente Spotify inicializado")

## 5. Definir Queries de Busca

Customize as queries abaixo conforme necess√°rio:

In [0]:
# Queries de busca - CUSTOMIZE AQUI!
SEARCH_QUERIES = [
    "genre:pop year:2024",
    "genre:rock year:2024",
    "genre:hip-hop year:2024",
    "The Weeknd",
    "Taylor Swift",
    "Billie Eilish",
]

TRACKS_PER_QUERY = 20  # M√°ximo: 50

print(f"üìã Queries configuradas: {len(SEARCH_QUERIES)}")
for q in SEARCH_QUERIES:
    print(f"   - {q}")

## 6. Executar Extra√ß√£o

In [0]:
print("="*60)
print("üöÄ INICIANDO EXTRA√á√ÉO DE DADOS DO SPOTIFY")
print("="*60)

all_tracks = []
seen_track_ids = set()

# Busca tracks usando m√∫ltiplas queries
for query in SEARCH_QUERIES:
    try:
        tracks = client.search_tracks(query, TRACKS_PER_QUERY)
        
        # Deduplica
        for track in tracks:
            track_id = track.get("id")
            if track_id and track_id not in seen_track_ids:
                seen_track_ids.add(track_id)
                all_tracks.append(track)
                
    except Exception as e:
        print(f"   ‚ö†Ô∏è Erro na query '{query}': {e}")
        continue

print(f"\n‚úÖ Total de tracks √∫nicas coletadas: {len(all_tracks)}")

if len(all_tracks) == 0:
    raise Exception("Nenhuma track foi coletada!")

## 7. Preparar Payload Final

In [0]:
# Prepara payload com metadata
payload = {
    "extraction_metadata": {
        "timestamp": datetime.now(timezone.utc).isoformat().replace("+00:00", "Z"),
        "method": "search",
        "queries": SEARCH_QUERIES,
        "total_tracks": len(all_tracks)
    },
    "items": all_tracks
}

print(f"‚úÖ Payload preparado")
print(f"   - Timestamp: {payload['extraction_metadata']['timestamp']}")
print(f"   - Total tracks: {payload['extraction_metadata']['total_tracks']}")

## 8. Salvar no Unity Catalog Volume

In [0]:
# Gera nome do arquivo com timestamp
timestamp = datetime.now(timezone.utc).strftime("%Y%m%d_%H%M%S")
filename = f"spotify_data_raw_{timestamp}.json"
full_path = f"{VOLUME_PATH}/{filename}"

# Salva JSON no Volume
dbutils.fs.put(
    full_path,
    json.dumps(payload, indent=2, ensure_ascii=False),
    overwrite=True
)

# Verifica tamanho do arquivo
file_info = dbutils.fs.ls(VOLUME_PATH)
file_size = [f.size for f in file_info if f.name == filename][0] / 1024  # KB

print("="*60)
print("‚úÖ EXTRA√á√ÉO CONCLU√çDA COM SUCESSO!")
print("="*60)
print(f"üìÅ Arquivo salvo: {full_path}")
print(f"üìä Tamanho: {file_size:.2f} KB")
print(f"üéµ Total de tracks: {len(all_tracks)}")
print("="*60)