In [None]:
!pip install cloudscraper playwright
!playwright install

Collecting cloudscraper
  Downloading cloudscraper-1.2.71-py2.py3-none-any.whl.metadata (19 kB)
Collecting playwright
  Downloading playwright-1.53.0-py3-none-manylinux1_x86_64.whl.metadata (3.5 kB)
Collecting pyee<14,>=13 (from playwright)
  Downloading pyee-13.0.0-py3-none-any.whl.metadata (2.9 kB)
Downloading cloudscraper-1.2.71-py2.py3-none-any.whl (99 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m99.7/99.7 kB[0m [31m7.4 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading playwright-1.53.0-py3-none-manylinux1_x86_64.whl (45.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.8/45.8 MB[0m [31m16.0 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pyee-13.0.0-py3-none-any.whl (15 kB)
Installing collected packages: pyee, playwright, cloudscraper
Successfully installed cloudscraper-1.2.71 playwright-1.53.0 pyee-13.0.0
Downloading Chromium 138.0.7204.23 (playwright build v1179)[2m from https://cdn.playwright.dev/dbazure/download/playwright/builds/ch

# 📂 **Montar Google Drive e Instalar Dependências**

- Monta o Google Drive para armazenamento persistente.
- Instala bibliotecas necessárias:
  - **cloudscraper**: Contorna bloqueios comuns em requisições HTTP.
  - **playwright**: Navegação automatizada em páginas web para coleta de dados.

# 📦 **Importações e Configurações**

- Importa módulos essenciais: 
  - `requests`, `json`, `pandas` (tratamento de dados estruturados).
  - `asyncio`, `nest_asyncio` (execução assíncrona).
  - `playwright` (navegação web assíncrona para coleta de dados dinâmicos).

- Configura `pandas` para exibir todas as colunas e conteúdos completos das células.

# 🌐 **Endpoints para Extração de Tabelas**

- Lista endpoints da API do Sofascore para tabelas de classificação por temporada.
- Cada item possui nome do campeonato, URL de consulta e temporada correspondente.
- Ajuste conforme necessário para incluir/excluir temporadas específicas.

# 🛠️ **Funções Auxiliares para Processamento das Tabelas**

## 📑 `extract_table_rows`
- Extrai linhas essenciais da tabela retornada pela API.
- Organiza estatísticas chave (posição, vitórias, pontos, gols, etc.) para cada time.

## 🖥️ `serialize_table`
- Converte as linhas de dados em formato markdown tabular.
- Facilita visualização clara dos dados para geração de exemplos.

# 🔍 **Funções para Geração Automática de Exemplos QA**

## 🧩 `make_logic`
- Define formas lógicas simplificadas que expressam a lógica da pergunta.
- Suporta diversos tipos de pergunta: cell, superlative, difference, aggregation, average e distance.

## 📚 `generate_qa_examples`
- Gera perguntas e respostas automaticamente com base nas tabelas serializadas.
- Tipos de perguntas geradas incluem:
  - Seleção de célula específica.
  - Perguntas superlativas (time com mais pontos, gols, etc.).
  - Diferenças entre pares de times.
  - Agregações (soma de pontos dos primeiros times).
  - Médias por jogo.
  - Distância para zonas críticas (primeiro colocado, último colocado, rebaixamento e classificação).

# 🚀 **Função Principal (fetch_and_generate)**

- Inicializa o navegador via Playwright para coletar dados dinâmicos diretamente da página do Sofascore.
- Itera pelos endpoints definidos, extraindo dados JSON das tabelas de classificação.
- Gera metadados descritivos para contextualizar cada tabela.
- Salva exemplos gerados em DataFrame do Pandas e exporta em formato CSV para uso posterior em treinamento de modelos.

# ⏳ **Execução Assíncrona e Salvamento**

- Aplica `nest_asyncio` para execução assíncrona no ambiente Colab.
- Executa a função principal assíncronamente, coletando e salvando todos os exemplos.
- Salva o dataset completo em CSV (`hierarchical_qa_dataset_all_seasons.csv`).
- Exibe contagem de exemplos por tipo de tarefa e campeonato para validação.
- Exporta o dataset balanceado final (`hierarchical_qa_dataset_balanced_all_seasons.csv`) em CSV.

In [None]:
import requests
import json
import pandas as pd
import os
import cloudscraper
import asyncio
import nest_asyncio
from playwright.async_api import async_playwright
from playwright.async_api import TimeoutError as PlaywrightTimeoutError
from pprint import pprint

pd.set_option('display.max_colwidth', None)
pd.set_option('display.max_columns', None)

# Monta o Google Drive (para Google Colab)
from google.colab import drive
drive.mount('/content/drive')

# Define a pasta de destino no Google Drive
output_folder = "/content/drive/My Drive/dados_rag_new"
os.makedirs(output_folder, exist_ok=True)

# Lista de endpoints com o nome do campeonato
# endpoints = [
#     {"championship": "Brasileirão Betano", "url": "https://www.sofascore.com/api/v1/unique-tournament/325/season/58766/standings/total"},
#     {"championship": "Premier League", "url": "https://www.sofascore.com/api/v1/unique-tournament/17/season/61627/standings/total"},
#     {"championship": "La Liga", "url": "https://www.sofascore.com/api/v1/unique-tournament/8/season/61643/standings/total"},
#     {"championship": "Bundesliga", "url": "https://www.sofascore.com/api/v1/unique-tournament/35/season/63516/standings/total"},
#     {"championship": "Serie A", "url": "https://www.sofascore.com/api/v1/unique-tournament/23/season/63515/standings/total"},
#     {"championship": "Ligue 1", "url": "https://www.sofascore.com/api/v1/unique-tournament/34/season/61736/standings/total"},
#     {"championship": "Liga Portugal", "url": "https://www.sofascore.com/api/v1/unique-tournament/238/season/63670/standings/total"},
#     {"championship": "Liga Profesional de Fútbol", "url": "https://www.sofascore.com/api/v1/unique-tournament/155/season/70268/standings/total"},
#     {"championship": "Eredivisie", "url": "https://www.sofascore.com/api/v1/unique-tournament/37/season/61666/standings/total"}
# ]

endpoints = [
    {"championship": "Brasileirão Betano 48982", "url": "https://www.sofascore.com/api/v1/unique-tournament/325/season/48982/standings/total", "season": 48982},
    {"championship": "Brasileirão Betano 40557", "url": "https://www.sofascore.com/api/v1/unique-tournament/325/season/40557/standings/total", "season": 40557},
    {"championship": "Brasileirão Betano 36166", "url": "https://www.sofascore.com/api/v1/unique-tournament/325/season/36166/standings/total", "season": 36166},
    {"championship": "Brasileirão Betano 27591", "url": "https://www.sofascore.com/api/v1/unique-tournament/325/season/27591/standings/total", "season": 27591},
    {"championship": "Brasileirão Betano 22931", "url": "https://www.sofascore.com/api/v1/unique-tournament/325/season/22931/standings/total", "season": 22931},
    {"championship": "Brasileirão Betano 16183", "url": "https://www.sofascore.com/api/v1/unique-tournament/325/season/16183/standings/total", "season": 16183},
    {"championship": "Brasileirão Betano 13100", "url": "https://www.sofascore.com/api/v1/unique-tournament/325/season/13100/standings/total", "season": 13100},
    {"championship": "Brasileirão Betano 11429", "url": "https://www.sofascore.com/api/v1/unique-tournament/325/season/11429/standings/total", "season": 11429},
    {"championship": "Brasileirão Betano 10173", "url": "https://www.sofascore.com/api/v1/unique-tournament/325/season/10173/standings/total", "season": 10173},
    {"championship": "Brasileirão Betano 7778", "url": "https://www.sofascore.com/api/v1/unique-tournament/325/season/7778/standings/total", "season": 7778},
    {"championship": "Brasileirão Betano 6075", "url": "https://www.sofascore.com/api/v1/unique-tournament/325/season/6075/standings/total", "season": 6075},
    {"championship": "Brasileirão Betano 4438", "url": "https://www.sofascore.com/api/v1/unique-tournament/325/season/4438/standings/total", "season": 4438},
    {"championship": "Brasileirão Betano 3311", "url": "https://www.sofascore.com/api/v1/unique-tournament/325/season/3311/standings/total", "season": 3311},
    {"championship": "Brasileirão Betano 2684", "url": "https://www.sofascore.com/api/v1/unique-tournament/325/season/2684/standings/total", "season": 2684},
    {"championship": "Brasileirão Betano 2079", "url": "https://www.sofascore.com/api/v1/unique-tournament/325/season/2079/standings/total", "season": 2079},
    {"championship": "Brasileirão Betano 1223", "url": "https://www.sofascore.com/api/v1/unique-tournament/325/season/1223/standings/total", "season": 1223},

    {"championship": "Premier League 52186",   "url": "https://www.sofascore.com/api/v1/unique-tournament/17/season/52186/standings/total", "season": 52186},
    {"championship": "Premier League 41886",   "url": "https://www.sofascore.com/api/v1/unique-tournament/17/season/41886/standings/total", "season": 41886},
    {"championship": "Premier League 37036",   "url": "https://www.sofascore.com/api/v1/unique-tournament/17/season/37036/standings/total", "season": 37036},
    {"championship": "Premier League 29415",   "url": "https://www.sofascore.com/api/v1/unique-tournament/17/season/29415/standings/total", "season": 29415},
    {"championship": "Premier League 23776",   "url": "https://www.sofascore.com/api/v1/unique-tournament/17/season/23776/standings/total", "season": 23776},
    {"championship": "Premier League 17359",   "url": "https://www.sofascore.com/api/v1/unique-tournament/17/season/17359/standings/total", "season": 17359},
    {"championship": "Premier League 13380",   "url": "https://www.sofascore.com/api/v1/unique-tournament/17/season/13380/standings/total", "season": 13380},
    {"championship": "Premier League 11733",   "url": "https://www.sofascore.com/api/v1/unique-tournament/17/season/11733/standings/total", "season": 11733},
    {"championship": "Premier League 10356",   "url": "https://www.sofascore.com/api/v1/unique-tournament/17/season/10356/standings/total", "season": 10356},
    {"championship": "Premier League 8186",   "url": "https://www.sofascore.com/api/v1/unique-tournament/17/season/8186/standings/total", "season": 8186},
    {"championship": "Premier League 6311",   "url": "https://www.sofascore.com/api/v1/unique-tournament/17/season/6311/standings/total", "season": 6311},
    {"championship": "Premier League 4710",   "url": "https://www.sofascore.com/api/v1/unique-tournament/17/season/4710/standings/total", "season": 4710},
    {"championship": "Premier League 3391",   "url": "https://www.sofascore.com/api/v1/unique-tournament/17/season/3391/standings/total", "season": 3391},
    {"championship": "Premier League 2746",   "url": "https://www.sofascore.com/api/v1/unique-tournament/17/season/2746/standings/total", "season": 2746},
    {"championship": "Premier League 2139",   "url": "https://www.sofascore.com/api/v1/unique-tournament/17/season/2139/standings/total", "season": 2139},
    {"championship": "Premier League 1544",   "url": "https://www.sofascore.com/api/v1/unique-tournament/17/season/1544/standings/total", "season": 1544},

    {"championship": "La Liga 52376",          "url": "https://www.sofascore.com/api/v1/unique-tournament/8/season/52376/standings/total", "season": 52376},
    {"championship": "La Liga 42409",          "url": "https://www.sofascore.com/api/v1/unique-tournament/8/season/42409/standings/total", "season": 42409},
    {"championship": "La Liga 37223",          "url": "https://www.sofascore.com/api/v1/unique-tournament/8/season/37223/standings/total", "season": 37223},
    {"championship": "La Liga 32501",          "url": "https://www.sofascore.com/api/v1/unique-tournament/8/season/32501/standings/total", "season": 32501},
    {"championship": "La Liga 24127",          "url": "https://www.sofascore.com/api/v1/unique-tournament/8/season/24127/standings/total", "season": 24127},
    {"championship": "La Liga 18020",          "url": "https://www.sofascore.com/api/v1/unique-tournament/8/season/18020/standings/total", "season": 18020},
    {"championship": "La Liga 13662",          "url": "https://www.sofascore.com/api/v1/unique-tournament/8/season/13662/standings/total", "season": 13662},
    {"championship": "La Liga 11906",          "url": "https://www.sofascore.com/api/v1/unique-tournament/8/season/11906/standings/total", "season": 11906},
    {"championship": "La Liga 10495",          "url": "https://www.sofascore.com/api/v1/unique-tournament/8/season/10495/standings/total", "season": 10495},
    {"championship": "La Liga 8578",          "url": "https://www.sofascore.com/api/v1/unique-tournament/8/season/8578/standings/total", "season": 8578},
    {"championship": "La Liga 6559",          "url": "https://www.sofascore.com/api/v1/unique-tournament/8/season/6559/standings/total", "season": 6559},
    {"championship": "La Liga 4959",          "url": "https://www.sofascore.com/api/v1/unique-tournament/8/season/4959/standings/total", "season": 4959},
    {"championship": "La Liga 3502",          "url": "https://www.sofascore.com/api/v1/unique-tournament/8/season/3502/standings/total", "season": 3502},
    {"championship": "La Liga 2896",          "url": "https://www.sofascore.com/api/v1/unique-tournament/8/season/2896/standings/total", "season": 2896},
    {"championship": "La Liga 2252",          "url": "https://www.sofascore.com/api/v1/unique-tournament/8/season/2252/standings/total", "season": 2252},
    {"championship": "La Liga 1587",          "url": "https://www.sofascore.com/api/v1/unique-tournament/8/season/1587/standings/total", "season": 1587},

    {"championship": "Bundesliga 52608",       "url": "https://www.sofascore.com/api/v1/unique-tournament/35/season/52608/standings/total", "season": 52608},
    {"championship": "Bundesliga 42268",       "url": "https://www.sofascore.com/api/v1/unique-tournament/35/season/42268/standings/total", "season": 42268},
    {"championship": "Bundesliga 37166",       "url": "https://www.sofascore.com/api/v1/unique-tournament/35/season/37166/standings/total", "season": 37166},
    {"championship": "Bundesliga 28210",       "url": "https://www.sofascore.com/api/v1/unique-tournament/35/season/28210/standings/total", "season": 28210},
    {"championship": "Bundesliga 23538",       "url": "https://www.sofascore.com/api/v1/unique-tournament/35/season/23538/standings/total", "season": 23538},
    {"championship": "Bundesliga 17597",       "url": "https://www.sofascore.com/api/v1/unique-tournament/35/season/17597/standings/total", "season": 17597},
    {"championship": "Bundesliga 13477",       "url": "https://www.sofascore.com/api/v1/unique-tournament/35/season/13477/standings/total", "season": 13477},
    {"championship": "Bundesliga 11818",       "url": "https://www.sofascore.com/api/v1/unique-tournament/35/season/11818/standings/total", "season": 11818},
    {"championship": "Bundesliga 10419",       "url": "https://www.sofascore.com/api/v1/unique-tournament/35/season/10419/standings/total", "season": 10419},
    {"championship": "Bundesliga 8238",       "url": "https://www.sofascore.com/api/v1/unique-tournament/35/season/8238/standings/total", "season": 8238},
    {"championship": "Bundesliga 6303",       "url": "https://www.sofascore.com/api/v1/unique-tournament/35/season/6303/standings/total", "season": 6303},
    {"championship": "Bundesliga 4792",       "url": "https://www.sofascore.com/api/v1/unique-tournament/35/season/4792/standings/total", "season": 4792},
    {"championship": "Bundesliga 3405",       "url": "https://www.sofascore.com/api/v1/unique-tournament/35/season/3405/standings/total", "season": 3405},
    {"championship": "Bundesliga 2811",       "url": "https://www.sofascore.com/api/v1/unique-tournament/35/season/2811/standings/total", "season": 2811},
    {"championship": "Bundesliga 2188",       "url": "https://www.sofascore.com/api/v1/unique-tournament/35/season/2188/standings/total", "season": 2188},
    {"championship": "Bundesliga 1557",       "url": "https://www.sofascore.com/api/v1/unique-tournament/35/season/1557/standings/total", "season": 1557},

    {"championship": "Serie A 52760",          "url": "https://www.sofascore.com/api/v1/unique-tournament/23/season/52760/standings/total", "season": 52760},
    {"championship": "Serie A 42415",          "url": "https://www.sofascore.com/api/v1/unique-tournament/23/season/42415/standings/total", "season": 42415},
    {"championship": "Serie A 37475",          "url": "https://www.sofascore.com/api/v1/unique-tournament/23/season/37475/standings/total", "season": 37475},
    {"championship": "Serie A 32523",          "url": "https://www.sofascore.com/api/v1/unique-tournament/23/season/32523/standings/total", "season": 32523},
    {"championship": "Serie A 24644",          "url": "https://www.sofascore.com/api/v1/unique-tournament/23/season/24644/standings/total", "season": 24644},
    {"championship": "Serie A 17932",          "url": "https://www.sofascore.com/api/v1/unique-tournament/23/season/17932/standings/total", "season": 17932},
    {"championship": "Serie A 13768",          "url": "https://www.sofascore.com/api/v1/unique-tournament/23/season/13768/standings/total", "season": 13768},
    {"championship": "Serie A 11966",          "url": "https://www.sofascore.com/api/v1/unique-tournament/23/season/11966/standings/total", "season": 11966},
    {"championship": "Serie A 10596",          "url": "https://www.sofascore.com/api/v1/unique-tournament/23/season/10596/standings/total", "season": 10596},
    {"championship": "Serie A 8618",          "url": "https://www.sofascore.com/api/v1/unique-tournament/23/season/8618/standings/total", "season": 8618},
    {"championship": "Serie A 6797",          "url": "https://www.sofascore.com/api/v1/unique-tournament/23/season/6797/standings/total", "season": 6797},
    {"championship": "Serie A 5145",          "url": "https://www.sofascore.com/api/v1/unique-tournament/23/season/5145/standings/total", "season": 5145},
    {"championship": "Serie A 3639",          "url": "https://www.sofascore.com/api/v1/unique-tournament/23/season/3639/standings/total", "season": 3639},
    {"championship": "Serie A 2930",          "url": "https://www.sofascore.com/api/v1/unique-tournament/23/season/2930/standings/total", "season": 2930},
    {"championship": "Serie A 2324",          "url": "https://www.sofascore.com/api/v1/unique-tournament/23/season/2324/standings/total", "season": 2324},
    {"championship": "Serie A 1552",          "url": "https://www.sofascore.com/api/v1/unique-tournament/23/season/1552/standings/total", "season": 1552},

    {"championship": "Ligue 1 52571",          "url": "https://www.sofascore.com/api/v1/unique-tournament/34/season/52571/standings/total", "season": 52571},
    {"championship": "Ligue 1 42273",          "url": "https://www.sofascore.com/api/v1/unique-tournament/34/season/42273/standings/total", "season": 42273},
    {"championship": "Ligue 1 37167",          "url": "https://www.sofascore.com/api/v1/unique-tournament/34/season/37167/standings/total", "season": 37167},
    {"championship": "Ligue 1 28222",          "url": "https://www.sofascore.com/api/v1/unique-tournament/34/season/28222/standings/total", "season": 28222},
    {"championship": "Ligue 1 23872",          "url": "https://www.sofascore.com/api/v1/unique-tournament/34/season/23872/standings/total", "season": 23872},
    {"championship": "Ligue 1 17279",          "url": "https://www.sofascore.com/api/v1/unique-tournament/34/season/17279/standings/total", "season": 17279},
    {"championship": "Ligue 1 13384",          "url": "https://www.sofascore.com/api/v1/unique-tournament/34/season/13384/standings/total", "season": 13384},
    {"championship": "Ligue 1 11648",          "url": "https://www.sofascore.com/api/v1/unique-tournament/34/season/11648/standings/total", "season": 11648},
    {"championship": "Ligue 1 10373",          "url": "https://www.sofascore.com/api/v1/unique-tournament/34/season/10373/standings/total", "season": 10373},
    {"championship": "Ligue 1 8122",          "url": "https://www.sofascore.com/api/v1/unique-tournament/34/season/8122/standings/total", "season": 8122},
    {"championship": "Ligue 1 6271",          "url": "https://www.sofascore.com/api/v1/unique-tournament/34/season/6271/standings/total", "season": 6271},
    {"championship": "Ligue 1 4616",          "url": "https://www.sofascore.com/api/v1/unique-tournament/34/season/4616/standings/total", "season": 4616},
    {"championship": "Ligue 1 3380",          "url": "https://www.sofascore.com/api/v1/unique-tournament/34/season/3380/standings/total", "season": 3380},
    {"championship": "Ligue 1 2719",          "url": "https://www.sofascore.com/api/v1/unique-tournament/34/season/2719/standings/total", "season": 2719},
    {"championship": "Ligue 1 2120",          "url": "https://www.sofascore.com/api/v1/unique-tournament/34/season/2120/standings/total", "season": 2120},
    {"championship": "Ligue 1 1542",          "url": "https://www.sofascore.com/api/v1/unique-tournament/34/season/1542/standings/total", "season": 1542},

    {"championship": "Liga Portugal 52769",    "url": "https://www.sofascore.com/api/v1/unique-tournament/238/season/52769/standings/total", "season": 52769},
    {"championship": "Liga Portugal 42655",    "url": "https://www.sofascore.com/api/v1/unique-tournament/238/season/42655/standings/total", "season": 42655},
    {"championship": "Liga Portugal 37358",    "url": "https://www.sofascore.com/api/v1/unique-tournament/238/season/37358/standings/total", "season": 37358},
    {"championship": "Liga Portugal 32456",    "url": "https://www.sofascore.com/api/v1/unique-tournament/238/season/32456/standings/total", "season": 32456},
    {"championship": "Liga Portugal 24150",    "url": "https://www.sofascore.com/api/v1/unique-tournament/238/season/24150/standings/total", "season": 24150},
    {"championship": "Liga Portugal 17714",    "url": "https://www.sofascore.com/api/v1/unique-tournament/238/season/17714/standings/total", "season": 17714},
    {"championship": "Liga Portugal 13539",    "url": "https://www.sofascore.com/api/v1/unique-tournament/238/season/13539/standings/total", "season": 13539},
    {"championship": "Liga Portugal 11924",    "url": "https://www.sofascore.com/api/v1/unique-tournament/238/season/11924/standings/total", "season": 11924},
    {"championship": "Liga Portugal 10453",    "url": "https://www.sofascore.com/api/v1/unique-tournament/238/season/10453/standings/total", "season": 10453},
    {"championship": "Liga Portugal 8382",    "url": "https://www.sofascore.com/api/v1/unique-tournament/238/season/8382/standings/total", "season": 8382},
    {"championship": "Liga Portugal 6483",    "url": "https://www.sofascore.com/api/v1/unique-tournament/238/season/6483/standings/total", "season": 6483},
    {"championship": "Liga Portugal 4907",    "url": "https://www.sofascore.com/api/v1/unique-tournament/238/season/4907/standings/total", "season": 4907},
    {"championship": "Liga Portugal 3462",    "url": "https://www.sofascore.com/api/v1/unique-tournament/238/season/3462/standings/total", "season": 3462},
    {"championship": "Liga Portugal 2832",    "url": "https://www.sofascore.com/api/v1/unique-tournament/238/season/2832/standings/total", "season": 2832},
    {"championship": "Liga Portugal 2256",    "url": "https://www.sofascore.com/api/v1/unique-tournament/238/season/2256/standings/total", "season": 2256},
    {"championship": "Liga Portugal 1781",    "url": "https://www.sofascore.com/api/v1/unique-tournament/238/season/1781/standings/total", "season": 1781},

    {"championship": "Liga Profesional de Fútbol 70268", "url": "https://www.sofascore.com/api/v1/unique-tournament/155/season/70268/standings/total", "season": 70268},
    {"championship": "Liga Profesional de Fútbol 57478", "url": "https://www.sofascore.com/api/v1/unique-tournament/155/season/57478/standings/total", "season": 57478},
    {"championship": "Liga Profesional de Fútbol 47647", "url": "https://www.sofascore.com/api/v1/unique-tournament/155/season/47647/standings/total", "season": 47647},
    {"championship": "Liga Profesional de Fútbol 41884", "url": "https://www.sofascore.com/api/v1/unique-tournament/155/season/41884/standings/total", "season": 41884},
    {"championship": "Liga Profesional de Fútbol 37231", "url": "https://www.sofascore.com/api/v1/unique-tournament/155/season/37231/standings/total", "season": 37231},
    {"championship": "Liga Profesional de Fútbol 24239", "url": "https://www.sofascore.com/api/v1/unique-tournament/155/season/24239/standings/total", "season": 24239},
    {"championship": "Liga Profesional de Fútbol 18113", "url": "https://www.sofascore.com/api/v1/unique-tournament/155/season/18113/standings/total", "season": 18113},
    {"championship": "Liga Profesional de Fútbol 13950", "url": "https://www.sofascore.com/api/v1/unique-tournament/155/season/13950/standings/total", "season": 13950},
    {"championship": "Liga Profesional de Fútbol 12117", "url": "https://www.sofascore.com/api/v1/unique-tournament/155/season/12117/standings/total", "season": 12117},
    {"championship": "Liga Profesional de Fútbol 11237", "url": "https://www.sofascore.com/api/v1/unique-tournament/155/season/11237/standings/total", "season": 11237},
    {"championship": "Liga Profesional de Fútbol 9651", "url": "https://www.sofascore.com/api/v1/unique-tournament/155/season/9651/standings/total", "season": 9651},
    {"championship": "Liga Profesional de Fútbol 8338", "url": "https://www.sofascore.com/api/v1/unique-tournament/155/season/8338/standings/total", "season": 8338},
    {"championship": "Liga Profesional de Fútbol 6455", "url": "https://www.sofascore.com/api/v1/unique-tournament/155/season/6455/standings/total", "season": 6455},
    {"championship": "Liga Profesional de Fútbol 5103", "url": "https://www.sofascore.com/api/v1/unique-tournament/155/season/5103/standings/total", "season": 5103},
    {"championship": "Liga Profesional de Fútbol 3613", "url": "https://www.sofascore.com/api/v1/unique-tournament/155/season/3613/standings/total", "season": 3613},
    {"championship": "Liga Profesional de Fútbol 2887", "url": "https://www.sofascore.com/api/v1/unique-tournament/155/season/2887/standings/total", "season": 2887},

    {"championship": "Eredivisie 52554",       "url": "https://www.sofascore.com/api/v1/unique-tournament/37/season/52554/standings/total", "season": 52554},
    {"championship": "Eredivisie 42256",       "url": "https://www.sofascore.com/api/v1/unique-tournament/37/season/42256/standings/total", "season": 42256},
    {"championship": "Eredivisie 36890",       "url": "https://www.sofascore.com/api/v1/unique-tournament/37/season/36890/standings/total", "season": 36890},
    {"championship": "Eredivisie 29186",       "url": "https://www.sofascore.com/api/v1/unique-tournament/37/season/29186/standings/total", "season": 29186},
    {"championship": "Eredivisie 23873",       "url": "https://www.sofascore.com/api/v1/unique-tournament/37/season/23873/standings/total", "season": 23873},
    {"championship": "Eredivisie 17353",       "url": "https://www.sofascore.com/api/v1/unique-tournament/37/season/17353/standings/total", "season": 17353},
    {"championship": "Eredivisie 13399",       "url": "https://www.sofascore.com/api/v1/unique-tournament/37/season/13399/standings/total", "season": 13399},
    {"championship": "Eredivisie 11777",       "url": "https://www.sofascore.com/api/v1/unique-tournament/37/season/11777/standings/total", "season": 11777},
    {"championship": "Eredivisie 10370",       "url": "https://www.sofascore.com/api/v1/unique-tournament/37/season/10370/standings/total", "season": 10370},
    {"championship": "Eredivisie 8170",       "url": "https://www.sofascore.com/api/v1/unique-tournament/37/season/8170/standings/total", "season": 8170},
    {"championship": "Eredivisie 6267",       "url": "https://www.sofascore.com/api/v1/unique-tournament/37/season/6267/standings/total", "season": 6267},
    {"championship": "Eredivisie 4746",       "url": "https://www.sofascore.com/api/v1/unique-tournament/37/season/4746/standings/total", "season": 4746},
    {"championship": "Eredivisie 3432",       "url": "https://www.sofascore.com/api/v1/unique-tournament/37/season/3432/standings/total", "season": 3432},
    {"championship": "Eredivisie 2745",       "url": "https://www.sofascore.com/api/v1/unique-tournament/37/season/2745/standings/total", "season": 2745},
    {"championship": "Eredivisie 2144",       "url": "https://www.sofascore.com/api/v1/unique-tournament/37/season/2144/standings/total", "season": 2144},
    {"championship": "Eredivisie 1711",       "url": "https://www.sofascore.com/api/v1/unique-tournament/37/season/1711/standings/total", "season": 1711},
]

def extract_table_rows(data):
    rows = data["standings"][0]["rows"]
    table = []
    for row in rows:
        table.append({
            "Posição": row.get("position", 0),
            "Team": row.get("team", {}).get("name", ""),
            "Jogos": row.get("matches", 0),
            "Vitórias": row.get("wins", 0),
            "Empates": row.get("draws", 0),
            "Derrotas": row.get("losses", 0),
            "Gols_Marcados": row.get("scoresFor", 0),
            "Gols_Sofridos": row.get("scoresAgainst", 0),
            "Saldo_de_Gols": int(row.get("scoreDiffFormatted", "0").replace('+','').replace('-','')),
            "Pontos": row.get("points", 0)
        })
    return table


def serialize_table(table_rows):
    headers = ["Posição", "Team", "Pontos", "Jogos", "Vitórias", "Empates", "Derrotas", "Gols_Marcados", "Gols_Sofridos", "Saldo_de_Gols"]
    lines = ["| " + " | ".join(headers) + " |"]
    for row in table_rows:
        lines.append("| " + " | ".join(str(row[h]) for h in headers) + " |")
    return "\n".join(lines)


def make_logic(form_type, **kwargs):
    # Define logical forms for new distance questions
    if form_type == 'cell':
        return f"(filter tree {kwargs['team']}) -> (filter tree {kwargs['metric']}) -> (filter level TOP 2)"
    if form_type == 'superlative':
        return f"(filter tree {kwargs['metric']}) -> (filter level TOP 2) -> (argmax 1)"
    if form_type in ['difference', 'distance']:
        return f"(filter tree {kwargs['team1']} {kwargs.get('team2', '')}) -> (filter tree {kwargs['metric']}) -> (difference)"
    if form_type == 'aggregation':
        return f"(filter tree {kwargs.get('metric','Pontos')}) -> (filter level LEFT_{kwargs.get('k',4)}) -> (sum)"
    if form_type == 'average':
        return f"(filter tree {kwargs['metric']}) -> (filter level TOP 2) -> (average)"
    return ""


def generate_qa_examples(championship, table_metadata, table_rows, serialized_table, season):
    instruction = "Esta é uma tarefa de respostas a perguntas sobre uma tabela de classificação de um campeonato de futebol (Hierarchical Table QA). Com base nos dados apresentados, realize a leitura da tabela relativa ao campeonato citado e responda à pergunta, dizendo somente o valor perguntado."
    examples = []
    ref = f"na tabela de classificação do campeonato {championship}, na temporada {season},"

    # 1. Cell selection
    for row in table_rows:
        examples.append({
            'instruction': instruction,
            'input': f"[TLE] {table_metadata}\n[TAB] {serialized_table}",
            'question': f"Qual é o saldo de gols do time {row['Team']} {ref}? Responda somente o valor.",
            'response': str(row['Saldo_de_Gols']),
            'logical_form': make_logic('cell', team=row['Team'], metric='Saldo_de_Gols'),
            'task_type': 'Cell Selection',
            'championship': championship,
            'season': season
        })

    # 2. Superlative across key metrics
    metrics = ['Vitórias', 'Empates', 'Derrotas', 'Gols_Marcados', 'Gols_Sofridos', 'Pontos']
    for metric in metrics:
        top = max(table_rows, key=lambda x: x[metric])['Team']
        examples.append({
            'instruction': instruction,
            'input': f"[TLE] {table_metadata}\n[TAB] {serialized_table}",
            'question': f"Qual time tem o maior número de {metric.lower()} {ref}? Responda somente o nome do time.",
            'response': top,
            'logical_form': make_logic('superlative', metric=metric),
            'task_type': 'Superlative',
            'championship': championship,
            'season': season
        })

    # 3. Differences for multiple pairs/metrics
    pairs = [(i, j) for i in range(len(table_rows)) for j in range(i+1, len(table_rows))][:10]
    for i, j in pairs:
        for metric in ['Pontos', 'Saldo_de_Gols']:
            diff = abs(table_rows[i][metric] - table_rows[j][metric])
            examples.append({
                'instruction': instruction,
                'input': f"[TLE] {table_metadata}\n[TAB] {serialized_table}",
                'question': f"Qual é a diferença de {metric.lower().replace('_',' ')} entre {table_rows[i]['Team']} e {table_rows[j]['Team']} {ref}? Responda somente o valor.",
                'response': str(diff),
                'logical_form': make_logic('difference', team1=table_rows[i]['Team'], team2=table_rows[j]['Team'], metric=metric),
                'task_type': 'Difference',
                'championship': championship,
                'season': season
            })

    # 4. Aggregation for various K
    for k in [2, 4, 6, 8]:
        total = sum(r['Pontos'] for r in table_rows[:k])
        examples.append({
            'instruction': instruction,
            'input': f"[TLE] {table_metadata}\n[TAB] {serialized_table}",
            'question': f"Qual é o total de pontos dos {k} primeiros times {ref}? Responda somente o valor.",
            'response': str(total),
            'logical_form': make_logic('aggregation', metric='Pontos', k=k),
            'task_type': 'Aggregation',
            'championship': championship,
            'season': season
        })

    # 5. Averages for metrics
    for metric in ['Gols_Marcados', 'Gols_Sofridos', 'Pontos']:
        avg_team = max(table_rows, key=lambda x: x[metric]/x['Jogos'])['Team']
        examples.append({
            'instruction': instruction,
            'input': f"[TLE] {table_metadata}\n[TAB] {serialized_table}",
            'question': f"Qual time teve a maior média de {metric.lower().replace('_',' ')} por jogo {ref}? Responda somente o valor.",
            'response': avg_team,
            'logical_form': make_logic('average', metric=metric),
            'task_type': 'Average',
            'championship': championship,
            'season': season
        })

    # 6. Distance to critical thresholds
    # Define thresholds
    points_first = table_rows[0]['Pontos']
    points_last = table_rows[-1]['Pontos']
    points_releg = table_rows[-4]['Pontos']  # posição de corte de rebaixamento
    points_qualify = table_rows[3]['Pontos']  # posição de corte de classificação
    for row in table_rows:
        pts = row['Pontos']
        team = row['Team']
        # Distância ao primeiro colocado
        diff_first = points_first - pts
        examples.append({
            'instruction': instruction,
            'input': f"[TLE] {table_metadata}\n[TAB] {serialized_table}",
            'question': f"Quantos pontos o time {team} está atrás do primeiro colocado {ref}? Responda somente o valor.",
            'response': str(diff_first),
            'logical_form': make_logic('distance', team1=team, team2=table_rows[0]['Team'], metric='Pontos'),
            'task_type': 'Distance to First',
            'championship': championship,
            'season': season
        })
        # Distância ao último colocado
        diff_last = pts - points_last
        examples.append({
            'instruction': instruction,
            'input': f"[TLE] {table_metadata}\n[TAB] {serialized_table}",
            'question': f"Quantos pontos o time {team} está acima do último colocado {ref}? Responda somente o valor.",
            'response': str(diff_last),
            'logical_form': make_logic('distance', team1=team, team2=table_rows[-1]['Team'], metric='Pontos'),
            'task_type': 'Distance to Last',
            'championship': championship,
            'season': season
        })
        # Distância à zona de rebaixamento
        diff_releg = pts - points_releg
        examples.append({
            'instruction': instruction,
            'input': f"[TLE] {table_metadata}\n[TAB] {serialized_table}",
            'question': f"Quantos pontos faltam para o time {team} entrar na zona de rebaixamento {ref}? Responda somente o valor.",
            'response': str(diff_releg),
            'logical_form': make_logic('distance', team1=team, team2=table_rows[-4]['Team'], metric='Pontos'),
            'task_type': 'Distance to Relegation',
            'championship': championship,
            'season': season
        })
        # Distância à zona de classificação
        diff_qual = points_qualify - pts
        examples.append({
            'instruction': instruction,
            'input': f"[TLE] {table_metadata}\n[TAB] {serialized_table}",
            'question': f"Quantos pontos faltam para o time {team} chegar na zona de classificação {ref}? Responda somente o valor.",
            'response': str(diff_qual),
            'logical_form': make_logic('distance', team1=team, team2=table_rows[3]['Team'], metric='Pontos'),
            'task_type': 'Distance to Qualification',
            'championship': championship,
            'season': season
        })

    return examples

async def fetch_and_generate(endpoints):
    nest_asyncio.apply()
    todas_examples = []
    async with async_playwright() as p:
        browser = await p.chromium.launch()
        page = await browser.new_page()

        for ep in endpoints:
            champ = ep['championship']
            url = ep['url']
            season = ep['season']
            page_url = url.replace('/api/v1', '')
            # await page.goto(page_url, wait_until='networkidle')
            # data = await page.evaluate("""async url => {
            #     const res = await fetch(url, {headers: {'Accept': 'application/json'}});
            #     return res.json();
            # }""", url)

            try:
                # Tenta navegar até a página para injetar fetch
                await page.goto(page_url, wait_until='networkidle', timeout=30000)
                data = await page.evaluate(
                    """async url => {
                        const res = await fetch(url, {headers: {'Accept': 'application/json'}});
                        return res.json();
                    }""",
                    url
                )

            except PlaywrightTimeoutError as e:
                # Loga no terminal e pula este endpoint
                print(f"[Timeout] {champ} (season {season}) em {page_url}: {e}")
                continue

            meta = data['standings'][0]['tournament']
            metadata = f"Esta é a tabela de classificação do campeonato {meta.get('name')} do país {meta.get('category', {}).get('country', {}).get('name')} na temporada {season}"

            rows = extract_table_rows(data)
            serialized = serialize_table(rows)
            examples = generate_qa_examples(champ, metadata, rows, serialized, season)

            todas_examples.extend(examples)

        await browser.close()

    # Converte para DataFrame e salva CSV
    df = pd.DataFrame(todas_examples)
    csv_path = os.path.join(output_folder, 'hierarchical_qa_dataset_all_seasons.csv')
    df.to_csv(csv_path, index=False, encoding='utf-8')
    print(f"Dataset salvo em: {csv_path}")
    return df

# Execução
if __name__ == '__main__':
    nest_asyncio.apply()
    df = asyncio.run(fetch_and_generate(endpoints))
    df.head()

Mounted at /content/drive
[Timeout] Premier League 17359 (season 17359) em https://www.sofascore.com/unique-tournament/17/season/17359/standings/total: Page.goto: Timeout 30000ms exceeded.
Call log:
  - navigating to "https://www.sofascore.com/unique-tournament/17/season/17359/standings/total", waiting until "networkidle"

Dataset salvo em: /content/drive/My Drive/dados_rag_new/hierarchical_qa_dataset_all_seasons.csv


In [None]:
print(df.shape)
df.head()

(18759, 8)


Unnamed: 0,instruction,input,question,response,logical_form,task_type,championship,season
0,"Esta é uma tarefa de respostas a perguntas sobre uma tabela de classificação de um campeonato de futebol (Hierarchical Table QA). Com base nos dados apresentados, realize a leitura da tabela relativa ao campeonato citado e responda à pergunta, dizendo somente o valor perguntado.",[TLE] Esta é a tabela de classificação do campeonato Brasileirão Betano do país Brazil na temporada 48982\n[TAB] | Posição | Team | Pontos | Jogos | Vitórias | Empates | Derrotas | Gols_Marcados | Gols_Sofridos | Saldo_de_Gols |\n| 1 | Palmeiras | 70 | 38 | 20 | 10 | 8 | 64 | 33 | 31 |\n| 2 | Grêmio | 68 | 38 | 21 | 5 | 12 | 63 | 56 | 7 |\n| 3 | Atlético Mineiro | 66 | 38 | 19 | 9 | 10 | 52 | 32 | 20 |\n| 4 | Flamengo | 66 | 38 | 19 | 9 | 10 | 56 | 42 | 14 |\n| 5 | Botafogo | 64 | 38 | 18 | 10 | 10 | 58 | 37 | 21 |\n| 6 | Red Bull Bragantino | 62 | 38 | 17 | 11 | 10 | 49 | 35 | 14 |\n| 7 | Fluminense | 56 | 38 | 16 | 8 | 14 | 51 | 47 | 4 |\n| 8 | Athletico | 56 | 38 | 14 | 14 | 10 | 51 | 43 | 8 |\n| 9 | Internacional | 55 | 38 | 15 | 10 | 13 | 46 | 45 | 1 |\n| 10 | Fortaleza | 54 | 38 | 15 | 9 | 14 | 45 | 44 | 1 |\n| 11 | São Paulo | 53 | 38 | 14 | 11 | 13 | 40 | 38 | 2 |\n| 12 | Cuiabá | 51 | 38 | 14 | 9 | 15 | 40 | 39 | 1 |\n| 13 | Corinthians | 50 | 38 | 12 | 14 | 12 | 47 | 48 | 1 |\n| 14 | Cruzeiro | 47 | 38 | 11 | 14 | 13 | 35 | 32 | 3 |\n| 15 | Vasco da Gama | 45 | 38 | 12 | 9 | 17 | 41 | 51 | 10 |\n| 16 | Bahia | 44 | 38 | 12 | 8 | 18 | 50 | 53 | 3 |\n| 17 | Santos | 43 | 38 | 11 | 10 | 17 | 39 | 64 | 25 |\n| 18 | Goiás | 38 | 38 | 9 | 11 | 18 | 36 | 53 | 17 |\n| 19 | Coritiba | 30 | 38 | 8 | 6 | 24 | 41 | 73 | 32 |\n| 20 | América Mineiro | 24 | 38 | 5 | 9 | 24 | 42 | 81 | 39 |,"Qual é o saldo de gols do time Palmeiras na tabela de classificação do campeonato Brasileirão Betano 48982, na temporada 48982,? Responda somente o valor.",31,(filter tree Palmeiras) -> (filter tree Saldo_de_Gols) -> (filter level TOP 2),Cell Selection,Brasileirão Betano 48982,48982
1,"Esta é uma tarefa de respostas a perguntas sobre uma tabela de classificação de um campeonato de futebol (Hierarchical Table QA). Com base nos dados apresentados, realize a leitura da tabela relativa ao campeonato citado e responda à pergunta, dizendo somente o valor perguntado.",[TLE] Esta é a tabela de classificação do campeonato Brasileirão Betano do país Brazil na temporada 48982\n[TAB] | Posição | Team | Pontos | Jogos | Vitórias | Empates | Derrotas | Gols_Marcados | Gols_Sofridos | Saldo_de_Gols |\n| 1 | Palmeiras | 70 | 38 | 20 | 10 | 8 | 64 | 33 | 31 |\n| 2 | Grêmio | 68 | 38 | 21 | 5 | 12 | 63 | 56 | 7 |\n| 3 | Atlético Mineiro | 66 | 38 | 19 | 9 | 10 | 52 | 32 | 20 |\n| 4 | Flamengo | 66 | 38 | 19 | 9 | 10 | 56 | 42 | 14 |\n| 5 | Botafogo | 64 | 38 | 18 | 10 | 10 | 58 | 37 | 21 |\n| 6 | Red Bull Bragantino | 62 | 38 | 17 | 11 | 10 | 49 | 35 | 14 |\n| 7 | Fluminense | 56 | 38 | 16 | 8 | 14 | 51 | 47 | 4 |\n| 8 | Athletico | 56 | 38 | 14 | 14 | 10 | 51 | 43 | 8 |\n| 9 | Internacional | 55 | 38 | 15 | 10 | 13 | 46 | 45 | 1 |\n| 10 | Fortaleza | 54 | 38 | 15 | 9 | 14 | 45 | 44 | 1 |\n| 11 | São Paulo | 53 | 38 | 14 | 11 | 13 | 40 | 38 | 2 |\n| 12 | Cuiabá | 51 | 38 | 14 | 9 | 15 | 40 | 39 | 1 |\n| 13 | Corinthians | 50 | 38 | 12 | 14 | 12 | 47 | 48 | 1 |\n| 14 | Cruzeiro | 47 | 38 | 11 | 14 | 13 | 35 | 32 | 3 |\n| 15 | Vasco da Gama | 45 | 38 | 12 | 9 | 17 | 41 | 51 | 10 |\n| 16 | Bahia | 44 | 38 | 12 | 8 | 18 | 50 | 53 | 3 |\n| 17 | Santos | 43 | 38 | 11 | 10 | 17 | 39 | 64 | 25 |\n| 18 | Goiás | 38 | 38 | 9 | 11 | 18 | 36 | 53 | 17 |\n| 19 | Coritiba | 30 | 38 | 8 | 6 | 24 | 41 | 73 | 32 |\n| 20 | América Mineiro | 24 | 38 | 5 | 9 | 24 | 42 | 81 | 39 |,"Qual é o saldo de gols do time Grêmio na tabela de classificação do campeonato Brasileirão Betano 48982, na temporada 48982,? Responda somente o valor.",7,(filter tree Grêmio) -> (filter tree Saldo_de_Gols) -> (filter level TOP 2),Cell Selection,Brasileirão Betano 48982,48982
2,"Esta é uma tarefa de respostas a perguntas sobre uma tabela de classificação de um campeonato de futebol (Hierarchical Table QA). Com base nos dados apresentados, realize a leitura da tabela relativa ao campeonato citado e responda à pergunta, dizendo somente o valor perguntado.",[TLE] Esta é a tabela de classificação do campeonato Brasileirão Betano do país Brazil na temporada 48982\n[TAB] | Posição | Team | Pontos | Jogos | Vitórias | Empates | Derrotas | Gols_Marcados | Gols_Sofridos | Saldo_de_Gols |\n| 1 | Palmeiras | 70 | 38 | 20 | 10 | 8 | 64 | 33 | 31 |\n| 2 | Grêmio | 68 | 38 | 21 | 5 | 12 | 63 | 56 | 7 |\n| 3 | Atlético Mineiro | 66 | 38 | 19 | 9 | 10 | 52 | 32 | 20 |\n| 4 | Flamengo | 66 | 38 | 19 | 9 | 10 | 56 | 42 | 14 |\n| 5 | Botafogo | 64 | 38 | 18 | 10 | 10 | 58 | 37 | 21 |\n| 6 | Red Bull Bragantino | 62 | 38 | 17 | 11 | 10 | 49 | 35 | 14 |\n| 7 | Fluminense | 56 | 38 | 16 | 8 | 14 | 51 | 47 | 4 |\n| 8 | Athletico | 56 | 38 | 14 | 14 | 10 | 51 | 43 | 8 |\n| 9 | Internacional | 55 | 38 | 15 | 10 | 13 | 46 | 45 | 1 |\n| 10 | Fortaleza | 54 | 38 | 15 | 9 | 14 | 45 | 44 | 1 |\n| 11 | São Paulo | 53 | 38 | 14 | 11 | 13 | 40 | 38 | 2 |\n| 12 | Cuiabá | 51 | 38 | 14 | 9 | 15 | 40 | 39 | 1 |\n| 13 | Corinthians | 50 | 38 | 12 | 14 | 12 | 47 | 48 | 1 |\n| 14 | Cruzeiro | 47 | 38 | 11 | 14 | 13 | 35 | 32 | 3 |\n| 15 | Vasco da Gama | 45 | 38 | 12 | 9 | 17 | 41 | 51 | 10 |\n| 16 | Bahia | 44 | 38 | 12 | 8 | 18 | 50 | 53 | 3 |\n| 17 | Santos | 43 | 38 | 11 | 10 | 17 | 39 | 64 | 25 |\n| 18 | Goiás | 38 | 38 | 9 | 11 | 18 | 36 | 53 | 17 |\n| 19 | Coritiba | 30 | 38 | 8 | 6 | 24 | 41 | 73 | 32 |\n| 20 | América Mineiro | 24 | 38 | 5 | 9 | 24 | 42 | 81 | 39 |,"Qual é o saldo de gols do time Atlético Mineiro na tabela de classificação do campeonato Brasileirão Betano 48982, na temporada 48982,? Responda somente o valor.",20,(filter tree Atlético Mineiro) -> (filter tree Saldo_de_Gols) -> (filter level TOP 2),Cell Selection,Brasileirão Betano 48982,48982
3,"Esta é uma tarefa de respostas a perguntas sobre uma tabela de classificação de um campeonato de futebol (Hierarchical Table QA). Com base nos dados apresentados, realize a leitura da tabela relativa ao campeonato citado e responda à pergunta, dizendo somente o valor perguntado.",[TLE] Esta é a tabela de classificação do campeonato Brasileirão Betano do país Brazil na temporada 48982\n[TAB] | Posição | Team | Pontos | Jogos | Vitórias | Empates | Derrotas | Gols_Marcados | Gols_Sofridos | Saldo_de_Gols |\n| 1 | Palmeiras | 70 | 38 | 20 | 10 | 8 | 64 | 33 | 31 |\n| 2 | Grêmio | 68 | 38 | 21 | 5 | 12 | 63 | 56 | 7 |\n| 3 | Atlético Mineiro | 66 | 38 | 19 | 9 | 10 | 52 | 32 | 20 |\n| 4 | Flamengo | 66 | 38 | 19 | 9 | 10 | 56 | 42 | 14 |\n| 5 | Botafogo | 64 | 38 | 18 | 10 | 10 | 58 | 37 | 21 |\n| 6 | Red Bull Bragantino | 62 | 38 | 17 | 11 | 10 | 49 | 35 | 14 |\n| 7 | Fluminense | 56 | 38 | 16 | 8 | 14 | 51 | 47 | 4 |\n| 8 | Athletico | 56 | 38 | 14 | 14 | 10 | 51 | 43 | 8 |\n| 9 | Internacional | 55 | 38 | 15 | 10 | 13 | 46 | 45 | 1 |\n| 10 | Fortaleza | 54 | 38 | 15 | 9 | 14 | 45 | 44 | 1 |\n| 11 | São Paulo | 53 | 38 | 14 | 11 | 13 | 40 | 38 | 2 |\n| 12 | Cuiabá | 51 | 38 | 14 | 9 | 15 | 40 | 39 | 1 |\n| 13 | Corinthians | 50 | 38 | 12 | 14 | 12 | 47 | 48 | 1 |\n| 14 | Cruzeiro | 47 | 38 | 11 | 14 | 13 | 35 | 32 | 3 |\n| 15 | Vasco da Gama | 45 | 38 | 12 | 9 | 17 | 41 | 51 | 10 |\n| 16 | Bahia | 44 | 38 | 12 | 8 | 18 | 50 | 53 | 3 |\n| 17 | Santos | 43 | 38 | 11 | 10 | 17 | 39 | 64 | 25 |\n| 18 | Goiás | 38 | 38 | 9 | 11 | 18 | 36 | 53 | 17 |\n| 19 | Coritiba | 30 | 38 | 8 | 6 | 24 | 41 | 73 | 32 |\n| 20 | América Mineiro | 24 | 38 | 5 | 9 | 24 | 42 | 81 | 39 |,"Qual é o saldo de gols do time Flamengo na tabela de classificação do campeonato Brasileirão Betano 48982, na temporada 48982,? Responda somente o valor.",14,(filter tree Flamengo) -> (filter tree Saldo_de_Gols) -> (filter level TOP 2),Cell Selection,Brasileirão Betano 48982,48982
4,"Esta é uma tarefa de respostas a perguntas sobre uma tabela de classificação de um campeonato de futebol (Hierarchical Table QA). Com base nos dados apresentados, realize a leitura da tabela relativa ao campeonato citado e responda à pergunta, dizendo somente o valor perguntado.",[TLE] Esta é a tabela de classificação do campeonato Brasileirão Betano do país Brazil na temporada 48982\n[TAB] | Posição | Team | Pontos | Jogos | Vitórias | Empates | Derrotas | Gols_Marcados | Gols_Sofridos | Saldo_de_Gols |\n| 1 | Palmeiras | 70 | 38 | 20 | 10 | 8 | 64 | 33 | 31 |\n| 2 | Grêmio | 68 | 38 | 21 | 5 | 12 | 63 | 56 | 7 |\n| 3 | Atlético Mineiro | 66 | 38 | 19 | 9 | 10 | 52 | 32 | 20 |\n| 4 | Flamengo | 66 | 38 | 19 | 9 | 10 | 56 | 42 | 14 |\n| 5 | Botafogo | 64 | 38 | 18 | 10 | 10 | 58 | 37 | 21 |\n| 6 | Red Bull Bragantino | 62 | 38 | 17 | 11 | 10 | 49 | 35 | 14 |\n| 7 | Fluminense | 56 | 38 | 16 | 8 | 14 | 51 | 47 | 4 |\n| 8 | Athletico | 56 | 38 | 14 | 14 | 10 | 51 | 43 | 8 |\n| 9 | Internacional | 55 | 38 | 15 | 10 | 13 | 46 | 45 | 1 |\n| 10 | Fortaleza | 54 | 38 | 15 | 9 | 14 | 45 | 44 | 1 |\n| 11 | São Paulo | 53 | 38 | 14 | 11 | 13 | 40 | 38 | 2 |\n| 12 | Cuiabá | 51 | 38 | 14 | 9 | 15 | 40 | 39 | 1 |\n| 13 | Corinthians | 50 | 38 | 12 | 14 | 12 | 47 | 48 | 1 |\n| 14 | Cruzeiro | 47 | 38 | 11 | 14 | 13 | 35 | 32 | 3 |\n| 15 | Vasco da Gama | 45 | 38 | 12 | 9 | 17 | 41 | 51 | 10 |\n| 16 | Bahia | 44 | 38 | 12 | 8 | 18 | 50 | 53 | 3 |\n| 17 | Santos | 43 | 38 | 11 | 10 | 17 | 39 | 64 | 25 |\n| 18 | Goiás | 38 | 38 | 9 | 11 | 18 | 36 | 53 | 17 |\n| 19 | Coritiba | 30 | 38 | 8 | 6 | 24 | 41 | 73 | 32 |\n| 20 | América Mineiro | 24 | 38 | 5 | 9 | 24 | 42 | 81 | 39 |,"Qual é o saldo de gols do time Botafogo na tabela de classificação do campeonato Brasileirão Betano 48982, na temporada 48982,? Responda somente o valor.",21,(filter tree Botafogo) -> (filter tree Saldo_de_Gols) -> (filter level TOP 2),Cell Selection,Brasileirão Betano 48982,48982


In [None]:
df.tail()

Unnamed: 0,instruction,input,question,response,logical_form,task_type,championship,season
18754,"Esta é uma tarefa de respostas a perguntas sobre uma tabela de classificação de um campeonato de futebol (Hierarchical Table QA). Com base nos dados apresentados, realize a leitura da tabela relativa ao campeonato citado e responda à pergunta, dizendo somente o valor perguntado.",[TLE] Esta é a tabela de classificação do campeonato Eredivisie do país Netherlands na temporada 1711\n[TAB] | Posição | Team | Pontos | Jogos | Vitórias | Empates | Derrotas | Gols_Marcados | Gols_Sofridos | Saldo_de_Gols |\n| 1 | AZ Alkmaar | 80 | 34 | 25 | 5 | 4 | 66 | 22 | 44 |\n| 2 | FC Twente | 69 | 34 | 20 | 9 | 5 | 62 | 31 | 31 |\n| 3 | AFC Ajax | 68 | 34 | 21 | 5 | 8 | 74 | 41 | 33 |\n| 4 | PSV Eindhoven | 65 | 34 | 19 | 8 | 7 | 71 | 33 | 38 |\n| 5 | SC Heerenveen | 60 | 34 | 17 | 9 | 8 | 66 | 57 | 9 |\n| 6 | FC Groningen | 56 | 34 | 17 | 5 | 12 | 53 | 36 | 17 |\n| 7 | Feyenoord | 45 | 34 | 12 | 9 | 13 | 54 | 46 | 8 |\n| 8 | NAC Breda | 45 | 34 | 13 | 6 | 15 | 44 | 54 | 10 |\n| 9 | FC Utrecht | 44 | 34 | 11 | 11 | 12 | 41 | 44 | 3 |\n| 10 | Vitesse | 43 | 34 | 11 | 10 | 13 | 41 | 48 | 7 |\n| 11 | NEC Nijmegen | 42 | 34 | 9 | 15 | 10 | 41 | 40 | 1 |\n| 12 | Willem II Tilburg | 37 | 34 | 10 | 7 | 17 | 35 | 58 | 23 |\n| 13 | Sparta Rotterdam | 35 | 34 | 9 | 8 | 17 | 46 | 66 | 20 |\n| 14 | ADO Den Haag | 32 | 34 | 8 | 8 | 18 | 41 | 58 | 17 |\n| 15 | Heracles Almelo | 32 | 34 | 7 | 11 | 16 | 35 | 53 | 18 |\n| 16 | Roda JC Kerkrade | 30 | 34 | 7 | 9 | 18 | 38 | 58 | 20 |\n| 17 | De Graafschap | 30 | 34 | 7 | 9 | 18 | 24 | 58 | 34 |\n| 18 | FC Volendam | 29 | 34 | 7 | 8 | 19 | 38 | 67 | 29 |,"Quantos pontos faltam para o time De Graafschap chegar na zona de classificação na tabela de classificação do campeonato Eredivisie 1711, na temporada 1711,? Responda somente o valor.",35,(filter tree De Graafschap PSV Eindhoven) -> (filter tree Pontos) -> (difference),Distance to Qualification,Eredivisie 1711,1711
18755,"Esta é uma tarefa de respostas a perguntas sobre uma tabela de classificação de um campeonato de futebol (Hierarchical Table QA). Com base nos dados apresentados, realize a leitura da tabela relativa ao campeonato citado e responda à pergunta, dizendo somente o valor perguntado.",[TLE] Esta é a tabela de classificação do campeonato Eredivisie do país Netherlands na temporada 1711\n[TAB] | Posição | Team | Pontos | Jogos | Vitórias | Empates | Derrotas | Gols_Marcados | Gols_Sofridos | Saldo_de_Gols |\n| 1 | AZ Alkmaar | 80 | 34 | 25 | 5 | 4 | 66 | 22 | 44 |\n| 2 | FC Twente | 69 | 34 | 20 | 9 | 5 | 62 | 31 | 31 |\n| 3 | AFC Ajax | 68 | 34 | 21 | 5 | 8 | 74 | 41 | 33 |\n| 4 | PSV Eindhoven | 65 | 34 | 19 | 8 | 7 | 71 | 33 | 38 |\n| 5 | SC Heerenveen | 60 | 34 | 17 | 9 | 8 | 66 | 57 | 9 |\n| 6 | FC Groningen | 56 | 34 | 17 | 5 | 12 | 53 | 36 | 17 |\n| 7 | Feyenoord | 45 | 34 | 12 | 9 | 13 | 54 | 46 | 8 |\n| 8 | NAC Breda | 45 | 34 | 13 | 6 | 15 | 44 | 54 | 10 |\n| 9 | FC Utrecht | 44 | 34 | 11 | 11 | 12 | 41 | 44 | 3 |\n| 10 | Vitesse | 43 | 34 | 11 | 10 | 13 | 41 | 48 | 7 |\n| 11 | NEC Nijmegen | 42 | 34 | 9 | 15 | 10 | 41 | 40 | 1 |\n| 12 | Willem II Tilburg | 37 | 34 | 10 | 7 | 17 | 35 | 58 | 23 |\n| 13 | Sparta Rotterdam | 35 | 34 | 9 | 8 | 17 | 46 | 66 | 20 |\n| 14 | ADO Den Haag | 32 | 34 | 8 | 8 | 18 | 41 | 58 | 17 |\n| 15 | Heracles Almelo | 32 | 34 | 7 | 11 | 16 | 35 | 53 | 18 |\n| 16 | Roda JC Kerkrade | 30 | 34 | 7 | 9 | 18 | 38 | 58 | 20 |\n| 17 | De Graafschap | 30 | 34 | 7 | 9 | 18 | 24 | 58 | 34 |\n| 18 | FC Volendam | 29 | 34 | 7 | 8 | 19 | 38 | 67 | 29 |,"Quantos pontos o time FC Volendam está atrás do primeiro colocado na tabela de classificação do campeonato Eredivisie 1711, na temporada 1711,? Responda somente o valor.",51,(filter tree FC Volendam AZ Alkmaar) -> (filter tree Pontos) -> (difference),Distance to First,Eredivisie 1711,1711
18756,"Esta é uma tarefa de respostas a perguntas sobre uma tabela de classificação de um campeonato de futebol (Hierarchical Table QA). Com base nos dados apresentados, realize a leitura da tabela relativa ao campeonato citado e responda à pergunta, dizendo somente o valor perguntado.",[TLE] Esta é a tabela de classificação do campeonato Eredivisie do país Netherlands na temporada 1711\n[TAB] | Posição | Team | Pontos | Jogos | Vitórias | Empates | Derrotas | Gols_Marcados | Gols_Sofridos | Saldo_de_Gols |\n| 1 | AZ Alkmaar | 80 | 34 | 25 | 5 | 4 | 66 | 22 | 44 |\n| 2 | FC Twente | 69 | 34 | 20 | 9 | 5 | 62 | 31 | 31 |\n| 3 | AFC Ajax | 68 | 34 | 21 | 5 | 8 | 74 | 41 | 33 |\n| 4 | PSV Eindhoven | 65 | 34 | 19 | 8 | 7 | 71 | 33 | 38 |\n| 5 | SC Heerenveen | 60 | 34 | 17 | 9 | 8 | 66 | 57 | 9 |\n| 6 | FC Groningen | 56 | 34 | 17 | 5 | 12 | 53 | 36 | 17 |\n| 7 | Feyenoord | 45 | 34 | 12 | 9 | 13 | 54 | 46 | 8 |\n| 8 | NAC Breda | 45 | 34 | 13 | 6 | 15 | 44 | 54 | 10 |\n| 9 | FC Utrecht | 44 | 34 | 11 | 11 | 12 | 41 | 44 | 3 |\n| 10 | Vitesse | 43 | 34 | 11 | 10 | 13 | 41 | 48 | 7 |\n| 11 | NEC Nijmegen | 42 | 34 | 9 | 15 | 10 | 41 | 40 | 1 |\n| 12 | Willem II Tilburg | 37 | 34 | 10 | 7 | 17 | 35 | 58 | 23 |\n| 13 | Sparta Rotterdam | 35 | 34 | 9 | 8 | 17 | 46 | 66 | 20 |\n| 14 | ADO Den Haag | 32 | 34 | 8 | 8 | 18 | 41 | 58 | 17 |\n| 15 | Heracles Almelo | 32 | 34 | 7 | 11 | 16 | 35 | 53 | 18 |\n| 16 | Roda JC Kerkrade | 30 | 34 | 7 | 9 | 18 | 38 | 58 | 20 |\n| 17 | De Graafschap | 30 | 34 | 7 | 9 | 18 | 24 | 58 | 34 |\n| 18 | FC Volendam | 29 | 34 | 7 | 8 | 19 | 38 | 67 | 29 |,"Quantos pontos o time FC Volendam está acima do último colocado na tabela de classificação do campeonato Eredivisie 1711, na temporada 1711,? Responda somente o valor.",0,(filter tree FC Volendam FC Volendam) -> (filter tree Pontos) -> (difference),Distance to Last,Eredivisie 1711,1711
18757,"Esta é uma tarefa de respostas a perguntas sobre uma tabela de classificação de um campeonato de futebol (Hierarchical Table QA). Com base nos dados apresentados, realize a leitura da tabela relativa ao campeonato citado e responda à pergunta, dizendo somente o valor perguntado.",[TLE] Esta é a tabela de classificação do campeonato Eredivisie do país Netherlands na temporada 1711\n[TAB] | Posição | Team | Pontos | Jogos | Vitórias | Empates | Derrotas | Gols_Marcados | Gols_Sofridos | Saldo_de_Gols |\n| 1 | AZ Alkmaar | 80 | 34 | 25 | 5 | 4 | 66 | 22 | 44 |\n| 2 | FC Twente | 69 | 34 | 20 | 9 | 5 | 62 | 31 | 31 |\n| 3 | AFC Ajax | 68 | 34 | 21 | 5 | 8 | 74 | 41 | 33 |\n| 4 | PSV Eindhoven | 65 | 34 | 19 | 8 | 7 | 71 | 33 | 38 |\n| 5 | SC Heerenveen | 60 | 34 | 17 | 9 | 8 | 66 | 57 | 9 |\n| 6 | FC Groningen | 56 | 34 | 17 | 5 | 12 | 53 | 36 | 17 |\n| 7 | Feyenoord | 45 | 34 | 12 | 9 | 13 | 54 | 46 | 8 |\n| 8 | NAC Breda | 45 | 34 | 13 | 6 | 15 | 44 | 54 | 10 |\n| 9 | FC Utrecht | 44 | 34 | 11 | 11 | 12 | 41 | 44 | 3 |\n| 10 | Vitesse | 43 | 34 | 11 | 10 | 13 | 41 | 48 | 7 |\n| 11 | NEC Nijmegen | 42 | 34 | 9 | 15 | 10 | 41 | 40 | 1 |\n| 12 | Willem II Tilburg | 37 | 34 | 10 | 7 | 17 | 35 | 58 | 23 |\n| 13 | Sparta Rotterdam | 35 | 34 | 9 | 8 | 17 | 46 | 66 | 20 |\n| 14 | ADO Den Haag | 32 | 34 | 8 | 8 | 18 | 41 | 58 | 17 |\n| 15 | Heracles Almelo | 32 | 34 | 7 | 11 | 16 | 35 | 53 | 18 |\n| 16 | Roda JC Kerkrade | 30 | 34 | 7 | 9 | 18 | 38 | 58 | 20 |\n| 17 | De Graafschap | 30 | 34 | 7 | 9 | 18 | 24 | 58 | 34 |\n| 18 | FC Volendam | 29 | 34 | 7 | 8 | 19 | 38 | 67 | 29 |,"Quantos pontos faltam para o time FC Volendam entrar na zona de rebaixamento na tabela de classificação do campeonato Eredivisie 1711, na temporada 1711,? Responda somente o valor.",-3,(filter tree FC Volendam Heracles Almelo) -> (filter tree Pontos) -> (difference),Distance to Relegation,Eredivisie 1711,1711
18758,"Esta é uma tarefa de respostas a perguntas sobre uma tabela de classificação de um campeonato de futebol (Hierarchical Table QA). Com base nos dados apresentados, realize a leitura da tabela relativa ao campeonato citado e responda à pergunta, dizendo somente o valor perguntado.",[TLE] Esta é a tabela de classificação do campeonato Eredivisie do país Netherlands na temporada 1711\n[TAB] | Posição | Team | Pontos | Jogos | Vitórias | Empates | Derrotas | Gols_Marcados | Gols_Sofridos | Saldo_de_Gols |\n| 1 | AZ Alkmaar | 80 | 34 | 25 | 5 | 4 | 66 | 22 | 44 |\n| 2 | FC Twente | 69 | 34 | 20 | 9 | 5 | 62 | 31 | 31 |\n| 3 | AFC Ajax | 68 | 34 | 21 | 5 | 8 | 74 | 41 | 33 |\n| 4 | PSV Eindhoven | 65 | 34 | 19 | 8 | 7 | 71 | 33 | 38 |\n| 5 | SC Heerenveen | 60 | 34 | 17 | 9 | 8 | 66 | 57 | 9 |\n| 6 | FC Groningen | 56 | 34 | 17 | 5 | 12 | 53 | 36 | 17 |\n| 7 | Feyenoord | 45 | 34 | 12 | 9 | 13 | 54 | 46 | 8 |\n| 8 | NAC Breda | 45 | 34 | 13 | 6 | 15 | 44 | 54 | 10 |\n| 9 | FC Utrecht | 44 | 34 | 11 | 11 | 12 | 41 | 44 | 3 |\n| 10 | Vitesse | 43 | 34 | 11 | 10 | 13 | 41 | 48 | 7 |\n| 11 | NEC Nijmegen | 42 | 34 | 9 | 15 | 10 | 41 | 40 | 1 |\n| 12 | Willem II Tilburg | 37 | 34 | 10 | 7 | 17 | 35 | 58 | 23 |\n| 13 | Sparta Rotterdam | 35 | 34 | 9 | 8 | 17 | 46 | 66 | 20 |\n| 14 | ADO Den Haag | 32 | 34 | 8 | 8 | 18 | 41 | 58 | 17 |\n| 15 | Heracles Almelo | 32 | 34 | 7 | 11 | 16 | 35 | 53 | 18 |\n| 16 | Roda JC Kerkrade | 30 | 34 | 7 | 9 | 18 | 38 | 58 | 20 |\n| 17 | De Graafschap | 30 | 34 | 7 | 9 | 18 | 24 | 58 | 34 |\n| 18 | FC Volendam | 29 | 34 | 7 | 8 | 19 | 38 | 67 | 29 |,"Quantos pontos faltam para o time FC Volendam chegar na zona de classificação na tabela de classificação do campeonato Eredivisie 1711, na temporada 1711,? Responda somente o valor.",36,(filter tree FC Volendam PSV Eindhoven) -> (filter tree Pontos) -> (difference),Distance to Qualification,Eredivisie 1711,1711


In [None]:
df.shape

(18759, 8)

In [None]:
# Exibe a contagem de exemplos por tipo de tarefa
print("\nContagem por task_type:")
print(df["task_type"].value_counts())


Contagem por task_type:
task_type
Difference                   2860
Cell Selection               2808
Distance to Relegation       2808
Distance to Last             2808
Distance to First            2808
Distance to Qualification    2808
Superlative                   858
Aggregation                   572
Average                       429
Name: count, dtype: int64


In [None]:
# Exibe a contagem de exemplos por campeonato
print("\nContagem por championship:")
print(df["championship"].value_counts())


Contagem por championship:
championship
Liga Profesional de Fútbol 9651     183
Liga Profesional de Fútbol 12117    183
Liga Profesional de Fútbol 57478    173
Liga Profesional de Fútbol 47647    173
Liga Profesional de Fútbol 41884    173
                                   ... 
Liga Portugal 3462                  113
Liga Portugal 4907                  113
Liga Portugal 6483                  113
Liga Profesional de Fútbol 11237    108
Liga Profesional de Fútbol 70268    108
Name: count, Length: 143, dtype: int64


In [None]:
csv_path = os.path.join(output_folder, 'hierarchical_qa_dataset_balanced_all_seasons.csv')
df.to_csv(csv_path, index=False, encoding='utf-8')
print(f"Dataset balanceado salvo em: {csv_path}")

Dataset balanceado salvo em: /content/drive/My Drive/dados_rag_new/hierarchical_qa_dataset_balanced_all_seasons.csv
