<a href="https://colab.research.google.com/github/mikaelaraujo/seotools/blob/main/Structured_Data_Extractor.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Autor: Mikael Araújo
# Versão: 1.0
# Programa capaz de extrair metadados de um conjunto de dados estruturados  "do tipo: <script type="application/ld+json">" de uma URL.
# Os labels a serem extraídos são os seguintes: headline, articleBody e keywords.
# O código também trabalha maneiras de resolver problemas causados pelo status code 403.

In [None]:
import requests
import json
from bs4 import BeautifulSoup

def extract_data_from_url(url):
  """
  Extrai os dados 'headline', 'articleBody' e 'keywords' de um conjunto de dados estruturados
  (do tipo: <script type="application/ld+json">) de uma URL.

  Args:
    url: A URL da página web.

  Returns:
    Um dicionário contendo os dados extraídos ou None se não forem encontrados.
  """
  try:
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'
    }
    response = requests.get(url, headers=headers)
    response.raise_for_status()  # Levanta uma exceção para códigos de status diferentes de 200

    soup = BeautifulSoup(response.content, 'html.parser')
    script_tags = soup.find_all('script', type="application/ld+json")

    for script_tag in script_tags:
      try:
        json_data = json.loads(script_tag.string)
        if isinstance(json_data, list):
          for item in json_data:
            if 'headline' in item and 'articleBody' in item and 'keywords' in item:
              return {
                  'headline': item.get('headline'),
                  'articleBody': item.get('articleBody'),
                  'keywords': item.get('keywords')
              }
        elif 'headline' in json_data and 'articleBody' in json_data and 'keywords' in json_data:
          return {
              'headline': json_data.get('headline'),
              'articleBody': json_data.get('articleBody'),
              'keywords': json_data.get('keywords')
          }
      except json.JSONDecodeError:
        continue  # Ignora erros de decodificação JSON

    return None  # Se nenhum dado for encontrado

  except requests.exceptions.RequestException as e:
    print(f"Erro ao acessar a URL: {e}")
    return None

# Exemplo de uso
url = "https://coingape.com/whats-behind-bitcoin-price-drop-political-chaos-or-us-fed-meeting/"
data = extract_data_from_url(url)

if data:
  print(data)
else:
  print("Dados não encontrados.")


{'headline': 'What’s Behind Bitcoin Price Drop? Political Chaos or US Fed Meeting?', 'articleBody': "Bitcoin price dropped below $60,000 to a low of $58,112 today. It has made the crypto community puzzled as the 50 bps Fed rate cuts speculation sparked rallies in stocks, crypto, and gold recently. Moreover, the political chaos surrounding the 2024 US presidential election continues to impact markets, alongside assassination attempts on Donald Trump.\r\nBitcoin Price Fell Amid Assassination Attempt on Donald Trump\r\nRepublican presidential candidate Donald Trump faced another assassination attempt near his Florida golf course on Sunday. Multiple shots were fired in his vicinity and Secret Service is still investigating the incident.\r\n\r\nPolitical events involving presidential candidates Kamala Harris or Donald Trump can influence market sentiment. Bitcoin price jumped after an assassination attempt on Trump in Pennsylvania on July 13 as it boosted his chances against President Joe B

In [None]:
# prompt: Crie um trecho de código que exiba os labels extraídos separadamente

if data:
  print("Headline:", data['headline'])
  print("\n")
  print("Keywords:", data['keywords'])
  print("\n")
  print("Article Body:\n", data['articleBody'])

else:
  print("Dados não encontrados.")

Headline: What’s Behind Bitcoin Price Drop? Political Chaos or US Fed Meeting?


Keywords: Bitcoin (BTC) Price, donald trump, US Fed meeting, US Federal Reserve


Article Body:
 Bitcoin price dropped below $60,000 to a low of $58,112 today. It has made the crypto community puzzled as the 50 bps Fed rate cuts speculation sparked rallies in stocks, crypto, and gold recently. Moreover, the political chaos surrounding the 2024 US presidential election continues to impact markets, alongside assassination attempts on Donald Trump.
Bitcoin Price Fell Amid Assassination Attempt on Donald Trump
Republican presidential candidate Donald Trump faced another assassination attempt near his Florida golf course on Sunday. Multiple shots were fired in his vicinity and Secret Service is still investigating the incident.

Political events involving presidential candidates Kamala Harris or Donald Trump can influence market sentiment. Bitcoin price jumped after an assassination attempt on Trump in Penn