Programa: Scraping Seguro de Goodreads para Sugerencias de Libros
Descripción: Este programa realiza scraping de Goodreads para obtener sugerencias de libros de manera segura y respetando los términos de uso del sitio web.
Autor: Fernando Pisot
Fecha: 18/07/25

Nota: Este programa utiliza técnicas responsables de scraping, como establecer un User-Agent adecuado, respetar las reglas del archivo robots.txt y limitar la frecuencia de las solicitudes para evitar sobrecargar el servidor.

In [None]:
%pip install requests beautifulsoup4 pandas




[notice] A new release of pip is available: 24.2 -> 25.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [10]:
# Configuración inicial

import requests
from bs4 import BeautifulSoup
import pandas as pd
import time
from IPython.display import display, HTML  # ¡Esto resuelve el NameError!


In [None]:

# Parámetros ajustables
URL = "https://www.goodreads.com/list/show/1.Best_Books_Ever"
MIN_RATING = 4.3
MIN_RATINGS_COUNT = 10000
DELAY = 3  # Segundos entre peticiones
HEADERS = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
}

**Función de Scraping**

In [11]:
def scrape_goodreads_list(url):
    books = []
    try:
        print(f"🔍 Scrapeando: {url}")
        response = requests.get(url, headers=HEADERS)
        response.raise_for_status()
        
        soup = BeautifulSoup(response.text, "html.parser")
        book_rows = soup.select("tr[itemtype='http://schema.org/Book']")
        
        for row in book_rows:
            title = row.select_one("a.bookTitle span").text.strip()
            author = row.select_one("a.authorName span").text.strip()
            book_url = "https://www.goodreads.com" + row.select_one("a.bookTitle")["href"]  # Enlace del libro
            
            # Selector y limpieza del rating (maneja textos como "really liked it")
            rating_text = row.select_one("span.minirating").text.strip()
            
            try:
                # Extrae el rating (ej: "really liked it 4.00 avg rating" -> "4.00")
                avg_rating = float(rating_text.split("avg rating")[0].strip().split()[-1])
                # Extrae el número de ratings (ej: "— 1,842,083 ratings" -> 1842083)
                ratings_count = int(rating_text.split("—")[-1].replace("ratings", "").replace(",", "").strip())
            except (IndexError, ValueError) as e:
                print(f"⚠️ Error al procesar: {title} - Texto crudo: '{rating_text}'")
                continue
            
            if avg_rating >= MIN_RATING and ratings_count >= MIN_RATINGS_COUNT:
                books.append({
                    "Título": title,
                    "Autor": author,
                    "Rating": avg_rating,
                    "N° Reseñas": ratings_count,
                    "Enlace": f'<a href="{book_url}" target="_blank">Ver libro</a>'  # Enlace HTML
                })
        
        time.sleep(DELAY)
        print(f"✅ ¡Libros encontrados: {len(books)}!")
        return pd.DataFrame(books)
    
    except Exception as e:
        print(f"❌ Error inesperado: {e}")
        return pd.DataFrame()

## Ejecución y visualización de resultados

In [12]:
# Ejecutar y mostrar resultados
df_books = scrape_goodreads_list(URL)
if not df_books.empty:
    # Mostrar con enlaces clickeables (solo en Jupyter)
    display(HTML(df_books.to_html(escape=False, render_links=True)))
else:
    print("No se encontraron libros con los filtros aplicados.")


🔍 Scrapeando: https://www.goodreads.com/list/show/1.Best_Books_Ever
✅ ¡Libros encontrados: 28!


Unnamed: 0,Título,Autor,Rating,N° Reseñas,Enlace
0,"The Hunger Games (The Hunger Games, #1)",Suzanne Collins,4.35,9574357,Ver libro
1,"Harry Potter and the Order of the Phoenix (Harry Potter, #5)",J.K. Rowling,4.5,3684567,Ver libro
2,The Book Thief,Markus Zusak,4.39,2797610,Ver libro
3,J.R.R. Tolkien 4-Book Boxed Set: The Hobbit and The Lord of the Rings,J.R.R. Tolkien,4.61,141374,Ver libro
4,The Giving Tree,Shel Silverstein,4.38,1208292,Ver libro
5,"The Lightning Thief (Percy Jackson and the Olympians, #1)",Rick Riordan,4.31,3330268,Ver libro
6,"Harry Potter and the Order of the Phoenix (Harry Potter, #5)",J.K. Rowling,4.5,3684567,Ver libro
7,Gone with the Wind,Margaret Mitchell,4.31,1257259,Ver libro
8,The Little Prince,Antoine de Saint-Exupéry,4.33,2369627,Ver libro
9,"Anne of Green Gables (Anne of Green Gables, #1)",L.M. Montgomery,4.32,1077910,Ver libro
