# Actividad extracurricular [12]
### *Métodos Numéricos*
## Web Scraping

* **Nombre:** Michael Enríquez
* **Fecha:** Martes 07 de enero, 2025
---

## ¿Qué es Web Scraping?

El web scraping es una técnica automatizada utilizada para extraer información de sitios web. Se basa en el uso de programas o scripts que navegan por páginas web, descargan su contenido y lo procesan para obtener datos estructurados como texto, imágenes o enlaces. Los datos analizados generalmente es la extracción del codigo HTML de la pagina, una vez analizada se exportan los datos de interes en formato Json o CSV.

---

## Prueba en python para dos librerias diferentes

*Sitio de pruebas:* https://quotes.toscrape.com/

1. BeautifulSoup
Una librería que facilita la extracción y el análisis de datos de documentos HTML y XML

In [2]:
from bs4 import BeautifulSoup
import requests

# URL del sitio web
url = "https://quotes.toscrape.com/"
response = requests.get(url)

# Parsear el contenido HTML con BeautifulSoup
soup = BeautifulSoup(response.content, "html.parser")

# Extraer citas y autores
quotes = soup.find_all("div", class_="quote")

for quote in quotes:
    text = quote.find("span", class_="text").get_text()
    author = quote.find("small", class_="author").get_text()
    print(f"Cita: {text}\nAutor: {author}\n")

Cita: “The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”
Autor: Albert Einstein

Cita: “It is our choices, Harry, that show what we truly are, far more than our abilities.”
Autor: J.K. Rowling

Cita: “There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”
Autor: Albert Einstein

Cita: “The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”
Autor: Jane Austen

Cita: “Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.”
Autor: Marilyn Monroe

Cita: “Try not to become a man of success. Rather become a man of value.”
Autor: Albert Einstein

Cita: “It is better to be hated for what you are than to be loved for what you are not.”
Autor: André Gide

Cita: “I have not failed. I've just found 10,000 ways that won't work.”
Autor: Thomas A. Edison

Cita

2. Scrapy

Una Herramienta de Scraping mas potente para paginas dinamicas y extensibles

In [1]:
from scrapy.crawler import CrawlerProcess
from scrapy import Spider

# Definir el Spider
class QuotesSpider(Spider):
    name = "quotes"
    start_urls = ['https://quotes.toscrape.com/']

    def parse(self, response):
        for quote in response.css('div.quote'):
            text = quote.css('span.text::text').get()
            author = quote.css('small.author::text').get()
            tags = quote.css('div.tags a.tag::text').getall()
            
            # Imprimir los resultados en la consola
            print(f"Cita: {text}")
            print(f"Autor: {author}")
            print(f"Etiquetas: {tags}")
            print("-" * 50)

# Configurar y ejecutar el proceso
process = CrawlerProcess()
process.crawl(QuotesSpider)
process.start()

2025-01-07 18:59:08 [scrapy.utils.log] INFO: Scrapy 2.12.0 started (bot: scrapybot)
2025-01-07 18:59:08 [scrapy.utils.log] INFO: Versions: lxml 5.3.0.0, libxml2 2.11.7, cssselect 1.2.0, parsel 1.9.1, w3lib 2.2.1, Twisted 24.11.0, Python 3.13.1 (tags/v3.13.1:0671451, Dec  3 2024, 19:06:28) [MSC v.1942 64 bit (AMD64)], pyOpenSSL 24.3.0 (OpenSSL 3.4.0 22 Oct 2024), cryptography 44.0.0, Platform Windows-11-10.0.26100-SP0
2025-01-07 18:59:08 [scrapy.addons] INFO: Enabled addons:
[]
2025-01-07 18:59:08 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.selectreactor.SelectReactor
2025-01-07 18:59:08 [scrapy.extensions.telnet] INFO: Telnet Password: 292b159f8ee7469e
2025-01-07 18:59:08 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.logstats.LogStats']
2025-01-07 18:59:08 [scrapy.crawler] INFO: Overridden settings:
{}
2025-01-07 18:59:09 [scrapy.middleware] INFO: Enabled downloader mi

Cita: “The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”
Autor: Albert Einstein
Etiquetas: ['change', 'deep-thoughts', 'thinking', 'world']
--------------------------------------------------
Cita: “It is our choices, Harry, that show what we truly are, far more than our abilities.”
Autor: J.K. Rowling
Etiquetas: ['abilities', 'choices']
--------------------------------------------------
Cita: “There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”
Autor: Albert Einstein
Etiquetas: ['inspirational', 'life', 'live', 'miracle', 'miracles']
--------------------------------------------------
Cita: “The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”
Autor: Jane Austen
Etiquetas: ['aliteracy', 'books', 'classic', 'humor']
--------------------------------------------------
Cita: “Imperfection is beauty,

## Pruebas con una pagina de mi elección

*Sitio Web:* http://books.toscrape.com/

In [4]:
import requests
from bs4 import BeautifulSoup

# URL de la página de Books to Scrape
url = 'http://books.toscrape.com/'

# Realizar una solicitud GET a la página
response = requests.get(url)
response.raise_for_status()  # Asegurarse de que la solicitud sea exitosa

# Analizar el contenido HTML con BeautifulSoup
soup = BeautifulSoup(response.text, 'html.parser')

# Buscar los nombres de los libros
productos = soup.find_all('h3')

# Extraer y mostrar los nombres de los libros
for producto in productos:
    nombre = producto.find('a')['title']
    print(f"Nombre del libro: {nombre}")



2025-01-07 19:10:29 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): books.toscrape.com:80
2025-01-07 19:10:30 [urllib3.connectionpool] DEBUG: http://books.toscrape.com:80 "GET / HTTP/1.1" 200 51294


Nombre del libro: A Light in the Attic
Nombre del libro: Tipping the Velvet
Nombre del libro: Soumission
Nombre del libro: Sharp Objects
Nombre del libro: Sapiens: A Brief History of Humankind
Nombre del libro: The Requiem Red
Nombre del libro: The Dirty Little Secrets of Getting Your Dream Job
Nombre del libro: The Coming Woman: A Novel Based on the Life of the Infamous Feminist, Victoria Woodhull
Nombre del libro: The Boys in the Boat: Nine Americans and Their Epic Quest for Gold at the 1936 Berlin Olympics
Nombre del libro: The Black Maria
Nombre del libro: Starving Hearts (Triangular Trade Trilogy, #1)
Nombre del libro: Shakespeare's Sonnets
Nombre del libro: Set Me Free
Nombre del libro: Scott Pilgrim's Precious Little Life (Scott Pilgrim #1)
Nombre del libro: Rip it Up and Start Again
Nombre del libro: Our Band Could Be Your Life: Scenes from the American Indie Underground, 1981-1991
Nombre del libro: Olio
Nombre del libro: Mesaerion: The Best Science Fiction Stories 1800-1849
No