In [None]:
This script scrapes details of books from the Goodreads list "Best Philosophical Fiction" and saves the information to a CSV file. 

# Steps

1. The first thing I did was install the required libraries using pip through Command Prompt:
    pip install bs4
    pip install requests 
    
    The bs4 (BeutifulSoup4) library is a library that makes it easy to scrape information from web pages. 
    It sits atop an HTML or XML parser, providing Pythonic idioms for iterating, searching, and modifying the parse tree.
    
    The Requests package is crucial for making HTTP requests to a specified URL, like we do with web scraping.
    When one makes a request to a URI, it returns a response. GoodReads is a website that allows html web scraping, so it gives a positive response.
    Many websites do not.
    
2. Inspecting the website:
    The "Best Philosophical List" I scraped can be found under this link: https://www.goodreads.com/list/show/1599.Best_Philosophical_Fiction.
    This list has 3 pages and 281 books (at the time it was scraped).
    Because it has 3 pages, I thought the best way to scrape it would be through defining a function that iterates the 3 pages.

In [1]:
from bs4 import BeautifulSoup as bs
import requests
import pandas as pd

In [2]:
# Function to scrape book details from a URL
def scrape_books(url):
    response = requests.get(url)
    soup = bs(response.content, 'html.parser')

    books = []
    book_elements = soup.find_all('tr', itemtype='http://schema.org/Book')
    
    for book_element in book_elements:
        title = book_element.find('a', class_='bookTitle').get_text(strip=True)
        author = book_element.find('a', class_='authorName').get_text(strip=True)
        score = book_element.find('span', class_='smallText uitext').get_text(strip=True)

        books.append({
            'title': title,
            'author': author,
            'score': score
        })
    
    return books

In [3]:
# Scrape data from all three pages
all_books = []
for page in range(1, 4):
    page_url = f'https://www.goodreads.com/list/show/1599.Best_Philosophical_Fiction?page={page}'
    page_books = scrape_books(page_url)
    all_books.extend(page_books)

In [4]:
# Convert the list of dictionaries to a DataFrame
books_df = pd.DataFrame(all_books)

In [5]:
# Save the DataFrame to a CSV file
csv_file = 'philosophical_books.csv'
books_df.to_csv(csv_file, index=False)