Solve the following two exercises using BeautifulSoup and Pandas. Please add more code cells as desired.

### Exercise 1
Scrape the page ``inpirational.html`` and provide the following information:

1. list the quotes by "Albert Einstein", as well as the tags assigned to them
2. list all the quotes tagged with "love"
3. list all authors of quotes tagged as "inspirational"

** You do not need to go through pagination. The first page suffices.

In [None]:
# Exercise 1: Scraping Inspirational.html
from bs4 import BeautifulSoup
import pandas as pd

# Load the HTML file
with open('Inspirational.html', 'r', encoding='utf-8') as f:
    soup = BeautifulSoup(f, 'html.parser')

# Get all quotes
quotes = soup.find_all('div', class_='quote')

# 1. List the quotes by "Albert Einstein" and their tags
print("=" * 60)
print("1. Quotes by Albert Einstein:")
print("=" * 60)
for quote in quotes:
    author = quote.find('small', class_='author').text
    if author == 'Albert Einstein':
        text = quote.find('span', class_='text').text
        tags = [tag.text for tag in quote.find_all('a', class_='tag')]
        print(f"\nQuote: {text}")
        print(f"Tags: {tags}")

# 2. List all quotes tagged with "love"
print("\n" + "=" * 60)
print("2. Quotes tagged with 'love':")
print("=" * 60)
for quote in quotes:
    tags = [tag.text for tag in quote.find_all('a', class_='tag')]
    if 'love' in tags:
        text = quote.find('span', class_='text').text
        author = quote.find('small', class_='author').text
        print(f"\nQuote: {text}")
        print(f"Author: {author}")

# 3. List all authors of quotes tagged as "inspirational"
print("\n" + "=" * 60)
print("3. Authors of quotes tagged as 'inspirational':")
print("=" * 60)
inspirational_authors = []
for quote in quotes:
    tags = [tag.text for tag in quote.find_all('a', class_='tag')]
    if 'inspirational' in tags:
        author = quote.find('small', class_='author').text
        inspirational_authors.append(author)
        
print(inspirational_authors)

1. Quotes by Albert Einstein:

Quote: “The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”
Tags: ['change', 'deep-thoughts', 'thinking', 'world']

Quote: “There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”
Tags: ['inspirational', 'life', 'live', 'miracle', 'miracles']

Quote: “Try not to become a man of success. Rather become a man of value.”
Tags: ['adulthood', 'success', 'value']

2. Quotes tagged with 'love':

Quote: “It is better to be hated for what you are than to be loved for what you are not.”
Author: André Gide

3. Authors of quotes tagged as 'inspirational':
['Albert Einstein', 'Marilyn Monroe', 'Thomas A. Edison']
['Albert Einstein', 'Marilyn Monroe', 'Thomas A. Edison']


### Exercise 2
Scrape the page ``ScrapeBooks.html`` and provide the code for the following:
1. List the title and price of all books with 1 or 2 stars
2. Create a boxplot with the distribution of prices of the books displayed in the first page
3. Create a scatterplot considering "number of stars" vs "price"
4. Compare the prices of those books with 1 or 2 stars and those with 3 or more using Mann-Whitney


In [None]:
# Exercise 2: Scraping ScrapeBooks.html
import matplotlib.pyplot as plt
from scipy.stats import mannwhitneyu

# Load the HTML file
with open('ScrapeBooks.html', 'r', encoding='utf-8') as f:
    soup = BeautifulSoup(f, 'html.parser')

# Get all books
books = soup.find_all('article', class_='product_pod')

# Star rating mapping
star_map = {'One': 1, 'Two': 2, 'Three': 3, 'Four': 4, 'Five': 5}

# Extract data from all books
book_data = []
for book in books:
    rating_class = book.find('p', class_='star-rating')['class']
    title = book.find('h3').find('a')['title']
    price_text = book.find('p', class_='price_color').text
    price = float(price_text.replace('£', ''))
    
    # Get star rating
    stars = 0
    for star_name, star_num in star_map.items():
        if star_name in rating_class:
            stars = star_num
            break
    
    book_data.append({
        'title': title,
        'price': price,
        'stars': stars
    })

df_books = pd.DataFrame(book_data)
print("All books data:")
print(df_books)

# 1. List the title and price of all books with 1 or 2 stars
print("\n" + "=" * 60)
print("1. Books with 1 or 2 stars:")
print("=" * 60)
low_rated = df_books[df_books['stars'] <= 2][['title', 'price']]
print(low_rated)

# 2. Create a boxplot with the distribution of prices
print("\n" + "=" * 60)
print("2. Boxplot of price distribution:")
print("=" * 60)
plt.figure(figsize=(8, 6))
plt.boxplot(df_books['price'])
plt.ylabel('Price (£)')
plt.title('Book Price Distribution')
plt.show()

# 3. Create a scatterplot of stars vs price
print("\n" + "=" * 60)
print("3. Scatterplot of Stars vs Price:")
print("=" * 60)
plt.figure(figsize=(8, 6))
plt.scatter(df_books['stars'], df_books['price'])
plt.xlabel('Number of Stars')
plt.ylabel('Price (£)')
plt.title('Stars vs Price')
plt.xticks([1, 2, 3, 4, 5])
plt.show()

# 4. Compare prices using Mann-Whitney test
print("\n" + "=" * 60)
print("4. Mann-Whitney U test comparing prices:")
print("=" * 60)
low_star_prices = df_books[df_books['stars'] <= 2]['price']
high_star_prices = df_books[df_books['stars'] >= 3]['price']

print(f"Books with 1-2 stars: {len(low_star_prices)} books")
print(f"Mean price: £{low_star_prices.mean():.2f}")
print(f"\nBooks with 3+ stars: {len(high_star_prices)} books")
print(f"Mean price: £{high_star_prices.mean():.2f}")

stat, pvalue = mannwhitneyu(low_star_prices, high_star_prices, alternative='two-sided')
print(f"\nMann-Whitney U statistic: {stat}")
print(f"P-value: {pvalue}")

if pvalue < 0.05:
    print("\nConclusion: There is a statistically significant difference in prices between low-rated (1-2 stars) and high-rated (3+ stars) books.")
else:
    print("\nConclusion: There is no statistically significant difference in prices between low-rated (1-2 stars) and high-rated (3+ stars) books.")

All books data:
                                                title  price  stars
0                                A Light in the Attic  51.77      3
1                                  Tipping the Velvet  53.74      1
2                                          Soumission  50.10      1
3                                       Sharp Objects  47.82      4
4               Sapiens: A Brief History of Humankind  54.23      5
5                                     The Requiem Red  22.65      1
6   The Dirty Little Secrets of Getting Your Dream...  33.34      4
7   The Coming Woman: A Novel Based on the Life of...  17.93      3
8   The Boys in the Boat: Nine Americans and Their...  22.60      4
9                                     The Black Maria  52.15      1
10     Starving Hearts (Triangular Trade Trilogy, #1)  13.99      2
11                              Shakespeare's Sonnets  20.66      4
12                                        Set Me Free  17.46      5
13  Scott Pilgrim's Precious Lit