# Web Scraping

Embark on a Python web scraping project exploring 'http://quotes.toscrape.comr to extract quotes, authors, and tags. Utilize BeautifulSoup and requests to navigate the website's HTML structure and uncover insights from the collected data.

#

#### Importing necessary libraries

In [1]:
import requests
from bs4 import BeautifulSoup

#### Setting the URL for the web scraping


In [2]:
url = "http://quotes.toscrape.com"

#### Making a request to the website

In [3]:
response = requests.get(url)
response

<Response [200]>

#### Parsing the HTML content of the page

In [4]:
soup = BeautifulSoup(response.content, "html.parser")

#### Extracting quote elements using BeautifulSoup

In [7]:
quote_elements = soup.find_all("span", class_="text")
quote_elements

[<span class="text" itemprop="text">“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”</span>,
 <span class="text" itemprop="text">“It is our choices, Harry, that show what we truly are, far more than our abilities.”</span>,
 <span class="text" itemprop="text">“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”</span>,
 <span class="text" itemprop="text">“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”</span>,
 <span class="text" itemprop="text">“Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.”</span>,
 <span class="text" itemprop="text">“Try not to become a man of success. Rather become a man of value.”</span>,
 <span class="text" itemprop="text">“It is better to be hated for what you are than to be loved for what you are not.

#### Displaying Original Quotes


In [12]:
print(" --- Original Quotes --- ")
for quote in quote_elements:
    print("\n")
    print(quote.text)

 --- Original Quotes --- 


“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”


“It is our choices, Harry, that show what we truly are, far more than our abilities.”


“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”


“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”


“Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.”


“Try not to become a man of success. Rather become a man of value.”


“It is better to be hated for what you are than to be loved for what you are not.”


“I have not failed. I've just found 10,000 ways that won't work.”


“A woman is like a tea bag; you never know how strong it is until it's in hot water.”


“A day without sunshine is like, you know, night.”


#### Converting Quotes to Uppercase

In [14]:
print("-- Uppercase Quotes --")
for quote in quote_elements:
    print("\n")
    print(quote.text.upper())

-- Uppercase Quotes --


“THE WORLD AS WE HAVE CREATED IT IS A PROCESS OF OUR THINKING. IT CANNOT BE CHANGED WITHOUT CHANGING OUR THINKING.”


“IT IS OUR CHOICES, HARRY, THAT SHOW WHAT WE TRULY ARE, FAR MORE THAN OUR ABILITIES.”


“THERE ARE ONLY TWO WAYS TO LIVE YOUR LIFE. ONE IS AS THOUGH NOTHING IS A MIRACLE. THE OTHER IS AS THOUGH EVERYTHING IS A MIRACLE.”


“THE PERSON, BE IT GENTLEMAN OR LADY, WHO HAS NOT PLEASURE IN A GOOD NOVEL, MUST BE INTOLERABLY STUPID.”


“IMPERFECTION IS BEAUTY, MADNESS IS GENIUS AND IT'S BETTER TO BE ABSOLUTELY RIDICULOUS THAN ABSOLUTELY BORING.”


“TRY NOT TO BECOME A MAN OF SUCCESS. RATHER BECOME A MAN OF VALUE.”


“IT IS BETTER TO BE HATED FOR WHAT YOU ARE THAN TO BE LOVED FOR WHAT YOU ARE NOT.”


“I HAVE NOT FAILED. I'VE JUST FOUND 10,000 WAYS THAT WON'T WORK.”


“A WOMAN IS LIKE A TEA BAG; YOU NEVER KNOW HOW STRONG IT IS UNTIL IT'S IN HOT WATER.”


“A DAY WITHOUT SUNSHINE IS LIKE, YOU KNOW, NIGHT.”


#### Counting the Number of Quotes

In [15]:
#[Assuming quote_elements is already defined and contains the quotes as elements]

sentence_count = 0
word_counts = []

for sentence in quote_elements:
    words = sentence.text.split()  # Splitting each sentence into words
    word_count = len(words)  # Counting the words in each quote
    word_counts.append(word_count)  # Adding the word count to the list
    sentence_count += 1

print(f"Total number of Quotes: {sentence_count}")


Total number of Quotes: 10


#### Displaying Word Count for Each Quote

In [20]:
for index, count in enumerate(word_counts, 1):
    print(f"Word count in quote number {index}: {count}")

# Identifying the Longest Quote
longest_quote = ""
for quote in quote_elements:
    text = quote.text
    if len(text) > len(longest_quote):
        longest_quote = text
        
print("\n")
print("The longest quote is:")
print(longest_quote)
print("\n")
print(f"Character count: {len(longest_quote)}")

Word count in quote number 1: 21
Word count in quote number 2: 16
Word count in quote number 3: 26
Word count in quote number 4: 19
Word count in quote number 5: 16
Word count in quote number 6: 14
Word count in quote number 7: 19
Word count in quote number 8: 12
Word count in quote number 9: 19
Word count in quote number 10: 9


The longest quote is:
“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”


Character count: 131


#### Counting Word Frequency Across All Quotes

In [23]:
word_frequency = {}
for quote in quote_elements:
    words = quote.text.split()
    for word in words:
        word_frequency[word] = word_frequency.get(word, 0) + 1
        # Sorting the word frequency dictionary by frequency in descending order
sorted_word_frequency = dict(sorted(word_frequency.items(), key=lambda x: x[1], reverse=True))

#### Displaying Word Frequency

In [24]:
for word, frequency in sorted_word_frequency.items():
    print(f"{word}: {frequency}")

is: 12
a: 7
be: 6
to: 5
our: 4
you: 4
as: 3
it: 3
of: 3
what: 3
than: 3
are: 3
not: 3
“The: 2
we: 2
have: 2
without: 2
“It: 2
that: 2
ways: 2
though: 2
in: 2
it's: 2
better: 2
absolutely: 2
become: 2
man: 2
for: 2
“A: 2
world: 1
created: 1
process: 1
thinking.: 1
It: 1
cannot: 1
changed: 1
changing: 1
thinking.”: 1
choices,: 1
Harry,: 1
show: 1
truly: 1
are,: 1
far: 1
more: 1
abilities.”: 1
“There: 1
only: 1
two: 1
live: 1
your: 1
life.: 1
One: 1
nothing: 1
miracle.: 1
The: 1
other: 1
everything: 1
miracle.”: 1
person,: 1
gentleman: 1
or: 1
lady,: 1
who: 1
has: 1
pleasure: 1
good: 1
novel,: 1
must: 1
intolerably: 1
stupid.”: 1
“Imperfection: 1
beauty,: 1
madness: 1
genius: 1
and: 1
ridiculous: 1
boring.”: 1
“Try: 1
success.: 1
Rather: 1
value.”: 1
hated: 1
loved: 1
not.”: 1
“I: 1
failed.: 1
I've: 1
just: 1
found: 1
10,000: 1
won't: 1
work.”: 1
woman: 1
like: 1
tea: 1
bag;: 1
never: 1
know: 1
how: 1
strong: 1
until: 1
hot: 1
water.”: 1
day: 1
sunshine: 1
like,: 1
know,: 1
night.”: 1


#### Identifying Unique Words Across All Quotes

In [32]:
unique_words = set()
for sentence in quote_elements:
    words = sentence.text.split()
    unique_words.update(words)

print("Unique words in quotes:\n", unique_words)

Unique words in quotes:
 {'have', 'it', '“Try', 'not.”', 'your', 'lady,', '“A', '“It', '“There', 'know,', 'genius', 'pleasure', 'strong', 'be', 'that', 'know', 'we', 'of', 'like', 'are', 'changed', 'success.', 'two', 'water.”', 'process', 'and', 'loved', 'in', 'cannot', 'gentleman', 'sunshine', 'thinking.', 'man', 'you', 'is', 'madness', 'world', 'miracle.”', 'good', 'life.', 'better', 'ways', 'more', 'than', 'must', 'who', 'has', 'value.”', 'Rather', 'for', 'ridiculous', 'far', 'are,', 'day', 'or', '“I', 'It', "it's", "won't", 'miracle.', 'hated', 'though', 'absolutely', 'our', 'like,', 'novel,', 'work.”', 'night.”', 'The', 'other', 'found', 'only', 'just', '“The', 'tea', 'until', 'hot', 'live', 'person,', 'thinking.”', 'Harry,', 'intolerably', 'never', 'as', '10,000', 'boring.”', 'become', 'abilities.”', 'bag;', 'not', 'woman', 'beauty,', "I've", 'changing', 'created', 'everything', '“Imperfection', 'show', 'what', 'failed.', 'choices,', 'nothing', 'stupid.”', 'truly', 'to', 'One', '

#### Calculating Average Word Length in Quotes

In [27]:
total_words = sum(word_counts)
average_word_length = sum(len(sentence.text) for sentence in quote_elements) / total_words
print("Average word length in quotes:", average_word_length)

Average word length in quotes: 5.233918128654971
