**Here I shall be showing you how to do web scraping using BeautifulSoup, on the website [toscrape.com](https://quotes.toscrape.com/)**

In [1]:
# Install required libraries
# !pip install beautifulsoup4
# !pip install requests

In [2]:
import requests
from bs4 import BeautifulSoup

In [3]:
website = requests.get('https://quotes.toscrape.com/')
soup = BeautifulSoup(website.text, 'html.parser')

**Extracting the title of the website first**

In [5]:
title = soup.title
print(title)

<title>Quotes to Scrape</title>


**Here, we can clearly see that, instead of just the title of the website, it is extracting the whole title tag, we can extract title using ```.text``` method**

In [6]:
print(title.text)

Quotes to Scrape


**NOTE:-** ``` In order to scrape more data from the website, you need to know the basics of HTML and CSS, you just need to know what are classes, tags and other such basic terms :)```

**In order to scrape more data, you need to understand what element lies in which tag, so just Right Click on the web page and click on 'Inspect', you will get a HTML page and you will be able to see which element lies in which tag**

### Scraping the header

**Using ```find``` method**

In [10]:
soup.find('h1').text.strip()

'Quotes to Scrape'

**Using ```CSS-Selectors```**

In [20]:
soup.select_one('.col-md-8 > h1').text.strip()

'Quotes to Scrape'

There are many different methods to scrape the same element from the websites like:
 - Using Find Method
 - Using CSS Selectors
 - Extracting children from parent element
 - Extracting siblings of an element

**In this notebook, I shall be using ```CSS-Selectors``` and ```Find``` method only as they are easier to understand**

### Extracting the first Quote

**The very first quote lies in the class named "text"**

In [26]:
soup.find(class_ = "text").text.strip()

'“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”'

In [27]:
quotes = soup.find_all(class_ = 'text')
quotes

[<span class="text" itemprop="text">“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”</span>,
 <span class="text" itemprop="text">“It is our choices, Harry, that show what we truly are, far more than our abilities.”</span>,
 <span class="text" itemprop="text">“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”</span>,
 <span class="text" itemprop="text">“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”</span>,
 <span class="text" itemprop="text">“Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.”</span>,
 <span class="text" itemprop="text">“Try not to become a man of success. Rather become a man of value.”</span>,
 <span class="text" itemprop="text">“It is better to be hated for what you are than to be loved for what you are not.

**Here it is giving all the tags having the class = "text", in the form of a list but we want quotes only**

**Extracting quotes text from the above list**

In [29]:
for quote in quotes:
    print(quote.text,"\n")

“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.” 

“It is our choices, Harry, that show what we truly are, far more than our abilities.” 

“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.” 

“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.” 

“Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.” 

“Try not to become a man of success. Rather become a man of value.” 

“It is better to be hated for what you are than to be loved for what you are not.” 

“I have not failed. I've just found 10,000 ways that won't work.” 

“A woman is like a tea bag; you never know how strong it is until it's in hot water.” 

“A day without sunshine is like, you know, night.” 



**Scraping Author names of each quote in the same manner** 

In [33]:
# soup.select('.author') CSS selector of the same output
authors = soup.find_all(class_ = 'author')
authors

[<small class="author" itemprop="author">Albert Einstein</small>,
 <small class="author" itemprop="author">J.K. Rowling</small>,
 <small class="author" itemprop="author">Albert Einstein</small>,
 <small class="author" itemprop="author">Jane Austen</small>,
 <small class="author" itemprop="author">Marilyn Monroe</small>,
 <small class="author" itemprop="author">Albert Einstein</small>,
 <small class="author" itemprop="author">André Gide</small>,
 <small class="author" itemprop="author">Thomas A. Edison</small>,
 <small class="author" itemprop="author">Eleanor Roosevelt</small>,
 <small class="author" itemprop="author">Steve Martin</small>]

In [34]:
for author in authors:
    print(author.text)

Albert Einstein
J.K. Rowling
Albert Einstein
Jane Austen
Marilyn Monroe
Albert Einstein
André Gide
Thomas A. Edison
Eleanor Roosevelt
Steve Martin


**In the same manner, those tags of each quote can be extracted, you can do that by yourself :)**

_________________________________________________________________________________________________________________________

**Now let's try to extract the data from multiple pages**

In [36]:
soup.select_one(".next > a")

<a href="/page/2/">Next <span aria-hidden="true">→</span></a>

**We can clearly see that each page contains a ```Next``` button which contains the class called next which fuurther contains the link of next page**