## Practice Problem on Web Scraping - BeautifulSoup

### Problem 1 : Print Body tag
You are given the HTML content of a webpage, your task is to :   
Print all the contents of the body tag(including body tag)  
NOTE : You are provided the HTML content inside variable html

In [1]:
html = '<!DOCTYPE html><html><head><title>Learning Beautiful Soup</title></head>\
<body><h1> About Us </h1><div class = "first_div"><p>Coding Ninjas Website</p>\
<a href="https://www.codingninjas.in/">Link to Coding Ninjas Website</a>\
<ul><li>This</li><li>is</li><li>an</li><li>unordered</li><li>list.</li></ul>\
</div><p id = "template_p">This is a template paragraph tag</p>\
<a href = "https://www.facebook.com/codingninjas/">\
This is the link of our Facebook Page</a></body></html>'

In [5]:
from bs4 import BeautifulSoup
data = BeautifulSoup(html, 'html.parser')
print(data.body)

<body><h1> About Us </h1><div class="first_div"><p>Coding Ninjas Website</p><a href="https://www.codingninjas.in/">Link to Coding Ninjas Website</a><ul><li>This</li><li>is</li><li>an</li><li>unordered</li><li>list.</li></ul></div><p id="template_p">This is a template paragraph tag</p><a href="https://www.facebook.com/codingninjas/">This is the link of our Facebook Page</a></body>


### Problem 2 : Attributes of div tag
You are given the HTML content of a webpage, your task is to :  
Print the name of all attributes of first div tag of the page  
NOTE : You are provided the HTML content inside variable html  

In [7]:
from bs4 import BeautifulSoup
data = BeautifulSoup(html, 'html.parser')
attr = data.div.attrs
for i in attr:
    print(i)

class


### Problem 3 : Strings of li
You are given the HTML content of a webpage, your task is to :  
Print the strings(only text without tag names) of all li tags separated by a space  
NOTE : You are provided the HTML content inside variable html

In [11]:
from bs4 import BeautifulSoup
data = BeautifulSoup(html, 'html.parser')
l = data.find_all('li')
for i in l :
    print(i.string,end=' ')

This is an unordered list. 

### Problem 4 : href of A tag
You are given the HTML content of a webpage, your task is to :    
Print the href of all the <a> tags on the page in different lines    
NOTE : You are provided the HTML content inside variable html  

In [23]:
from bs4 import BeautifulSoup
data = BeautifulSoup(html, 'html.parser')
l = data.find_all('a')
for i in l :
    print(i.attrs['href'])

https://www.codingninjas.in/
https://www.facebook.com/codingninjas/


### Problem 5 : Descendants and children
You are given the HTML content of a webpage, your task is to :  
Print the difference between the number of descendants and the number of children of the html tag  
NOTE : You are provided the HTML content inside variable html  

In [28]:
html = '<!DOCTYPE html><html><head><title>Navigate Parse Tree</title></head>\
<body><h1>This is your Assignment</h1><a href = "https://www.google.com">This is a link that will take you to Google</a>\
<ul><li><p> This question is given to test your knowledge of <b>Web Scraping</b></p>\
<p>Web scraping is a term used to describe the use of a program or algorithm to extract and process large amounts of data from the web.</p></li>\
<li id = "li2">This is an li tag given to you for scraping</li>\
<li>This li tag gives you the various ways to get data from a website\
<ol><li class = "list_or">Using API of the website</li><li>Scrape data using BeautifulSoup</li><li>Scrape data using Selenium</li>\
<li>Scrape data using Scrapy</li></ol></li>\
<li class = "list_or"><a href="https://www.crummy.com/software/BeautifulSoup/bs4/doc/">\
Clicking on this takes you to the documentation of BeautifulSoup</a>\
<a href="https://selenium-python.readthedocs.io/" id="anchor">Clicking on this takes you to the documentation of Selenium</a>\
</li></ul></body></html>'

In [26]:
from bs4 import BeautifulSoup
data = BeautifulSoup(html, 'html.parser')
child = list(data.html.children)
desc = list(data.html.descendants)
print(len(desc)-len(child))

32


### Problem 6 : Name of tags with ID
You are given the HTML content of a webpage, your task is to :  
Print the name of all the tags in different lines that have an id attribute.  
NOTE : You are provided the HTML content inside variable html

In [40]:
from bs4 import BeautifulSoup
data = BeautifulSoup(html, 'html.parser')
l = data.find_all(id=True)
for i in l:
    print(i.name)

li
a


### Problem 7 : Next Sibling
You are given the HTML content of a webpage, your task is to :  
Print all content of the next siblings of the tag that have id as “li2”(in different lines)  
NOTE : Content includes the complete html of tag  

In [57]:
from bs4 import BeautifulSoup
data = BeautifulSoup(html, 'html.parser')
l = data.find(id = "li2").next_siblings 
for i in list(l):
    print(i)


<li>This li tag gives you the various ways to get data from a website<ol><li class="list_or">Using API of the website</li><li>Scrape data using BeautifulSoup</li><li>Scrape data using Selenium</li><li>Scrape data using Scrapy</li></ol></li>
<li class="list_or"><a href="https://www.crummy.com/software/BeautifulSoup/bs4/doc/">Clicking on this takes you to the documentation of BeautifulSoup</a><a href="https://selenium-python.readthedocs.io/" id="anchor">Clicking on this takes you to the documentation of Selenium</a></li>


### Problem 8 : Parents of title
You are given the HTML content of a webpage, your task is to :  
Print content of all the parents of the title tag(linewise)  
NOTE : Content includes the whole HTML of the tag  

In [96]:
from bs4 import BeautifulSoup
data = BeautifulSoup(html, 'html.parser')
l = data.title
for parent in l.parents:
    print(parent)

<head><title>Navigate Parse Tree</title></head>
<html><head><title>Navigate Parse Tree</title></head><body><h1>This is your Assignment</h1><a href="https://www.google.com">This is a link that will take you to Google</a><ul><li><p> This question is given to test your knowledge of <b>Web Scraping</b></p><p>Web scraping is a term used to describe the use of a program or algorithm to extract and process large amounts of data from the web.</p></li><li id="li2">This is an li tag given to you for scraping</li><li>This li tag gives you the various ways to get data from a website<ol><li class="list_or">Using API of the website</li><li>Scrape data using BeautifulSoup</li><li>Scrape data using Selenium</li><li>Scrape data using Scrapy</li></ol></li><li class="list_or"><a href="https://www.crummy.com/software/BeautifulSoup/bs4/doc/">Clicking on this takes you to the documentation of BeautifulSoup</a><a href="https://selenium-python.readthedocs.io/" id="anchor">Clicking on this takes you to the doc

### Problem 9 : Next Element
You are given the HTML content of a webpage, your task is to :  
Print the string which is present inside the isecond <a> tag using BeautifulSoup's next_element property  

In [68]:
from bs4 import BeautifulSoup
data = BeautifulSoup(html, 'html.parser')
second = data.find_all('a')[1]
print(second.next_element)

Clicking on this takes you to the documentation of BeautifulSoup


### Problem 10 : Book Names from First Page
Print the title of all 20 books which are present on first page of this http://books.toscrape.com/ website.

In [6]:
from bs4 import BeautifulSoup
import requests
response = requests.get('http://books.toscrape.com/')
#response
html_data = response.text
data = BeautifulSoup(html_data, 'html.parser')
#print(data.prettify())
books=data.find_all(class_='product_pod')
for i in books:
    print(i.h3.a['title'])

A Light in the Attic
Tipping the Velvet
Soumission
Sharp Objects
Sapiens: A Brief History of Humankind
The Requiem Red
The Dirty Little Secrets of Getting Your Dream Job
The Coming Woman: A Novel Based on the Life of the Infamous Feminist, Victoria Woodhull
The Boys in the Boat: Nine Americans and Their Epic Quest for Gold at the 1936 Berlin Olympics
The Black Maria
Starving Hearts (Triangular Trade Trilogy, #1)
Shakespeare's Sonnets
Set Me Free
Scott Pilgrim's Precious Little Life (Scott Pilgrim #1)
Rip it Up and Start Again
Our Band Could Be Your Life: Scenes from the American Indie Underground, 1981-1991
Olio
Mesaerion: The Best Science Fiction Stories 1800-1849
Libertarianism for Beginners
It's Only the Himalayas


### Problem 11 : All Categories
Print the name of all categories which are present this http://books.toscrape.com/ website.

In [1]:
from bs4 import BeautifulSoup
import requests
response = requests.get('http://books.toscrape.com/')
html_data = response.text
data = BeautifulSoup(html_data, 'html.parser')
cat = data.find(class_='side_categories')
c=cat.find_all('a')
for i in c:
    if i.string.strip() != "Books":
        print(i.string.strip())

Travel
Mystery
Historical Fiction
Sequential Art
Classics
Philosophy
Romance
Womens Fiction
Fiction
Childrens
Religion
Nonfiction
Music
Default
Science Fiction
Sports and Games
Add a comment
Fantasy
New Adult
Young Adult
Science
Poetry
Paranormal
Art
Psychology
Autobiography
Parenting
Adult Fiction
Humor
Horror
History
Food and Drink
Christian Fiction
Business
Biography
Thriller
Contemporary
Spirituality
Academic
Self Help
Historical
Christian
Suspense
Short Stories
Novels
Health
Politics
Cultural
Erotica
Crime


### Problem 12 : All Book Names
Print the title of all books which are present on first 10 pages of this http://books.toscrape.com/ website.

In [None]:
from bs4 import BeautifulSoup
import requests

In [3]:
all_urls = ['http://books.toscrape.com/catalogue/page-1.html']
cur_url='http://books.toscrape.com/catalogue/page-1.html'
base_url='http://books.toscrape.com/catalogue/'

response = requests.get(cur_url)
i=0
while response.status_code==200:
    data = BeautifulSoup(response.text, 'html.parser')
    next_page = data.find(class_='next')
    if i==9:
        break
    next_page_url = base_url+next_page.a['href']
    #print(next_page_url)
    all_urls.append(next_page_url)
    cur_url = next_page_url
    response = requests.get(cur_url)
    i+=1

In [5]:
for i in all_urls:
    response = requests.get(i)
    html_data = response.text
    data = BeautifulSoup(html_data, 'html.parser')
    books=data.find_all(class_='product_pod')
    for i in books:
        print(i.h3.a['title'])

A Light in the Attic
Tipping the Velvet
Soumission
Sharp Objects
Sapiens: A Brief History of Humankind
The Requiem Red
The Dirty Little Secrets of Getting Your Dream Job
The Coming Woman: A Novel Based on the Life of the Infamous Feminist, Victoria Woodhull
The Boys in the Boat: Nine Americans and Their Epic Quest for Gold at the 1936 Berlin Olympics
The Black Maria
Starving Hearts (Triangular Trade Trilogy, #1)
Shakespeare's Sonnets
Set Me Free
Scott Pilgrim's Precious Little Life (Scott Pilgrim #1)
Rip it Up and Start Again
Our Band Could Be Your Life: Scenes from the American Indie Underground, 1981-1991
Olio
Mesaerion: The Best Science Fiction Stories 1800-1849
Libertarianism for Beginners
It's Only the Himalayas
In Her Wake
How Music Works
Foolproof Preserving: A Guide to Small Batch Jams, Jellies, Pickles, Condiments, and More: A Foolproof Guide to Making Small Batch Jams, Jellies, Pickles, Condiments, and More
Chase Me (Paris Nights #2)
Black Dust
Birdsong: A Story in Pictures
A

Modern Romance
Miss Peregrineâs Home for Peculiar Children (Miss Peregrineâs Peculiar Children #1)
Louisa: The Extraordinary Life of Mrs. Adams
Little Red
Library of Souls (Miss Peregrineâs Peculiar Children #3)
Large Print Heart of the Pride
I Had a Nice Time And Other Lies...: How to find love & sh*t like that
Hollow City (Miss Peregrineâs Peculiar Children #2)
Grumbles
Full Moon over Noahâs Ark: An Odyssey to Mount Ararat and Beyond
Frostbite (Vampire Academy #2)
Follow You Home
First Steps for New Christians (Print Edition)
Finders Keepers (Bill Hodges Trilogy #2)
Fables, Vol. 1: Legends in Exile (Fables #1)
Eureka Trivia 6.0
Drive: The Surprising Truth About What Motivates Us
Done Rubbed Out (Reightman & Bailey #1)
Doing It Over (Most Likely To #1)
Deliciously Ella Every Day: Quick and Easy Recipes for Gluten-Free Snacks, Packed Lunches, and Simple Meals


### Problem 13 : Book Details
Find and print the details of all books which are present on first 2 pages of this  http://books.toscrape.com/ website.    
All details include - Title of the book, book page url, Price (in float, without any currency or extra symbol), and quantity in stock (in integer). Save all the details in a dataframe and print in the required format.  
Note: Remove the Trailing Zeros from price of book.

In [12]:
from bs4 import BeautifulSoup
import pandas as pd
import requests
import re

allPages = ['http://books.toscrape.com/catalogue/page-1.html','http://books.toscrape.com/catalogue/page-2.html']
column_names = ['Title', 'Link', 'Price', 'Quantity in Stock']
base_url = 'http://books.toscrape.com/catalogue/'

allBooks=[]
for i in allPages:
    response = requests.get(i)
    data = BeautifulSoup(response.text, 'html.parser')
    book = data.find_all(class_='product_pod')
    for i in book:
        b_url = base_url + i.h3.a['href']
        allBooks.append(b_url)

        
Book_details=[]
for i in allBooks:
    response = requests.get(i)
    data = BeautifulSoup(response.text, 'html.parser')
    title = data.h1.string
    price = data.find(class_='price_color').string
    qty = data.find(class_='instock availability')
    qty = qty.contents[-1].strip()
    qty = int(re.search('\d+',qty).group())
    price = float(re.search('[\d.]+',price).group())
    Book_details.append([title, i, price, qty])
    
#for i in Book_details:
#print(*i)
df = pd.DataFrame(Book_details,columns=column_names)
for i in range(len(df)):
    print(df['Title'][i],df['Link'][i],df['Price'][i],df['Quantity in Stock'][i])
    
    


A Light in the Attic http://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html 51.77 22
Tipping the Velvet http://books.toscrape.com/catalogue/tipping-the-velvet_999/index.html 53.74 20
Soumission http://books.toscrape.com/catalogue/soumission_998/index.html 50.1 20
Sharp Objects http://books.toscrape.com/catalogue/sharp-objects_997/index.html 47.82 20
Sapiens: A Brief History of Humankind http://books.toscrape.com/catalogue/sapiens-a-brief-history-of-humankind_996/index.html 54.23 20
The Requiem Red http://books.toscrape.com/catalogue/the-requiem-red_995/index.html 22.65 19
The Dirty Little Secrets of Getting Your Dream Job http://books.toscrape.com/catalogue/the-dirty-little-secrets-of-getting-your-dream-job_994/index.html 33.34 19
The Coming Woman: A Novel Based on the Life of the Infamous Feminist, Victoria Woodhull http://books.toscrape.com/catalogue/the-coming-woman-a-novel-based-on-the-life-of-the-infamous-feminist-victoria-woodhull_993/index.html 17.93 19
The Boy

In [11]:
import requests

from bs4 import BeautifulSoup

import pandas as pd

allPages = ['http://books.toscrape.com/catalogue/page-1.html',
            'http://books.toscrape.com/catalogue/page-2.html']

column_names = ['Title', 'Link', 'Price', 'Quantity in Stock']



base_url="http://books.toscrape.com/catalogue/"
book_webpages=[]


for i in allPages:
    response=requests.get(i)
    data=BeautifulSoup(response.text,"html.parser")
    
    for j in data.find_all(class_="product_pod") :
        book_webpages.append(base_url+j.h3.a["href"])
# print(book_webpages)
Title=[]
Link=[]
Price=[]
Quantity=[]
 
for link in book_webpages:
    r=requests.get(link)
    data=BeautifulSoup(r.text,"html.parser")
    
    tag=data.find(class_="col-sm-6 product_main")
    
    Title.append(tag.h1.string.strip())
    Link.append(link)
    string=tag.p.string.strip()
    Price.append(float(string[2:]))
    l=list(tag.find(class_="instock availability").strings)
    s=l[1].strip()
    Quantity.append(int(s[10:(len(s)-11)]))

df=pd.DataFrame({"title":Title,"link":Link,"price":Price,"quantity":Quantity})
# print(df)
for row in range(len(df.title)) :
    ans=df.iloc[row,:].values
    for i in ans:
        print(i,end=" ")
    print()


A Light in the Attic http://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html 51.77 22 
Tipping the Velvet http://books.toscrape.com/catalogue/tipping-the-velvet_999/index.html 53.74 20 
Soumission http://books.toscrape.com/catalogue/soumission_998/index.html 50.1 20 
Sharp Objects http://books.toscrape.com/catalogue/sharp-objects_997/index.html 47.82 20 
Sapiens: A Brief History of Humankind http://books.toscrape.com/catalogue/sapiens-a-brief-history-of-humankind_996/index.html 54.23 20 
The Requiem Red http://books.toscrape.com/catalogue/the-requiem-red_995/index.html 22.65 19 
The Dirty Little Secrets of Getting Your Dream Job http://books.toscrape.com/catalogue/the-dirty-little-secrets-of-getting-your-dream-job_994/index.html 33.34 19 
The Coming Woman: A Novel Based on the Life of the Infamous Feminist, Victoria Woodhull http://books.toscrape.com/catalogue/the-coming-woman-a-novel-based-on-the-life-of-the-infamous-feminist-victoria-woodhull_993/index.html 17.93 19 

## Assignment

### Problem 1 : Print the data of first 3 movies
From this https://www.imdb.com/search/title/?release_date=2018&sort=num_votes,desc&page=1&ref_=adv_nxt link,  
Find and print the name and genre of the first 3 titles

In [26]:
from bs4 import BeautifulSoup
import requests
response = requests.get('https://www.imdb.com/search/title/?release_date=2018&sort=num_votes,desc&page=1&ref_=adv_nxt')
data = BeautifulSoup(response.text,"html.parser")
title = data.find_all(class_='lister-item-header')[:3]
genre = data.find_all(class_='genre')[:3]
for i in range(len(e)):
    print(title[i].find('a').string+' ;',end=' ')
    print(genre[i].string.strip())

Avengers: Infinity War ; Action, Adventure, Sci-Fi
Black Panther ; Action, Adventure, Sci-Fi
Deadpool 2 ; Action, Adventure, Comedy


### Problem 2 : titles with most votes
Link to use https://www.imdb.com/search/title/?release_date=2018&sort=num_votes,desc&page=1&ref_=adv_nxt  
Print the names of movies with highest number of votes from year 2010 to 2014  
Note : Print the titles line wise starting from year 2010  

In [6]:
from bs4 import BeautifulSoup
import requests
urls = ['https://www.imdb.com/search/title/?release_date=2010-01-01,2010-12-31&sort=num_votes,desc&ref_=adv_prv', 'https://www.imdb.com/search/title/?release_date=2011-01-01,2011-12-31&sort=num_votes,desc&ref_=adv_prv',
       'https://www.imdb.com/search/title/?release_date=2012-01-01,2012-12-31&sort=num_votes,desc&ref_=adv_prv', 'https://www.imdb.com/search/title/?release_date=2013-01-01,2013-12-31&sort=num_votes,desc&ref_=adv_prv',
       'https://www.imdb.com/search/title/?release_date=2014-01-01,2014-12-31&sort=num_votes,desc&ref_=adv_prv']
for i in urls:
    res = requests.get(i)
    data = BeautifulSoup(res.text,"html.parser")
    a = data.find(class_ = "lister-item mode-advanced")
    name = a.find(class_ = "lister-item-content").a.string
    print(name)

Inception
Game of Thrones
The Dark Knight Rises
The Wolf of Wall Street
Interstellar


### Problem 3 : Title with maximum duration
Link to use https://www.imdb.com/search/title/?release_date=2018&sort=num_votes,desc&page=1&ref_=adv_nxt  
Out of the first 250 titles with highest number of votes in 2018,find which title has the maximum duration.

In [7]:
from bs4 import BeautifulSoup
import requests
import time
from random import randint

In [9]:
dct = {}
for i in range(1,202,50):
    res = requests.get('https://www.imdb.com/search/title/?release_date=2018&sort=num_votes,desc&page=1&ref_=adv_nxt')
    data = BeautifulSoup(res.text,"html.parser")
    tags = data.find_all('div',class_='lister-item')
    for j in tags:
        if j.find('span',class_='runtime'):
            head = j.find('h3',class_='lister-item-header')
            dur = j.find('span',class_='runtime')
            t= int(dur.text.strip().split(' ')[0])
            dct[head.a.string] = t
    time.sleep(randint(0,3))
maxdur = -1
maxnum = 0
for k,v in dct.items():
    if v>maxdur:
        maxdur = v
        maxname = k
print(maxname,maxdur)
            
    

The Haunting of Hill House 572


### Problem 4 : Applications of AI
From this website : https://en.wikipedia.org/wiki/Artificial_intelligence  
Find and print all applications of AI (As present in Contents of the page)  
Note : Print applications line wise

In [2]:
from bs4 import BeautifulSoup
import requests

In [4]:
response = requests.get('https://en.wikipedia.org/wiki/Artificial_intelligence')
data = BeautifulSoup(response.text,"html.parser")
b1 = data.find_all(class_='toclevel-1 tocsection-36')
for i in b1:
    for j in i.ul.find_all(class_='toctext'):
        print(j.string)
print('')




### Problem 5 : Image with maximum area
From this website : https://en.wikipedia.org/wiki/Artificial_intelligence  
Find and print the src of the <img> tag which occupies the maximum area on the page.  
Note : Ignore images which doesn't have height or width attributes

In [49]:
from bs4 import BeautifulSoup
import requests

response = requests.get('https://en.wikipedia.org/wiki/Artificial_intelligence')
data = BeautifulSoup(response.text,"html.parser")
all_tag = data.find_all('img')
max_area = -1
url = ''
for i in all_tag :
    if i.has_attr('height') and i.has_attr('width'):
        if int(i['height']) * int(i['width']) > max_area:
            max_area = int(i['height']) * int(i['width'])
            url = i['src']
print(url)

//upload.wikimedia.org/wikipedia/commons/thumb/1/13/Joseph_Ayerle_portrait_of_Ornella_Muti_%28detail%29%2C_calculated_by_Artificial_Intelligence_%28AI%29_technology.jpg/220px-Joseph_Ayerle_portrait_of_Ornella_Muti_%28detail%29%2C_calculated_by_Artificial_Intelligence_%28AI%29_technology.jpg


### Problem 6 : Quotes with tag humor
Find all the quotes that have the tag as "humor" from this website http://quotes.toscrape.com/  

In [56]:
import requests
from bs4 import BeautifulSoup

all_urls = ['http://quotes.toscrape.com/']
base = 'http://quotes.toscrape.com/'

response = requests.get(base)
next_url=''
q_st=[]
t_st=[]

while response.status_code == 200 :
    
    data = BeautifulSoup(response.text, 'html.parser')
    next_page= data.find(class_='next')
    q =  data.find_all(class_ = 'quote')
    for i in range(len(q)):
        temp = []
        txt = q[i].find(class_='text')
        tag = q[i].find_all('a',class_='tag')
        #print(txt.text.strip())
        for j in range(len(tag)):
            #print(tag[j].text.strip())
            temp.append(tag[j].text.strip())

        if 'humor' in temp:
            q_st.append(txt)
            t_st.append(temp)
    if data.find(class_ = 'next') is None : 
        break
    
    next_url=base+next_page.a['href']
    all_urls.append(next_url)  
    response = requests.get(next_url)
#print(len(all_urls))
#for j in all_urls :
    #response = requests.get(j)
    #data = BeautifulSoup(response.text, 'html.parser')

    #for i in data.find_all(class_ = 'quote'):
        #s = i.find('a',class_ = 'tag')
        #if s:
            #if s.string=='humor':
               #st= i.find(class_ = 'text').string
                #l=len(st)
                #print(st)
for i in range(len(q_st)):
    print(q_st[i].text.strip())

“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”
“A day without sunshine is like, you know, night.”
“Anyone who thinks sitting in church can make you a Christian must also think that sitting in a garage can make you a car.”
“Beauty is in the eye of the beholder and it may be necessary from time to time to give a stupid or misinformed beholder a black eye.”
“All you need is love. But a little chocolate now and then doesn't hurt.”
“Remember, we're madly in love, so it's all right to kiss me anytime you feel like it.”
“Some people never go crazy. What truly horrible lives they must lead.”
“The trouble with having an open mind, of course, is that people will insist on coming along and trying to put things in it.”
“Think left and think right and think low and think high. Oh, the thinks you can think up if only you try!”
“The reason I talk to myself is because I’m the only one whose answers I accept.”
“I am free of all prejudice. I hate

### Problem 7 : Print all authors
Find and print the names of all the different authors from all pages of this http://quotes.toscrape.com/ website    
Note : Print the names of all authors line wise sorted in dictionary order  

In [45]:
import requests
from bs4 import BeautifulSoup

all_urls = ['http://quotes.toscrape.com/page/1/']
base = 'http://quotes.toscrape.com'

response = requests.get(all_urls[0])

while response.status_code == 200 :
    data = BeautifulSoup(response.text, 'html.parser')
    if data.find(class_ = 'next') is None : 
        break
        
    url = data.find(class_ = 'next').a['href']
    all_urls.append(base+url)  
    response = requests.get(all_urls[-1])
    
auth_name = []
for i in all_urls :
    response = requests.get(i)
    data = BeautifulSoup(response.text, 'html.parser')

    for j in data.find_all(class_ = 'quote'):
        name = j.find(class_ = 'author').string
        if name not in auth_name:
            auth_name.append(name)
            
for name in sorted(auth_name):
    print(name)


Albert Einstein
Alexandre Dumas fils
Alfred Tennyson
Allen Saunders
André Gide
Ayn Rand
Bob Marley
C.S. Lewis
Charles Bukowski
Charles M. Schulz
Douglas Adams
Dr. Seuss
E.E. Cummings
Eleanor Roosevelt
Elie Wiesel
Ernest Hemingway
Friedrich Nietzsche
Garrison Keillor
George Bernard Shaw
George Carlin
George Eliot
George R.R. Martin
Harper Lee
Haruki Murakami
Helen Keller
J.D. Salinger
J.K. Rowling
J.M. Barrie
J.R.R. Tolkien
James Baldwin
Jane Austen
Jim Henson
Jimi Hendrix
John Lennon
Jorge Luis Borges
Khaled Hosseini
Madeleine L'Engle
Marilyn Monroe
Mark Twain
Martin Luther King Jr.
Mother Teresa
Pablo Neruda
Ralph Waldo Emerson
Stephenie Meyer
Steve Martin
Suzanne Collins
Terry Pratchett
Thomas A. Edison
W.C. Fields
William Nicholson


### Problem 8 : Birth Date of authors
Find the birth date of authors whose name start with 'J' from this http://quotes.toscrape.com/ website  
Note : Print a dictionary containing the name as key and the birth date as value.The Names of authors should be alphabetically sorted.

In [55]:
import requests
from bs4 import BeautifulSoup

authors = {}
for i in range(1,11):
    response = requests.get('http://quotes.toscrape.com/page/'+ str(i) + '/')
    data = BeautifulSoup(response.text, 'html.parser')
    for aut in data.select('.author') :
        if aut.text[0] == 'J':
            authors[aut.text] = aut.next_sibling.next_sibling['href']
bdate ={}
for author in sorted(authors) :
    page = requests.get('http://quotes.toscrape.com'+ authors[author])
    data = BeautifulSoup(page.text, 'html.parser')
    for i in data.select('.author-born-date'):
        bdate[author] = i.text
print(bdate)

{'J.D. Salinger': 'January 01, 1919', 'J.K. Rowling': 'July 31, 1965', 'J.M. Barrie': 'May 09, 1860', 'J.R.R. Tolkien': 'January 03, 1892', 'James Baldwin': 'August 02, 1924', 'Jane Austen': 'December 16, 1775', 'Jim Henson': 'September 24, 1936', 'Jimi Hendrix': 'November 27, 1942', 'John Lennon': 'October 09, 1940', 'Jorge Luis Borges': 'August 24, 1899'}


### Problem 9 : Quotes by Albert Einstein
Find all the quotes by Albert Einstein(in the order they appear on the page) from this website http://quotes.toscrape.com/  
Note : Fetch data from all the pages.  

In [40]:
import requests
from bs4 import BeautifulSoup

all_urls = ['http://quotes.toscrape.com/page/1/']
base = 'http://quotes.toscrape.com'

response = requests.get(all_urls[0])

while response.status_code == 200 :
    data = BeautifulSoup(response.text, 'html.parser')
    
    if data.find(class_ = 'next') is None : 
        break
        
    url = data.find(class_ = 'next').a['href']
    all_urls.append(base+url)  
    response = requests.get(all_urls[-1])
    
for i in all_urls :
    response = requests.get(i)
    data = BeautifulSoup(response.text, 'html.parser')

    for j in data.find_all(class_ = 'quote'):
        s = j.find(class_ = 'author').string
        if s == 'Albert Einstein' :
            s= j.find(class_ = 'text').string
            print(s)

“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”
“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”
“Try not to become a man of success. Rather become a man of value.”
“If you can't explain it to a six year old, you don't understand it yourself.”
“If you want your children to be intelligent, read them fairy tales. If you want them to be more intelligent, read them more fairy tales.”
“Logic will get you from A to Z; imagination will get you everywhere.”
“Any fool can know. The point is to understand.”
“Life is like riding a bicycle. To keep your balance, you must keep moving.”
“If I were not a physicist, I would probably be a musician. I often think in music. I live my daydreams in music. I see my life in terms of music.”
“Anyone who has never made a mistake has never tried anything new.”
