## Print the data of first 3 movies

From this https://www.imdb.com/search/title?release_date=2018&sort=num_votes,desc&page=1&ref_=adv_nxt

Find and print the name and genre of the first 3 titles

In [1]:
from bs4 import BeautifulSoup
import requests

res = requests.get('https://www.imdb.com/search/title/?release_date=2018&sort=num_votes,desc&page=1&ref_=adv_nxt')
data = BeautifulSoup(res.text, 'html.parser')
titles = data.find_all(class_ = 'lister-item-content') # it is of type bs4.element.ResultSet
genres = data.find_all(class_ = 'genre') # it is of type bs4.element.ResultSet
for i in range(3):
    gen_i = genres[i].string.strip()
    print(titles[i].a.string.strip(),';',gen_i)

Avengers: Infinity War ; Action, Adventure, Sci-Fi
Black Panther ; Action, Adventure, Sci-Fi
Deadpool 2 ; Action, Adventure, Comedy


## titles with most votes

Link to use https://www.imdb.com/search/title?release_date=2018&sort=num_votes,desc&page=1&ref_=adv_nxt

Print the names of movies with highest number of votes from year 2010 to 2014

In [3]:
from bs4 import BeautifulSoup
import requests

base_url1 = 'https://www.imdb.com/search/title/?release_date='
base_url2 = '&sort=num_votes,desc&page=1&ref_=adv_nxt'

for i in range(2010,2015):
    res = requests.get(base_url1 + str(i) + base_url2)
    data = BeautifulSoup(res.text, 'html.parser')
    print(data.h3.a.string)

Inception
Game of Thrones
The Dark Knight Rises
The Wolf of Wall Street
Interstellar


## title with maximum duration

Link to use https://www.imdb.com/search/title?release_date=2018&sort=num_votes,desc&page=1&ref_=adv_nxt

Out of the first 250 titles with highest number of votes in 2018,find which title has the maximum duration.

In [4]:
from bs4 import BeautifulSoup
import requests
import re

#If we use above link then we need to handle separate case for 1st time. So we use 
#https://www.imdb.com/search/title/?release_date=2018-01-01,2018-12-31&amp;sort=num_votes,desc&amp;start=1&amp;ref_=adv_nxt

url1 = 'https://www.imdb.com/search/title/?release_date=2018-01-01,2018-12-31&amp;sort=num_votes,desc&amp;start='
url2 = '&amp;ref_=adv_nxt'

max_dur = 0
for i in range(5): # in 1 call we get 50 results . So call 5 times bcz we want 250 results
    curr_url = url1 + str(i*50+1) + url2
    res = requests.get(curr_url)
    data = BeautifulSoup(res.text, 'html.parser')
    
    movies = data.find_all(class_ = 'lister-item-content')
    for c in range(50): # bcz 50 results in 1 page
        title = movies[c].h3.a.string
        duration = movies[c].find(class_='runtime')
        if duration is not None: # bcz out of 250 1 movie has no duration given
            duration = duration.string.strip()
            dur = int(re.search('\d+',str(duration)).group())
            if(dur > max_dur):
                max_dur = dur
                dur_title = dur
                title_name = title
        
print(title_name, dur_title)

The Haunting of Hill House 572


## Image with maximum area

From this website :https://en.wikipedia.org/wiki/Artificial_intelligence

Find and print the src of the <img> tag which occupies the maximum area on the page.

#### Note :
Ignore images which doesn't have height or width attributes

In [5]:
from bs4 import BeautifulSoup
import requests

res = requests.get('https://en.wikipedia.org/wiki/Artificial_intelligence')
data = BeautifulSoup(res.text,'html.parser')

images = data.find_all('img')
max_area = 0
for img in images:
    w = img.get('width',"")
    h = img.get('height',"")
    if w!="" and h!="":
        area = int(w)*int(h)
        if area > max_area:
            max_area = area
            src = img.get('src')
print(src)

//upload.wikimedia.org/wikipedia/commons/thumb/1/13/Joseph_Ayerle_portrait_of_Ornella_Muti_%28detail%29%2C_calculated_by_Artificial_Intelligence_%28AI%29_technology.jpg/220px-Joseph_Ayerle_portrait_of_Ornella_Muti_%28detail%29%2C_calculated_by_Artificial_Intelligence_%28AI%29_technology.jpg


## Quotes with tag humor

Find all the quotes that have the tag as "humor" from this website http://quotes.toscrape.com/

In [6]:
from bs4 import BeautifulSoup
import requests

base_url = 'http://quotes.toscrape.com'
curr_url = 'http://quotes.toscrape.com/page/1/'
while True:
    res = requests.get(curr_url)
    data = BeautifulSoup(res.text,'html.parser')

    quotes = data.find_all(class_='quote')
    for q in quotes:
        tags =  q.find(class_ = 'keywords').get('content') # or tag.find(class_ = 'keywords')['content'] 
        tag_list = tags.split(',')
        if "humor" in tag_list:
            print(q.find(class_='text').string)
    
    next_page = data.find(class_='next')
    if next_page is None:
        break
    
    curr_url = base_url + next_page.a['href']

“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”
“A day without sunshine is like, you know, night.”
“Anyone who thinks sitting in church can make you a Christian must also think that sitting in a garage can make you a car.”
“Beauty is in the eye of the beholder and it may be necessary from time to time to give a stupid or misinformed beholder a black eye.”
“All you need is love. But a little chocolate now and then doesn't hurt.”
“Remember, we're madly in love, so it's all right to kiss me anytime you feel like it.”
“Some people never go crazy. What truly horrible lives they must lead.”
“The trouble with having an open mind, of course, is that people will insist on coming along and trying to put things in it.”
“Think left and think right and think low and think high. Oh, the thinks you can think up if only you try!”
“The reason I talk to myself is because I’m the only one whose answers I accept.”
“I am free of all prejudice. I hate

## Print all authors

Find and print the names of all the different authors from all pages of this website http://quotes.toscrape.com/

Note : Print the names of all authors line wise sorted in dictionary order

In [7]:
from bs4 import BeautifulSoup
import requests

base_url = 'http://quotes.toscrape.com'
curr_url = 'http://quotes.toscrape.com/page/1/'
authors = {}
authors_list = []
while True:
    res = requests.get(curr_url)
    data = BeautifulSoup(res.text,'html.parser')

    quotes = data.find_all(class_='quote')
    for q in quotes:
        author = q.find(class_='author')
        if author is not None and authors.get(author.string,0)==0:
            authors[author.string] = 1
            authors_list.append(author.string)
        
    next_page = data.find(class_='next')
    if next_page is None:
        break
    
    curr_url = base_url + next_page.a['href']

authors_list.sort()
for author in authors_list:
    print(author)

Albert Einstein
Alexandre Dumas fils
Alfred Tennyson
Allen Saunders
André Gide
Ayn Rand
Bob Marley
C.S. Lewis
Charles Bukowski
Charles M. Schulz
Douglas Adams
Dr. Seuss
E.E. Cummings
Eleanor Roosevelt
Elie Wiesel
Ernest Hemingway
Friedrich Nietzsche
Garrison Keillor
George Bernard Shaw
George Carlin
George Eliot
George R.R. Martin
Harper Lee
Haruki Murakami
Helen Keller
J.D. Salinger
J.K. Rowling
J.M. Barrie
J.R.R. Tolkien
James Baldwin
Jane Austen
Jim Henson
Jimi Hendrix
John Lennon
Jorge Luis Borges
Khaled Hosseini
Madeleine L'Engle
Marilyn Monroe
Mark Twain
Martin Luther King Jr.
Mother Teresa
Pablo Neruda
Ralph Waldo Emerson
Stephenie Meyer
Steve Martin
Suzanne Collins
Terry Pratchett
Thomas A. Edison
W.C. Fields
William Nicholson


## Birth Date of authors

Find the birth date of authors whose name start with 'J' from this website http://quotes.toscrape.com/

Note : Print a dictionary containing the name as key and the birth date as value.The Names of authors should be alphabetically sorted.

In [None]:
from bs4 import BeautifulSoup
import requests

base_url = 'http://quotes.toscrape.com'
curr_url = 'http://quotes.toscrape.com/page/1/'
authors = {}
authors_list = []
while True:
    res = requests.get(curr_url)
    data = BeautifulSoup(res.text,'html.parser')

    quotes = data.find_all(class_='quote')
    for q in quotes:
        author = q.find(class_='author')
        if author is not None and authors.get(author.string,0)==0 and author.string[0]=='J':
            about_auhor = base_url + q.find_all('a')[0]['href'] # 1st link gives detail about author
            response = requests.get(about_auhor)
            auth_data = BeautifulSoup(response.text, 'html.parser')
            
            birth = auth_data.find(class_='author-born-date').string
            authors[author.string] = str(birth),
            authors_list.append((author.string,birth))
        
    next_page = data.find(class_='next')
    if next_page is None:
        break
    
    curr_url = base_url + next_page.a['href']

authors_list.sort(key=lambda x: x[0])
print('{',end='')
for i in range(len(authors_list)):
    if i==len(authors_list)-1 :
        print("'"+authors_list[i][0]+"': '"+authors_list[i][1],end="'")
    else:
        print("'"+authors_list[i][0]+"': '"+authors_list[i][1],end="', ")
print('}')