# Web Scraping Lab

You will find in this notebook some scrapy exercises to practise your scraping skills.

**Tips:**

- Check the response status code for each request to ensure you have obtained the intended content.
- Print the response text in each request to understand the kind of info you are getting and its format.
- Check for patterns in the response text to extract the data/info requested in each question.
- Visit the urls below and take a look at their source code through Chrome DevTools. You'll need to identify the html tags, special class names, etc used in the html content you are expected to extract.

**Resources**:
- [Requests library](http://docs.python-requests.org/en/master/#the-user-guide)
- [Beautiful Soup Doc](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)
- [Urllib](https://docs.python.org/3/library/urllib.html#module-urllib)
- [re lib](https://docs.python.org/3/library/re.html)
- [lxml lib](https://lxml.de/)
- [Scrapy](https://scrapy.org/)
- [List of HTTP status codes](https://en.wikipedia.org/wiki/List_of_HTTP_status_codes)
- [HTML basics](http://www.simplehtmlguide.com/cheatsheet.php)
- [CSS basics](https://www.cssbasics.com/#page_start)

#### Below are the libraries and modules you may need. `requests`,  `BeautifulSoup` and `pandas` are already imported for you. If you prefer to use additional libraries feel free to do it.

In [14]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

#### Download, parse (using BeautifulSoup), and print the content from the Trending Developers page from GitHub:

In [15]:
# This is the url you will scrape in this exercise
url = 'https://github.com/trending/developers'

In [16]:
# your code here
html = requests.get(url).content
soup = BeautifulSoup(html, "html.parser")


#### Display the names of the trending developers retrieved in the previous step.

Your output should be a Python list of developer names. Each name should not contain any html tag.

**Instructions:**

1. Find out the html tag and class names used for the developer names. You can achieve this using Chrome DevTools.

1. Use BeautifulSoup to extract all the html elements that contain the developer names.

1. Use string manipulation techniques to replace whitespaces and linebreaks (i.e. `\n`) in the *text* of each html element. Use a list to store the clean names.

1. Print the list of names.

Your output should look like below:

```
['trimstray (@trimstray)',
 'joewalnes (JoeWalnes)',
 'charlax (Charles-AxelDein)',
 'ForrestKnight (ForrestKnight)',
 'revery-ui (revery-ui)',
 'alibaba (Alibaba)',
 'Microsoft (Microsoft)',
 'github (GitHub)',
 'facebook (Facebook)',
 'boazsegev (Bo)',
 'google (Google)',
 'cloudfetch',
 'sindresorhus (SindreSorhus)',
 'tensorflow',
 'apache (TheApacheSoftwareFoundation)',
 'DevonCrawford (DevonCrawford)',
 'ARMmbed (ArmMbed)',
 'vuejs (vuejs)',
 'fastai (fast.ai)',
 'QiShaoXuan (Qi)',
 'joelparkerhenderson (JoelParkerHenderson)',
 'torvalds (LinusTorvalds)',
 'CyC2018',
 'komeiji-satori (神楽坂覚々)',
 'script-8']
 ```

In [22]:
# your code here
html = requests.get(url).content
soup = BeautifulSoup(html, "html.parser")


#See url = 'https://github.com/trending/developers' with the chrome dev tools and scroll over the name
#Those are the tags I need. On h1 I have the name and the main repo and on 'p' I have the username. 


tags = ['h1', 'p']
text = [element.text for element in soup.find_all(tags)][2:]
#Check the results, the two first element are not needed (headers and presentations)
text

new = []

# #Iteration to clean strings

for item in text: 
    x = item.strip()
    new.append(x)

# #Check the result
new

# # #Function to create chunks of name+username+repo
def chunks(l, n):
    for i in range(0, len(l), n):
        yield l[i:i+n]

# # #I have to pass 3 to chunks to create lists of 3 items (name, user and repo)
new_text = chunks(new, 3) 
new_text


# # #Convert results to a list
newest_text = list(new_text)
newest_text
# #Check the results


result = []

for i in range(len(newest_text)):
        element = newest_text[i][0] + ' (' + newest_text[i][1] + ')'  
        result.append(element)

result[:20]

['David Tolnay (dtolnay)',
 'Stephen Celis (stephencelis)',
 'Henrik Rydgård (hrydgard)',
 'Stefan Prodan (stefanprodan)',
 'Matthias Urhahn (d4rken)',
 'Shreyas Patil (PatilShreyas)',
 'Mr.doob (mrdoob)',
 'Christian Muehlhaeuser (muesli)',
 'Anuken (Mindustry)',
 'MathewSachin (Captura)',
 'laurent22 (joplin)',
 'cketti (OkHttpWithContentUri)',
 'aneagoie (ztm-python-cheat-sheet)',
 'yajra (laravel-datatables)',
 'munen (emacs.d)',
 'cpp-httplib (Jared Palmer)',
 'razzle (PySimpleGUI)',
 'PySimpleGUI (Andrew Gallant)',
 'ripgrep (John Arundel)',
 'script (Sindre Sorhus)']

In [39]:
table = soup.find_all('h2',{'class':'f3 text-normal'});
trending_devs = [dev.text.strip().replace(' ','').replace('\n\n', ' ') for dev in table];
trending_devs

[]

#### Display the trending Python repositories in GitHub.

The steps to solve this problem is similar to the previous one except that you need to find out the repository names instead of developer names.

In [40]:
# This is the url you will scrape in this exercise
url = 'https://github.com/trending/python?since=daily'
html = requests.get(url).content
soup = BeautifulSoup(html, "html.parser")

In [41]:
# your code here
articles = soup.find_all('article')
repo = []
articles
for a in articles:
    clean = a.text.strip().replace('\n\n','').split()
    if clean[0] != 'Popular':
        repo.append(clean[1] + clean[2] + clean[3])
print(repo)

['minimaxir/big-list-of-naughty-strings', 'rusty1s/pytorch_geometric', 'espnet/espnet', 'public-apis/public-apis', 'donnemartin/system-design-primer', 'ranjian0/building_tool', 'sherlock-project/sherlock🔎', 'OpenMined/PySyft', 'open-mmlab/mmdetection', 'xillwillx/skiptracer', 'allenai/allennlp', 'explosion/spaCy💫', 'zylo117/Yet-Another-EfficientDet-Pytorch', 'renatoviolin/next_word_prediction', 'anandpawara/Real_Time_Image_Animation', 'pytorch/fairseq', 'lyhue1991/eat_tensorflow2_in_30_days', 'zhanghang1989/ResNeSt', 'Rapptz/discord.py', 'google-research/big_transfer', 'TachibanaYoshino/AnimeGAN', 'shengqiangzhang/examples-of-web-crawlers', 'hunglc007/tensorflow-yolov4-tflite', 'ianzhao05/textshot', 'bitcoinbook/bitcoinbook']


In [42]:
html = requests.get(url).content
soup = BeautifulSoup(html, "lxml")


#On the h1 tag I find the name of the repo
tags = ['h1']
text = [element.text for element in soup.find_all(tags)][1:]
#First element does not interest us. Check the result
#text

#Cleaning the results (step 1)
new = []

for item in text: 
    x = item.strip()     
    new.append(x)

#Check the results
new

#Cleaning (step 2)
newest_list = []

for element in new:
    item = element.replace('\n', '')
    item2 = item.replace('     ', '')
    newest_list.append(item2)

newest_list

['minimaxir / big-list-of-naughty-strings',
 'rusty1s / pytorch_geometric',
 'espnet / espnet',
 'public-apis / public-apis',
 'donnemartin / system-design-primer',
 'ranjian0 / building_tool',
 'sherlock-project / sherlock',
 'OpenMined / PySyft',
 'open-mmlab / mmdetection',
 'xillwillx / skiptracer',
 'allenai / allennlp',
 'explosion / spaCy',
 'zylo117 / Yet-Another-EfficientDet-Pytorch',
 'renatoviolin / next_word_prediction',
 'anandpawara / Real_Time_Image_Animation',
 'pytorch / fairseq',
 'lyhue1991 / eat_tensorflow2_in_30_days',
 'zhanghang1989 / ResNeSt',
 'Rapptz / discord.py',
 'google-research / big_transfer',
 'TachibanaYoshino / AnimeGAN',
 'shengqiangzhang / examples-of-web-crawlers',
 'hunglc007 / tensorflow-yolov4-tflite',
 'ianzhao05 / textshot',
 'bitcoinbook / bitcoinbook']

#### Display all the image links from Walt Disney wikipedia page.

In [38]:
# This is the url you will scrape in this exercise
url = 'https://en.wikipedia.org/wiki/Walt_Disney'
html = requests.get(url).content
soup = BeautifulSoup(html, "html.parser")

In [39]:
# your code here
images = soup.find_all("img")
result = []
for i in images:
    result.append('https:' + i['src'])
print(result)

['https://upload.wikimedia.org/wikipedia/en/thumb/e/e7/Cscr-featured.svg/20px-Cscr-featured.svg.png', 'https://upload.wikimedia.org/wikipedia/en/thumb/8/8c/Extended-protection-shackle.svg/20px-Extended-protection-shackle.svg.png', 'https://upload.wikimedia.org/wikipedia/commons/thumb/d/df/Walt_Disney_1946.JPG/220px-Walt_Disney_1946.JPG', 'https://upload.wikimedia.org/wikipedia/commons/thumb/8/87/Walt_Disney_1942_signature.svg/150px-Walt_Disney_1942_signature.svg.png', 'https://upload.wikimedia.org/wikipedia/commons/thumb/c/c4/Walt_Disney_envelope_ca._1921.jpg/220px-Walt_Disney_envelope_ca._1921.jpg', 'https://upload.wikimedia.org/wikipedia/commons/thumb/4/4d/Newman_Laugh-O-Gram_%281921%29.webm/220px-seek%3D2-Newman_Laugh-O-Gram_%281921%29.webm.jpg', 'https://upload.wikimedia.org/wikipedia/commons/thumb/0/0d/Trolley_Troubles_poster.jpg/170px-Trolley_Troubles_poster.jpg', 'https://upload.wikimedia.org/wikipedia/commons/thumb/7/71/Walt_Disney_and_his_cartoon_creation_%22Mickey_Mouse%22_-_

#### Retrieve an arbitary Wikipedia page of "Python" and create a list of links on that page.

In [22]:
# This is the url you will scrape in this exercise
url ='https://en.wikipedia.org/wiki/Python';

In [23]:
# your code here
html = requests.get(url).content
soup = BeautifulSoup(html, "lxml")
table = soup.find_all('a')
for link in table:
    if 'href' in link.attrs:
        print(link['href'])

#mw-head
#p-search
https://en.wiktionary.org/wiki/Python
https://en.wiktionary.org/wiki/python
#Snakes
#Ancient_Greece
#Media_and_entertainment
#Computing
#Engineering
#Roller_coasters
#Vehicles
#Weaponry
#People
#Other_uses
#See_also
/w/index.php?title=Python&action=edit&section=1
/wiki/Pythonidae
/wiki/Python_(genus)
/w/index.php?title=Python&action=edit&section=2
/wiki/Python_(mythology)
/wiki/Python_of_Aenus
/wiki/Python_(painter)
/wiki/Python_of_Byzantium
/wiki/Python_of_Catana
/w/index.php?title=Python&action=edit&section=3
/wiki/Python_(film)
/wiki/Pythons_2
/wiki/Monty_Python
/wiki/Python_(Monty)_Pictures
/w/index.php?title=Python&action=edit&section=4
/wiki/Python_(programming_language)
/wiki/CPython
/wiki/CMU_Common_Lisp
/wiki/PERQ#PERQ_3
/w/index.php?title=Python&action=edit&section=5
/w/index.php?title=Python&action=edit&section=6
/wiki/Python_(Busch_Gardens_Tampa_Bay)
/wiki/Python_(Coney_Island,_Cincinnati,_Ohio)
/wiki/Python_(Efteling)
/w/index.php?title=Python&action=edi

#### Find the number of titles that have changed in the United States Code since its last release point.

In [47]:
# This is the url you will scrape in this exercise
url = 'http://uscode.house.gov/download/download.shtml'

In [48]:
# your code here
#When parsing, I see the only class different when a title has changes is usctitlechanged. Therefore I have to count those
txt = requests.get(url).text;
count = txt.count('class="usctitlechanged"');
print(f'Number of titles changed: {count}');

Number of titles changed: 11


#### Find a Python list with the top ten FBI's Most Wanted names.

In [49]:
# This is the url you will scrape in this exercise
url = 'https://www.fbi.gov/wanted/topten'
html = requests.get(url).content
soup = BeautifulSoup(html, "html.parser")

In [51]:
# your code here
names = soup.find_all("h3", attrs={"class":"title"})
result = [n.text.strip() for n in names]
result

['JASON DEREK BROWN',
 'ALEXIS FLORES',
 'JOSE RODOLFO VILLARREAL-HERNANDEZ',
 'EUGENE PALMER',
 'RAFAEL CARO-QUINTERO',
 'ROBERT WILLIAM FISHER',
 'BHADRESHKUMAR CHETANBHAI PATEL',
 'ALEJANDRO ROSALES CASTILLO',
 'ARNOLDO JIMENEZ',
 'YASER ABDEL SAID']

####  Display the 20 latest earthquakes info (date, time, latitude, longitude and region name) by the EMSC as a pandas dataframe.

In [36]:
# This is the url you will scrape in this exercise
url = 'https://www.emsc-csem.org/Earthquake/'
html = requests.get(url).content
soup = BeautifulSoup(html, "html.parser")

In [37]:
# your code here
html = requests.get(url).content;
soup = BeautifulSoup(html, "lxml");
earthquakes = soup.find('tbody', {'id': 'tbody'}).find_all("tr");

nelem = 20;
latest_earthquakes = [];
    
for earthquake in earthquakes[:nelem]:
    # Date and time
    date, time = earthquake.find('td', {'class': 'tabev6'}).find('a').text.split();
    # Latitude and longitude
    lat_deg, lon_deg = earthquake.find_all('td', {'class': 'tabev1'});
    lat_dir, lon_dir, magnitude = earthquake.find_all('td', {'class': 'tabev2'});
    lat_deg = f"{lat_deg.text.strip()} {lat_dir.text.strip()}";
    lon_deg = f"{lon_deg.text.strip()} {lon_dir.text.strip()}";
    # Region
    region = earthquake.find('td', {'class': 'tb_region'}).text.strip();
    # Create list of information and append
    earthquake_summary = [date, time, lat_deg , lon_deg, region];
    latest_earthquakes.append(earthquake_summary);
    
df = pd.DataFrame(latest_earthquakes, columns=['Date', 'Time', 'Latitude', 'Longitude', 'Region']);
df

Unnamed: 0,Date,Time,Latitude,Longitude,Region
0,2020-11-02,11:09:30.4,32.21 N,116.06 W,"BAJA CALIFORNIA, MEXICO"
1,2020-11-02,11:06:38.0,1.98 S,119.35 E,"SULAWESI, INDONESIA"
2,2020-11-02,10:57:36.8,37.94 N,27.11 E,WESTERN TURKEY
3,2020-11-02,10:50:02.0,24.02 S,67.25 W,"SALTA, ARGENTINA"
4,2020-11-02,10:41:29.3,37.85 N,26.96 E,"DODECANESE ISLANDS, GREECE"
5,2020-11-02,10:37:57.2,37.84 N,26.82 E,"DODECANESE ISLANDS, GREECE"
6,2020-11-02,10:31:56.1,38.05 N,27.28 E,WESTERN TURKEY
7,2020-11-02,10:30:29.4,37.80 N,27.13 E,WESTERN TURKEY
8,2020-11-02,10:05:55.0,24.18 S,67.47 W,"SALTA, ARGENTINA"
9,2020-11-02,09:59:40.6,29.77 S,71.56 W,"OFFSHORE COQUIMBO, CHILE"


#### Count the number of tweets by a given Twitter account.
Ask the user for the handle (@handle) of a twitter account. You will need to include a ***try/except block*** for account names not found. 
<br>***Hint:*** the program should count the number of tweets for any provided account.

In [34]:
#Twitter no longer uses html code.  Rendered in Javascript



# This is the url you will scrape in this exercise 
# You will need to add the account credentials to this url
url = 'https://twitter.com/'

In [35]:
# your code here

username = input('Please, input your username: ')
html = requests.get(url + username).content
soup = BeautifulSoup(html, "lxml")

try:
    tweet_box = soup.find('li', {'class':'ProfileNav-item ProfileNav-item--tweets is-active'});
    tweets = tweet_box.find('a').find('span', {'class':'ProfileNav-value'});
    print("{} has {} number of tweets.".format(username, tweets.get('data-count')))
except:
    print('Account name not found...')

Please, input your username: @ironhackams
Account name not found...


#### Number of followers of a given twitter account
Ask the user for the handle (@handle) of a twitter account. You will need to include a ***try/except block*** for account names not found. 
<br>***Hint:*** the program should count the followers for any provided account.

In [45]:
#Twitter no longer uses html code.  Rendered in Javascript



# This is the url you will scrape in this exercise 
# You will need to add the account credentials to this url
url = 'https://twitter.com/ironhack'


In [46]:
# your code here

username = input('Please, input your username: ')
html = requests.get(url + username).content;
soup = BeautifulSoup(html, "lxml");

try:
    tweet_box = soup.find('li', {'class':'ProfileNav-item ProfileNav-item--followers'});
    tweets = tweet_box.find('a').find('span', {'class':'ProfileNav-value'});
    print("{} has {} followers.".format(username, tweets.get('data-count')))
except:
    print('Account name not found...')

Please, input your username: ironhackams
Account name not found...


#### List all language names and number of related articles in the order they appear in wikipedia.org.

In [11]:
# This is the url you will scrape in this exercise
url = 'https://www.wikipedia.org/'

In [12]:
# your code here


html = requests.get(url).content
soup = BeautifulSoup(html, "lxml")

languages = soup.find_all('a', {'class': 'link-box'})
for language in languages:
    print(language.text.strip())

English
6 168 000+ articles
Español
1 630 000+ artículos
日本語
1 231 000+ 記事
Deutsch
2 486 000+ Artikel
Русский
1 665 000+ статей
Français
2 254 000+ articles
Italiano
1 639 000+ voci
中文
1 150 000+ 條目
Português
1 044 000+ artigos
العربية
1 068 000+ مقالة


#### A list with the different kind of datasets available in data.gov.uk.

In [13]:
# This is the url you will scrape in this exercise
url = 'https://data.gov.uk/'


In [14]:
# your code here


html = requests.get(url).content
soup = BeautifulSoup(html,"lxml")
topics = soup.findAll('h2')
for topic in topics:
    print(topic.text)

Tell us whether you accept cookies
Data topics
Support links


#### Display the top 10 languages by number of native speakers stored in a pandas dataframe.

In [41]:
# This is the url you will scrape in this exercise
url = 'https://en.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakers'

In [42]:
# your code here


html = requests.get(url).content
soup = BeautifulSoup(html,"lxml")
languages = soup.find('table', {'class': 'wikitable sortable'}).find_all('a', attrs = {'title' : True});
lang = []
for i in range(10):
    lang.append(languages[i].text)
lang_df = pd.DataFrame(lang)
lang_df

Unnamed: 0,0
0,Mandarin Chinese
1,Sino-Tibetan
2,Sinitic
3,Spanish
4,Indo-European
5,Romance
6,English
7,Indo-European
8,Germanic
9,Hindi


## Bonus
#### Scrape a certain number of tweets of a given Twitter account.

In [52]:
#Twitter no longer uses html code.  Rendered in Javascript



# This is the url you will scrape in this exercise 
# You will need to add the account credentials to this url
url = 'https://twitter.com/'

In [53]:
# your code here

username = input('Please, input your username: ')
n_tweets = int(input('Input number of tweets to scrape: '))
html = requests.get(url + username).content;
soup = BeautifulSoup(html, "lxml");

all_tweets = soup.find_all('div', {'class':'tweet'})

if all_tweets:
    for tweet in all_tweets[0:n_tweets]:
        name = tweet.find('span', {'class': 'FullNameGroup'}).find('strong')
        username = tweet.find('span', {'class': 'username'})
        time = tweet.find('small', {'class': 'time'})
        content = tweet.find('p', {'class': 'TweetTextSize TweetTextSize--normal js-tweet-text tweet-text'})
        statistics = tweet.find('div', {'class': 'ProfileTweet-actionCountList u-hiddenVisually'})
        
        print(f'\n{name.text} {username.text} {time.text.strip()}')
        print(content.text)
        print(statistics.text.strip().replace('\n', ' '))
else:
    print('Account name not found or tweet list is empty...')

Please, input your username: ironhackams
Input number of tweets to scrape: 3
Account name not found or tweet list is empty...


#### Display IMDB's top 250 data (movie name, initial release, director name and stars) as a pandas dataframe.

In [25]:
# This is the url you will scrape in this exercise 
url = 'https://www.imdb.com/chart/top'
html = requests.get(url).content
soup = BeautifulSoup(html, "html.parser")

In [26]:
# your code here
html = requests.get(url).content;
soup = BeautifulSoup(html, "lxml");

movies = soup.find_all('td', {'class':'titleColumn'})
titles = [movie.find('a').text for movie in movies]
years = [movie.find('span').text[1:-1] for movie in movies]
directors = [movie.find('a').get('title').split(',')[0][:-7] for movie in movies]
actors = [' & '.join(movie.find('a').get('title').split(',')[1:]) for movie in movies]

movies_dict = {'Title': titles, 'Release': years, 'Director': directors, 'Actors': actors}

movies_df = pd.DataFrame(movies_dict)
movies_df

Unnamed: 0,Title,Release,Director,Actors
0,The Shawshank Redemption,1994,Frank Darabont,Tim Robbins & Morgan Freeman
1,The Godfather,1972,Francis Ford Coppola,Marlon Brando & Al Pacino
2,The Godfather: Part II,1974,Francis Ford Coppola,Al Pacino & Robert De Niro
3,The Dark Knight,2008,Christopher Nolan,Christian Bale & Heath Ledger
4,12 Angry Men,1957,Sidney Lumet,Henry Fonda & Lee J. Cobb
...,...,...,...,...
245,The Battle of Algiers,1966,Gillo Pontecorvo,Brahim Hadjadj & Jean Martin
246,The Terminator,1984,James Cameron,Arnold Schwarzenegger & Linda Hamilton
247,Aladdin,1992,Ron Clements,Scott Weinger & Robin Williams
248,Winter Sleep,2014,Nuri Bilge Ceylan,Haluk Bilginer & Melisa Sözen


#### Display the movie name, year and a brief summary of the top 10 random movies (IMDB) as a pandas dataframe.

In [27]:
#This is the url you will scrape in this exercise
url = 'http://www.imdb.com/chart/top'


In [28]:
# your code here


from random import shuffle;

n_random = 10;

html = requests.get(url).content;
soup = BeautifulSoup(html, "lxml");
movies = soup.find_all('td', {'class':'titleColumn'})

shuffle(movies)

titles = [movie.find('a').text for movie in movies[0:n_random]]
years = [movie.find('span').text[1:-1] for movie in movies[0:n_random]]
links_to_movies = [movie.find('a').get('href') for movie in movies[0:n_random]]

summary = []
for link in links_to_movies:
    html = requests.get('https://www.imdb.com' + link).content;
    soup = BeautifulSoup(html, "lxml");
    summary.append(soup.find('div', {'class':'summary_text'}).text.strip());

movies_dict = {'Title': titles, 'Release': years, 'Summary': summary}

movies_df = pd.DataFrame(movies_dict)
movies_df



Unnamed: 0,Title,Release,Summary
0,Up,2009,78-year-old Carl Fredricksen travels to Paradi...
1,Fight Club,1999,An insomniac office worker and a devil-may-car...
2,To Be or Not to Be,1942,"During the Nazi occupation of Poland, an actin..."
3,American Beauty,1999,A sexually frustrated suburban father has a mi...
4,"Lock, Stock and Two Smoking Barrels",1998,A botched card game in London triggers four fr...
5,Aladdin,1992,A kindhearted street urchin and a power-hungry...
6,3 Idiots,2009,Two friends are searching for their long lost ...
7,Room,2015,"Held captive for 7 years in an enclosed space,..."
8,Stand by Me,1986,"After the death of one of his friends, a write..."
9,Downfall,2004,"Traudl Junge, the final secretary for Adolf Hi..."


#### Find the live weather report (temperature, wind speed, description and weather) of a given city.

In [29]:
# your code here


city = input('Enter the city: ').lower();
url = 'http://api.openweathermap.org/data/2.5/weather?'+'q='+city+'&APPID=b35975e18dc93725acb092f7272cc6b8&units=metric'
weather_json = requests.get(url).json()

print("\n{}'s temperature: {}°C ".format(city.capitalize(), weather_json['main']['temp']))
print("Wind speed: {} m/s".format(weather_json['wind']['speed']))
print("Description: {}".format(weather_json['weather'][0]['description'].capitalize()))
print("Weather: {}".format(weather_json['weather'][0]['main'].capitalize()))

Enter the city: Pittsburgh

Pittsburgh's temperature: 0.07°C 
Wind speed: 5.7 m/s
Description: Overcast clouds
Weather: Clouds


#### Find the book name, price and stock availability as a pandas dataframe.

In [30]:
# This is the url you will scrape in this exercise. 
# It is a fictional bookstore created to be scraped. 
url = 'http://books.toscrape.com/'

In [31]:
# your code here

html = requests.get(url).content;
soup = BeautifulSoup(html, "lxml");
books = soup.find_all('article', {'class': 'product_pod'})

titles = [book.find('h3').text for book in books];
prices = [book.find('p', {'class': 'price_color'}).text for book in books];
stock = [book.find('p', {'class': 'instock availability'}).text.strip() for book in books]

books_dict = {'Title': titles, 'Price': prices, 'Stock': stock}

books_df = pd.DataFrame(books_dict)
books_df

Unnamed: 0,Title,Price,Stock
0,A Light in the ...,£51.77,In stock
1,Tipping the Velvet,£53.74,In stock
2,Soumission,£50.10,In stock
3,Sharp Objects,£47.82,In stock
4,Sapiens: A Brief History ...,£54.23,In stock
5,The Requiem Red,£22.65,In stock
6,The Dirty Little Secrets ...,£33.34,In stock
7,The Coming Woman: A ...,£17.93,In stock
8,The Boys in the ...,£22.60,In stock
9,The Black Maria,£52.15,In stock
