# Web Scraping Lab

You will find in this notebook some scrapy exercises to practise your scraping skills.

**Tips:**

- Check the response status code for each request to ensure you have obtained the intended content.
- Print the response text in each request to understand the kind of info you are getting and its format.
- Check for patterns in the response text to extract the data/info requested in each question.
- Visit the urls below and take a look at their source code through Chrome DevTools. You'll need to identify the html tags, special class names, etc used in the html content you are expected to extract.

**Resources**:
- [Requests library](http://docs.python-requests.org/en/master/#the-user-guide)
- [Beautiful Soup Doc](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)
- [Urllib](https://docs.python.org/3/library/urllib.html#module-urllib)
- [re lib](https://docs.python.org/3/library/re.html)
- [lxml lib](https://lxml.de/)
- [Scrapy](https://scrapy.org/)
- [List of HTTP status codes](https://en.wikipedia.org/wiki/List_of_HTTP_status_codes)
- [HTML basics](http://www.simplehtmlguide.com/cheatsheet.php)
- [CSS basics](https://www.cssbasics.com/#page_start)

#### Below are the libraries and modules you may need. `requests`,  `BeautifulSoup` and `pandas` are already imported for you. If you prefer to use additional libraries feel free to do it.

In [1]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

#### Download, parse (using BeautifulSoup), and print the content from the Trending Developers page from GitHub:

In [2]:
# This is the url you will scrape in this exercise
url = 'https://github.com/trending/developers'
response = requests.get(url)
response

<Response [200]>

In [3]:
html=response.content
soup=BeautifulSoup(html)

#### Display the names of the trending developers retrieved in the previous step.

Your output should be a Python list of developer names. Each name should not contain any html tag.

**Instructions:**

1. Find out the html tag and class names used for the developer names. You can achieve this using Chrome DevTools.

1. Use BeautifulSoup to extract all the html elements that contain the developer names.

1. Use string manipulation techniques to replace whitespaces and linebreaks (i.e. `\n`) in the *text* of each html element. Use a list to store the clean names.

1. Print the list of names.

Your output should look like below:

```
['trimstray (@trimstray)',
 'joewalnes (JoeWalnes)',
 'charlax (Charles-AxelDein)',
 'ForrestKnight (ForrestKnight)',
 'revery-ui (revery-ui)',
 'alibaba (Alibaba)',
 'Microsoft (Microsoft)',
 'github (GitHub)',
 'facebook (Facebook)',
 'boazsegev (Bo)',
 'google (Google)',
 'cloudfetch',
 'sindresorhus (SindreSorhus)',
 'tensorflow',
 'apache (TheApacheSoftwareFoundation)',
 'DevonCrawford (DevonCrawford)',
 'ARMmbed (ArmMbed)',
 'vuejs (vuejs)',
 'fastai (fast.ai)',
 'QiShaoXuan (Qi)',
 'joelparkerhenderson (JoelParkerHenderson)',
 'torvalds (LinusTorvalds)',
 'CyC2018',
 'komeiji-satori (神楽坂覚々)',
 'script-8']
 ```

In [4]:
soup.find_all('h1',attrs={"class":"h3 lh-condensed"})

[<h1 class="h3 lh-condensed">
 <a data-hydro-click='{"event_type":"explore.click","payload":{"click_context":"TRENDING_DEVELOPERS_PAGE","click_target":"OWNER","click_visual_representation":"TRENDING_DEVELOPER","actor_id":null,"record_id":697676,"originating_url":"https://github.com/trending/developers","user_id":null}}' data-hydro-click-hmac="24f1e07d51c83be96fbfa52a77edfd19be96ea3a4952fe00aba0d8e7c8c6d8f0" href="/vadimdemedes">
             Vadim Demedes
 </a>
 </h1>,
 <h1 class="h3 lh-condensed">
 <a data-hydro-click='{"event_type":"explore.click","payload":{"click_context":"TRENDING_DEVELOPERS_PAGE","click_target":"OWNER","click_visual_representation":"TRENDING_DEVELOPER","actor_id":null,"record_id":173661,"originating_url":"https://github.com/trending/developers","user_id":null}}' data-hydro-click-hmac="e5f8b488975347353146deff5d76f57f9c5823c58166af008ff3dade354f7a3a" href="/kripken">
             Alon Zakai
 </a>
 </h1>,
 <h1 class="h3 lh-condensed">
 <a data-hydro-click='{"event_

In [5]:
soup.find_all('h1',attrs={"class":"h3 lh-condensed"})[0]

<h1 class="h3 lh-condensed">
<a data-hydro-click='{"event_type":"explore.click","payload":{"click_context":"TRENDING_DEVELOPERS_PAGE","click_target":"OWNER","click_visual_representation":"TRENDING_DEVELOPER","actor_id":null,"record_id":697676,"originating_url":"https://github.com/trending/developers","user_id":null}}' data-hydro-click-hmac="24f1e07d51c83be96fbfa52a77edfd19be96ea3a4952fe00aba0d8e7c8c6d8f0" href="/vadimdemedes">
            Vadim Demedes
</a>
</h1>

In [6]:
soup.find_all('h1',attrs={"class":"h3 lh-condensed"})[0].text.strip()

'Vadim Demedes'

In [7]:
names= [name.text.strip() for name in soup.find_all('h1',attrs={"class":"h3 lh-condensed"})]
names

['Vadim Demedes',
 'Alon Zakai',
 'Kentaro Wada',
 'Tianon Gravi',
 'Iain Collins',
 'Hadley Wickham',
 'Björn Rabenstein',
 'Caleb Porzio',
 'James Agnew',
 'William Candillon',
 'Łukasz Magiera',
 'Taylor Otwell',
 'Ivan Paulovich',
 'John Ryan',
 'Ines Montani',
 'Mislav Marohnić',
 'Filip Skokan',
 'Arvid Norberg',
 'Rico Suter',
 'Kent C. Dodds',
 'Patrick Hulce',
 'Paul Beusterien',
 'Minko Gechev',
 'Greg Bergé',
 'Han Xiao']

In [8]:
soup.find_all('p')[2].text.strip()

'kripken'

In [9]:
tag_h1= soup.find_all('h1',attrs={"class":"h3 lh-condensed"})
tag_h1[0].find('a')['href'].replace("/","")

'vadimdemedes'

In [10]:
tag_h1[0]

<h1 class="h3 lh-condensed">
<a data-hydro-click='{"event_type":"explore.click","payload":{"click_context":"TRENDING_DEVELOPERS_PAGE","click_target":"OWNER","click_visual_representation":"TRENDING_DEVELOPER","actor_id":null,"record_id":697676,"originating_url":"https://github.com/trending/developers","user_id":null}}' data-hydro-click-hmac="24f1e07d51c83be96fbfa52a77edfd19be96ea3a4952fe00aba0d8e7c8c6d8f0" href="/vadimdemedes">
            Vadim Demedes
</a>
</h1>

In [11]:
tag_h1[0].find('a')

<a data-hydro-click='{"event_type":"explore.click","payload":{"click_context":"TRENDING_DEVELOPERS_PAGE","click_target":"OWNER","click_visual_representation":"TRENDING_DEVELOPER","actor_id":null,"record_id":697676,"originating_url":"https://github.com/trending/developers","user_id":null}}' data-hydro-click-hmac="24f1e07d51c83be96fbfa52a77edfd19be96ea3a4952fe00aba0d8e7c8c6d8f0" href="/vadimdemedes">
            Vadim Demedes
</a>

In [12]:
tag_h1[0].find('a')['href']

'/vadimdemedes'

---

In [13]:
# List of All names of Developers
[name.find('a')['href'].replace("/","") + " (" + name.text.strip()+")" \
 for name in soup.find_all('h1',attrs={"class":"h3 lh-condensed"})]

['vadimdemedes (Vadim Demedes)',
 'kripken (Alon Zakai)',
 'wkentaro (Kentaro Wada)',
 'tianon (Tianon Gravi)',
 'iaincollins (Iain Collins)',
 'hadley (Hadley Wickham)',
 'beorn7 (Björn Rabenstein)',
 'calebporzio (Caleb Porzio)',
 'jamesagnew (James Agnew)',
 'wcandillon (William Candillon)',
 'magik6k (Łukasz Magiera)',
 'taylorotwell (Taylor Otwell)',
 'ivanpaulovich (Ivan Paulovich)',
 'johnpryan (John Ryan)',
 'ines (Ines Montani)',
 'mislav (Mislav Marohnić)',
 'panva (Filip Skokan)',
 'arvidn (Arvid Norberg)',
 'RicoSuter (Rico Suter)',
 'kentcdodds (Kent C. Dodds)',
 'patrickhulce (Patrick Hulce)',
 'paulb777 (Paul Beusterien)',
 'mgechev (Minko Gechev)',
 'gregberge (Greg Bergé)',
 'hanxiao (Han Xiao)']

#### Display the trending Python repositories in GitHub.

The steps to solve this problem is similar to the previous one except that you need to find out the repository names instead of developer names.

In [14]:
# This is the url you will scrape in this exercise
url = 'https://github.com/trending/python?since=daily'

In [15]:
response = requests.get(url)
response

<Response [200]>

In [16]:
html = response.content
soup = BeautifulSoup(html)

In [17]:
tag_h1 = soup.find_all('h1',attrs={"class":"h3 lh-condensed"})[0]
tag_h1

<h1 class="h3 lh-condensed">
<a data-hydro-click='{"event_type":"explore.click","payload":{"click_context":"TRENDING_REPOSITORIES_PAGE","click_target":"REPOSITORY","click_visual_representation":"REPOSITORY_NAME_HEADING","actor_id":null,"record_id":268039885,"originating_url":"https://github.com/trending/python?since=daily","user_id":null}}' data-hydro-click-hmac="444200bd1160752af74f720cf48d3af3565e7bc6ca7028ed2d7b689eaa5987ae" href="/schenkd/nginx-ui">
<svg aria-hidden="true" class="octicon octicon-repo mr-1 text-gray" color="gray" height="16" mr="1" version="1.1" viewbox="0 0 16 16" width="16"><path d="M2 2.5A2.5 2.5 0 014.5 0h8.75a.75.75 0 01.75.75v12.5a.75.75 0 01-.75.75h-2.5a.75.75 0 110-1.5h1.75v-2h-8a1 1 0 00-.714 1.7.75.75 0 01-1.072 1.05A2.495 2.495 0 012 11.5v-9zm10.5-1V9h-8c-.356 0-.694.074-1 .208V2.5a1 1 0 011-1h8zM5 12.25v3.25a.25.25 0 00.4.2l1.45-1.087a.25.25 0 01.3 0L8.6 15.7a.25.25 0 00.4-.2v-3.25a.25.25 0 00-.25-.25h-3.5a.25.25 0 00-.25.25z" fill-rule="evenodd"></path>

In [18]:
tag_h1 .find_all('a')[0]['href']

'/schenkd/nginx-ui'

In [19]:
[rep_name.find_all('a')[0]['href'] for rep_name in soup.find_all('h1',attrs={"class":"h3 lh-condensed"})] 

['/schenkd/nginx-ui',
 '/vt-vl-lab/3d-photo-inpainting',
 '/keras-team/keras',
 '/google/jax',
 '/EdyJ/blender-to-unity-fbx-exporter',
 '/sqlmapproject/sqlmap',
 '/fighting41love/funNLP',
 '/minivision-ai/photo2cartoon',
 '/psf/black',
 '/opencv/open_model_zoo',
 '/python/typeshed',
 '/2020PB/police-brutality',
 '/wbt5/real-url',
 '/Kr1s77/awesome-python-login-model',
 '/facebookresearch/moco',
 '/trailofbits/algo',
 '/jackzhenguo/python-small-examples',
 '/docker/compose',
 '/stellargraph/stellargraph',
 '/saltstack/salt',
 '/celery/celery',
 '/dbader/schedule',
 '/dxa4481/truffleHog',
 '/d2l-ai/d2l-en',
 '/python/cpython']

#### Display all the image links from Walt Disney wikipedia page.

In [20]:
# This is the url you will scrape in this exercise
url = 'https://en.wikipedia.org/wiki/Walt_Disney'
response = requests.get(url)
response

<Response [200]>

In [21]:
html=response.content
soup=BeautifulSoup(html)

In [22]:
soup.find_all("img")[0]

<img alt="This is a featured article. Click here for more information." data-file-height="438" data-file-width="462" decoding="async" height="19" src="//upload.wikimedia.org/wikipedia/en/thumb/e/e7/Cscr-featured.svg/20px-Cscr-featured.svg.png" srcset="//upload.wikimedia.org/wikipedia/en/thumb/e/e7/Cscr-featured.svg/30px-Cscr-featured.svg.png 1.5x, //upload.wikimedia.org/wikipedia/en/thumb/e/e7/Cscr-featured.svg/40px-Cscr-featured.svg.png 2x" width="20"/>

In [23]:
soup.find_all("img")[0]['src']

'//upload.wikimedia.org/wikipedia/en/thumb/e/e7/Cscr-featured.svg/20px-Cscr-featured.svg.png'

In [24]:
[link['src'] for link in soup.find_all("img")]

['//upload.wikimedia.org/wikipedia/en/thumb/e/e7/Cscr-featured.svg/20px-Cscr-featured.svg.png',
 '//upload.wikimedia.org/wikipedia/en/thumb/8/8c/Extended-protection-shackle.svg/20px-Extended-protection-shackle.svg.png',
 '//upload.wikimedia.org/wikipedia/commons/thumb/d/df/Walt_Disney_1946.JPG/220px-Walt_Disney_1946.JPG',
 '//upload.wikimedia.org/wikipedia/commons/thumb/8/87/Walt_Disney_1942_signature.svg/150px-Walt_Disney_1942_signature.svg.png',
 '//upload.wikimedia.org/wikipedia/commons/thumb/c/c4/Walt_Disney_envelope_ca._1921.jpg/220px-Walt_Disney_envelope_ca._1921.jpg',
 '//upload.wikimedia.org/wikipedia/commons/thumb/4/4d/Newman_Laugh-O-Gram_%281921%29.webm/220px-seek%3D2-Newman_Laugh-O-Gram_%281921%29.webm.jpg',
 '//upload.wikimedia.org/wikipedia/commons/thumb/0/0d/Trolley_Troubles_poster.jpg/170px-Trolley_Troubles_poster.jpg',
 '//upload.wikimedia.org/wikipedia/commons/thumb/7/71/Walt_Disney_and_his_cartoon_creation_%22Mickey_Mouse%22_-_National_Board_of_Review_Magazine.jpg/170

#### Retrieve an arbitary Wikipedia page of "Python" and create a list of links on that page.

In [25]:
# This is the url you will scrape in this exercise
url ='https://en.wikipedia.org/wiki/Python' 
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html)

In [26]:
# soup.find_all('a')[0]['href']   # error there is no href

In [27]:
soup.find_all('a')[2]['href']     # No information of links

'#p-search'

In [28]:
soup.find_all('a')[3]['href']     # From 4th elements, inf of links exist

'https://en.wiktionary.org/wiki/Python'

In [29]:
#for item in soup.find_all('a'):
#    if 'href' in item.attrs;       
        

In [30]:
for item in soup.find_all('a'):
    print(item)

# note that the first element of 'a' doesn´t exists 'href'
# it must be used TRY and EXCEPT 

<a id="top"></a>
<a class="mw-jump-link" href="#mw-head">Jump to navigation</a>
<a class="mw-jump-link" href="#p-search">Jump to search</a>
<a class="extiw" href="https://en.wiktionary.org/wiki/Python" title="wiktionary:Python">Python</a>
<a class="extiw" href="https://en.wiktionary.org/wiki/python" title="wiktionary:python">python</a>
<a href="#Snakes"><span class="tocnumber">1</span> <span class="toctext">Snakes</span></a>
<a href="#Ancient_Greece"><span class="tocnumber">2</span> <span class="toctext">Ancient Greece</span></a>
<a href="#Media_and_entertainment"><span class="tocnumber">3</span> <span class="toctext">Media and entertainment</span></a>
<a href="#Computing"><span class="tocnumber">4</span> <span class="toctext">Computing</span></a>
<a href="#Engineering"><span class="tocnumber">5</span> <span class="toctext">Engineering</span></a>
<a href="#Roller_coasters"><span class="tocnumber">5.1</span> <span class="toctext">Roller coasters</span></a>
<a href="#Vehicles"><span clas

In [31]:
links_lst = []
for item in soup.find_all('a'):
    try:
        if item['href'].startswith('/wiki'):
            links_lst.append(item['href'])
        elif item['href'].startswith('http'):
            links_lst.append(item['href'])
    except:
        pass
links_lst = list(set(links_lst))
links_lst    

['/wiki/Special:RecentChanges',
 'https://uk.wikipedia.org/wiki/%D0%9F%D1%96%D1%84%D0%BE%D0%BD',
 'https://www.wikidata.org/wiki/Special:EntityPage/Q747452#sitelinks-wikipedia',
 'https://pl.wikipedia.org/wiki/Pyton',
 '/wiki/Python_of_Aenus',
 'https://en.wikipedia.org/w/index.php?title=Python&oldid=963092579',
 '/wiki/Portal:Current_events',
 '/wiki/Python_(film)',
 '/wiki/Python_(Coney_Island,_Cincinnati,_Ohio)',
 '/wiki/Category:Animal_common_name_disambiguation_pages',
 '/wiki/PERQ#PERQ_3',
 'https://la.wikipedia.org/wiki/Python_(discretiva)',
 '/wiki/Category:Disambiguation_pages',
 'https://en.wikipedia.org/w/index.php?title=Special:WhatLinksHere/Python&namespace=0',
 'https://lb.wikipedia.org/wiki/Python',
 '/wiki/Python_(missile)',
 '/wiki/Python_(Busch_Gardens_Tampa_Bay)',
 '/wiki/Help:Contents',
 'https://af.wikipedia.org/wiki/Python',
 '/wiki/Cython',
 '/wiki/Python_(genus)',
 'https://en.wiktionary.org/wiki/Python',
 'https://ur.wikipedia.org/wiki/%D9%BE%D8%A7%D8%A6%DB%8C%

#### Find the number of titles that have changed in the United States Code since its last release point.

In [32]:
# This is the url you will scrape in this exercise
url = 'http://uscode.house.gov/download/download.shtml'
response=requests.get(url)
html = response.content
soup = BeautifulSoup(html)

In [33]:
# Note that Titles in bold have been changed since the last release point
tag_div= soup.find_all("div", attrs={'class':'usctitlechanged'})[0]
tag_div

<div class="usctitlechanged" id="us/usc/t22">

          Title 22 - Foreign Relations and Intercourse

        </div>

In [34]:
tag_div.text.strip()

'Title 22 - Foreign Relations and Intercourse'

In [35]:
# List comprehension for the titles in bold have been changed since the last release point
[title.text.strip() for title in soup.find_all("div", attrs={'class':'usctitlechanged'})]

['Title 22 - Foreign Relations and Intercourse']

#### Find a Python list with the top ten FBI's Most Wanted names.

In [36]:
# This is the url you will scrape in this exercise
url = 'https://www.fbi.gov/wanted/topten'
response=requests.get(url)
html = response.content
soup = BeautifulSoup(html)

In [37]:
soup.find_all('h3')[0].text.strip()

'YASER ABDEL SAID'

In [38]:
[most_wanted_name.text.strip() for most_wanted_name in soup.find_all('h3')]

['YASER ABDEL SAID',
 'ALEXIS FLORES',
 'EUGENE PALMER',
 'SANTIAGO VILLALBA MEDEROS',
 'RAFAEL CARO-QUINTERO',
 'ROBERT WILLIAM FISHER',
 'BHADRESHKUMAR CHETANBHAI PATEL',
 'ALEJANDRO ROSALES CASTILLO',
 'ARNOLDO JIMENEZ',
 'JASON DEREK BROWN']

####  Display the 20 latest earthquakes info (date, time, latitude, longitude and region name) by the EMSC as a pandas dataframe.

In [39]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

In [40]:
# This is the url you will scrape in this exercise
url = 'https://www.emsc-csem.org/Earthquake/'
response=requests.get(url)
html = response.content
soup = BeautifulSoup(html)

In [41]:
soup.find_all("td",attrs={"class":"tabev6"})[0].find('a').text.split() # date and time samples

['2020-06-24', '15:17:50.0']

---

In [42]:
soup.find_all('td',attrs={"class":"tabev1"})[0].text.strip()  # latitude sample

'20.86'

In [43]:
soup.find_all('td',attrs={"class":"tabev1"})[1].text.strip()  # longitude sample

'69.11'

In [44]:
soup.find_all('td',attrs={"class":"tabev2"})[0].text.strip()  # Latitude N / S

'S'

In [45]:
soup.find_all('td',attrs={"class":"tabev2"})[1].text.strip()  # longitude W / E

'W'

In [46]:
soup.find_all('td',attrs={"class":"tabev2"})[2].text.strip()  # Magnitude sample

'2.9'

In [47]:
soup.find_all('td',attrs={"class":"tabev3"})[0].text          # depth sample

'106'

In [48]:
soup.find_all('td',attrs={"class":"tb_region"})[0].text.strip() # region name sample

'TARAPACA, CHILE'

In [49]:
# list comprehension to retrieve date and time 
lst_date_and_time = [item.find('a').text.split() for item in soup.find_all("td",attrs={"class":"tabev6"})]
print(lst_date_and_time)

[['2020-06-24', '15:17:50.0'], ['2020-06-24', '15:07:39.9'], ['2020-06-24', '14:58:06.6'], ['2020-06-24', '14:27:49.9'], ['2020-06-24', '13:52:07.9'], ['2020-06-24', '13:48:44.6'], ['2020-06-24', '13:43:53.0'], ['2020-06-24', '13:37:44.0'], ['2020-06-24', '13:34:57.4'], ['2020-06-24', '13:34:30.0'], ['2020-06-24', '13:25:38.3'], ['2020-06-24', '13:24:05.6'], ['2020-06-24', '13:14:33.7'], ['2020-06-24', '12:51:21.8'], ['2020-06-24', '12:32:10.0'], ['2020-06-24', '12:27:49.5'], ['2020-06-24', '12:27:11.6'], ['2020-06-24', '12:19:38.2'], ['2020-06-24', '12:16:16.0'], ['2020-06-24', '12:14:52.3'], ['2020-06-24', '12:11:18.0'], ['2020-06-24', '12:10:00.0'], ['2020-06-24', '12:02:42.0'], ['2020-06-24', '12:00:49.0'], ['2020-06-24', '11:54:25.0'], ['2020-06-24', '11:54:19.0'], ['2020-06-24', '11:52:33.6'], ['2020-06-24', '11:51:44.0'], ['2020-06-24', '11:43:26.8'], ['2020-06-24', '11:14:11.0'], ['2020-06-24', '11:08:43.3'], ['2020-06-24', '10:55:15.0'], ['2020-06-24', '10:52:24.0'], ['2020-06

In [50]:
lst_date = []
lst_time = []

for i in range(len(lst_date_and_time)):
    lst_date.append(lst_date_and_time[i][0])
    lst_time.append(lst_date_and_time[i][1])
print(lst_date)
print(lst_time)

['2020-06-24', '2020-06-24', '2020-06-24', '2020-06-24', '2020-06-24', '2020-06-24', '2020-06-24', '2020-06-24', '2020-06-24', '2020-06-24', '2020-06-24', '2020-06-24', '2020-06-24', '2020-06-24', '2020-06-24', '2020-06-24', '2020-06-24', '2020-06-24', '2020-06-24', '2020-06-24', '2020-06-24', '2020-06-24', '2020-06-24', '2020-06-24', '2020-06-24', '2020-06-24', '2020-06-24', '2020-06-24', '2020-06-24', '2020-06-24', '2020-06-24', '2020-06-24', '2020-06-24', '2020-06-24', '2020-06-24', '2020-06-24', '2020-06-24', '2020-06-24', '2020-06-24', '2020-06-24', '2020-06-24', '2020-06-24', '2020-06-24', '2020-06-24', '2020-06-24', '2020-06-24', '2020-06-24', '2020-06-24', '2020-06-24', '2020-06-24']
['15:17:50.0', '15:07:39.9', '14:58:06.6', '14:27:49.9', '13:52:07.9', '13:48:44.6', '13:43:53.0', '13:37:44.0', '13:34:57.4', '13:34:30.0', '13:25:38.3', '13:24:05.6', '13:14:33.7', '12:51:21.8', '12:32:10.0', '12:27:49.5', '12:27:11.6', '12:19:38.2', '12:16:16.0', '12:14:52.3', '12:11:18.0', '12:

In [51]:
df_date_time = pd.DataFrame([item.find('a').text.split() for item in soup.find_all("td",attrs={"class":"tabev6"})], 
             columns=['Date', 'Time'])

In [52]:
# Note:
# soup.find_all('td',attrs={"class":"tabev1"})[0].text.strip()  # Latitude sample
# soup.find_all('td',attrs={"class":"tabev1"})[1].text.strip()  # Longitude sample


# soup.find_all('td',attrs={"class":"tabev2"})[0].text.strip()  # Latitude N / S
# soup.find_all('td',attrs={"class":"tabev2"})[1].text.strip()  # Longitude W / E
# soup.find_all('td',attrs={"class":"tabev2"})[2].text.strip()  # Magnitude sample

In [53]:
coordinates = [lat.text.strip() for lat in soup.find_all('td',attrs={"class":"tabev1"})]
print(coordinates) 

['20.86', '69.11', '34.37', '25.84', '56.49', '34.54', '58.46', '156.58', '19.41', '155.28', '38.19', '117.80', '16.03', '95.84', '17.93', '68.83', '38.26', '38.79', '11.43', '86.02', '17.95', '67.08', '19.20', '155.50', '44.25', '115.05', '44.52', '115.20', '47.54', '9.20', '17.99', '66.82', '31.65', '104.35', '18.01', '66.72', '30.46', '71.59', '5.79', '153.75', '17.83', '66.89', '6.27', '73.14', '66.37', '18.69', '18.23', '70.61', '66.39', '18.72', '15.68', '96.50', '37.08', '37.53', '66.39', '18.71', '34.23', '119.23', '20.41', '69.82', '44.13', '115.10', '19.66', '108.67', '20.17', '69.11', '14.02', '145.24', '60.52', '151.36', '18.05', '66.84', '44.54', '115.25', '33.45', '28.29', '28.23', '69.81', '15.63', '96.26', '15.67', '96.17', '16.01', '96.20', '51.84', '173.97', '38.19', '117.83', '0.81', '126.69', '41.74', '16.04', '19.15', '155.48', '16.11', '95.29', '14.99', '93.80', '17.99', '66.83']


In [54]:
# Retrieve the numerical data of lat and long
lst_lat=[]
lst_long=[]
for i in range(0, len(coordinates)):
    if i % 2 == 0:
        lst_lat.append(coordinates[i])
    else:
        lst_long.append(coordinates[i])
print("Latitude (numerical): ", lst_lat, "\n\n"
      "Longitude (numerical): ", lst_long)

Latitude (numerical):  ['20.86', '34.37', '56.49', '58.46', '19.41', '38.19', '16.03', '17.93', '38.26', '11.43', '17.95', '19.20', '44.25', '44.52', '47.54', '17.99', '31.65', '18.01', '30.46', '5.79', '17.83', '6.27', '66.37', '18.23', '66.39', '15.68', '37.08', '66.39', '34.23', '20.41', '44.13', '19.66', '20.17', '14.02', '60.52', '18.05', '44.54', '33.45', '28.23', '15.63', '15.67', '16.01', '51.84', '38.19', '0.81', '41.74', '19.15', '16.11', '14.99', '17.99'] 

Longitude (numerical):  ['69.11', '25.84', '34.54', '156.58', '155.28', '117.80', '95.84', '68.83', '38.79', '86.02', '67.08', '155.50', '115.05', '115.20', '9.20', '66.82', '104.35', '66.72', '71.59', '153.75', '66.89', '73.14', '18.69', '70.61', '18.72', '96.50', '37.53', '18.71', '119.23', '69.82', '115.10', '108.67', '69.11', '145.24', '151.36', '66.84', '115.25', '28.29', '69.81', '96.26', '96.17', '96.20', '173.97', '117.83', '126.69', '16.04', '155.48', '95.29', '93.80', '66.83']


In [55]:
# Retrieving the Data of N / S (latitude)
lat_NS=[item.text.strip() for item in soup.find_all('td',attrs={"class":"tabev2"}) 
        if (item.text.strip() == "N") or(item.text.strip() == "S")]
print(lat_NS)
len(lat_NS)

['S', 'N', 'N', 'N', 'N', 'N', 'N', 'S', 'N', 'N', 'N', 'N', 'N', 'N', 'N', 'N', 'N', 'N', 'S', 'S', 'N', 'N', 'N', 'N', 'N', 'N', 'N', 'N', 'N', 'S', 'N', 'N', 'S', 'N', 'N', 'N', 'N', 'N', 'S', 'N', 'N', 'N', 'N', 'N', 'S', 'N', 'N', 'N', 'N', 'N']


50

In [56]:
# Retrieving the Data of W / E (latitude)
long_WE=[item.text.strip() for item in soup.find_all('td',attrs={"class":"tabev2"}) 
         if (item.text.strip() == "W") or(item.text.strip() == "E")]
print(long_WE)

['W', 'E', 'W', 'W', 'W', 'W', 'W', 'W', 'E', 'W', 'W', 'W', 'W', 'W', 'E', 'W', 'W', 'W', 'W', 'E', 'W', 'W', 'W', 'W', 'W', 'W', 'E', 'W', 'W', 'W', 'W', 'W', 'W', 'E', 'W', 'W', 'W', 'E', 'W', 'W', 'W', 'W', 'W', 'W', 'E', 'E', 'W', 'W', 'W', 'W']


In [57]:
lst_latitude=[]
lst_longitude=[]

for i in range(len(lst_lat)):
    lst_latitude.append(lst_lat[i]  + " " + lat_NS[i])
    lst_longitude.append(lst_long[i] + " " + long_WE[i])
print(lst_latitude)
print(lst_longitude)

['20.86 S', '34.37 N', '56.49 N', '58.46 N', '19.41 N', '38.19 N', '16.03 N', '17.93 S', '38.26 N', '11.43 N', '17.95 N', '19.20 N', '44.25 N', '44.52 N', '47.54 N', '17.99 N', '31.65 N', '18.01 N', '30.46 S', '5.79 S', '17.83 N', '6.27 N', '66.37 N', '18.23 N', '66.39 N', '15.68 N', '37.08 N', '66.39 N', '34.23 N', '20.41 S', '44.13 N', '19.66 N', '20.17 S', '14.02 N', '60.52 N', '18.05 N', '44.54 N', '33.45 N', '28.23 S', '15.63 N', '15.67 N', '16.01 N', '51.84 N', '38.19 N', '0.81 S', '41.74 N', '19.15 N', '16.11 N', '14.99 N', '17.99 N']
['69.11 W', '25.84 E', '34.54 W', '156.58 W', '155.28 W', '117.80 W', '95.84 W', '68.83 W', '38.79 E', '86.02 W', '67.08 W', '155.50 W', '115.05 W', '115.20 W', '9.20 E', '66.82 W', '104.35 W', '66.72 W', '71.59 W', '153.75 E', '66.89 W', '73.14 W', '18.69 W', '70.61 W', '18.72 W', '96.50 W', '37.53 E', '18.71 W', '119.23 W', '69.82 W', '115.10 W', '108.67 W', '69.11 W', '145.24 E', '151.36 W', '66.84 W', '115.25 W', '28.29 E', '69.81 W', '96.26 W'

---

In [58]:
# Retriving list of Magnitude
lst_magnitude=[item.text.strip() for item in soup.find_all('td',attrs={"class":"tabev2"}) 
               if item.text.strip() not in ['N','S','W','E']]
print(lst_magnitude)

['2.9', '3.1', '4.3', '2.7', '2.1', '2.0', '4.0', '3.8', '2.3', '3.8', '2.8', '2.0', '2.1', '2.3', '1.9', '2.6', '2.9', '2.3', '2.6', '4.8', '2.9', '3.5', '3.5', '2.6', '3.0', '4.1', '2.0', '3.6', '2.1', '3.0', '2.4', '4.1', '2.7', '4.5', '2.5', '2.5', '2.0', '3.1', '3.3', '3.8', '4.0', '3.8', '4.1', '2.2', '3.8', '2.2', '2.2', '3.9', '3.8', '2.1']


In [59]:
# Retriving list of depth
lst_depth = [item.text for item in soup.find_all('td',attrs={"class":"tabev3"})]
print(lst_depth)

['106', '7', '2', '188', '1', '4', '57', '185', '7', '156', '11', '35', '5', '10', '2', '16', '5', '14', '9', '10', '11', '191', '4', '46', '12', '22', '7', '20', '16', '60', '8', '10', '104', '116', '35', '10', '7', '10', '115', '13', '29', '18', '60', '2', '10', '20', '36', '17', '12', '16']


In [60]:
# Retrieving list of region name
lst_region_name = [reg_name.text.strip() for reg_name in soup.find_all('td',attrs={"class":"tb_region"})]
print(lst_region_name)

['TARAPACA, CHILE', 'CRETE, GREECE', 'REYKJANES RIDGE', 'ALASKA PENINSULA', 'ISLAND OF HAWAII, HAWAII', 'NEVADA', 'OAXACA, MEXICO', 'LA PAZ, BOLIVIA', 'EASTERN TURKEY', 'NEAR COAST OF NICARAGUA', 'PUERTO RICO REGION', 'ISLAND OF HAWAII, HAWAII', 'SOUTHERN IDAHO', 'SOUTHERN IDAHO', 'SWITZERLAND', 'PUERTO RICO', 'WESTERN TEXAS', 'PUERTO RICO', 'COQUIMBO, CHILE', 'NEW IRELAND REGION, P.N.G.', 'PUERTO RICO REGION', 'NORTHERN COLOMBIA', 'ICELAND REGION', 'DOMINICAN REPUBLIC REGION', 'ICELAND REGION', 'OAXACA, MEXICO', 'CENTRAL TURKEY', 'ICELAND REGION', 'GREATER LOS ANGELES AREA, CALIF.', 'TARAPACA, CHILE', 'SOUTHERN IDAHO', 'REVILLA GIGEDO ISLANDS REGION', 'TARAPACA, CHILE', 'ROTA REGION, N. MARIANA ISLANDS', 'KENAI PENINSULA, ALASKA', 'PUERTO RICO', 'SOUTHERN IDAHO', 'EASTERN MEDITERRANEAN SEA', 'ATACAMA, CHILE', 'OFFSHORE OAXACA, MEXICO', 'OFFSHORE OAXACA, MEXICO', 'OAXACA, MEXICO', 'ANDREANOF ISLANDS, ALEUTIAN IS.', 'NEVADA', 'MOLUCCA SEA', 'SOUTHERN ITALY', 'ISLAND OF HAWAII, HAWAII', 

In [61]:
pd.DataFrame({"Date"            : lst_date,
              "Time"            : lst_time,
              "Latitude degrees": lst_latitude,
              "Logitude degrees": lst_longitude,
              "Depth km"        : lst_depth,
              "Magnitude"       : lst_magnitude,
              "Region Name"     : lst_region_name})

Unnamed: 0,Date,Time,Latitude degrees,Logitude degrees,Depth km,Magnitude,Region Name
0,2020-06-24,15:17:50.0,20.86 S,69.11 W,106,2.9,"TARAPACA, CHILE"
1,2020-06-24,15:07:39.9,34.37 N,25.84 E,7,3.1,"CRETE, GREECE"
2,2020-06-24,14:58:06.6,56.49 N,34.54 W,2,4.3,REYKJANES RIDGE
3,2020-06-24,14:27:49.9,58.46 N,156.58 W,188,2.7,ALASKA PENINSULA
4,2020-06-24,13:52:07.9,19.41 N,155.28 W,1,2.1,"ISLAND OF HAWAII, HAWAII"
5,2020-06-24,13:48:44.6,38.19 N,117.80 W,4,2.0,NEVADA
6,2020-06-24,13:43:53.0,16.03 N,95.84 W,57,4.0,"OAXACA, MEXICO"
7,2020-06-24,13:37:44.0,17.93 S,68.83 W,185,3.8,"LA PAZ, BOLIVIA"
8,2020-06-24,13:34:57.4,38.26 N,38.79 E,7,2.3,EASTERN TURKEY
9,2020-06-24,13:34:30.0,11.43 N,86.02 W,156,3.8,NEAR COAST OF NICARAGUA


#### Count the number of tweets by a given Twitter account.
Ask the user for the handle (@handle) of a twitter account. You will need to include a ***try/except block*** for account names not found. 
<br>***Hint:*** the program should count the number of tweets for any provided account.

In [62]:
# This is the url you will scrape in this exercise 
# You will need to add the account credentials to this url
url = 'https://twitter.com/'

In [63]:
# your code here

#### Number of followers of a given twitter account
Ask the user for the handle (@handle) of a twitter account. You will need to include a ***try/except block*** for account names not found. 
<br>***Hint:*** the program should count the followers for any provided account.

In [64]:
# This is the url you will scrape in this exercise 
# You will need to add the account credentials to this url
url = 'https://twitter.com/'

In [65]:
# your code here

#### List all language names and number of related articles in the order they appear in wikipedia.org.

In [66]:
# This is the url you will scrape in this exercise
url = 'https://www.wikipedia.org/'
response=requests.get(url)
html = response.content
soup = BeautifulSoup(html)

In [67]:
# language data sample
soup.find_all('div',attrs={"class":"central-featured"})[0].find_all('strong')[0].text 

'English'

In [68]:
lst_language = [lang.text for lang in soup.find_all('div',attrs={"class":"central-featured"})[0].find_all('strong')]
print(lst_language)

['English', '日本語', 'Español', 'Deutsch', 'Русский', 'Français', 'Italiano', '中文', 'Português', 'Polski']


In [69]:
# Related article sample
soup.find_all('div',attrs={"class":"central-featured"})[0].find_all('bdi')[0].text.replace('\xa0'," ")

'6 105 000+'

In [70]:
lst_num_article = [num_article.text.replace('\xa0'," ") 
                   for num_article in soup.find_all('div',attrs={"class":"central-featured"})[0].find_all('bdi')]
print(lst_num_article) 

['6 105 000+', '1 213 000+', '1 606 000+', '2 446 000+', '1 637 000+', '2 229 000+', '1 615 000+', '1 125 000+', '1 036 000+', '1 416 000+']


In [71]:
pd.DataFrame({'Languages'          : lst_language,
              'Nº Related Articles': lst_num_article})

Unnamed: 0,Languages,Nº Related Articles
0,English,6 105 000+
1,日本語,1 213 000+
2,Español,1 606 000+
3,Deutsch,2 446 000+
4,Русский,1 637 000+
5,Français,2 229 000+
6,Italiano,1 615 000+
7,中文,1 125 000+
8,Português,1 036 000+
9,Polski,1 416 000+


#### A list with the different kind of datasets available in data.gov.uk.

In [72]:
# This is the url you will scrape in this exercise
url = 'https://data.gov.uk/'
response=requests.get(url)
html = response.content
soup = BeautifulSoup(html)

In [73]:
# Dataset sample
soup.find_all('h2')[0].text

'Business and economy'

In [74]:
lst_datasets = [dataset.text for dataset in soup.find_all('h2')]
print(lst_datasets)

['Business and economy', 'Crime and justice', 'Defence', 'Education', 'Environment', 'Government', 'Government spending', 'Health', 'Mapping', 'Society', 'Towns and cities', 'Transport']


#### Display the top 10 languages by number of native speakers stored in a pandas dataframe.

In [75]:
# This is the url you will scrape in this exercise
url = 'https://en.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakers'
response=requests.get(url)
html = response.content
soup = BeautifulSoup(html)

In [76]:
pd.read_html(url)[0].head(10)

Unnamed: 0,Rank,Language,Speakers(millions),% of World pop.(March 2019)[8],Language familyBranch
0,1,Mandarin Chinese,918.0,11.922,Sino-TibetanSinitic
1,2,Spanish,480.0,5.994,Indo-EuropeanRomance
2,3,English,379.0,4.922,Indo-EuropeanGermanic
3,4,Hindi (Sanskritised Hindustani)[9],341.0,4.429,Indo-EuropeanIndo-Aryan
4,5,Bengali,228.0,2.961,Indo-EuropeanIndo-Aryan
5,6,Portuguese,221.0,2.87,Indo-EuropeanRomance
6,7,Russian,154.0,2.0,Indo-EuropeanBalto-Slavic
7,8,Japanese,128.0,1.662,JaponicJapanese
8,9,Western Punjabi[10],92.7,1.204,Indo-EuropeanIndo-Aryan
9,10,Marathi,83.1,1.079,Indo-EuropeanIndo-Aryan


## Bonus
#### Scrape a certain number of tweets of a given Twitter account.

In [77]:
# This is the url you will scrape in this exercise 
# You will need to add the account credentials to this url
url = 'https://twitter.com/'

#### Display IMDB's top 250 data (movie name, initial release, director name and stars) as a pandas dataframe.

In [78]:
# This is the url you will scrape in this exercise 
url = 'https://www.imdb.com/chart/top'
response=requests.get(url)
html = response.content
soup = BeautifulSoup(html)

In [79]:
# movie name sample
soup.find_all('td',attrs={"class":"titleColumn"})[0].find_all('a')[0].text

'Um Sonho de Liberdade'

In [80]:
lst_movies = [movie.find_all('a')[0].text for movie in soup.find_all('td',attrs={"class":"titleColumn"})]
print(lst_movies)

['Um Sonho de Liberdade', 'O Poderoso Chefão', 'O Poderoso Chefão II', 'Batman: O Cavaleiro das Trevas', '12 Homens e uma Sentença', 'A Lista de Schindler', 'O Senhor dos Anéis: O Retorno do Rei', 'Pulp Fiction: Tempo de Violência', 'Três Homens em Conflito', 'O Senhor dos Anéis: A Sociedade do Anel', 'Clube da Luta', 'Forrest Gump: O Contador de Histórias', 'A Origem', 'Star Wars, Episódio V: O Império Contra-Ataca', 'O Senhor dos Anéis: As Duas Torres', 'Matrix', 'Os Bons Companheiros', 'Um Estranho no Ninho', 'Os Sete Samurais', 'Seven: Os Sete Crimes Capitais', 'A Vida é Bela', 'Cidade de Deus', 'O Silêncio dos Inocentes', 'A Felicidade Não se Compra', 'Guerra nas Estrelas', 'O Resgate do Soldado Ryan', 'A Viagem de Chihiro', 'Parasita', 'À Espera de um Milagre', 'Interestelar', 'O Profissional', 'Os Suspeitos', 'Harakiri', 'O Rei Leão', 'O Pianista', 'De Volta para o Futuro', 'O Exterminador do Futuro 2: O Julgamento Final', 'A Outra História Americana', 'Tempos Modernos', 'Psicos

In [81]:
# initial release sample
soup.find_all('td',attrs={"class":"titleColumn"})[0].find_all('span')[0].text.strip("(").strip(")")

'1994'

In [82]:
lst_init_rel = [init_rel.find_all('span')[0].text.strip("(").strip(")") 
                for init_rel in soup.find_all('td',attrs={"class":"titleColumn"})]
print(lst_init_rel)

['1994', '1972', '1974', '2008', '1957', '1993', '2003', '1994', '1966', '2001', '1999', '1994', '2010', '1980', '2002', '1999', '1990', '1975', '1954', '1995', '1997', '2002', '1991', '1946', '1977', '1998', '2001', '2019', '1999', '2014', '1994', '1995', '1962', '1994', '2002', '1985', '1991', '1998', '1936', '1960', '2000', '1931', '2006', '2011', '2014', '2006', '1968', '1988', '1942', '1954', '1988', '1979', '1979', '2000', '1981', '2019', '1940', '2006', '2012', '1957', '1980', '2008', '2018', '1950', '1997', '2018', '1964', '2003', '1957', '2012', '1984', '2019', '1986', '2016', '2017', '1999', '1995', '1981', '2009', '1995', '1963', '1984', '2007', '1983', '1992', '2009', '1997', '2019', '1968', '1958', '2018', '2000', '1931', '2016', '2004', '1941', '2012', '1971', '1987', '1948', '1959', '1971', '1921', '2000', '1952', '1983', '1976', '2001', '1962', '2010', '1973', '1952', '1927', '2011', '1965', '1944', '1960', '2010', '1962', '1989', '2009', '1997', '1995', '1975', '1985',

In [83]:
# director name and stars sample
soup.find_all('td',attrs={"class":"titleColumn"})[0].find_all('a')[0]['title']

'Frank Darabont (dir.), Tim Robbins, Morgan Freeman'

In [84]:
lst_names = [names.find_all('a')[0]['title'] for names in soup.find_all('td',attrs={"class":"titleColumn"})]
print(lst_names)

['Frank Darabont (dir.), Tim Robbins, Morgan Freeman', 'Francis Ford Coppola (dir.), Marlon Brando, Al Pacino', 'Francis Ford Coppola (dir.), Al Pacino, Robert De Niro', 'Christopher Nolan (dir.), Christian Bale, Heath Ledger', 'Sidney Lumet (dir.), Henry Fonda, Lee J. Cobb', 'Steven Spielberg (dir.), Liam Neeson, Ralph Fiennes', 'Peter Jackson (dir.), Elijah Wood, Viggo Mortensen', 'Quentin Tarantino (dir.), John Travolta, Uma Thurman', 'Sergio Leone (dir.), Clint Eastwood, Eli Wallach', 'Peter Jackson (dir.), Elijah Wood, Ian McKellen', 'David Fincher (dir.), Brad Pitt, Edward Norton', 'Robert Zemeckis (dir.), Tom Hanks, Robin Wright', 'Christopher Nolan (dir.), Leonardo DiCaprio, Joseph Gordon-Levitt', 'Irvin Kershner (dir.), Mark Hamill, Harrison Ford', 'Peter Jackson (dir.), Elijah Wood, Ian McKellen', 'Lana Wachowski (dir.), Keanu Reeves, Laurence Fishburne', 'Martin Scorsese (dir.), Robert De Niro, Ray Liotta', 'Milos Forman (dir.), Jack Nicholson, Louise Fletcher', 'Akira Kuros

In [85]:
pd.DataFrame({"movie name"             :lst_movies , 
              "initial release"        :lst_init_rel, 
              "director name and stars":lst_names})

Unnamed: 0,movie name,initial release,director name and stars
0,Um Sonho de Liberdade,1994,"Frank Darabont (dir.), Tim Robbins, Morgan Fre..."
1,O Poderoso Chefão,1972,"Francis Ford Coppola (dir.), Marlon Brando, Al..."
2,O Poderoso Chefão II,1974,"Francis Ford Coppola (dir.), Al Pacino, Robert..."
3,Batman: O Cavaleiro das Trevas,2008,"Christopher Nolan (dir.), Christian Bale, Heat..."
4,12 Homens e uma Sentença,1957,"Sidney Lumet (dir.), Henry Fonda, Lee J. Cobb"
...,...,...,...
245,Munna Bhai M.B.B.S.,2003,"Rajkumar Hirani (dir.), Sanjay Dutt, Arshad Warsi"
246,Neon Genesis Evangelion: O Fim do Evangelho,1997,"Hideaki Anno (dir.), Megumi Ogata, Megumi Haya..."
247,Aladdin,1992,"Ron Clements (dir.), Scott Weinger, Robin Will..."
248,Lagaan: Era uma Vez na Índia,2001,"Ashutosh Gowariker (dir.), Aamir Khan, Raghuvi..."


#### Display the movie name, year and a brief summary of the top 10 random movies (IMDB) as a pandas dataframe.

In [86]:
#This is the url you will scrape in this exercise
url = 'http://www.imdb.com/chart/top'
response=requests.get(url)
html = response.content
soup = BeautifulSoup(html)

In [87]:
pd.read_html(url)[0].head(10)

Unnamed: 0.1,Unnamed: 0,Rank & Title,IMDb Rating,Your Rating,Unnamed: 4
0,,1. Um Sonho de Liberdade (1994),9.2,12345678910 NOT YET RELEASED Seen,
1,,2. O Poderoso Chefão (1972),9.1,12345678910 NOT YET RELEASED Seen,
2,,3. O Poderoso Chefão II (1974),9.0,12345678910 NOT YET RELEASED Seen,
3,,4. Batman: O Cavaleiro das Trevas (2008),9.0,12345678910 NOT YET RELEASED Seen,
4,,5. 12 Homens e uma Sentença (1957),8.9,12345678910 NOT YET RELEASED Seen,
5,,6. A Lista de Schindler (1993),8.9,12345678910 NOT YET RELEASED Seen,
6,,7. O Senhor dos Anéis: O Retorno do Rei (2003),8.9,12345678910 NOT YET RELEASED Seen,
7,,8. Pulp Fiction: Tempo de Violência (1994),8.8,12345678910 NOT YET RELEASED Seen,
8,,9. Três Homens em Conflito (1966),8.8,12345678910 NOT YET RELEASED Seen,
9,,10. O Senhor dos Anéis: A Sociedade do Anel ...,8.8,12345678910 NOT YET RELEASED Seen,


#### Find the live weather report (temperature, wind speed, description and weather) of a given city.

In [None]:
#https://openweathermap.org/current
city = input('Enter the city: ')
url = 'http://api.openweathermap.org/data/2.5/weather?'+'q='+city+'&APPID=b35975e18dc93725acb092f7272cc6b8&units=metric'
response=requests.get(url)
response.json()

In [None]:
temperature = response.json()['main']['temp']
temperature

In [None]:
wind_speed = response.json()['wind']['speed']
wind_speed

In [None]:
description = response.json()['weather'][0]['description']
description

In [None]:
weather = response.json()['weather'][0]['main']
weather

In [None]:
pd.DataFrame({"Name of City" : [city],
              "Temperature"  : [temperature],
              "Wind Speed"   : [wind_speed],
              "Description"  : [description],
              "Weather"      : [weather]})

#### Find the book name, price and stock availability as a pandas dataframe.

In [None]:
# This is the url you will scrape in this exercise.
# It is a fictional bookstore created to be scraped.
url = 'http://books.toscrape.com/'
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html)

In [None]:
# book name sample
soup.find_all('h3')[0].find("a")["title"]

In [None]:
lst_book_name = [item.find("a")["title"] for item in soup.find_all('h3')]
print(lst_book_name)

In [None]:
# price sample
soup.find_all('div', attrs={"class": "product_price"})[0].find('p').text

In [None]:
lst_price = [price.find('p').text for price in soup.find_all('div', attrs={"class": "product_price"})]
print(lst_price)

In [None]:
# availability sample
soup.find_all('div',{"class":"product_price"})[0].find_all('p',attrs={"class":"instock availability"})[0].text.strip()

In [None]:
lst_stock = [item.find_all('p',attrs={"class":"instock availability"})[0].text.strip() 
             for item in soup.find_all('div',{"class":"product_price"})]
print(lst_stock)

In [None]:
pd.DataFrame({"book name": lst_book_name, 
              "price"    : lst_price,
              "stock availability": lst_stock })

In [None]:
# The first page is ready, in this loop we will create a dataframe for every page in the website
# and concat those dataframes with the one we did for the first page  
# http://books.toscrape.com/catalogue/page-2.html

for i in range(2,51):
    url = f'http://books.toscrape.com/catalogue/page-{i}.html'
    soup = BeautifulSoup(requests.get(url).content)
    
    
    
    
    