## Lab | Web Scraping Single Page
    Business goal:
    Check the case_study_gnod.md file.

    Make sure you've understood the big picture of your project:

        > the goal of the company (Gnod),
        > their current product (Gnoosic),
        > their strategy, and
        > how your project fits into this context.
    
    Re-read the business case and the e-mail from the CTO, take a look at the flowchart and create an initial Trello board with the tasks you think you'll have to accomplish.

    Instructions - Scraping popular songs
    Your product will take a song as an input from the user and will output another song (the recommendation). In most cases, the recommended song will have to be similar to the inputted song, but the CTO thinks that if the song is on the top charts at the moment, the user will enjoy more a recommendation of a song that's also popular at the moment.

    You have find data on the internet about currently popular songs. Billboard maintains a weekly Top 100 of "hot" songs here: https://www.billboard.com/charts/hot-100.

    It's a good place to start! Scrape the current top 100 songs and their respective artists, and put the information into a pandas dataframe.

### Importing Libraries

In [1]:
from bs4 import BeautifulSoup
import requests
import pandas as pd

### Loading the data

In [2]:
url = "https://www.billboard.com/charts/hot-100"

## Getting the html code of the web page

In [3]:
response = requests.get(url)
response.status_code # 200 status code means OK!

200

## Parsing the html code

In [4]:
soup = BeautifulSoup(response.content, "html.parser")
soup

<!DOCTYPE html>

<html class="" lang="">
<head>
<meta charset="utf-8"/>
<meta content="IE=edge" http-equiv="X-UA-Compatible"/>
<meta content="width=device-width, initial-scale=1, user-scalable=no" name="viewport"/>
<title>The Hot 100 Chart | Billboard</title>
<meta content="The Hot 100 Chart" name="title" property="title">
<meta content="@billboard" name="twitter:site"/>
<meta content="Billboard" property="og:site_name">
<meta content="article" property="og:type">
<link href="/manifest.json" rel="manifest"/>
<style>
        .chart-pro-access {
            background-image: url('https://www.billboard.com/assets/1606143657/images/piano/chart-pro-access-mb.png?ed9f7c62d54f7662233f');
        }

        @media (min-width: 769px) {
            .chart-pro-access {
                background-image: url('https://www.billboard.com/assets/1606143657/images/piano/chart-pro-access-dk.png?ed9f7c62d54f7662233f');
            }
        }
    </style>
<script async="async" data-cfasync="false" src="ht

In [5]:
#charts > div > div.chart-list__wrapper > div > ol >
    #li:nth-child(1) > button > span.chart-element__rank.flex--column.flex--xy-center.flex--no-shrink

#charts > div > div.chart-list__wrapper > div > ol > li:nth-child(1) > button > span.chart-element__information >
    #span.chart-element__information__song.text--truncate.color--primary
    
#charts > div > div.chart-list__wrapper > div > ol > li:nth-child(1) > button > span.chart-element__information >
    #span.chart-element__information__artist.text--truncate.color--secondary

In [6]:
soup.select("span.chart-element__information__song.text--truncate.color--primary")

[<span class="chart-element__information__song text--truncate color--primary">Mood</span>,
 <span class="chart-element__information__song text--truncate color--primary">Positions</span>,
 <span class="chart-element__information__song text--truncate color--primary">I Hope</span>,
 <span class="chart-element__information__song text--truncate color--primary">Laugh Now Cry Later</span>,
 <span class="chart-element__information__song text--truncate color--primary">Blinding Lights</span>,
 <span class="chart-element__information__song text--truncate color--primary">Lemonade</span>,
 <span class="chart-element__information__song text--truncate color--primary">Holy</span>,
 <span class="chart-element__information__song text--truncate color--primary">Dakiti</span>,
 <span class="chart-element__information__song text--truncate color--primary">Savage Love (Laxed - Siren Beat)</span>,
 <span class="chart-element__information__song text--truncate color--primary">For The Night</span>,
 <span class="

In [7]:
songs = []


num_iter = len(soup.select("span.chart-element__information__song.text--truncate.color--primary"))

# iterate through the result set and retrive all the data
for i in range(num_iter):
    songs.append(soup.select("span.chart-element__information__song.text--truncate.color--primary")[i].get_text())

In [8]:
artists = []


num_iter2 = len(soup.select("span.chart-element__information__artist.text--truncate.color--secondary"))

# iterate through the result set and retrive all the data
for i in range(num_iter):
    artists.append(soup.select("span.chart-element__information__artist.text--truncate.color--secondary")[i].get_text())

In [9]:
rankings = []


num_iter3 = len(soup.select("span.chart-element__rank__number"))

# iterate through the result set and retrive all the data
for i in range(num_iter):
    rankings.append(soup.select("span.chart-element__rank__number")[i].get_text())

In [10]:
print(rankings)

['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35', '36', '37', '38', '39', '40', '41', '42', '43', '44', '45', '46', '47', '48', '49', '50', '51', '52', '53', '54', '55', '56', '57', '58', '59', '60', '61', '62', '63', '64', '65', '66', '67', '68', '69', '70', '71', '72', '73', '74', '75', '76', '77', '78', '79', '80', '81', '82', '83', '84', '85', '86', '87', '88', '89', '90', '91', '92', '93', '94', '95', '96', '97', '98', '99', '100']


## Constructing the dataframe

In [11]:
# each list becomes a column
top_100 = pd.DataFrame({"rankings":rankings,
                       "songs":songs,
                       "artists":artists,
                      })

top_100.head()

Unnamed: 0,rankings,songs,artists
0,1,Mood,24kGoldn Featuring iann dior
1,2,Positions,Ariana Grande
2,3,I Hope,Gabby Barrett Featuring Charlie Puth
3,4,Laugh Now Cry Later,Drake Featuring Lil Durk
4,5,Blinding Lights,The Weeknd


### Decade End (2010s)

In [12]:
url_2 = "https://www.billboard.com/charts/decade-end/hot-100"

## Getting the html code of the web page

In [13]:
response_2 = requests.get(url_2)
response_2.status_code

200

## Parsing the html code

In [14]:
soup_2 = BeautifulSoup(response_2.content, "html.parser")
soup_2

<!DOCTYPE html>

<html class="" lang="">
<head>
<meta charset="utf-8"/>
<meta content="IE=edge" http-equiv="X-UA-Compatible"/>
<meta content="width=device-width, initial-scale=1, user-scalable=no" name="viewport"/>
<title>Hot 100 Songs - Decade-End | Billboard</title>
<meta content="Hot 100 Songs - Decade-End" name="title" property="title">
<meta content="See Billboard's rankings of this decades's most popular songs, albums, and artists." name="description" property="description">
<meta content="https://www.billboard.com/assets/1606143657/images/ye-charts/charts-ye-share-fb.jpg?472a790e67f42f9b25d0" name="og:image" property="og:image">
<meta content="https://www.billboard.com/assets/1606143657/images/ye-charts/charts-ye-share-twitter.jpg?472a790e67f42f9b25d0" name="twitter:image" property="twitter:image"/>
<meta content="@billboard" name="twitter:site"/>
<meta content="Billboard" property="og:site_name">
<meta content="article" property="og:type"/>
<script async="async" data-cfasync="f

In [15]:
#main > div.container.container--xxlight-grey.container--no-side-padding > div > div:nth-child(1) > div > article:nth-child(1) >
    #div.ye-chart-item__primary-row.decade-end-chart-item__no-expand > div.ye-chart-item__rank

#main > div.container.container--xxlight-grey.container--no-side-padding > div > div:nth-child(1) > div > article:nth-child(1) > 
    #div.ye-chart-item__primary-row.decade-end-chart-item__no-expand > div.ye-chart-item__text > div.ye-chart-item__title
    
#main > div.container.container--xxlight-grey.container--no-side-padding > div > div:nth-child(1) > div > article:nth-child(1) >
    #div.ye-chart-item__primary-row.decade-end-chart-item__no-expand > div.ye-chart-item__text > div.ye-chart-item__artist

In [32]:
soup_2.select("div.ye-chart-item__rank")

[<div class="ye-chart-item__rank">
 1
 </div>,
 <div class="ye-chart-item__rank">
 2
 </div>,
 <div class="ye-chart-item__rank">
 3
 </div>,
 <div class="ye-chart-item__rank">
 4
 </div>,
 <div class="ye-chart-item__rank">
 5
 </div>,
 <div class="ye-chart-item__rank">
 6
 </div>,
 <div class="ye-chart-item__rank">
 7
 </div>,
 <div class="ye-chart-item__rank">
 8
 </div>,
 <div class="ye-chart-item__rank">
 9
 </div>,
 <div class="ye-chart-item__rank">
 10
 </div>,
 <div class="ye-chart-item__rank">
 11
 </div>,
 <div class="ye-chart-item__rank">
 12
 </div>,
 <div class="ye-chart-item__rank">
 13
 </div>,
 <div class="ye-chart-item__rank">
 14
 </div>,
 <div class="ye-chart-item__rank">
 15
 </div>,
 <div class="ye-chart-item__rank">
 16
 </div>,
 <div class="ye-chart-item__rank">
 17
 </div>,
 <div class="ye-chart-item__rank">
 18
 </div>,
 <div class="ye-chart-item__rank">
 19
 </div>,
 <div class="ye-chart-item__rank">
 20
 </div>,
 <div class="ye-chart-item__rank">
 21
 </div>,
 

In [23]:
songs_2 = []


num_iter_2 = len(soup_2.select("div.ye-chart-item__title"))

# iterate through the result set and retrive all the data
for i in range(num_iter_2):
    songs_2.append(soup_2.select("div.ye-chart-item__title")[i].get_text())

In [26]:
artists_2 = []


num_iter_3 = len(soup_2.select("div.ye-chart-item__artist"))

# iterate through the result set and retrive all the data
for i in range(num_iter_3):
    artists_2.append(soup_2.select("div.ye-chart-item__artist")[i].get_text())

In [35]:
rankings_2 = []


num_iter_4 = len(soup_2.select("div.ye-chart-item__rank"))

# iterate through the result set and retrive all the data
for i in range(num_iter_4):
    rankings_2.append(soup_2.select("div.ye-chart-item__rank")[i].get_text())

### Fixing the format

In [37]:
rankings_2

['\n1\n',
 '\n2\n',
 '\n3\n',
 '\n4\n',
 '\n5\n',
 '\n6\n',
 '\n7\n',
 '\n8\n',
 '\n9\n',
 '\n10\n',
 '\n11\n',
 '\n12\n',
 '\n13\n',
 '\n14\n',
 '\n15\n',
 '\n16\n',
 '\n17\n',
 '\n18\n',
 '\n19\n',
 '\n20\n',
 '\n21\n',
 '\n22\n',
 '\n23\n',
 '\n24\n',
 '\n25\n',
 '\n26\n',
 '\n27\n',
 '\n28\n',
 '\n29\n',
 '\n30\n',
 '\n31\n',
 '\n32\n',
 '\n33\n',
 '\n34\n',
 '\n35\n',
 '\n36\n',
 '\n37\n',
 '\n38\n',
 '\n39\n',
 '\n40\n',
 '\n41\n',
 '\n42\n',
 '\n43\n',
 '\n44\n',
 '\n45\n',
 '\n46\n',
 '\n47\n',
 '\n48\n',
 '\n49\n',
 '\n50\n',
 '\n51\n',
 '\n52\n',
 '\n53\n',
 '\n54\n',
 '\n55\n',
 '\n56\n',
 '\n57\n',
 '\n58\n',
 '\n59\n',
 '\n60\n',
 '\n61\n',
 '\n62\n',
 '\n63\n',
 '\n64\n',
 '\n65\n',
 '\n66\n',
 '\n67\n',
 '\n68\n',
 '\n69\n',
 '\n70\n',
 '\n71\n',
 '\n72\n',
 '\n73\n',
 '\n74\n',
 '\n75\n',
 '\n76\n',
 '\n77\n',
 '\n78\n',
 '\n79\n',
 '\n80\n',
 '\n81\n',
 '\n82\n',
 '\n83\n',
 '\n84\n',
 '\n85\n',
 '\n86\n',
 '\n87\n',
 '\n88\n',
 '\n89\n',
 '\n90\n',
 '\n91\n',
 '\n92\n

In [39]:
rankings_2 = list(map(lambda x: x.replace("\n",""), rankings_2))
rankings_2

['1',
 '2',
 '3',
 '4',
 '5',
 '6',
 '7',
 '8',
 '9',
 '10',
 '11',
 '12',
 '13',
 '14',
 '15',
 '16',
 '17',
 '18',
 '19',
 '20',
 '21',
 '22',
 '23',
 '24',
 '25',
 '26',
 '27',
 '28',
 '29',
 '30',
 '31',
 '32',
 '33',
 '34',
 '35',
 '36',
 '37',
 '38',
 '39',
 '40',
 '41',
 '42',
 '43',
 '44',
 '45',
 '46',
 '47',
 '48',
 '49',
 '50',
 '51',
 '52',
 '53',
 '54',
 '55',
 '56',
 '57',
 '58',
 '59',
 '60',
 '61',
 '62',
 '63',
 '64',
 '65',
 '66',
 '67',
 '68',
 '69',
 '70',
 '71',
 '72',
 '73',
 '74',
 '75',
 '76',
 '77',
 '78',
 '79',
 '80',
 '81',
 '82',
 '83',
 '84',
 '85',
 '86',
 '87',
 '88',
 '89',
 '90',
 '91',
 '92',
 '93',
 '94',
 '95',
 '96',
 '97',
 '98',
 '99',
 '100']

In [40]:
songs_2 = list(map(lambda x: x.replace("\n",""), songs_2))
songs_2

['Uptown Funk!',
 'Party Rock Anthem',
 'Shape Of You',
 'Closer',
 'Girls Like You',
 'We Found Love',
 ' Old Town Road',
 'Somebody That I Used To Know',
 'Despacito',
 'Rolling In The Deep',
 'Sunflower (Spider-Man: Into The Spider-Verse)',
 'Without Me',
 'Call Me Maybe',
 'Blurred Lines',
 'Perfect',
 'Sicko Mode',
 'All About That Bass',
 ' Royals',
 "God's Plan",
 'Moves Like Jagger',
 'Happy',
 'Just The Way You Are',
 'Rockstar',
 'TiK ToK',
 'See You Again',
 'Dark Horse',
 'Thrift Shop',
 'One More Night',
 'We Are Young',
 "That's What I Like",
 'The Hills',
 'All Of Me',
 'Happier',
 'Shake It Off',
 'One Dance',
 'Radioactive',
 'Sexy And I Know It',
 'Someone Like You',
 'Counting Stars',
 'E.T.',
 'Trap Queen',
 'Love Yourself',
 'Firework',
 'Give Me Everything',
 'Locked Out Of Heaven',
 'Love The Way You Lie',
 'Thinking Out Loud',
 'Sorry',
 'California Gurls',
 'Dynamite',
 'Lucid Dreams',
 'Hello',
 'Work',
 'Grenade',
 'Hey, Soul Sister',
 'I Like It',
 'Wake Me 

In [41]:
artists_2 = list(map(lambda x: x.replace("\n",""), artists_2))
artists_2

['Mark Ronson Featuring Bruno Mars',
 'LMFAO Featuring Lauren Bennett & GoonRock',
 'Ed Sheeran',
 'The Chainsmokers Featuring Halsey',
 'Maroon 5 Featuring Cardi B',
 'Rihanna Featuring Calvin Harris',
 'Lil Nas X Featuring Billy Ray Cyrus',
 'Gotye Featuring Kimbra',
 'Luis Fonsi & Daddy Yankee Featuring Justin Bieber',
 'Adele',
 'Post Malone & Swae Lee',
 'Halsey',
 'Carly Rae Jepsen',
 'Robin Thicke Featuring T.I. + Pharrell',
 'Ed Sheeran',
 'Travis Scott',
 'Meghan Trainor',
 'Lorde',
 'Drake',
 'Maroon 5 Featuring Christina Aguilera',
 'Pharrell Williams',
 'Bruno Mars',
 'Post Malone Featuring 21 Savage',
 'Ke$ha',
 'Wiz Khalifa Featuring Charlie Puth',
 'Katy Perry Featuring Juicy J',
 'Macklemore & Ryan Lewis Featuring Wanz',
 'Maroon 5',
 'fun. Featuring Janelle Monae',
 'Bruno Mars',
 'The Weeknd',
 'John Legend',
 'Marshmello & Bastille',
 'Taylor Swift',
 'Drake Featuring WizKid & Kyla',
 'Imagine Dragons',
 'LMFAO',
 'Adele',
 'OneRepublic',
 'Katy Perry Featuring Kanye

## Constructing the dataframe

In [42]:
# each list becomes a column
top_2010s_100 = pd.DataFrame({"rankings":rankings_2,
                       "songs":songs_2,
                       "artists":artists_2,
                      })

top_2010s_100.head()

Unnamed: 0,rankings,songs,artists
0,1,Uptown Funk!,Mark Ronson Featuring Bruno Mars
1,2,Party Rock Anthem,LMFAO Featuring Lauren Bennett & GoonRock
2,3,Shape Of You,Ed Sheeran
3,4,Closer,The Chainsmokers Featuring Halsey
4,5,Girls Like You,Maroon 5 Featuring Cardi B


## Practicing Web Scraping

##### Retrieve an arbitrary Wikipedia page of "Python" and create a list of links on that page: url ='https://en.wikipedia.org/wiki/Python'

In [43]:
url_3 = "https://en.wikipedia.org/wiki/Python"

## Getting the html code of the web page

In [45]:
response_3 = requests.get(url_3)
response_3.status_code

200

## Parsing the html code

In [46]:
soup_3 = BeautifulSoup(response_3.content, "html.parser")
soup_3

<!DOCTYPE html>

<html class="client-nojs" dir="ltr" lang="en">
<head>
<meta charset="utf-8"/>
<title>Python - Wikipedia</title>
<script>document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgRequestId":"29543186-9912-4240-b57b-bbbcfe53f776","wgCSPNonce":!1,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"Python","wgTitle":"Python","wgCurRevisionId":987482924,"wgRevisionId":987482924,"wgArticleId":46332325,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Disambiguation pages with short descriptions","Short description is different from Wikidata","All article disambiguation pages","All disambiguation pages","Animal common name disambiguatio

In [5]:
##mw-content-text > div.mw-parser-output > ul:nth-child(7) > li:nth-child(1) > a

In [47]:
soup_3.select("li:nth-child(1) > a")

[<a class="mw-redirect" href="/wiki/Pythons" title="Pythons">Pythons</a>,
 <a href="/wiki/Python_(genus)" title="Python (genus)"><i>Python</i> (genus)</a>,
 <a href="#Computing"><span class="tocnumber">1</span> <span class="toctext">Computing</span></a>,
 <a href="/wiki/Python_(programming_language)" title="Python (programming language)">Python (programming language)</a>,
 <a href="/wiki/Python_of_Aenus" title="Python of Aenus">Python of Aenus</a>,
 <a href="/wiki/Python_(Efteling)" title="Python (Efteling)">Python (Efteling)</a>,
 <a href="/wiki/Python_(automobile_maker)" title="Python (automobile maker)">Python (automobile maker)</a>,
 <a href="/wiki/Python_(missile)" title="Python (missile)">Python (missile)</a>,
 <a href="/wiki/PYTHON" title="PYTHON">PYTHON</a>,
 <a href="/wiki/Python_(Monty)_Pictures" title="Python (Monty) Pictures">Python (Monty) Pictures</a>,
 <a href="/wiki/Cython" title="Cython">Cython</a>,
 <a href="/wiki/Category:Disambiguation_pages" title="Category:Disambi

In [52]:
links = []

num_iter_5 = len(soup_3.select("li:nth-child(1) > a"))

for i in range(num_iter_5): 
    links.append(soup_3.select("li:nth-child(1) > a")[i].get_text())

In [55]:
links = links[:-1]

In [56]:
links

['Pythons',
 'Python (genus)',
 '1 Computing',
 'Python (programming language)',
 'Python of Aenus',
 'Python (Efteling)',
 'Python (automobile maker)',
 'Python (missile)',
 'PYTHON',
 'Python (Monty) Pictures',
 'Cython',
 'Disambiguation pages',
 'Disambiguation pages with short descriptions',
 'Article',
 'Read',
 'Main page',
 'Help',
 'What links here',
 'Download as PDF',
 'Wikimedia Commons',
 'Afrikaans',
 'Privacy policy']

##### Find the number of titles that have changed in the United States Code since its last release point: url = 'http://uscode.house.gov/download/download.shtml'

In [57]:
url_4 = "https://uscode.house.gov/download/download.shtml"
response_4 = requests.get(url_4)
response_4.status_code
soup_4 = BeautifulSoup(response_4.content, "html.parser")
soup_4

<?xml version='1.0' encoding='UTF-8' ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml"><head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<meta content="IE=8" http-equiv="X-UA-Compatible"/>
<meta content="no-cache" http-equiv="pragma"/><!-- HTTP 1.0 -->
<meta content="no-cache,must-revalidate" http-equiv="cache-control"/><!-- HTTP 1.1 -->
<meta content="0" http-equiv="expires"/>
<link href="/javax.faces.resource/favicon.ico.xhtml?ln=images" rel="shortcut icon"/><link href="/javax.faces.resource/cssLayout.css.xhtml?ln=css" rel="stylesheet" type="text/css"/><script src="/javax.faces.resource/jsf.js.xhtml?ln=javax.faces" type="text/javascript"></script><link href="/javax.faces.resource/static.css.xhtml?ln=css" rel="stylesheet" type="text/css"/></head><body><script src="/javax.faces.resource/browserPreferences.js.xhtml?ln=scripts" type="text/javasc

In [63]:
#content > div > div > div.uscitemlist > div:nth-child(11)
#us\/usc\/t11
#content > div > div > div.uscitemlist > div:nth-child(15)
#us\/usc\/t11
#<div class="usctitlechanged" id="us/usc/t11">


soup_4.select("div.usctitlechanged")

[<div class="usctitlechanged" id="us/usc/t7">
 
           Title 7 - Agriculture
 
         </div>,
 <div class="usctitlechanged" id="us/usc/t11">
 
           Title 11 - Bankruptcy <span class="footnote"><a class="fn" href="#fn">٭</a></span>
 </div>,
 <div class="usctitlechanged" id="us/usc/t13">
 
           Title 13 - Census <span class="footnote"><a class="fn" href="#fn">٭</a></span>
 </div>,
 <div class="usctitlechanged" id="us/usc/t14">
 
           Title 14 - Coast Guard <span class="footnote"><a class="fn" href="#fn">٭</a></span>
 </div>,
 <div class="usctitlechanged" id="us/usc/t15">
 
           Title 15 - Commerce and Trade
 
         </div>,
 <div class="usctitlechanged" id="us/usc/t16">
 
           Title 16 - Conservation
 
         </div>,
 <div class="usctitlechanged" id="us/usc/t21">
 
           Title 21 - Food and Drugs
 
         </div>,
 <div class="usctitlechanged" id="us/usc/t24">
 
           Title 24 - Hospitals and Asylums
 
         </div>,
 <div class="uscti

In [70]:
laws_c = []

num_iter_6 = len(soup_4.select("div.usctitlechanged"))
for i in range(num_iter_6): 
    laws_c.append(soup_4.select("div.usctitlechanged")[i].get_text())

In [71]:
laws_c

['\n\n          Title 7 - Agriculture\n\n        ',
 '\n\n          Title 11 - Bankruptcy ٭\n',
 '\n\n          Title 13 - Census ٭\n',
 '\n\n          Title 14 - Coast Guard ٭\n',
 '\n\n          Title 15 - Commerce and Trade\n\n        ',
 '\n\n          Title 16 - Conservation\n\n        ',
 '\n\n          Title 21 - Food and Drugs\n\n        ',
 '\n\n          Title 24 - Hospitals and Asylums\n\n        ',
 '\n\n          Title 27 - Intoxicating Liquors\n\n        ',
 '\n\n          Title 32 - National Guard ٭\n',
 '\n\n          Title 33 - Navigation and Navigable Waters\n\n        ',
 '\n\n          Title 34 - Crime Control and Law Enforcement\n\n        ',
 '\n\n          Title 36 - Patriotic and National Observances, Ceremonies, and Organizations ٭\n',
 "\n\n          Title 38 - Veterans' Benefits ٭\n",
 '\n\n          Title 42 - The Public Health and Welfare\n\n        ',
 '\n\n          Title 45 - Railroads\n\n        ',
 '\n\n          Title 49 - Transportation ٭\n',
 '\n\n 

In [72]:
laws_c = list(map(lambda x: x.replace("\n",""), laws_c))
laws_c

['          Title 7 - Agriculture        ',
 '          Title 11 - Bankruptcy ٭',
 '          Title 13 - Census ٭',
 '          Title 14 - Coast Guard ٭',
 '          Title 15 - Commerce and Trade        ',
 '          Title 16 - Conservation        ',
 '          Title 21 - Food and Drugs        ',
 '          Title 24 - Hospitals and Asylums        ',
 '          Title 27 - Intoxicating Liquors        ',
 '          Title 32 - National Guard ٭',
 '          Title 33 - Navigation and Navigable Waters        ',
 '          Title 34 - Crime Control and Law Enforcement        ',
 '          Title 36 - Patriotic and National Observances, Ceremonies, and Organizations ٭',
 "          Title 38 - Veterans' Benefits ٭",
 '          Title 42 - The Public Health and Welfare        ',
 '          Title 45 - Railroads        ',
 '          Title 49 - Transportation ٭',
 '          Title 54 - National Park Service and Related Programs ٭']

In [None]:
laws_c = list(map(lambda x: x.replace(" ٭",""), laws_c))
laws_c

In [None]:
laws_c = list(map(lambda x: x.replace("   ",""), laws_c))
laws_c

In [101]:
laws_c = list(map(lambda x: x.replace("           ",""), laws_c))
laws_c

[' Title 7 - Agriculture',
 ' Title 11 - Bankruptcy',
 ' Title 13 - Census',
 ' Title 14 - Coast Guard',
 ' Title 15 - Commerce and Trade',
 ' Title 16 - Conservation',
 ' Title 21 - Food and Drugs',
 ' Title 24 - Hospitals and Asylums',
 ' Title 27 - Intoxicating Liquors',
 ' Title 32 - National Guard',
 ' Title 33 - Navigation and Navigable Waters',
 ' Title 34 - Crime Control and Law Enforcement',
 ' Title 36 - Patriotic and National Observances, Ceremonies, and Organizations',
 ' Title 38 - Veterans Benefits',
 ' Title 42 - The Public Health and Welfare',
 ' Title 45 - Railroads',
 ' Title 49 - Transportation',
 ' Title 54 - National Park Service and Related Programs']

##### Create a Python list with the top ten FBI's Most Wanted names: url = 'https://www.fbi.gov/wanted/topten'

In [111]:
url_5 = "https://www.fbi.gov/wanted/topten"
response_5 = requests.get(url_5)
response_5.status_code
soup_5 = BeautifulSoup(response_5.content, "html.parser")
soup_5

<!DOCTYPE html>

<html data-gridsystem="bs3" lang="en">
<head>
<meta charset="utf-8"/>
<meta content="ie=edge" http-equiv="x-ua-compatible"/>
<meta content="width=device-width, initial-scale=1.0" name="viewport"/>
<link href="https://www.fbi.gov/wanted/topten" rel="canonical"/>
<link href="https://www.fbi.gov/wanted/topten/RSS" rel="alternate" title="Ten Most Wanted Fugitives - RSS 1.0" type="application/rss+xml"/>
<link href="https://www.fbi.gov/wanted/topten/rss.xml" rel="alternate" title="Ten Most Wanted Fugitives - RSS 2.0" type="application/rss+xml"/>
<link href="https://www.fbi.gov/wanted/topten/atom.xml" rel="alternate" title="Ten Most Wanted Fugitives - Atom" type="application/rss+xml"/>
<title>Ten Most Wanted Fugitives — FBI</title>
<meta content="The FBI is offering rewards for information leading to the apprehension of the Ten Most Wanted Fugitives. Select the images of suspects to display more information." name="DC.description"/>
<meta content="text/plain" name="DC.format"

In [112]:
#query-results-0f737222c5054a81a120bce207b0446a > ul > li:nth-child(1) > h3 > a
#query-results-0f737222c5054a81a120bce207b0446a > ul > li:nth-child(1) > h3
#<h3 class="title">
#<a href="https://www.fbi.gov/wanted/topten/bhadreshkumar-chetanbhai-patel">BHADRESHKUMAR CHETANBHAI PATEL</a>
#</h3>

soup_5.select("h3 a")

[<a href="https://www.fbi.gov/wanted/topten/bhadreshkumar-chetanbhai-patel">BHADRESHKUMAR CHETANBHAI PATEL</a>,
 <a href="https://www.fbi.gov/wanted/topten/alejandro-castillo">ALEJANDRO ROSALES CASTILLO</a>,
 <a href="https://www.fbi.gov/wanted/topten/arnoldo-jimenez">ARNOLDO JIMENEZ</a>,
 <a href="https://www.fbi.gov/wanted/topten/jason-derek-brown">JASON DEREK BROWN</a>,
 <a href="https://www.fbi.gov/wanted/topten/alexis-flores">ALEXIS FLORES</a>,
 <a href="https://www.fbi.gov/wanted/topten/jose-rodolfo-villarreal-hernandez">JOSE RODOLFO VILLARREAL-HERNANDEZ</a>,
 <a href="https://www.fbi.gov/wanted/topten/eugene-palmer">EUGENE PALMER</a>,
 <a href="https://www.fbi.gov/wanted/topten/rafael-caro-quintero">RAFAEL CARO-QUINTERO</a>,
 <a href="https://www.fbi.gov/wanted/topten/robert-william-fisher">ROBERT WILLIAM FISHER</a>,
 <a href="https://www.fbi.gov/wanted/topten/yaser-abdel-said">YASER ABDEL SAID</a>]

In [122]:
most_w = []
ranking = []

num_iter_7 = len(soup_5.select("h3 a"))
for i in range(num_iter_7): 
    most_w.append(soup_5.select("h3 a")[i].get_text())
    ranking.append([i+1])

In [123]:
most_w

['BHADRESHKUMAR CHETANBHAI PATEL',
 'ALEJANDRO ROSALES CASTILLO',
 'ARNOLDO JIMENEZ',
 'JASON DEREK BROWN',
 'ALEXIS FLORES',
 'JOSE RODOLFO VILLARREAL-HERNANDEZ',
 'EUGENE PALMER',
 'RAFAEL CARO-QUINTERO',
 'ROBERT WILLIAM FISHER',
 'YASER ABDEL SAID']

In [124]:
# each list becomes a column
top_10 = pd.DataFrame({"FBI Most Wanted":most_w,
                       "Ranking": ranking,
                      })

top_10

Unnamed: 0,FBI Most Wanted,Ranking
0,BHADRESHKUMAR CHETANBHAI PATEL,[1]
1,ALEJANDRO ROSALES CASTILLO,[2]
2,ARNOLDO JIMENEZ,[3]
3,JASON DEREK BROWN,[4]
4,ALEXIS FLORES,[5]
5,JOSE RODOLFO VILLARREAL-HERNANDEZ,[6]
6,EUGENE PALMER,[7]
7,RAFAEL CARO-QUINTERO,[8]
8,ROBERT WILLIAM FISHER,[9]
9,YASER ABDEL SAID,[10]


##### List all language names and number of related articles in the order they appear in wikipedia.org: url = 'https://www.wikipedia.org/'

In [125]:
url_6 = "https://www.wikipedia.org/"
response_6 = requests.get(url_6)
response_6.status_code
soup_6 = BeautifulSoup(response_6.content, "html.parser")
soup_6

<!DOCTYPE html>

<html class="no-js" lang="mul">
<head>
<meta charset="utf-8"/>
<title>Wikipedia</title>
<meta content="Wikipedia is a free online encyclopedia, created and edited by volunteers around the world and hosted by the Wikimedia Foundation." name="description"/>
<script>
document.documentElement.className = document.documentElement.className.replace( /(^|\s)no-js(\s|$)/, "$1js-enabled$2" );
</script>
<meta content="initial-scale=1,user-scalable=yes" name="viewport"/>
<link href="/static/apple-touch/wikipedia.png" rel="apple-touch-icon"/>
<link href="/static/favicon/wikipedia.ico" rel="shortcut icon"/>
<link href="//creativecommons.org/licenses/by-sa/3.0/" rel="license"/>
<style>
.sprite{background-image:url(portal/wikipedia.org/assets/img/sprite-46c49284.png);background-image:linear-gradient(transparent,transparent),url(portal/wikipedia.org/assets/img/sprite-46c49284.svg);background-repeat:no-repeat;display:inline-block;vertical-align:middle}.svg-Commons-logo_sister{backgroun

In [133]:
#js-link-box-pt > strong
#js-link-box-pt > small > bdi

soup_6.select("strong")

[<strong class="jsl10n localized-slogan" data-jsl10n="slogan">The Free Encyclopedia</strong>,
 <strong>English</strong>,
 <strong>Español</strong>,
 <strong>日本語</strong>,
 <strong>Deutsch</strong>,
 <strong>Русский</strong>,
 <strong>Français</strong>,
 <strong>Italiano</strong>,
 <strong>中文</strong>,
 <strong>Português</strong>,
 <strong>Polski</strong>,
 <strong class="jsl10n" data-jsl10n="app-links.title">
 <a class="jsl10n" data-jsl10n="app-links.url" href="https://en.wikipedia.org/wiki/List_of_Wikipedia_mobile_applications">
 Download Wikipedia for Android or iOS
 </a>
 </strong>]

In [157]:
soup_6.select("bdi")

[<bdi dir="ltr">6 195 000+</bdi>,
 <bdi dir="ltr">1 641 000+</bdi>,
 <bdi dir="ltr">1 239 000+</bdi>,
 <bdi dir="ltr">2 503 000+</bdi>,
 <bdi dir="ltr">1 678 000+</bdi>,
 <bdi dir="ltr">2 270 000+</bdi>,
 <bdi dir="ltr">1 652 000+</bdi>,
 <bdi dir="ltr">1 159 000+</bdi>,
 <bdi dir="ltr">1 047 000+</bdi>,
 <bdi dir="ltr">1 439 000+</bdi>,
 <bdi dir="ltr">
 1 000 000+
 </bdi>,
 <bdi dir="rtl">العربية</bdi>,
 <bdi dir="rtl">مصرى</bdi>,
 <bdi dir="ltr">
 100 000+
 </bdi>,
 <bdi dir="rtl">فارسی</bdi>,
 <bdi dir="rtl">עברית</bdi>,
 <bdi dir="rtl" lang="kk-Arab">قازاقشا</bdi>,
 <bdi dir="rtl">تۆرکجه</bdi>,
 <bdi dir="rtl">اردو</bdi>,
 <bdi dir="ltr">
 10 000+
 </bdi>,
 <bdi dir="rtl" lang="ku-Arab">كوردی</bdi>,
 <bdi dir="rtl">کوردیی ناوەندی</bdi>,
 <bdi dir="rtl">مازِرونی</bdi>,
 <bdi dir="rtl">پنجابی (شاہ مکھی)</bdi>,
 <bdi dir="rtl">پښتو</bdi>,
 <bdi dir="rtl">سنڌي</bdi>,
 <bdi dir="rtl">ייִדיש</bdi>,
 <bdi dir="ltr">
 1 000+
 </bdi>,
 <bdi dir="rtl">ܐܬܘܪܝܐ</bdi>,
 <bdi dir="rtl">گیلکی</bd

In [158]:
languages = []
a_numbers = []

num_iter_8 = len(soup_6.select("strong"))
for i in range(num_iter_8): 
    languages.append(soup_6.select("strong")[i].get_text())
    a_numbers.append(soup_6.select("bdi")[i].get_text())

In [159]:
languages = languages[1:-1]
languages

['English',
 'Español',
 '日本語',
 'Deutsch',
 'Русский',
 'Français',
 'Italiano',
 '中文',
 'Português',
 'Polski']

In [162]:
a_numbers = list(map(lambda x: x.replace("\n",""), a_numbers))
a_numbers

['6.195.000+',
 '1.641.000+',
 '1.239.000+',
 '2.503.000+',
 '1.678.000+',
 '2.270.000+',
 '1.652.000+',
 '1.159.000+',
 '1.047.000+',
 '1.439.000+',
 '1.000.000+']

In [163]:
a_numbers = list(map(lambda x: x.replace("\xa0","."), a_numbers))
a_numbers

['6.195.000+',
 '1.641.000+',
 '1.239.000+',
 '2.503.000+',
 '1.678.000+',
 '2.270.000+',
 '1.652.000+',
 '1.159.000+',
 '1.047.000+',
 '1.439.000+',
 '1.000.000+']

In [165]:
a_numbers = a_numbers[:-2]

In [166]:
# each list becomes a column
top_10_l = pd.DataFrame({"Language":languages,
                       "Articles": a_numbers,
                      })

top_10_l

Unnamed: 0,Language,Articles
0,English,6.195.000+
1,Español,1.641.000+
2,日本語,1.239.000+
3,Deutsch,2.503.000+
4,Русский,1.678.000+
5,Français,2.270.000+
6,Italiano,1.652.000+
7,中文,1.159.000+
8,Português,1.047.000+
9,Polski,1.439.000+


##### A list with the different kind of datasets available in data.gov.uk: url = 'https://data.gov.uk/'

In [167]:
url_7 = "https://data.gov.uk/"
response_7 = requests.get(url_7)
response_7.status_code
soup_7 = BeautifulSoup(response_7.content, "html.parser")
soup_7


<!DOCTYPE html>

<!--[if lt IE 9]><html class="lte-ie8" lang="en"><![endif]-->
<!--[if gt IE 8]><!--><html lang="en"><!--<![endif]-->
<html class="govuk-template">
<head>
<meta charset="utf-8"/>
<title>Find open data - data.gov.uk</title>
<meta content="#0b0c0c" name="theme-color">
<meta content="width=device-width, initial-scale=1" name="viewport"/>
<link href="/find-assets/application-f5e9f2f4ce27e6411457ec8df8d360010daa54eb6f37f2d7be3704208b1973f3.css" media="screen" rel="stylesheet"/>
<meta content="authenticity_token" name="csrf-param">
<meta content="keWyvUkoSArny6DRtNtlWLd4PwJcNs301y4bCivyN0iL2nQpDpeIVCW2EIhIzd6sr+2ffh0i42n9pd7DNkvHgw==" name="csrf-token"/>
</meta></meta></head>
<body class="govuk-template__body">
<script>document.body.className = ((document.body.className) ? document.body.className + ' js-enabled' : 'js-enabled');</script>
<a class="gem-c-skip-link govuk-skip-link" href="#main-content">Skip to main content</a>
<div aria-label="cookie banner" class="gem-c-cooki

In [174]:
#main-content > div:nth-child(3) > div > ul > li:nth-child(1) > h3 > a


soup_7.select("h3 a")

[<a class="govuk-link" href="/search?filters%5Btopic%5D=Business+and+economy">Business and economy</a>,
 <a class="govuk-link" href="/search?filters%5Btopic%5D=Crime+and+justice">Crime and justice</a>,
 <a class="govuk-link" href="/search?filters%5Btopic%5D=Defence">Defence</a>,
 <a class="govuk-link" href="/search?filters%5Btopic%5D=Education">Education</a>,
 <a class="govuk-link" href="/search?filters%5Btopic%5D=Environment">Environment</a>,
 <a class="govuk-link" href="/search?filters%5Btopic%5D=Government">Government</a>,
 <a class="govuk-link" href="/search?filters%5Btopic%5D=Government+spending">Government spending</a>,
 <a class="govuk-link" href="/search?filters%5Btopic%5D=Health">Health</a>,
 <a class="govuk-link" href="/search?filters%5Btopic%5D=Mapping">Mapping</a>,
 <a class="govuk-link" href="/search?filters%5Btopic%5D=Society">Society</a>,
 <a class="govuk-link" href="/search?filters%5Btopic%5D=Towns+and+cities">Towns and cities</a>,
 <a class="govuk-link" href="/search?f

In [175]:
d_types = []

num_iter_9 = len(soup_7.select("h3 a"))
for i in range(num_iter_9): 
    d_types.append(soup_7.select("h3 a")[i].get_text())

In [176]:
d_types

['Business and economy',
 'Crime and justice',
 'Defence',
 'Education',
 'Environment',
 'Government',
 'Government spending',
 'Health',
 'Mapping',
 'Society',
 'Towns and cities',
 'Transport']

##### Display the top 10 languages by number of native speakers stored in a pandas dataframe: url = 'https://en.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakers'

In [177]:
url_8 = "https://en.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakers"
response_8 = requests.get(url_8)
response_8.status_code
soup_8 = BeautifulSoup(response_8.content, "html.parser")
soup_8

<!DOCTYPE html>

<html class="client-nojs" dir="ltr" lang="en">
<head>
<meta charset="utf-8"/>
<title>List of languages by number of native speakers - Wikipedia</title>
<script>document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgRequestId":"0d2eed89-929c-46b1-a0b2-70e0f7c6f23a","wgCSPNonce":!1,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"List_of_languages_by_number_of_native_speakers","wgTitle":"List of languages by number of native speakers","wgCurRevisionId":985620308,"wgRevisionId":985620308,"wgArticleId":405385,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Wikipedia indefinitely semi-protected pages","Articles with short desc

In [185]:
#mw-content-text > div.mw-parser-output > table:nth-child(18) > tbody > tr:nth-child(1) > td:nth-child(1)
#mw-content-text > div.mw-parser-output > table:nth-child(18) > tbody > tr:nth-child(1) > td:nth-child(2) > a
#mw-content-text > div.mw-parser-output > table:nth-child(18) > tbody > tr:nth-child(1) > td:nth-child(3)

soup_8.select("td:nth-child(2) > a")

[<a href="/wiki/Mandarin_Chinese" title="Mandarin Chinese">Mandarin Chinese</a>,
 <a href="/wiki/Spanish_language" title="Spanish language">Spanish</a>,
 <a href="/wiki/English_language" title="English language">English</a>,
 <a href="/wiki/Hindi" title="Hindi">Hindi</a>,
 <a href="/wiki/Hindustani_language" title="Hindustani language">Hindustani</a>,
 <a href="/wiki/Bengali_language" title="Bengali language">Bengali</a>,
 <a href="/wiki/Portuguese_language" title="Portuguese language">Portuguese</a>,
 <a href="/wiki/Russian_language" title="Russian language">Russian</a>,
 <a href="/wiki/Japanese_language" title="Japanese language">Japanese</a>,
 <a href="/wiki/Punjabi_language" title="Punjabi language">Western Punjabi</a>,
 <a href="/wiki/Marathi_language" title="Marathi language">Marathi</a>,
 <a href="/wiki/Telugu_language" title="Telugu language">Telugu</a>,
 <a href="/wiki/Wu_Chinese" title="Wu Chinese">Wu Chinese</a>,
 <a href="/wiki/Turkish_language" title="Turkish language">Tur

In [190]:
ranking_l = []
language_n = []
native_s = []

num_iter_10 = len(soup_8.select("td:nth-child(1)"))
for i in range(num_iter_10): 
    ranking_l.append(soup_8.select("td:nth-child(1)")[i].get_text())
    
num_iter_11 = len(soup_8.select("td:nth-child(2) > a"))
for i in range(num_iter_11): 
    language_n.append(soup_8.select("td:nth-child(2) > a")[i].get_text())
    
num_iter_12 = len(soup_8.select("td:nth-child(3)"))
for i in range(num_iter_12): 
    native_s.append(soup_8.select("td:nth-child(3)")[i].get_text())

In [197]:
ranking_l = list(map(lambda x: x.replace("\n",""), ranking_l))
ranking_l

['1',
 '2',
 '3',
 '4',
 '5',
 '6',
 '7',
 '8',
 '9',
 '10',
 '11',
 '12',
 '13',
 '14',
 '15',
 '16',
 '17',
 '18',
 '19',
 '20',
 '21',
 '22',
 '23',
 '24',
 '25',
 '26',
 '27',
 '28',
 '29',
 '30',
 '31',
 '32',
 '33',
 '34',
 '35',
 '36',
 '37',
 '38',
 '39',
 '40',
 '41',
 '42',
 '43',
 '44',
 '45',
 '46',
 '47',
 '48',
 '49',
 '50',
 '51',
 '52',
 '53',
 '54',
 '55',
 '56',
 '57',
 '58',
 '59',
 '60',
 '61',
 '62',
 '63',
 '64',
 '65',
 '66',
 '67',
 '68',
 '69',
 '70',
 '71',
 '72',
 '73',
 '74',
 '75',
 '76',
 '77',
 '78',
 '79',
 '80',
 '81',
 '82',
 '83',
 '84',
 '85',
 '86',
 '87',
 '88',
 '89',
 '90',
 '91',
 '',
 '1',
 '2',
 '3',
 '4',
 '5',
 '6',
 '7',
 '8',
 '9',
 '10',
 '11',
 '12',
 '13',
 '14',
 '15',
 '16',
 '17',
 '18',
 '19',
 '20',
 '21',
 '22',
 '23',
 '24',
 '25',
 '26',
 '27',
 '28',
 '29',
 '30',
 '31',
 '32',
 '33',
 '34',
 '35',
 '36',
 '37',
 '38',
 '39',
 '40',
 '41',
 '42',
 '43',
 '44',
 '45',
 '46',
 '47',
 '48',
 '49',
 '50',
 '51',
 '52',
 '53',
 '54'

In [198]:
ranking_l

['1',
 '2',
 '3',
 '4',
 '5',
 '6',
 '7',
 '8',
 '9',
 '10',
 '11',
 '12',
 '13',
 '14',
 '15',
 '16',
 '17',
 '18',
 '19',
 '20',
 '21',
 '22',
 '23',
 '24',
 '25',
 '26',
 '27',
 '28',
 '29',
 '30',
 '31',
 '32',
 '33',
 '34',
 '35',
 '36',
 '37',
 '38',
 '39',
 '40',
 '41',
 '42',
 '43',
 '44',
 '45',
 '46',
 '47',
 '48',
 '49',
 '50',
 '51',
 '52',
 '53',
 '54',
 '55',
 '56',
 '57',
 '58',
 '59',
 '60',
 '61',
 '62',
 '63',
 '64',
 '65',
 '66',
 '67',
 '68',
 '69',
 '70',
 '71',
 '72',
 '73',
 '74',
 '75',
 '76',
 '77',
 '78',
 '79',
 '80',
 '81',
 '82',
 '83',
 '84',
 '85',
 '86',
 '87',
 '88',
 '89',
 '90',
 '91',
 '',
 '1',
 '2',
 '3',
 '4',
 '5',
 '6',
 '7',
 '8',
 '9',
 '10',
 '11',
 '12',
 '13',
 '14',
 '15',
 '16',
 '17',
 '18',
 '19',
 '20',
 '21',
 '22',
 '23',
 '24',
 '25',
 '26',
 '27',
 '28',
 '29',
 '30',
 '31',
 '32',
 '33',
 '34',
 '35',
 '36',
 '37',
 '38',
 '39',
 '40',
 '41',
 '42',
 '43',
 '44',
 '45',
 '46',
 '47',
 '48',
 '49',
 '50',
 '51',
 '52',
 '53',
 '54'

In [192]:
language_n

['Mandarin Chinese',
 'Spanish',
 'English',
 'Hindi',
 'Hindustani',
 'Bengali',
 'Portuguese',
 'Russian',
 'Japanese',
 'Western Punjabi',
 'Marathi',
 'Telugu',
 'Wu Chinese',
 'Turkish',
 'Korean',
 'French',
 'German',
 'Vietnamese',
 'Tamil',
 'Yue Chinese',
 'Urdu',
 'Hindustani',
 'Javanese',
 'Italian',
 'Egyptian Arabic',
 'Gujarati',
 'Iranian Persian',
 'Bhojpuri',
 'Min Nan Chinese',
 'Hakka Chinese',
 'Jin Chinese',
 'Hausa',
 'Kannada',
 'Indonesian',
 'Polish',
 'Yoruba',
 'Xiang Chinese',
 'Malayalam',
 'Odia',
 'Maithili',
 'Burmese',
 'Eastern Punjabi',
 'Sunda',
 'Sudanese Arabic',
 'Algerian Arabic',
 'Moroccan Arabic',
 'Ukrainian',
 'Igbo',
 'Northern Uzbek',
 'Sindhi',
 'North Levantine Arabic',
 'Romanian',
 'Tagalog',
 'Dutch',
 'Saʽidi Arabic',
 'Gan Chinese',
 'Amharic',
 'Northern Pashto',
 'Magahi',
 'Thai',
 'Saraiki',
 'Khmer',
 'Chhattisgarhi',
 'Somali',
 'Malay',
 'Malay',
 'Cebuano',
 'Nepali',
 'Mesopotamian Arabic',
 'Assamese',
 'Sinhalese',
 'No

In [193]:
native_s

['918\n',
 '480\n',
 '379\n',
 '341\n',
 '228\n',
 '221\n',
 '154\n',
 '128\n',
 '92.7\n',
 '83.1\n',
 '82.0\n',
 '81.4\n',
 '79.4\n',
 '77.3\n',
 '77.2\n',
 '76.1\n',
 '76.0\n',
 '75.0\n',
 '73.1\n',
 '68.6\n',
 '68.3\n',
 '64.8\n',
 '64.6\n',
 '56.4\n',
 '52.8\n',
 '52.2\n',
 '50.1\n',
 '48.2\n',
 '46.9\n',
 '43.9\n',
 '43.6\n',
 '43.4\n',
 '39.7\n',
 '37.8\n',
 '37.3\n',
 '37.1\n',
 '34.5\n',
 '33.9\n',
 '32.9\n',
 '32.6\n',
 '32.4\n',
 '31.9\n',
 '29.4\n',
 '27.5\n',
 '27.3\n',
 '27.0\n',
 '25.1\n',
 '24.6\n',
 '24.6\n',
 '24.3\n',
 '23.6\n',
 '23.1\n',
 '22.4\n',
 '22.1\n',
 '21.9\n',
 '20.9\n',
 '20.7\n',
 '20.7\n',
 '20.0\n',
 '16.6\n',
 '16.3\n',
 '16.2\n',
 '16.1\n',
 '15.9\n',
 '15.8\n',
 '15.7\n',
 '15.3\n',
 '15.3\n',
 '14.6\n',
 '14.5\n',
 '14.5\n',
 '14.1\n',
 '13.8\n',
 '13.1\n',
 '13.0\n',
 '12.9\n',
 '12.8\n',
 '12.6\n',
 '12.1\n',
 '12.1\n',
 '11.6\n',
 '11.6\n',
 '11.4\n',
 '11.0\n',
 '10.9\n',
 '10.8\n',
 '10.7\n',
 '10.5\n',
 '10.4\n',
 '10.3\n',
 '10.3\n',
 '935 (

In [195]:
native_s = list(map(lambda x: x.replace("\n",""), native_s))
native_s

['918',
 '480',
 '379',
 '341',
 '228',
 '221',
 '154',
 '128',
 '92.7',
 '83.1',
 '82.0',
 '81.4',
 '79.4',
 '77.3',
 '77.2',
 '76.1',
 '76.0',
 '75.0',
 '73.1',
 '68.6',
 '68.3',
 '64.8',
 '64.6',
 '56.4',
 '52.8',
 '52.2',
 '50.1',
 '48.2',
 '46.9',
 '43.9',
 '43.6',
 '43.4',
 '39.7',
 '37.8',
 '37.3',
 '37.1',
 '34.5',
 '33.9',
 '32.9',
 '32.6',
 '32.4',
 '31.9',
 '29.4',
 '27.5',
 '27.3',
 '27.0',
 '25.1',
 '24.6',
 '24.6',
 '24.3',
 '23.6',
 '23.1',
 '22.4',
 '22.1',
 '21.9',
 '20.9',
 '20.7',
 '20.7',
 '20.0',
 '16.6',
 '16.3',
 '16.2',
 '16.1',
 '15.9',
 '15.8',
 '15.7',
 '15.3',
 '15.3',
 '14.6',
 '14.5',
 '14.5',
 '14.1',
 '13.8',
 '13.1',
 '13.0',
 '12.9',
 '12.8',
 '12.6',
 '12.1',
 '12.1',
 '11.6',
 '11.6',
 '11.4',
 '11.0',
 '10.9',
 '10.8',
 '10.7',
 '10.5',
 '10.4',
 '10.3',
 '10.3',
 '935 (955)',
 '390 (405)',
 '365 (360)',
 '295 (310)',
 '280 (295)',
 '205 (215)',
 '200 (205)',
 '160 (155)',
 '125 (125)',
 '95 (100)',
 '92 (95)',
 '82',
 '80',
 '77',
 '76',
 '76',
 '7

In [200]:
# each list becomes a column
top_10_ln = pd.DataFrame({"Language":language_n[:10],
                       "Native Speakers": native_s[:10],
                        "Ranking": ranking_l[:10]
                      })

top_10_ln

Unnamed: 0,Language,Native Speakers,Ranking
0,Mandarin Chinese,918.0,1
1,Spanish,480.0,2
2,English,379.0,3
3,Hindi,341.0,4
4,Hindustani,228.0,5
5,Bengali,221.0,6
6,Portuguese,154.0,7
7,Russian,128.0,8
8,Japanese,92.7,9
9,Western Punjabi,83.1,10
