
# Lab | Web Scraping Multiple Pages
### based on Lab | Web Scraping Single Page

#### Business goal:

- Check the `case_study_gnod.md` file.
- Make sure you've understood the big picture of your project:

  - the goal of the company (`Gnod`),
  - their current product (`Gnoosic`),
  - their strategy, and
  - how your project fits into this context.

  Re-read the business case and the e-mail from the CTO, take a look at the flowchart and create an initial Trello board with the tasks you think you'll have to accomplish.

#### Instructions - Scraping popular songs

Your product will take a song as an input from the user and will output another song (the recommendation). In most cases, the recommended song will have to be similar to the inputted song, but the CTO thinks that if the song is on the top charts at the moment, the user will enjoy more a recommendation of a song that's also popular at the moment.

You have find data on the internet about currently popular songs. Billboard maintains a weekly Top 100 of "hot" songs here: [https://www.billboard.com/charts/hot-100](https://www.billboard.com/charts/hot-100).

It's a good place to start! Scrape the current top 100 songs and their respective artists, and put the information into a pandas dataframe.



In [1]:
from bs4 import BeautifulSoup
import requests
from tqdm.notebook import trange, tqdm
import pandas as pd

In [63]:
# Import the requests library

url = "https://www.billboard.com/charts/hot-100"
response = requests.get(url)
response.status_code


200

In [64]:
response


<Response [200]>

In [65]:


# Create the soup with the html content stored in the response object

soup = BeautifulSoup(response.content, "html.parser")

#soup = BeautifulSoup(response.content, "html.parser")

In [66]:
# look at the soup

print (soup.prettify())

<!DOCTYPE html>
<html class="" lang="">
 <head>
  <meta charset="utf-8"/>
  <meta content="IE=edge" http-equiv="X-UA-Compatible"/>
  <meta content="width=device-width, initial-scale=1, user-scalable=no" name="viewport"/>
  <title>
   The Hot 100 Chart | Billboard
  </title>
  <meta content="The Hot 100 Chart" name="title" property="title">
   <meta content="@billboard" name="twitter:site"/>
   <meta content="Billboard" property="og:site_name">
    <meta content="article" property="og:type">
     <link href="/manifest.json" rel="manifest"/>
     <style>
      .chart-pro-access {
            background-image: url('https://www.billboard.com/assets/1613666598/images/piano/chart-pro-access-mb.png?f7f898e8791e9146d9c8');
        }

        @media (min-width: 769px) {
            .chart-pro-access {
                background-image: url('https://www.billboard.com/assets/1613666598/images/piano/chart-pro-access-dk.png?f7f898e8791e9146d9c8');
            }
        }
     </style>
     <script a

### css selectors
* **Author** 
- span.chart-element__information__artist.text--truncate.color--secondary
    
    
* **Song**
- "span.chart-element__information__song.text--truncate.color--primary

In [57]:
# select only the box with the top100

top_h = soup.select(".chart-list__elements")

In [59]:
print (top_h)

[<ol class="chart-list__elements">
<li class="chart-list__element display--flex">
<button class="chart-element__wrapper display--flex flex--grow sort--default">
<span class="chart-element__rank flex--column flex--xy-center flex--no-shrink">
<span class="chart-element__rank__number">1</span>
<span class="chart-element__trend chart-element__trend--steady color--secondary"><i class="fa fa-arrow-right"><span class="sr--only">Steady</span></i></span>
</span>
<span class="chart-element__information">
<span class="chart-element__information__song text--truncate color--primary">Drivers License</span>
<span class="chart-element__information__artist text--truncate color--secondary">Olivia Rodrigo</span>
<span class="chart-element__information__delta color--secondary">
<span class="chart-element__information__delta__text text--default">-</span>
<span class="chart-element__information__delta__text text--last">1 Last Week</span>
<span class="chart-element__information__delta__text text--peak">1 Pea

In [67]:
authors = soup.select("span.chart-element__information__artist.text--truncate.color--secondary")

In [77]:
authors

[<span class="chart-element__information__artist text--truncate color--secondary">Olivia Rodrigo</span>,
 <span class="chart-element__information__artist text--truncate color--secondary">Cardi B</span>,
 <span class="chart-element__information__artist text--truncate color--secondary">The Weeknd</span>,
 <span class="chart-element__information__artist text--truncate color--secondary">The Weeknd</span>,
 <span class="chart-element__information__artist text--truncate color--secondary">24kGoldn Featuring iann dior</span>,
 <span class="chart-element__information__artist text--truncate color--secondary">Ariana Grande</span>,
 <span class="chart-element__information__artist text--truncate color--secondary">Chris Brown &amp; Young Thug</span>,
 <span class="chart-element__information__artist text--truncate color--secondary">Dua Lipa Featuring DaBaby</span>,
 <span class="chart-element__information__artist text--truncate color--secondary">Ariana Grande</span>,
 <span class="chart-element__info

In [68]:
songs = soup.select("span.chart-element__information__song.text--truncate.color--primary")

In [75]:
songs

[<span class="chart-element__information__song text--truncate color--primary">Drivers License</span>,
 <span class="chart-element__information__song text--truncate color--primary">Up</span>,
 <span class="chart-element__information__song text--truncate color--primary">Blinding Lights</span>,
 <span class="chart-element__information__song text--truncate color--primary">Save Your Tears</span>,
 <span class="chart-element__information__song text--truncate color--primary">Mood</span>,
 <span class="chart-element__information__song text--truncate color--primary">34+35</span>,
 <span class="chart-element__information__song text--truncate color--primary">Go Crazy</span>,
 <span class="chart-element__information__song text--truncate color--primary">Levitating</span>,
 <span class="chart-element__information__song text--truncate color--primary">Positions</span>,
 <span class="chart-element__information__song text--truncate color--primary">What You Know Bout Love</span>,
 <span class="chart-elem

In [69]:


artists = []
for k in range(len(authors)):
    artists.append(soup.select('span.chart-element__information__artist.text--truncate.color--secondary')[k].get_text())



In [70]:
artists

['Olivia Rodrigo',
 'Cardi B',
 'The Weeknd',
 'The Weeknd',
 '24kGoldn Featuring iann dior',
 'Ariana Grande',
 'Chris Brown & Young Thug',
 'Dua Lipa Featuring DaBaby',
 'Ariana Grande',
 'Pop Smoke',
 'CJ',
 'Pop Smoke Featuring Lil Baby & DaBaby',
 'SZA',
 'Justin Bieber Featuring Chance The Rapper',
 'Billie Eilish',
 'Justin Bieber & benny blanco',
 'Pooh Shiesty Featuring Lil Durk',
 'Luke Combs',
 'Bad Bunny & Jhay Cortez',
 'Gabby Barrett Featuring Charlie Puth',
 'Megan Thee Stallion',
 'Doja Cat',
 'AJR',
 'Tate McRae',
 'Machine Gun Kelly X blackbear',
 'Yung Bleu Featuring Drake',
 'Internet Money & Gunna Featuring Don Toliver & NAV',
 'Megan Thee Stallion Featuring DaBaby',
 'Taylor Swift',
 'Drake Featuring Lil Durk',
 'Lil Baby',
 'Lewis Capaldi',
 'BRS Kash',
 'Justin Bieber',
 'Niko Moon',
 'Morgan Wallen',
 'Moneybagg Yo',
 'Morgan Wallen',
 'Kelsea Ballerini',
 'Chris Stapleton',
 'Saweetie Featuring Doja Cat',
 'Parmalee x Blanco Brown',
 'BTS',
 'The Kid LAROI',
 

In [71]:


song_name = []
for k in range(len(authors)):
    song_name.append(soup.select('span.chart-element__information__song.text--truncate.color--primary')[k].get_text())



In [72]:
song_name

['Drivers License',
 'Up',
 'Blinding Lights',
 'Save Your Tears',
 'Mood',
 '34+35',
 'Go Crazy',
 'Levitating',
 'Positions',
 'What You Know Bout Love',
 'Whoopty',
 'For The Night',
 'Good Days',
 'Holy',
 'Therefore I Am',
 'Lonely',
 'Back In Blood',
 'Better Together',
 'Dakiti',
 'I Hope',
 'Body',
 'Streets',
 'Bang!',
 'You Broke Me First.',
 "My Ex's Best Friend",
 "You're Mines Still",
 'Lemonade',
 'Cry Baby',
 'Willow',
 'Laugh Now Cry Later',
 'On Me',
 'Before You Go',
 'Throat Baby (Go Baby)',
 'Anyone',
 'Good Time',
 'Wasted On You',
 'Time Today',
 'Sand In My Boots',
 'Hole In The Bottle',
 'Starting Over',
 'Best Friend',
 'Just The Way',
 'Dynamite',
 'Without You',
 'Put Your Records On',
 'Beers And Sunshine',
 'Damage',
 'Kings & Queens',
 'Rockstar',
 'Down To One',
 'Neighbors',
 "Somebody's Problem",
 'Beat Box',
 "What's Your Country Song",
 'GNF (OKOKOK)',
 'The Good Ones',
 'Afterglow',
 'Golden',
 'Goosebumps',
 'Buss It',
 'Long Live',
 'Monsters',
 'G

In [73]:
# build a dataframe with the name of the artists and the songs
top_100 = pd.DataFrame({"artist_name": artists,
                       "song_name": song_name})

In [74]:
top_100.head(50)

Unnamed: 0,artist_name,song_name
0,Olivia Rodrigo,Drivers License
1,Cardi B,Up
2,The Weeknd,Blinding Lights
3,The Weeknd,Save Your Tears
4,24kGoldn Featuring iann dior,Mood
5,Ariana Grande,34+35
6,Chris Brown & Young Thug,Go Crazy
7,Dua Lipa Featuring DaBaby,Levitating
8,Ariana Grande,Positions
9,Pop Smoke,What You Know Bout Love


# ----------- Next Lab



# Lab | Web Scraping Multiple Pages

#### Business goal:

- Check the `case_study_gnod.md` file.
- Make sure you've understood the big picture of your project:

  - the goal of the company (`Gnod`),
  - their current product (`Gnoosic`),
  - their strategy, and
  - how your project fits into this context.

  Re-read the business case and the e-mail from the CTO, take a look at the flowchart and create an initial Trello board with the tasks you think you'll have to accomplish.

#### Instructions 

#### Prioritize the MVP

In the previous lab, you had to scrape data about "hot songs". It's critical to be on track with that part, as it was part of the request from the CTO.

If you couldn't finish the first lab, use this time to go back there.

#### Expand the project

If you're done, you can try to expand the project on your own. Here are a few suggestions:

- Find other lists of hot songs on the internet and scrape them too: having a bigger pool of songs will be awesome!
- Apply the same logic to other "groups" of songs: the best songs from a decade or from a country / culture / language / genre.
- Wikipedia maintains a large collection of lists of songs: https://en.wikipedia.org/wiki/Lists_of_songs

#### Practice web scraping

As you've seen, scraping the internet is a skill that can get you all sorts of information. Here are some little challenges that you can try to gain more experience in the field:

- Retrieve an arbitrary Wikipedia page of "Python" and create a list of links on that page: `url ='https://en.wikipedia.org/wiki/Python'`
- Find the number of titles that have changed in the United States Code since its last release point: `url = 'http://uscode.house.gov/download/download.shtml'`
- Create a Python list with the top ten FBI's Most Wanted names: `url = 'https://www.fbi.gov/wanted/topten'`
- Display the 20 latest earthquakes info (date, time, latitude, longitude and region name) by the EMSC as a pandas dataframe: `url = 'https://www.emsc-csem.org/Earthquake/'`
- List all language names and number of related articles in the order they appear in [wikipedia.org](wikipedia.org): `url = 'https://www.wikipedia.org/'`
- A list with the different kind of datasets available in [data.gov.uk](data.gov.uk): `url = 'https://data.gov.uk/'`
- Display the top 10 languages by number of native speakers stored in a pandas dataframe: `url = 'https://en.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakers'`




## First part - extending the list to 200

100  songs from http://www.songlyrics.com/news/top-songs/all-time/
css selector of main table : table.tracklist > tbody:nth-child(1)

In [39]:
url = "http://www.songlyrics.com/news/top-songs/all-time/"
response3 = requests.get(url)
response3.status_code

200

In [40]:
soup3 = BeautifulSoup(response3.content, "html.parser")

In [43]:
print(soup3.prettify)

<bound method Tag.prettify of <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html lang="en-US" xmlns="http://www.w3.org/1999/xhtml">
<head profile="http://gmpg.org/xfn/11"><link href="http://www.songlyrics.com/news/wp-content/cache/minify/bc125.css" media="all" rel="stylesheet" type="text/css"/>
<title>Top 100 Songs of All Time | SONGLYRICS</title>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<meta content="width=device-width, initial-scale=1" name="viewport"/> <!-- viewport -->
<link href="http://www.songlyrics.com/news/xmlrpc.php" rel="pingback"/>
<link href="http://slmain.songlyricscom.netdna-cdn.com/favicon.ico" rel="icon" type="image/ico"/>
<link href="http://slmain.songlyricscom.netdna-cdn.com/css/global.min.9.css" rel="stylesheet" type="text/css"/>
<script src="http://sljs.songlyricscom.netdna-cdn.com/jquery.min.new.js" type="text/javascript"></script>
<!--
topsongs1
-->
<scrip

In [50]:
field = soup3.select("table.tracklist > tbody > tr > td > a")
field

[<a href="http://www.songlyrics.com/bob-dylan-lyrics/" title="Bob Dylan Lyrics ">Bob Dylan</a>,
 <a href="http://www.songlyrics.com/bob-dylan/like-a-rolling-stone-lyrics/" title="Like a Rolling Stone Lyrics Bob Dylan">Like a Rolling Stone</a>,
 <a href="http://www.songlyrics.com/aretha-franklin-lyrics/" title="Aretha Franklin Lyrics ">Aretha Franklin</a>,
 <a href="http://www.songlyrics.com/aretha-franklin/respect-lyrics/" title="Respect Lyrics Aretha Franklin">Respect</a>,
 <a href="http://www.songlyrics.com/the-who-lyrics/" title="The Who Lyrics ">The Who</a>,
 <a href="http://www.songlyrics.com/the-who/my-generation-lyrics/" title="My Generation Lyrics The Who">My Generation</a>,
 <a href="http://www.songlyrics.com/chuck-berry-lyrics/" title="Chuck Berry Lyrics ">Chuck Berry</a>,
 <a href="http://www.songlyrics.com/chuck-berry/johnny-b-goode-lyrics/" title="Johnny B. Goode Lyrics Chuck Berry">Johnny B. Goode</a>,
 <a href="http://www.songlyrics.com/ray-charles-lyrics/" title="Ray Ch

In [51]:
names = []

for i in range(len(field)):
    names.append(soup3.select("table.tracklist > tbody > tr > td > a")[i].get_text())

In [52]:
names # now I have to divide it in authors and titles

['Bob Dylan',
 'Like a Rolling Stone',
 'Aretha Franklin',
 'Respect',
 'The Who',
 'My Generation',
 'Chuck Berry',
 'Johnny B. Goode',
 'Ray Charles',
 "What'd I Say",
 'The Beach Boys',
 'Good Vibrations',
 'John Lennon',
 'Imagine',
 'Nirvana',
 'Smells Like Teen Spirit',
 'Marvin Gaye',
 "What's Going On",
 'The Rolling Stones',
 "(I Can't Get No) Satisfaction",
 'The Beatles',
 'Hey Jude',
 'Bob Dylan',
 'Good Golly Miss Molly ',
 'The Jimi Hendrix Experience',
 'Purple Haze',
 'The Beatles',
 'Let It Be',
 'Bruce Springsteen',
 'Born To Run',
 'The Beatles',
 'Yesterday',
 'Sam Cooke',
 'A Change Is Gonna Come',
 'Elvis Presley',
 'Hound Dog',
 'The Beatles',
 'I Want to Hold Your Hand',
 'The Clash',
 'London Calling',
 'Chuck Berry',
 'Maybellene',
 'The Impressions',
 'People Get Ready  ',
 'Derek And The Dominos',
 'Layla  ',
 'Johnny Cash',
 'I Walk The Line  ',
 'The Beatles',
 'Help  ',
 'The Ronettes',
 'Be My Baby  ',
 'Beach Boys',
 'God Only Knows  ',
 'Otis Redding',

In [53]:
# Using list slicing. Separating odd and even index elements 
#test_list = [1, 2, 3, 4, 5, 6, 7, 8, 9]
#res = test_list[::2] #+ test_list[1::2] 
#res

[1, 3, 5, 7, 9]

In [58]:
artist_names = names[::2]
artist_names 

['Bob Dylan',
 'Aretha Franklin',
 'The Who',
 'Chuck Berry',
 'Ray Charles',
 'The Beach Boys',
 'John Lennon',
 'Nirvana',
 'Marvin Gaye',
 'The Rolling Stones',
 'The Beatles',
 'Bob Dylan',
 'The Jimi Hendrix Experience',
 'The Beatles',
 'Bruce Springsteen',
 'The Beatles',
 'Sam Cooke',
 'Elvis Presley',
 'The Beatles',
 'The Clash',
 'Chuck Berry',
 'The Impressions',
 'Derek And The Dominos',
 'Johnny Cash',
 'The Beatles',
 'The Ronettes',
 'Beach Boys',
 'Otis Redding',
 'The Beatles',
 'Led Zeppelin',
 'The Beatles',
 'U2',
 'Rolling Stones',
 'Buddy Holly',
 'Band',
 'Rolling Stones',
 'The Doors',
 'The Righteous Brothers',
 'Martha And The Vandellas',
 'Bob Marley',
 'Tina Turner',
 'Little Richard',
 'David Bowie',
 'Ray Charles',
 'Grandmaster Flash',
 'Jimi Hendrix',
 'Smokey Robinson',
 'Kinks',
 'The Eagles',
 'Elvis Presley',
 'Simon And Garfunkel',
 'Bob Dylan',
 'The Kingsmen',
 'Prince',
 'Michael Jackson',
 'Sex Pistols',
 'Jerry Lee Lewis',
 'Procol Harum',
 'P

In [59]:
song_names = names[1::2]
song_names

['Like a Rolling Stone',
 'Respect',
 'My Generation',
 'Johnny B. Goode',
 "What'd I Say",
 'Good Vibrations',
 'Imagine',
 'Smells Like Teen Spirit',
 "What's Going On",
 "(I Can't Get No) Satisfaction",
 'Hey Jude',
 'Good Golly Miss Molly ',
 'Purple Haze',
 'Let It Be',
 'Born To Run',
 'Yesterday',
 'A Change Is Gonna Come',
 'Hound Dog',
 'I Want to Hold Your Hand',
 'London Calling',
 'Maybellene',
 'People Get Ready  ',
 'Layla  ',
 'I Walk The Line  ',
 'Help  ',
 'Be My Baby  ',
 'God Only Knows  ',
 '(Sitting On) The Dock Of The Bay  ',
 'In My Life  ',
 'Stairway To Heaven  ',
 'A Day In The Life  ',
 'One',
 'Gimme Shelter',
 "That'll Be The Day",
 'The Weight',
 'Sympathy For The Devil',
 'Light My Fire',
 "You've Lost That Loving Feeling",
 'Dancing In The Streets',
 'No Woman No Cry',
 'River Deep, Mountain High',
 'Tutti Frutti',
 'Heroes',
 'Georgia On My Mind',
 'The Message',
 'All Along The Watchtower',
 'The Tracks Of My Tears',
 'Waterloo Sunset',
 'Hotel Califo

In [60]:
songlyrics = pd.DataFrame({"artist_name": artist_names,
                       "song_name": song_names})
songlyrics.head()

Unnamed: 0,artist_name,song_name
0,Bob Dylan,Like a Rolling Stone
1,Aretha Franklin,Respect
2,The Who,My Generation
3,Chuck Berry,Johnny B. Goode
4,Ray Charles,What'd I Say


In [61]:
songlyrics.shape

(100, 2)

In [75]:

top_200 = pd.concat([top_100, songlyrics], ignore_index=True)
#pd.concat([data_frame_1, data_frame_2, ],ignore_index=True)

In [81]:
top_200.tail() 

Unnamed: 0,artist_name,song_name
195,Little Richard,Good Golly Miss Molly
196,Al Green,Love And Happiness
197,Rolling Stones,You Can't Always Get What You Want
198,Jerry Lee Lewis,Great Balls Of Fire
199,Creedence Clearwater Revival,Fortunate Son


In [79]:
top_200.shape

(200, 2)

In [112]:
# saving the top_200 in a csv file

lab_file = top_200.to_csv("lab_top_200_random_songs.csv", index=False)

# saving the DataFrame as a CSV file 
#gfg_csv_data = df.to_csv('GfG.csv', index = False) 

## Second part - Practice web scraping



- Retrieve an arbitrary Wikipedia page of "Python" and create a list of links on that page: `url ='https://en.wikipedia.org/wiki/Python'`
- Find the number of titles that have changed in the United States Code since its last release point: `url = 'http://uscode.house.gov/download/download.shtml'`
- Create a Python list with the top ten FBI's Most Wanted names: `url = 'https://www.fbi.gov/wanted/topten'`
- Display the 20 latest earthquakes info (date, time, latitude, longitude and region name) by the EMSC as a pandas dataframe: `url = 'https://www.emsc-csem.org/Earthquake/'`
- List all language names and number of related articles in the order they appear in [wikipedia.org](wikipedia.org): `url = 'https://www.wikipedia.org/'`
- A list with the different kind of datasets available in [data.gov.uk](data.gov.uk): `url = 'https://data.gov.uk/'`
- Display the top 10 languages by number of native speakers stored in a pandas dataframe: `url = 'https://en.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakers'`



### 1. Links from wikipedia


In [89]:
# importing the library from wikipedia

url = "https://en.wikipedia.org/wiki/Python_(programming_language)"
response = requests.get(url)
response.status_code

200

In [90]:
soup = BeautifulSoup(response.content, "html.parser")

In [91]:
soup

<!DOCTYPE html>

<html class="client-nojs" dir="ltr" lang="en">
<head>
<meta charset="utf-8"/>
<title>Python (programming language) - Wikipedia</title>
<script>document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgRequestId":"YDOMt0Kye@8hiQnWSQzj@QAAAA0","wgCSPNonce":!1,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"Python_(programming_language)","wgTitle":"Python (programming language)","wgCurRevisionId":1007718549,"wgRevisionId":1007718549,"wgArticleId":23862,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Articles with short description","Short description is different from Wikidata","Use dmy dates from August 2020","Articles contai

In [100]:
links = soup.select(".mw-parser-output > ul:nth-child(76) > li > a")
links

[<a href="/wiki/Automation" title="Automation">Automation</a>,
 <a class="mw-redirect" href="/wiki/Data_analytics" title="Data analytics">Data analytics</a>,
 <a class="mw-redirect" href="/wiki/Databases" title="Databases">Databases</a>,
 <a href="/wiki/Documentation" title="Documentation">Documentation</a>,
 <a class="mw-redirect" href="/wiki/Graphical_user_interfaces" title="Graphical user interfaces">Graphical user interfaces</a>,
 <a class="mw-redirect" href="/wiki/Image_processing" title="Image processing">Image processing</a>,
 <a href="/wiki/Machine_learning" title="Machine learning">Machine learning</a>,
 <a class="mw-redirect" href="/wiki/Mobile_App" title="Mobile App">Mobile App</a>,
 <a href="/wiki/Multimedia" title="Multimedia">Multimedia</a>,
 <a class="mw-redirect" href="/wiki/Computer_networking" title="Computer networking">Computer Networking</a>,
 <a class="mw-redirect" href="/wiki/Scientific_computing" title="Scientific computing">Scientific computing</a>,
 <a class="

In [106]:
link_list = []

for i in range(len(links)):
    link_list.append(soup.select(".mw-parser-output > ul:nth-child(76) > li > a")[i].get("href"))
  

In [107]:
 #list of links
    print (link_list) 

['/wiki/Automation', '/wiki/Data_analytics', '/wiki/Databases', '/wiki/Documentation', '/wiki/Graphical_user_interfaces', '/wiki/Image_processing', '/wiki/Machine_learning', '/wiki/Mobile_App', '/wiki/Multimedia', '/wiki/Computer_networking', '/wiki/Scientific_computing', '/wiki/System_administration', '/wiki/Test_framework', '/wiki/Text_processing', '/wiki/Web_framework', '/wiki/Web_scraping']


In [110]:
# add the frist part of the url to tle list of links
wiki = "https://en.wikipedia.org"

wiki_links = [wiki + i for i in link_list]
wiki_links

['https://en.wikipedia.org/wiki/Automation',
 'https://en.wikipedia.org/wiki/Data_analytics',
 'https://en.wikipedia.org/wiki/Databases',
 'https://en.wikipedia.org/wiki/Documentation',
 'https://en.wikipedia.org/wiki/Graphical_user_interfaces',
 'https://en.wikipedia.org/wiki/Image_processing',
 'https://en.wikipedia.org/wiki/Machine_learning',
 'https://en.wikipedia.org/wiki/Mobile_App',
 'https://en.wikipedia.org/wiki/Multimedia',
 'https://en.wikipedia.org/wiki/Computer_networking',
 'https://en.wikipedia.org/wiki/Scientific_computing',
 'https://en.wikipedia.org/wiki/System_administration',
 'https://en.wikipedia.org/wiki/Test_framework',
 'https://en.wikipedia.org/wiki/Text_processing',
 'https://en.wikipedia.org/wiki/Web_framework',
 'https://en.wikipedia.org/wiki/Web_scraping']