2. The list of the various MSc programmes offered by the School of EECS is provided at the following URL: [http://eecs.qmul.ac.uk/postgraduate/programmes/](http://eecs.qmul.ac.uk/postgraduate/programmes/). Perform web scraping on the table present in the above URL and convert it into a pandas dataframe that would include one row for each programme of study as shown in the webpage. The dataframe should include the following 5 columns: name of postgraduate degree programme (e.g. Advanced Electronic and Electrical Engineering), programme code for part-time study (e.g. H60C), programme code for full-time study (e.g. H60A), URL for part-time study programme details, URL for full-time study programme details. Perform data cleaning to remove unecessary characters when needed. In the report include the code that was used to scrape, convert and clean the table and provide evidence that the table has been successfully scraped (e.g. by displaying the contents of the dataframe). [1 mark out of 5]

In [None]:
#Load Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

from urllib.request import urlopen
from bs4 import BeautifulSoup

In [2]:
#Open url
url='http://eecs.qmul.ac.uk/postgraduate/programmes/'
html=urlopen(url)
print(html)

<http.client.HTTPResponse object at 0x7fc9e99e6a90>


In [3]:
#Scraping web
soup=BeautifulSoup(html,'lxml')
print(soup)

<!DOCTYPE html>
<html data-js="no" lang="en">
<head>
<meta charset="utf-8"/>
<meta content="width=device-width, initial-scale=1.0" name="viewport"/>
<script>document.documentElement.setAttribute('data-js', 'yes');</script>
<title>Postgraduate programmes - School of Electronic Engineering and Computer Science</title>
<link href="http://eecs.qmul.ac.uk/media/site-assets/qmul-site/css/lib.min.1.1.2.css" media="" rel="stylesheet" type="text/css"/><!-- Lib CSS -->
<link href="http://eecs.qmul.ac.uk/media/site-assets/qmul-site/css/vendors~lib.min.css" media="" rel="stylesheet" type="text/css"/><!-- Vendors Lib CSS -->
<link href="http://eecs.qmul.ac.uk/media/site-assets/qmul-site/css/qm-custom.1.0.3.css" media="" rel="stylesheet" type="text/css"/><!-- QM CUstom CSS -->
<link href="" media="all" rel="stylesheet" type="text/css"/>
<script async="async" src="//script.crazyegg.com/pages/scripts/0011/5388.js" type="text/javascript"></script>
<!--DataTable_Events -->
<link href="http://eecs.qmul.a

In [4]:
#Get header
header_list=[]
col_labels=soup.find_all('th')
col_str=str(col_labels)
cleantext_header=BeautifulSoup(col_str,'lxml').get_text()
header_list.append(cleantext_header) # Add the clean table header to the list
print(header_list)


['[Postgraduate degree programmes, Part-time(2 year), Full-time(1 year)]']


In [5]:
df_header=pd.DataFrame(header_list)
df_header2=df_header[0].str.split(',',expand=True)

#remove unnecessary []from header
df_header2[0]=df_header2[0].str.strip('[')
df_header2[2]=df_header2[2].str.strip(']')

df_header2

Unnamed: 0,0,1,2
0,Postgraduate degree programmes,Part-time(2 year),Full-time(1 year)


In [6]:
#Get table cells
# Create an empty list where the table will be stored
table_list = []
rows=soup.find_all('tr')
# For every row in the table, find each cell element and add it to the list
for row in rows:
    row_td = row.find_all('td')
    row_cells = str(row_td)
    row_cleantext = BeautifulSoup(row_cells, "lxml").get_text()  # extract the text without HTML tags
    table_list.append(row_cleantext)  # Add the clean table row to the list
    
print(table_list)

['[]', '[Advanced Electronic and Electrical Engineering, H60C, H60A]', '[Artificial Intelligence, I4U2\xa0, I4U1\xa0]', '[Big Data Science, H6J6, H6J7]', '[Computer Games, \xa0, I4U4]', '[Computer Science, G4U2, G4U1]', '[Computer Science by Research, G4Q2, G4Q1]', '[Computing and Information Systems, G5U6, G5U5]', '[Data Science and Artificial Intelligence by Conversion, \xa0, I4U5\xa0]', '[Electronic Engineering by Research, H6T6, H6T5]', '[Internet of Things (Data), I1T2, I1T0]', '[Machine Learning for Visual Data Analytics, H6JZ, H6JE]', '[Media and Arts Technology by Research, \xa0, G4Q3]', '[Sound and Music Computing\xa0, H6T4, H6T8]', '[Telecommunication and Wireless Systems, H6JD, H6JA]', '[Digital and Technology Solutions (Apprenticeship), I4DA, \xa0]']


In [7]:
df_table=pd.DataFrame(table_list)
df_table2=df_table[0].str.split(',',expand=True)
df_table2
#remove unnecessary []
df_table2[0]=df_table2[0].str.strip('[')
df_table2[0]=df_table2[0].str.strip(']')
df_table2[2]=df_table2[2].str.strip(']')

#remove all rows with any missing values
df_table3=df_table2.dropna(axis=0,how='any')
df_table3


Unnamed: 0,0,1,2
1,Advanced Electronic and Electrical Engineering,H60C,H60A
2,Artificial Intelligence,I4U2,I4U1
3,Big Data Science,H6J6,H6J7
4,Computer Games,,I4U4
5,Computer Science,G4U2,G4U1
6,Computer Science by Research,G4Q2,G4Q1
7,Computing and Information Systems,G5U6,G5U5
8,Data Science and Artificial Intelligence by Co...,,I4U5
9,Electronic Engineering by Research,H6T6,H6T5
10,Internet of Things (Data),I1T2,I1T0


In [17]:
#Concat dataframe
df=pd.concat([df_header2,df_table3])
df2 = df.rename(columns=df.iloc[0]) # We assign the first row to be the dataframe header
df3 = df2.drop(df2.index[0]) # We drop the replicated header from the first row of the dataframe

df3

Unnamed: 0,Postgraduate degree programmes,Part-time(2 year),Full-time(1 year)
1,Advanced Electronic and Electrical Engineering,H60C,H60A
2,Artificial Intelligence,I4U2,I4U1
3,Big Data Science,H6J6,H6J7
4,Computer Games,,I4U4
5,Computer Science,G4U2,G4U1
6,Computer Science by Research,G4Q2,G4Q1
7,Computing and Information Systems,G5U6,G5U5
8,Data Science and Artificial Intelligence by Conversion,,I4U5
9,Electronic Engineering by Research,H6T6,H6T5
10,Internet of Things (Data),I1T2,I1T0


In [50]:
rows=soup.find_all('td')

table_url=[]

for row in rows:
    
    link=row.find('a')
    if link!=None:
        link1=str(link.get('href'))
        table_url.append(link1)
        
        
print(table_url) 


['https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/advanced-electronic-and-electrical-engineering-msc/', 'https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/advanced-electronic-and-electrical-engineering-msc/', 'https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/artificial-intelligence-msc/', 'https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/artificial-intelligence-msc/', 'https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/big-data-science-msc/', 'https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/big-data-science-msc/', 'https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/computer-games-msc/', 'https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/computer-science-msc/', 'https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/computer-science-msc/', 'https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/computer-science-by-research-msc/', 'https://www.qmul.ac.uk/p

In [51]:
table_url=list(dict.fromkeys(table_url))
table_url

['https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/advanced-electronic-and-electrical-engineering-msc/',
 'https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/artificial-intelligence-msc/',
 'https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/big-data-science-msc/',
 'https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/computer-games-msc/',
 'https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/computer-science-msc/',
 'https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/computer-science-by-research-msc/',
 'https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/computing-and-information-systems-msc/',
 'https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/data-science-and-artificial-intelligence-msc/',
 'https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/electronic-engineering-by-research-msc/',
 'https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/internet-of-things

In [48]:
tu1=table_url
tu1[14]=None
fullTime=tu1
fullTime_url=pd.DataFrame(data=fullTime,columns=['Full-Time'])
fullTime_url.index+=1
fullTime_url

Unnamed: 0,Full-Time
1,https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/advanced-electronic-and-electrical-engineering-msc/
2,https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/artificial-intelligence-msc/
3,https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/big-data-science-msc/
4,https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/computer-games-msc/
5,https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/computer-science-msc/
6,https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/computer-science-by-research-msc/
7,https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/computing-and-information-systems-msc/
8,https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/data-science-and-artificial-intelligence-msc/
9,https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/electronic-engineering-by-research-msc/
10,https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/internet-of-things-data-msc/


In [52]:
tu2=table_url
tu2[3]=None
tu2[7]=None
tu2[11]=None
partTime=tu2
partTime_url=pd.DataFrame(data=partTime,columns=['Part-Time'])
partTime_url.index+=1
partTime_url

Unnamed: 0,Part-Time
1,https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/advanced-electronic-and-electrical-engineering-msc/
2,https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/artificial-intelligence-msc/
3,https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/big-data-science-msc/
4,
5,https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/computer-science-msc/
6,https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/computer-science-by-research-msc/
7,https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/computing-and-information-systems-msc/
8,
9,https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/electronic-engineering-by-research-msc/
10,https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/internet-of-things-data-msc/


In [53]:
pd.set_option('display.max_colwidth', None)
pd.concat([df3,partTime_url,fullTime_url],axis=1)

Unnamed: 0,Postgraduate degree programmes,Part-time(2 year),Full-time(1 year),Part-Time,Full-Time
1,Advanced Electronic and Electrical Engineering,H60C,H60A,https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/advanced-electronic-and-electrical-engineering-msc/,https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/advanced-electronic-and-electrical-engineering-msc/
2,Artificial Intelligence,I4U2,I4U1,https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/artificial-intelligence-msc/,https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/artificial-intelligence-msc/
3,Big Data Science,H6J6,H6J7,https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/big-data-science-msc/,https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/big-data-science-msc/
4,Computer Games,,I4U4,,https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/computer-games-msc/
5,Computer Science,G4U2,G4U1,https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/computer-science-msc/,https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/computer-science-msc/
6,Computer Science by Research,G4Q2,G4Q1,https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/computer-science-by-research-msc/,https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/computer-science-by-research-msc/
7,Computing and Information Systems,G5U6,G5U5,https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/computing-and-information-systems-msc/,https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/computing-and-information-systems-msc/
8,Data Science and Artificial Intelligence by Conversion,,I4U5,,https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/data-science-and-artificial-intelligence-msc/
9,Electronic Engineering by Research,H6T6,H6T5,https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/electronic-engineering-by-research-msc/,https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/electronic-engineering-by-research-msc/
10,Internet of Things (Data),I1T2,I1T0,https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/internet-of-things-data-msc/,https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/internet-of-things-data-msc/
