# World University Rankings Data

This notebook will lead you through a simple code to extract the data of university ranking from https://www.timeshighereducation.com/
________________________________________________________________________________________________________________

## Aim
This code tries to extract information from a university ranking website. The concept of collecting data from websites is called web scraping and is used mainly to collect data from websites which do not offer an API to collect data natively. 
Several tutorials are available to explain the web scraping basics. This notebook is more of a case study. <br />
Let's start ......  

## 1. Prerequisites
This code assumes that you have python installed on your machine.  Basic knowledge of python is also assumed. Here is a full list of the prerequisites: 
* python 3.6 or above
* Jupyter notebook - or any environment that allows running python
* The following python libraries (BeautifulSoup, Selenium, urllib, objectpath and Pandas) 
* A web browser, I am using chrome 77 here, but you can use other browsers too
* Web driver for the browsers you are using, for chrome and chrome based browsers you can download it from here https://chromedriver.chromium.org/downloads

## 2. What data are you trying to get? 
This is the first question you should ask yourself, before even touching a single key. In our case, we started with the idea of collecting the list of universities with their ranking. To understand how to do so you will need to visit the website itself to understand a bit about it and its webpages. <br/>

The page we are tyring to scrap looked something like this 

![title](img/basic_page_01.PNG)
<br/> <br/> <br/> 


It is clear that the page contains some sort of a table that hosts the information we are trying to collect. However collecting the information will depend on the HTML code hidden behind what we can see in the browser window. In chrome to display the HTML code simply press F12. The page should look something like this  

![title](img/page_code_02.PNG)
<br/> <br/> <br/> 


Using the small inspection cursor you can point at elements of the page and find out which part of the HTML represent them. This important because we will only use the HTML to collect the data and not the displayed page in the browser. Once you have identified the part of HTML corresponds with the information we need then we will start scraping 

## 3. Let's write some python 

A standard method of using python to request internet pages is through the requests library, however in our particular case this approach will not work, because the website uses AJAX to modify the HTML of the page. This means that the HTML code which you will receive by using requests will only contain an empty template of the table and not the information we are trying to collect. To give the JS code a chance to run and populate the table with the information, we use selenium. Selenium uses browsers to request webpages and then collect the HTML after the page is fully loaded, which will allow us to collect the information we need. 

In [1]:
# import standard libraries
import json
import time

# import third party libraries
import objectpath
import pandas as pd

from bs4 import BeautifulSoup as soup
from urllib.request import urlopen 
from selenium import webdriver

Selenium requires a webdriver to access the web browser. It can be installed or used from an executable file directly. In this example we will use the executable file directly, please edit the following code by adding the location of the webdriver. It is recommonded to place it with the code itself.

In [2]:
webdriver_location = '***please insert here webdriver location *** /chromedriver.exe'

webdriver_location = 'C:/Users/razek/Desktop/git_pages/World-University-Rankings-Table/chromedriver.exe'
# initiate webdriver
driver = webdriver.Chrome(executable_path = webdriver_location)

Defining which web address we are going to use for scrapping is essential for the workflow. A quick inspection of the target web address reveals that changing the length parameter from 25 to -1 will result in collecting all the available universities instead of 25 per page. This should enable us to collect the information we need in one go rather than requesting several web pages. 
Also checking the tabs available on the web page (ranking, scores) reveals more data to be collected. Therefore we will use two web adresses to collect the data, as follows: 

In [3]:
url_stats = 'https://www.timeshighereducation.com/world-university-rankings/2020/world-ranking#!/page/0/length/-1/sort_by/rank/sort_order/asc/cols/stats'
url_scores = 'https://www.timeshighereducation.com/world-university-rankings/2020/world-ranking#!/page/0/length/-1/sort_by/rank/sort_order/asc/cols/scores'

In [4]:
# create two webdriver objects, one for each of the two adresses 
stats_browser = webdriver.Chrome()
scores_browser = webdriver.Chrome()

In the follwoing two cells, we will follow the same procedures for each of the two pages as follows: 
* Request the webpage using the webdriver 
* Collect the HTML code of the webpage 
* Use BeautifulSoup to "parse" the HTML. This will enable us to collect spesific pices of code after
* Use "findAll" method to collect the objects which we identified in the HTML code

In [5]:
# use the webdriver to request the ranking webpage
stats_browser.get(url_stats)

# collect the webpage HTML after its loading
stats_page_html = stats_browser.page_source

# parse the HTML using BeautifulSoup
stats_page_soup = soup(stats_page_html, 'html.parser')

# collect HTML objects 
rank_obj = stats_page_soup.findAll("td", {"class":"rank sorting_1 sorting_2"})
names_obj = stats_page_soup.findAll("td", {"class":"name namesearch"})
stats_number_students_obj = stats_page_soup.findAll("td", {"class":"stats stats_number_students"})
stats_student_staff_ratio_obj = stats_page_soup.findAll("td", {"class":"stats stats_student_staff_ratio"})
stats_pc_intl_students_obj = stats_page_soup.findAll("td", {"class":"stats stats_pc_intl_students"})
stats_female_male_ratio_obj = stats_page_soup.findAll("td", {"class":"stats stats_female_male_ratio"})

# close the browser
stats_browser.close() 

In [6]:
# use the webdriver to request the scores webpage
scores_browser.get(url_scores)

# collect the webpage HTML after its loading
scores_page_html = scores_browser.page_source
scores_page_soup = soup(scores_page_html, 'html.parser')

# parse the HTML using BeautifulSoup
overall_score_obj = scores_page_soup.findAll("td", {"class":"scores overall-score"})
teaching_score_obj = scores_page_soup.findAll("td", {"class":"scores teaching-score"})
research_score_obj = scores_page_soup.findAll("td", {"class":"scores research-score"})
citations_score_obj = scores_page_soup.findAll("td", {"class":"scores citations-score"})
industry_income_score_obj = scores_page_soup.findAll("td", {"class":"scores industry_income-score"})
international_outlook_score_obj = scores_page_soup.findAll("td", {"class":"scores international_outlook-score"})

# close the browser
scores_browser.close() 

Once the HTML objects are collected, then we can start extracting the data from them. The data will be presented eventually in pandas dataframe, which can be presented as a table. Pandas dataframe can be constructed using lists of equal length. In the follwoing two cells we will extract/collect the data using two differnet methods

#### Extracting data from HTML objects: 

In [7]:
rank, names, number_students, student_staff_ratio, intl_students, female_male_ratio, web_address =  [], [], [], [], [], [], []
overall_score, teaching_score, research_score, citations_score, industry_income_score, international_outlook_score = [], [], [], [], [], []
for i in range(len(names_obj)):
    web_address.append('https://www.timeshighereducation.com' + names_obj[i].a.get('href'))
    rank.append(rank_obj[i].text)
    
    names.append(names_obj[i].a.text)
    number_students.append(stats_number_students_obj[i].text)
    student_staff_ratio.append(stats_student_staff_ratio_obj[i].text)
    intl_students.append(stats_pc_intl_students_obj[i].text)
    female_male_ratio.append(stats_female_male_ratio_obj[i].text[:2])
    
    overall_score.append(overall_score_obj[i].text)
    teaching_score.append(teaching_score_obj[i].text)
    research_score.append(research_score_obj[i].text)
    citations_score.append(citations_score_obj[i].text)
    industry_income_score.append(industry_income_score_obj[i].text)
    international_outlook_score.append(international_outlook_score_obj[i].text)

In [8]:
full_address_list, streetAddress_list, addressLocality_list, addressRegion_list, postalCode_list, addressCountry_list  = [], [], [], [], [], []
for web in web_address:
    page = urlopen(web)
    page_html = soup(page, 'html.parser')
    location = page_html.findAll('script', {'type':"application/ld+json"})
    jt = json.loads(location[0].text)
    jsonnn_tree = objectpath.Tree(jt)
    streetAddress_list.append(list(jsonnn_tree.execute('$..streetAddress'))[0])
    addressLocality_list.append(list(jsonnn_tree.execute('$..addressLocality'))[0])
    addressRegion_list.append(list(jsonnn_tree.execute('$..addressRegion'))[0])
    postalCode_list.append(list(jsonnn_tree.execute('$..postalCode'))[0])
    addressCountry_list.append(list(jsonnn_tree.execute('$..addressCountry'))[0])
    full_address = page_html.findAll('div', {'class':"institution-info__contact-detail institution-info__contact-detail--address"})[0].text.strip()
    full_address_list.append(full_address)
    print ('{} out of {}'.format(len(full_address_list), len (web_address)), full_address)

1 out of 1396 University Offices, Wellington Square, Oxford, Oxfordshire, OX1 2JD, United Kingdom
2 out of 1396 1200 East California Boulevard, Pasadena, California, 91125, United States
3 out of 1396 The Old Schools, Trinity Lane, Cambridge, Cambridgeshire, CB2 1TN, United Kingdom
4 out of 1396 450 Serra Mall, Stanford, California, 94305–2004, United States
5 out of 1396 77 Massachusetts Avenue, Cambridge, Massachusetts, 02139-4307, United States
6 out of 1396 Princeton, New Jersey, 08544, United States
7 out of 1396 Massachusetts Hall, Cambridge, Massachusetts, 02138, United States
8 out of 1396 New Haven, Connecticut, 06520, United States
9 out of 1396 Edward H. Levi Hall, 5801 South Ellis Avenue, Chicago, Illinois, 60637, United States
10 out of 1396 South Kensington Road, Kensington, London, SW7 2AZ, United Kingdom
11 out of 1396 3451 Walnut Street, Philadelphia, Philadelphia, Pennsylvania, 19104, United States
12 out of 1396 Baltimore, Maryland, 21218, United States
13 out of 139

118 out of 1396 Western Bank, Sheffield, South Yorkshire, S10 2TN, United Kingdom
119 out of 1396 10900 Euclid Ave, Cleveland, Ohio, 44106, United States
120 out of 1396 North Terrace, Adelaide, South Australia, SA 5005, Australia
121 out of 1396 No. 1, Sec 4, Roosevelt Road, Taipei, 10617, Taiwan
122 out of 1396 University Road, Southampton, Hampshire, SO17 1BJ, United Kingdom
123 out of 1396 Sint-Pietersnieuwstraat, B - 9000 Ghent, Belgium
124 out of 1396 Boulder, Colorado, CO 80309, United States
125 out of 1396 Wilhelmsplatz, 37073 Göttingen, Germany
126 out of 1396 Tat Chee Avenue, Kowloon, Hong Kong
127 out of 1396 Postbus 616, Maastricht, 6200 MD, Netherlands
128 out of 1396 Comeniuslaan 4, 6525 HP Nijmegen, Netherlands
129 out of 1396 Heslington, York, Yorkshire, YO10 5DD, United Kingdom
130 out of 1396 85 boulevard Saint-Germain, cedex 06, Paris, Ile-de-France, 75006, France
131 out of 1396 P.O. Box 1072 Blindern, 0316 Oslo, Norway
132 out of 1396 35 Stirling Highway, Crawley,

238 out of 1396 Warandelaan 2, AB Tilburg, 5037, Netherlands
239 out of 1396 PO Box 217, 7500 AE Enschede, Netherlands
240 out of 1396 50, UNIST-gil, Ulju-gun, Ulsan, 44919, South Korea
241 out of 1396 Belfield, Dublin 4, Ireland
242 out of 1396 201 Presidents Circle, Salt Lake City, Utah, 84112, United States
243 out of 1396 Blacksburg, Virginia, 24061, United States
244 out of 1396 Via Olgettina 58, Milano, 20123, Italy
245 out of 1396 Pleinlaan 2, 1050 Brussel, Belgium
246 out of 1396 1834 Wake Forest Road, Winston-Salem, North Carolina, 27106, United States
247 out of 1396 University of Waterloo, 200 University Avenue West, Waterloo, Ontario, N2L 3G1, Canada
248 out of 1396 1151 Richmond Street, London, Ontario, N6A 3K7, Canada
249 out of 1396 P.O. Box 8795, Williamsburg, Virginia, 23187-8795, United States
250 out of 1396 Northfields Ave, Wollongong, New South Wales, NSW 2522, Australia
251 out of 1396 P.O.Box 50927, Riyadh 11533, Saudi Arabia
252 out of 1396 Building WA - level 2

355 out of 1396 Universitätsstraße 30, 95440 Bayreuth, Germany
356 out of 1396 Malet Street, Bloomsbury, London, WC1E 7HX, United Kingdom
357 out of 1396 Bibliothekstraße 1, Bremen, 28359, Germany
358 out of 1396 Kingston Lane, Greater London, Uxbridge, Middlesex, UB8 3PH, United Kingdom
359 out of 1396 5200 North Lake Rd., Merced, California, 95343, United States
360 out of 1396 Palma de Cima, 1649-023 Lisbon, Portugal
361 out of 1396 950 Main Street, Worcester, Massachusetts, 01610, United States
362 out of 1396 2131 Hillside Road, Storrs, Connecticut, 06269, United States
363 out of 1396 Rethymnon, Crete, 74100, Greece
364 out of 1396 2199 South University Blvd, Denver, Colorado, 80208, United States
365 out of 1396 P.zza S.Marco 4, Firenze, Florence, 50121, Italy
366 out of 1396 New Cross, London, SE14 6NW, United Kingdom
367 out of 1396 222, Wangsimni-ro, Seongdong-gu, Seoul, 133-791, South Korea
368 out of 1396 Martelarenlaan 42, 3500 Hasselt, Belgium
369 out of 1396 105 Memorial

475 out of 1396 Orta Mahalle, Tuzla, Istanbul, 34956, Turkey
476 out of 1396 Dufourstrasse 50, St.Gallen, 9000, Switzerland
477 out of 1396 105 Administration Place, Saskatoon, Saskatchewan, S7N 5A2, Canada
478 out of 1396 27 Rue Saint Guillaume, 75337 Paris, France
479 out of 1396 209 Neungdong-ro, Gwangjin-gu, Seoul, South Korea
480 out of 1396 Ülloi út 26, H - 1085 Budapest, Hungary
481 out of 1396 via Banchi di Sotto 55, Siena, 53100, Italy
482 out of 1396 Thornhaugh Street, Russell Square, London, WC1H 0XG, United Kingdom
483 out of 1396 Columbia, South Carolina, 29208, United States
484 out of 1396 4 rue Blaise Pascal CS 90032, F-67081 Strasbourg cedex, France
485 out of 1396 16 Richmond Street, Glasgow, Strathclyde, G1 1XQ, United Kingdom
486 out of 1396 Technion City, Haifa, 32000, Israel
487 out of 1396 2-11-1 Kaga, Itabashi-ku, Tokyo, 173-8605, Japan
488 out of 1396 One UTSA Circle, San Antonio, Texas, TX 78249, United States
489 out of 1396 1- 5 -45 Yushima, Bunkyoku, Tokyo,

596 out of 1396 Paisley Campus, Paisley, PA1 2BE, United Kingdom
597 out of 1396 P.O. Box 413, Milwaukee, Wisconsin, 53201, United States
598 out of 1396 Gaußstraße 20, Wuppertal, 42119, Germany
599 out of 1396 422 Siming South Road, Xiamen, Fujian, 361005, China
600 out of 1396 No.28 Xianning West Road, Xi'an, Shaanxi, 710049, China
601 out of 1396 22-2 Seto, Yokohama, Kanagawa, 236-0027, Japan
602 out of 1396 Tuomiokirkontori 3, FI-20500 Turku, Finland
603 out of 1396 Tsuruga, Ikki-machi, Aizu-Wakamatsu City, Fukushima, 965-8580, Japan
604 out of 1396 Pza. San Diego, s/n, Alcalá de Henares, Madrid, 28801, Spain
605 out of 1396 Rettorato, via Duomo, 6, Vercelli, 13100, Italy
606 out of 1396 Amritanagar, Coimbatore, Tamilnadu, 641112, India
607 out of 1396 Carrera 1 N° 18A 12 Bogotá, Bogata, Distrito Capital, Colombia
608 out of 1396 54124 Thessaloniki, Greece
609 out of 1396 Fayetteville, Arkansas, 72701, United States
610 out of 1396 Auburn, Alabama, AL 36849, United States
611 out o

717 out of 1396 Avda. de la Universidad s/n, Elche, 03202, Spain
718 out of 1396 Largo do Paço, Braga, 4704-553, Portugal
719 out of 1396 100 Culbertson Hall, P.O. Box 172000, Bozeman, Montana, 59717-2000, United States
720 out of 1396 Av. Eugenio Garza Sada 2501, Sur Col. Tecnológico, Monterrey, Nuevo León, 64849, Mexico
721 out of 1396 No.219 Ningliu Road, Nanjing, Jiangsu, 210044, China
722 out of 1396 No. 140 Hanzhong Road, Nanjing, Jiangsu, 210029, China
723 out of 1396 Gulou, Nanjing, Jiangsu, China
724 out of 1396 1, quai de Tourville, BP 13522, Nantes, 44035, France
725 out of 1396 Ciudad Universitaria, Mexico City, 04510, Mexico
726 out of 1396 No.1, University Road, Tainan City, 701, Taiwan
727 out of 1396 20 avenue Albert Einstein, 69621 Villeurbanne, France
728 out of 1396 Leninskiy prospekt 4, Moscow, 119049, Russian Federation
729 out of 1396 162 HePing East Road, Section 1, Taipei, Taiwan
730 out of 1396 Heroon Polytechniou 9, 15780 Zografou, Greece
731 out of 1396 4505 

834 out of 1396 Western Avenue, Cardiff, CF5 2YB, United Kingdom
835 out of 1396 C/ Madrid, 126 - Getafe, Madrid, 28903, Spain
836 out of 1396 Calle Altagracia, 50 Ciudad Real, Ciudad Real, 13071, Spain
837 out of 1396 Adelphi Building, Preston, Lancashire, PR1 2HE, United Kingdom
838 out of 1396 259 Wen-Hwa 1st Road, Kwei-Shan Tao-Yuan, 33302, Taiwan
839 out of 1396 1-33, Yayoi-cho Inage-ku, Chiba-shi, Chiba, 263-8522, Japan
840 out of 1396 Av Libertador, Bernardo OHiggins 1058, Santiago, Chile
841 out of 1396 29 Xueyuan Road, Haidian, Beijing, 100083, China
842 out of 1396 No. 388 Lumo Road, Wuhan, Hubei, 430074, China
843 out of 1396 No1, Daxue Road, Xuzhou, Jiangsu, 221116, China
844 out of 1396 24 Tongjia Alley, Gulou, Nanjing, Jiangsu, 210009, China
845 out of 1396 567 Baekje-daero, Jeonju-si, Jeollabuk-do, 54896, South Korea
846 out of 1396 174 Shazhengjie, Shapingba, Chongqing, 400044, China
847 out of 1396 77 Yongbong-ro, Buk-gu, Gwangju, 61186, South Korea
848 out of 1396 254

954 out of 1396 Rue de Damas, BP 17-5208 - Mar Mikhaël, Beyrouth - 1104 2020, Lebanon
955 out of 1396 The Crescent, Salford, Greater Manchester, M5 4WT, United Kingdom
956 out of 1396 Rua Quirino de Andrade 215, Sao Paulo, 01049-010, Brazil
957 out of 1396 S1W17, Chuoh-ku Sapporo, Hokkaido, 060-8556, Japan
958 out of 1396 30 Xueyuan Road Haidian District, Beijing 100083, Beijing, 100083, China
959 out of 1396 163 Seoulsiripdaero 90 Jeonnong-dong, Dongdaemun-gu, Seoul, 130-743, South Korea
960 out of 1396 C/ S. Fernando, 4, Sevilla, 41004, Spain
961 out of 1396 Daneshjo Blv Shahid Shahriari Sq, Yemen St, Shahid Chamran, Tehran 1983963113, Iran
962 out of 1396 99 Shangda Road BaoShaun District, Shanghai, Shanghai, China
963 out of 1396 777 National Road, Shanghai, 200433, China
964 out of 1396 1550 Haigang Avenue, Shanghai, 201306, China
965 out of 1396 243 Daxue Road, Shantou, Guangdong, 515063, China
966 out of 1396 Howard Street, Sheffield, South Yorkshire, S1 1WB, United Kingdom
967 

1067 out of 1396 1200 Matsumoto-cho, Kasugai, Aichi, 487-8501, Japan
1068 out of 1396 99 Daehak-ro, Yuseong-gu, Daejeon, 34134, South Korea
1069 out of 1396 200 Chung Pei Rd, Chung Li District, Taoyuan City, Taiwan, 32023, Taiwan
1070 out of 1396 742-1 Higashinakano, Hachioji-shi, Tokyo, 192-0393, Japan
1071 out of 1396 P.O Cochin, Kerala, 682022, India
1072 out of 1396 94, University of Colombo, Cumaratunga Munidasa Mw, Colombo 03, Sri Lanka
1073 out of 1396 Šafárikovo námestie 6, P.O.BOX 440, Bratislava 1, 814 99, Slovakia
1074 out of 1396 Víctor Lamas 1290, Casilla 160-C, Concepción, Chile
1075 out of 1396 325 Ain El Bey Way, Constantine, 25017, Algeria
1076 out of 1396 Fovam ter 8, Budapest, Hungary
1077 out of 1396 Balcali, Saricam, Adana, 01330, Turkey
1078 out of 1396 Kamýcká 129, 165 21 Praha 6, Suchdol, Czech Republic
1079 out of 1396 P.O. Box 35091, Dar es Salaam, Tanzania
1080 out of 1396 2401 Taft Avenue, Manila, 0922, Philippines
1081 out of 1396 Main Bawana Road, Shahbad 

1178 out of 1396 2, Satpayev str., Astana, 010008, Kazakhstan
1179 out of 1396 Gagarin Avenue, Nizhny Novgorod, 603950, Russian Federation
1180 out of 1396 ul. Narutowicza 68, Lodz, 90-136, Poland
1181 out of 1396 90-924 Łódź, Żeromskiego 116, Poland
1182 out of 1396 103 Borough Road, London, SE1 0AA, United Kingdom
1183 out of 1396 Rodovia Celso Garcia Cid, Pr 445 Km 380, Londrina, 86057-970, Brazil
1184 out of 1396 Pratapgunj, Vadodara, Gujarat, India
1185 out of 1396 Khamriang Sub-District, Kantarawichai District, Maha Sarakham, 44150, Thailand
1186 out of 1396 Kota Samarahan, Sarawak, 94300, Malaysia
1187 out of 1396 Karnataka, Manipal, 576104, India
1188 out of 1396 University Campus of Manouba, Manouba, 2010, Tunisia
1189 out of 1396 Göztepe Kampüsü, Kadiköy, Istanbul, 34722, Turkey
1190 out of 1396 Av Abdelkrim Khattabi, 511 - 40000 Marrakech, Morocco
1191 out of 1396 1-1 Kanda-Surugadai, Chiyoda-ku, Tokyo, 101-8301, Japan
1192 out of 1396 1-501 Shiogamaguchi, Tempaku-ku, Nagoya

1291 out of 1396 P.O. Box 3619995161, Shahrood, Iran
1292 out of 1396 3-7-5 Toyosu, Koto-ku, Tokyo, 135-8548, Japan
1293 out of 1396 Seta Tsukinowa-cho, Otsu, Shiga, 520-2192, Japan
1294 out of 1396 1060 Nishikawatsu, Matsue, Shimane, 690-8504, Japan
1295 out of 1396 Shinshu Daigaku 3-1-1 Asahi, Matsumoto-shi, Nagano, 390-8621, Japan
1296 out of 1396 Fars Province, Shiraz, Zand, Iran
1297 out of 1396 836, Ohya, Suruga-ku, Shizuoka-Shi, Shizuoka, 422-8529, Japan
1298 out of 1396 52-1 Yada, Suruga Ward, Shizuoka, 422-8526, Japan
1299 out of 1396 1-5-8 Hatanodai, Shinagawa-ku, Tokyo, 142-8555, Japan
1300 out of 1396 79 Svobodny pr., Krasnoyarsk, 660041, Russian Federation
1301 out of 1396 Khandagiri Square, Near SUM Hospital, Bhubaneswar, Odisha, 751030, India
1302 out of 1396 ul. Bankowa 12, 40-007 Katowice, Poland
1303 out of 1396 ul. Akademicka 2A, Gliwice, Upper Silesia, 44-100, Poland
1304 out of 1396 22 Borommarachachonani Rd, Talingchan, Bangkok, 10170, Thailand
1305 out of 1396 Sa

In [9]:
df = pd.DataFrame({
    'rank' : rank,
    'name' : names,
    'number_students' : number_students,
    'student_staff_ratio' : student_staff_ratio,
    'intl_students' : intl_students,
    'female_male_ratio' : female_male_ratio,
    'overall_score' : overall_score,
    'teaching_score' : teaching_score,
    'research_score' : research_score,
    'citations_score' : citations_score,
    'industry_income_score' : industry_income_score,
    'international_outlook_score' : international_outlook_score,
    'address' : full_address_list, 
    'street_address' : streetAddress_list,
    'locality_address' : addressLocality_list,
    'region_address' : addressRegion_list,
    'postcode_address' : postalCode_list,
    'country_address' : addressCountry_list
})
df

Unnamed: 0,rank,name,number_students,student_staff_ratio,intl_students,female_male_ratio,overall_score,teaching_score,research_score,citations_score,industry_income_score,international_outlook_score,address,street_address,locality_address,region_address,postcode_address,country_address
0,1,University of Oxford,20664,11.2,41%,46,95.4,90.5,99.6,98.4,65.5,96.4,"University Offices, Wellington Square, Oxford,...",University Offices Wellington Square,Oxford,Oxfordshire,OX1 2JD,United Kingdom
1,2,California Institute of Technology,2240,6.4,30%,34,94.5,92.1,97.2,97.9,88.0,82.5,"1200 East California Boulevard, Pasadena, Cali...",1200 East California Boulevard,Pasadena,CA,91125,United States
2,3,University of Cambridge,18978,10.9,37%,47,94.4,91.4,98.7,95.8,59.3,95.0,"The Old Schools, Trinity Lane, Cambridge, Camb...",The Old Schools Trinity Lane,Cambridge,Cambridgeshire,CB2 1TN,United Kingdom
3,4,Stanford University,16135,7.3,23%,43,94.3,92.8,96.4,99.9,66.2,79.5,"450 Serra Mall, Stanford, California, 94305–20...",450 Serra Mall,Stanford,CA,94305–2004,United States
4,5,Massachusetts Institute of Technology,11247,8.6,34%,39,93.6,90.5,92.4,99.5,86.9,89.0,"77 Massachusetts Avenue, Cambridge, Massachuse...",77 Massachusetts Avenue,Cambridge,MA,02139-4307,United States
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1391,1001+,Yuan Ze University,8356,19.5,8%,42,10.7–22.1,17.3,13.9,15.5,47.0,28.3,"No.135, Yuandong Rd, Taoyuan City, 320, Taiwan","No.135, Yuandong Rd",Taoyuan City,,320,Taiwan
1392,1001+,Zagazig University,156419,24.0,1%,53,10.7–22.1,13.6,7.7,29.6,34.4,38.8,"Zagazig 44519, Egypt",Zagazig 44519,,,,Egypt
1393,1001+,University of Zagreb,68216,18.9,3%,59,10.7–22.1,17.8,12.9,25.3,37.4,33.0,"Trg maršala Tita 14 HR-10000, Zagreb, Croatia",Trg maršala Tita 14 HR-10000 Zagreb,,,,Croatia
1394,1001+,University of Zanjan,9980,25.1,0%,54,10.7–22.1,17.0,12.3,28.5,43.8,18.7,"University Blvd, Zanjan, 45371-38791, Iran",University Blvd,Zanjan,45371-38791,,Iran


In [10]:
df['intl_students'] = df['intl_students'].str.replace(pat='%', repl='')
df['rank'] = df['rank'].str.replace(pat='\–\d*|\+', repl='', regex=True)
df['overall_score'] = df['overall_score'].str.replace(pat='.*\–', repl='', regex=True)
df['number_students'] = df['number_students'].str.replace(pat=',', repl='', regex=True)
df = df.replace('n/a*', pd.np.nan, regex=True)
df

Unnamed: 0,rank,name,number_students,student_staff_ratio,intl_students,female_male_ratio,overall_score,teaching_score,research_score,citations_score,industry_income_score,international_outlook_score,address,street_address,locality_address,region_address,postcode_address,country_address
0,1,University of Oxford,20664,11.2,41,46,95.4,90.5,99.6,98.4,65.5,96.4,"University Offices, Wellington Square, Oxford,...",University Offices Wellington Square,Oxford,Oxfordshire,OX1 2JD,United Kingdom
1,2,California Institute of Technology,2240,6.4,30,34,94.5,92.1,97.2,97.9,88.0,82.5,"1200 East California Boulevard, Pasadena, Cali...",1200 East California Boulevard,Pasadena,CA,91125,United States
2,3,University of Cambridge,18978,10.9,37,47,94.4,91.4,98.7,95.8,59.3,95.0,"The Old Schools, Trinity Lane, Cambridge, Camb...",The Old Schools Trinity Lane,Cambridge,Cambridgeshire,CB2 1TN,United Kingdom
3,4,Stanford University,16135,7.3,23,43,94.3,92.8,96.4,99.9,66.2,79.5,"450 Serra Mall, Stanford, California, 94305–20...",450 Serra Mall,Stanford,CA,94305–2004,United States
4,5,Massachusetts Institute of Technology,11247,8.6,34,39,93.6,90.5,92.4,99.5,86.9,89.0,"77 Massachusetts Avenue, Cambridge, Massachuse...",77 Massachusetts Avenue,Cambridge,MA,02139-4307,United States
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1391,1001,Yuan Ze University,8356,19.5,8,42,22.1,17.3,13.9,15.5,47.0,28.3,"No.135, Yuandong Rd, Taoyuan City, 320, Taiwan","No.135, Yuandong Rd",Taoyuan City,,320,Taiwan
1392,1001,Zagazig University,156419,24.0,1,53,22.1,13.6,7.7,29.6,34.4,38.8,"Zagazig 44519, Egypt",Zagazig 44519,,,,Egypt
1393,1001,University of Zagreb,68216,18.9,3,59,22.1,17.8,12.9,25.3,37.4,33.0,"Trg maršala Tita 14 HR-10000, Zagreb, Croatia",Trg maršala Tita 14 HR-10000 Zagreb,,,,Croatia
1394,1001,University of Zanjan,9980,25.1,0,54,22.1,17.0,12.3,28.5,43.8,18.7,"University Blvd, Zanjan, 45371-38791, Iran",University Blvd,Zanjan,45371-38791,,Iran


In [11]:
df.to_csv('uni_02.csv', encoding='utf-16', index=False)