# Using "The World University Ranking" Dataset to start Data Science Learning Adventure

# Important Subject before we start :
# Python List Comprehension

![image.png](attachment:image.png)

Lists are one of the four built-in data structures in Python. Other data structures that you might know are tuples, dictionaries, and sets. A list in Python is different from, for example, int or bool, in the sense that it's a compound data type: you can group values in lists. These values don't need to be of the same type: they can be a combination of boolean, String, integer, float values.

List literals are a collection of data surrounded by brackets, and the elements are separated by a comma. The list is capable of holding various data types inside it, unlike arrays.

![image.png](attachment:image.png)

ist comprehension in Python is also surrounded by brackets, but instead of the list of data inside it, you enter an expression followed by for loop and if-else clauses.

In [None]:
S = [x**2 for x in range(10)]
V = [2**i for i in range(13)]
M = [x for x in S if x % 2 == 0]
print(S)
print(V)
print(M)

* The list S is built up with the square brackets. In those brackets, you see that there is an element x, which is raised to the power of 10. Now, you just need to know for how many values (and which values!) you need to raise to the power of 2. This is determined in range(10). 
* The list V contains the base value 2, which is raised to a certain power. Just like before, now you need to know which power or i is exactly going to be used to do this. You see that i, in this case, is part of range(13), which means that you start from 0 and go until 12. 
* Lastly, the list M contains elements that are part of S if -and only if- they can be divided by 2 without having any leftovers. The modulo needs to be 0. In other words, the list M is built up with the equal values that are stored in list S.

List comprehension is a complete substitute to for loops, lambda function as well as the functions map(), filter() and reduce(). As you might already know, you use for loops to repeat a block of code a fixed number of times. List comprehensions are good alternatives to for loops, as they are more compact. 

In [None]:
numbers = range(30)
new_list = []
for n in numbers:
    if n%2==0: 
        new_list.append(n**2) #raise that element to the power of 2 and append to the list
print(new_list)

new__list = [n**2 for n in numbers if n%2==0] #expression followed by for loop followed by the conditional clause
print(new__list)

kilometer = [39.2, 36.5, 37.3, 37.8]

feet = map(lambda x: float(3280.8399)*x, kilometer)
print(list(feet))

feet_ = [float(3280.8399)*x for x in kilometer]
print(feet_)

# Python *args and **kwargs

In programming, we define a function to make a reusable code that performs similar operation. To perform that operation, we call a function with the specific value, this value is called a function argument in Python.

In [None]:
def adder(x,y,z):
    print("sum:",x+y+z)

adder(10,12,13)

In above program we have adder() function with three arguments x, y and z. When we pass three values while calling adder() function, we get sum of the 3 numbers as the output.If we passed 5 arguments to the adder() function instead of 3 arguments due to which we got TypeError.

* *args (Non Keyword Arguments)
* **kwargs (Keyword Arguments)

* We use *args and **kwargs as an argument when we are unsure about the number of arguments to pass in the functions.

Python has *args which allow us to pass the variable number of non keyword arguments to function.
In the function, we should use an asterisk * before the parameter name to pass variable length arguments.The arguments are passed as a tuple and these passed arguments make tuple inside the function with same name as the parameter excluding asterisk *

In [None]:
def adder(*num):
    sum = 0
    for n in num:
        sum = sum + n
    print("Sum:",sum)
adder(3,5)
adder(4,5,6,7)
adder(1,2,3,5,6)

* In the above program, we used *num as a parameter which allows us to pass variable length argument list to the adder() function. Inside the function, we have a loop which adds the passed argument and prints the result. We passed 3 different tuples with variable length as an argument to the function.

* Python passes variable length non keyword argument to function using *args but we cannot use this to pass keyword argument. For this problem Python has got a solution called **kwargs, it allows us to pass the variable length of keyword arguments to the function.

* In the function, we use the double asterisk ** before the parameter name to denote this type of argument. The arguments are passed as a dictionary and these arguments make a dictionary inside function with name same as the parameter excluding double asterisk **.

In [None]:
def intro(**data):
    print("\nData type of argument:",type(data))
    for key, value in data.items():
        print("{} is {}".format(key,value))
        
intro(Firstname="Sita", Lastname="Sharma", Age=22, Phone=1234567890)
intro(Firstname="John", Lastname="Wood", Email="johnwood@nomail.com", Country="Wakanda", Age=25, Phone=9876543210)

In [None]:
import numpy as np
import pandas as pd 
import os
import matplotlib.pyplot as plt
import seaborn as sns

from subprocess import check_output
print(check_output(["ls", "../input"]).decode("utf8"))
#UTF-8 is a variable-width character encoding standard 
#that uses between one and four eight-bit bytes to represent all valid Unicode code points.

for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))
# Any results you write to the current directory are saved as output.       

# Information about Dataset

![image.png](attachment:image.png)

Ranking universities is a difficult, political, and controversial practice. There are hundreds of different national and international university ranking systems, many of which disagree with each other. This dataset contains three global university rankings from very different places.

The Times Higher Education World University Ranking is widely regarded as one of the most influential and widely observed university measures. Founded in the United Kingdom in 2010, it has been criticized for its commercialization and for undermining non-English-instructing institutions.

The Academic Ranking of World Universities, also known as the Shanghai Ranking, is an equally influential ranking. It was founded in China in 2003 and has been criticized for focusing on raw research power and for undermining humanities and quality of instruction.

To further extend your analyses, we've also included two sets of supplementary data.

The first of these is a set of data on educational attainment around the world. It comes from The World Data Bank and comprises information from the UNESCO Institute for Statistics and the Barro-Lee Dataset. How does national educational attainment relate to the quality of each nation's universities?

The second supplementary dataset contains information about public and private direct expenditure on education across nations. This data comes from the National Center for Education Statistics. It represents expenditure as a percentage of gross domestic product. Does spending more on education lead to better international university rankings?



In [None]:
data = pd.read_csv('/kaggle/input/world-university-rankings/school_and_country_table.csv')
data = pd.read_csv('/kaggle/input/world-university-rankings/timesData.csv')
data = pd.read_csv('/kaggle/input/world-university-rankings/cwurData.csv')
#/kaggle/input/world-university-rankings/education_expenditure_supplementary_data.csv
#/kaggle/input/world-university-rankings/educational_attainment_supplementary_data.csv
#/kaggle/input/world-university-rankings/shanghaiData.csv
data.head(11)

In [None]:
data.info()

In [None]:
data.corr()
# Corrrelation Map 
#The statistical relationship between two variables is referred to as their correlation. 
#A correlation could be positive, meaning both variables move in the same direction, or negative, meaning that 
#when one variable's value increases, the other variables' values decrease.

In [None]:
data.world_rank.plot(kind = 'line', color = 'g',label = 'world_rank',linewidth=1,alpha = 0.5,grid = True,linestyle = ':')
data.national_rank.plot(color = 'b',label = 'national_rank',linewidth=1, alpha = 0.5,grid = True,linestyle = '-.')

data.plot(kind='scatter', x='quality_of_education', y='quality_of_faculty',alpha = 0.5,color = 'red')
plt.xlabel('quality_of_education')             
plt.ylabel('quality_of_faculty')
plt.title('quality_of_education - quality_of_faculty Scatter Plot')            

In [None]:
# Histogram
# bins = number of bar in figure
data.publications.plot(kind = 'hist',bins = 50,figsize = (5,5))
plt.show()