# INTRODUCTION :

* ***Objective :***
  1. Analysing the data set and trying to find World University Rankings and extracting information     through plots .
  2. Of all the Universities in the World which is the best ? 

* ***Description :***

    Ranking universities is a difficult, political, and controversial practice. There are hundreds of     different national and international university ranking systems, many of which disagree with each     other. This dataset contains three global university rankings from very different places.

   University Ranking Data

    The **Times Higher Education World University Ranking** is widely regarded as one of the most         influential and widely observed university measures. Founded in the United Kingdom in 2010, it       has been criticized for its commercialization and for undermining non-English-instructing             institutions.

    The **Academic Ranking of World Universities**, also known as the Shanghai Ranking, is an equally     influential ranking. It was founded in China in 2003 and has been criticized for focusing on raw     research power and for undermining humanities and quality of instruction.

    The **Center for World University Rankings**, is a less well know listing that comes from Saudi       Arabia, it was founded in 2012.
   
Here we will be utilizing the Center for World University Ranking(CWUR) .

* ***HYPOTHESIS :***

   A good University is one that provides good teaching/quality of education , does good quality of research , recieves citations and industrial income !
   
   Lesser the individual value of quality of education , citations,publication etc, better is the University !


* ***ACKNOWLEGMENTS :***

    Myself <a href='https://www.linkedin.com/in/kaustubh-mishra-54556917b'> Kaustubh Mishra </a> and     my team member <a href='https://www.linkedin.com/in/yuvraj-kumar-68164117a/'> Yuvraj Kumar</a>       created this notebook as part of the course work under ***“Pandas, bamboolib & Orange                 workshop”*** at Suven, under mentor-ship of <a href='https://www.linkedin.com/in/rocky-jagtiani-3b390649/'> Rocky Jagtiani </a> .
    
    Learned from : https://datascience.suvenconsultants.com
    
    Mentored by : <a href='https://www.linkedin.com/in/rocky-jagtiani-3b390649/'> Rocky Jagtiani </a>
    
* ***DATASET :***

   The Center for World University Rankings provides information for top universities in the World. *They use an overal rank per university and individual values for the National Rank, Quality of Education, Alumni Employment, Quality of Faculty, Publications, Influence, Citations, Broad Impact, Patents, and an Total Score. All these values are for a certain year. For example, Harvard University has the first position for 2012, with 100.00 overall score and ranks 1 for most of the dimmensions.*
   
   The above data set has : 
   Features : 1 Categorical , 12 Numerical .
   Meta Attributes : 1 Text.
   
   There are in all 2200 observations available(no of rows) from which we extract information through exploratory data analysis and visualisation .
       
   
   We have also used a Supplementary data available with the above dataset : ***education_expenditure_supplementary_data*** that provides information about the Public and Private sector expenditure on education by different countries for the period : 1995-2011 .
   Note : The information about Private Expenditure and Total Expenditure is not available for the period : 1995-2010 .

   There are different values for different years, here we will analyse this dataset for the year : 2012 as a sample case and try to deduce various information fot the year 2012.
   
   Note : Here we have used Orange data mining toolkit , some of the plots and figure are obtained through Orange toolkit. Thus , displaying how one can ,simply and easily, obtain useful information from Orange.
   

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

cwr = pd.read_csv("../input/world-university-rankings/cwurData.csv")
cwr


In [None]:
print(cwr.columns)
print(cwr["year"].value_counts())
print(cwr["country"].value_counts())

In [None]:
cwr[cwr.world_rank==1]

Here we can conclude that Harvard University was continuesly holding rank 1 from 2012 to 2015 . 

# *BARPLOT :*

In [None]:
top5 = cwr[cwr.world_rank<6]
ax = sns.barplot(data=top5, x="world_rank", y="institution", hue="year", palette=['blue', 'red', 'yellow', 'grey'], saturation=0.6)
ax.set_title('WORLD RANK UNIVERSITIES BY CWUR(2012-2015)')
ax.grid(color='#cccccc')
ax.set_ylabel('Institutions')
ax.set_xlabel(None)
ax.set_xticklabels(cwr["year"].unique().astype(str), rotation='vertical')

The following information can be clearly seen from the above plot : 

* Hardward University ranks : 1 according to CWUR from 2012-2015
* MIT ranks 2 in 2012 , 4th in 2013, and 3rd in 2014-15 .
* Similarly, Stanford ranks 3rd in 2012 and 2nd from 2013-15.
* An interesting thing to notice is that Caltech ranks 5th in 2012 while it is not amongst the top 5   for the next 4 years.

*  Also Oxford was not among the top 5 in 2012 but is present since 2013 .


# *CORRELATIONS :*

In [None]:
from IPython.display import Image
Image("../input/correlation/correlation.png")

Now we can clearly see the correlation between different features of our dataset. This is done easily through Orange toolkit .

# *SCATTER PLOTS :*

In [None]:
from IPython.display import Image
Image("../input/orange1/WhatsApp Image 2020-09-02 at 9.24.56 PM.png")

The above graph displays the relationship between the world_rank and publications for the whole 2012-2015. The points that are close to 0-200 mark can be considered as the topmost universities of the world . There is a direct positive relationship between the world rank and publications . 

In [None]:
from IPython.display import Image
Image("../input/orange2/scatterplot1.png")

* The above plot compares the public expenditure for the years 2011 and 1995.

* The different shape of points represent different countries as depicted in the plot .

* The different color represent the institution type . 

In [None]:
Image("../input/folder/world_rank vs scoretop5.jpeg")

The above scatter plot displays the score vs world rank for the top5 universities according to CWUR fot the period : 2012-2015.

Notice that higher the score-value , lesser is the World Rank by CWUR.

* The different shape of points represent the different Countries.
* The different color of points represent the different ranks.
* The different sizes of the shapes of points represent their different years(As the size decreases year increases)

# *Feature Statistics of Supplementary Data :*

In [None]:
print("Supplementary data education expenditure :")
from IPython.display import Image
Image("../input/folder/feature-1.jpg")

The different colors represent the institution types
The data used is for the year 2011 .
We can clearly see that 13% of the whol data for the year 2011 is missing .
The central tendency of direct expenditure type is Private and that of country is Denmark .

# *HISTOGRAM :*

In [None]:
print("Supplementary data : education_expenditure : ")
from IPython.display import Image
Image("../input/orange2/histogram.png")

The above histogram displays the education_expenditure for different levels of education by Public and Private direct expenditure and total direct expenditure for the year 2011 as this was the only year when all the three are present for most of the countries .

# *NORMAL DISTRIBUTION :*

Another Question which arises : How many institutes of a particular country are included in the list each year ?Are more Universities included each year?

The two figures below try to explain the above asked questions :

The two figures below, they provide us information about the occurance of frequency of a country with respect to the years.
We have used a normal distribution to represent the number of universities of a country(different countries represented by different color of normal plots) .The purpose for utilising normal distribution over the span of period: 2012-15 is as it provides us the following information : 

1. The number of universities of a particular country included for a particular year(area of the normal distrubution for that country between the present year and the previous year).

2. When(i.e "At which Year") is maximum no. of Universities for a particular country achieved(for the period : 2012-15).

3. As we see that , the general trend of normal distribution is increasing from the year 2012 to 2015,for every country , we can have a clear idea that the no. of universities from a particular country is increasing each year.

In [None]:
print("Fig 1.")
from IPython.display import Image
Image("../input/orange2/country-peryear.png")

Fig 1 : is tabulated for 50% of the entire dataset .

In [None]:
print("Fig 2.")
Image("../input/orange2/countryentry_per year.png")

Fig 2 : is tabulated for just 50 values in the dataset .

# *SAMPLE CASE : 2012*

In [None]:
df1=cwr[cwr.year==2012]
df2=cwr[cwr.year==2013]
df3=cwr[cwr.year==2014]
df4=cwr[cwr.year==2015]
print("Top 10 unversities in year 2012 are-")
df1[df1.world_rank<11]


In [None]:
print("university having highest quality of education in the year of 2012 is-")
df1[df1.quality_of_education==df1.quality_of_education.min()]

## *Scatter Plots :*

In [None]:
df1[df1.world_rank<11].plot(kind='scatter',x='quality_of_education',y='institution')
plt.grid(b= True,which = 'minor',axis = 'both',color='#999999' ,linestyle = '-' ,alpha = 0.2 )
plt.minorticks_on()
plt.show()

df1[df1.world_rank<11].plot(kind='scatter',x='alumni_employment',y='institution')
plt.grid(b= True,which = 'minor',axis = 'both',color='#999999' ,linestyle = '-' ,alpha = 0.2 )
plt.minorticks_on()
plt.show()

df1[df1.world_rank<11].plot(kind='scatter',x='quality_of_faculty',y='institution')
plt.grid(b= True,which = 'minor',axis = 'both',color='#999999' ,linestyle = '-' ,alpha = 0.2 )
plt.minorticks_on()
plt.show()

df1[df1.world_rank<11].plot(kind='scatter',x='publications',y='institution')
plt.grid(b= True,which = 'minor',axis = 'both',color='#999999' ,linestyle = '-' ,alpha = 0.2 )
plt.minorticks_on()
plt.show()

df1[df1.world_rank<11].plot(kind='scatter',x='influence',y='institution')
plt.grid(b= True,which = 'minor',axis = 'both',color='#999999' ,linestyle = '-' ,alpha = 0.2 )
plt.minorticks_on()
plt.show()

df1[df1.world_rank<11].plot(kind='scatter',x='citations',y='institution')
plt.grid(b= True,which = 'minor',axis = 'both',color='#999999' ,linestyle = '-' ,alpha = 0.2 )
plt.minorticks_on()
plt.show()

The above plots display the rankings held by Top 10 Universities of CWUR based on World_ranks in different areas like citations ,quality of education , etc.

We have seen the various rankings of top 10 universities based on World Rank . Now we also want to understand the Top 10 Universities for different areas like : citations , quality of education, etc .

## *TOP 10 Universities on different Parameters :*

In [None]:
df1[df1.citations<11]

In [None]:
df1[df1.quality_of_education<11]

In [None]:
df1[df1.publications<11]

# *CONCLUSION :* 
We can now derive the following the following conclusions : 
* Harward University's world rank : 1 for the period 2012-15.
* Harward University , Standford University and MIT are among the top 5 universities for the period : 2012-15.
* Having world rank : 1 doesn't implies that the University is no. 1 in each of its department like : quality of education,alumni_employment etc .
* There is a direct and positive correlation between the publications and world_rank, followed by positive correlation between influence and world_rank.
* The total no of Countries increases each year, and we also notice that no. of Universities in differnt countries also increases each year.
* The relation between score and world_rank can be considered as a -ve correlation as the University having more score receives less world_rank(or lower world rank).


# *Vote of Thanks :*

I would like to humbly and sincerely thank my mentor <a href='https://www.linkedin.com/in/rocky-jagtiani-3b390649/'> Rocky Jagtiani </a> . He is more of a friend to me then mentor .The data Analytics taught by him and various assignments we did and are still doing is the best way to learn and improve in Data Science field.

Recommended : https://datascience.suvenconsultants.com/