## What Influences a University's Reputation for Research Excellence?
#### STAT 301 Group Report 

---

<br><br>

Plan:
- [ ] FINISH PARTS BY 9PM ON SATURDAY!!!!!!
- [x] format references -> kai
- [ ] copy paste cleaning, eda diagrams, methods -> gina 
- [x] relevant background information on the topic so that someone unfamiliar with it will be prepared to understand the rest of your proposal. -> kai
- [x] find 2 articles -> kai
- [ ] frame your question/objectives in terms of what is already known in the literature. -> prajna
- [ ] Summarize what you found, and the implications/impact of your findings. -> Prajna, sabrina 
- [ ] If relevant, discuss whether your results were what you expected to find. -> sabrina 
- [ ] Discuss how your model could be improved;  -> gina
- [ ] Discuss future questions/research this study could lead to.  -> gina

<br><br>

## Introduction
In recent years, the assessment of higher education institutions has garnered significant attention due to the implications for students, educators, policymakers, and the academic community at large. University rankings serve as a vital tool for students seeking quality education, researchers exploring collaborative opportunities, and institutions striving for excellence in teaching and research. A university's research capability ranking in particular is of interest to both students (usually Graduate or Masters candidates) and faculty. Understanding the factors influencing this rankings is crucial for stakeholders within and external to the university to make informed decisions and enhance educational outcomes.

For a student, the university's teaching effectiveness might be the most significant factor contributing to a good rating. For faculty and policy makers within the institution, teaching effectiveness is often in a delicate balance with research capabilities for resources. Fairweather (2002) suggests that for a faculty member to participate in both teaching and research is a mutually reinforcing process that allows them to be a "complete faculty member". Gaikwad (2021) suggests that research production can be critical to feelings of job security for those primarily involved in teaching. Given that balancing the two activities requires decisions on the allotment of resources like money and time, we believe it is worthwhile to explore the relationship between a university's research capabilities and teaching effectiveness. However, there is not a conclusive perspective on this relationship in much of the existing literature (Khan, 2017).    

We believe that a university's gender diversity may also produce insights about its research capabilities. The inclusion of women in research can produce a more diverse environment of ideas and perspectives, leading to innovation and quality research. The gender gap in university research teams has shrunk, but a lack of statistical information as a current shortcoming in academia that hinders the effective study of the relationship between gender and research quality (Restrepo, 2021).  

While previous research has shed light on the multifaceted nature of university rankings, there remains a gap in understanding the relationship between teaching effectiveness, research capabilities, and gender diversity. This study aims to address this gap by investigating how these factors influence university rankings, with a focus on the World University Rankings 2023 dataset. By examining the interplay between teaching effectiveness, research capabilities, and gender diversity, we seek to provide insights that can inform institutional policies and practices, enhance educational outcomes, and promote diversity and inclusion in academia.

<br>

#### Research Question 
In this study, we aim to investigate how specific variables affect the research score of universities, as indicated in the World University Rankings 2023 dataset. Our primary research question is:

**"How does a university's research capabilities change based on its rank, teaching effectiveness, and female-to-male student ratio?"**

<br>

#### Dataset Description

To address these questions, we utilize the [World University Rankings 2023 dataset](https://www.kaggle.com/datasets/alitaqi000/world-university-rankings-2023), sourced from Kaggle. This dataset is a comprehensive collection of data from 1799 universities across 104 countries and regions. Compiled by Times Higher Education, the dataset incorporates survey responses from 40,000 scholars worldwide and analyzes 121 million citations from over 15.5 million research publications. With over 2,500 institutions contributing data, the dataset comprises over 680,000 data points, providing a rich source for exploration.

In all, we have 13 different variables and 2341 observations (including NAs). The university's research capabilities will be operationalized with Research Score, and teaching effectiveness with Teaching Score. These variables offer insights into various aspects of university performance and enable us to examine the relationships between teaching, research, and gender diversity. The variables are:
<br><br>

| | Variable | Variable Type | Description |
|---| -------- | ------- | --- |
|1| University rank  | chr  |Rank of specific university all over the world|
|2| University name | chr    |Specific name of University|
|3| Location | chr |Physical place where university exists|
|4| No. of students | chr |Present number of students enrolled in university as of 2023|
|5| No. of students per staff |dbl |Number of students under one Professor|
|6| International students |chr  |Total number of International Students|
|7| Female : male ratio |chr  |A ratio of female to male students respectively|
|8| Overall score | chr | The combined weighted scores of those given below. Out of 100|
|9| Teaching score | chr |The percieved prestige of the institution based on the Academic Reputation Survey. Out of 100.|
|10| Research score | chr |Reputation for research excellence amongst peers based on the Academic Reputation Survey. Out of 100|
|11| Citations score | chr |The number of citations received by a journal in one year to documents published in the three previous years, divided by the number of documents indexed in Scopus published in those same three years. Out of 100.|
|12| Industry income score | chr |How much money a university receives from the working industry in exchange for its academic expertise. Out of 100|
|13| International outlook score | chr |The ability of a university to attract undergraduates, postgraduates and faculty from all over the globe.|

<br><br>

---
## Methods and Results

#### Exploratory Data Analysis (EDA)
- [x] Demonstrate that the dataset can be read from the web into R.
- [ ] Clean and wrangle your data into a tidy format.
- [ ] Plot the relevant raw data, tailoring your plot in a way that addresses your question.
    - [ ] CORR PLOT FROM SABRINA
    - [ ] GINA'S 3 EDA GRAPHS 
    - [ ] make sure to explore the association of the explanatory variables with the response.
    - [ ] your Exploratory Data Analysis (EDA) must be comprehensive with high quality plots.
- [ ] Any summary tables that is relevant to your analysis.
- [ ] Be sure to not print output that takes up a lot of screen space.

<br>

#### Methods: Plan
- [ ] Describe in written English the methods you used to perform your analysis from beginning to end that narrates the code that does the analysis.
- [ ] If included, describe the “Feature Selection” process, how and why you choose the covariates of your final model.
- [ ] Make sure to interpret/explain the results you obtain. It’s not enough to just say “I fitted a linear model with these covariates, and my R-square is 0.87”.
    - [ ] if inference is the aim of your project, detailed interpretation of your fitted model is required, as well as a discussion of relevant quantities (e.g., are the coefficients significant? how is the model fitting the data)?
    - [ ] a careful model assessment must be conducted.
    - [ ] if prediction is the aim of the project, describe the test data used or how it was created.
- [ ] Ensure your tables and/or figures are labeled with a figure/table number.

<br><br>

In [7]:
# load required packages
library(readr)
library(dplyr)
library(ggplot2)
library(leaps)
library(olsrr)

# set the desired size for plots
options(repr.plot.width = 12, repr.plot.height = 20) 

# adjust the number of significant digits to display for double type values
options(digits = 15)  

# Read in the data from the web and display the first few rows
data <- read_csv("https://raw.githubusercontent.com/gna7/stat301report/main/world_university_rankings_2023.csv",show_col_types = FALSE)
# head(data)

<br><br>

---
## Discussion
- [x] In this section, you’ll interpret the results you obtained in the previous section with respect to the main question/goal of your project.
- [x] Summarize what you found, and the implications/impact of your findings.
- [x] If relevant, discuss whether your results were what you expected to find.
- [ ] Discuss how your model could be improved;
- [x] Discuss future questions/research this study could lead to.


The most optimal model for this research question would be model 5 due to the highly adjusted R^2 (0.917) and Cp value (10.720769). This model was chosen as the optimal model for this question since the Cp stays stable (e.g. does not get closer to 9) with added predictor variables and the R^2 does not vary either. Relative to our research question, we can say that the variables of rank, international outlook score, number of students per stad, teaching score, and citation score are all influential in a university's research score based on the above findings. We found that the recorded research scores had significant linear correlations with the variables of University Rank, International Outlook Score, No. of Students Per Staff, Teaching score, and Citation score. These findings align with our expected results and were not surprising.

These findings can influence how universities around the globe can improve their research excellence, that is, which factors universities can focus on improving in order to enhance their research excellence. For instance, since teaching scores were correlated with research scores, improving one's teaching scores can result in research improvements as well. This can impact universities in various ways, especially for those that are primarily research-based schools. Improving research scores can yield more academic findings from these universities and provide more opportunities for students to conduct and publish their own meaningful research.

Future questions that can be discussed based on the findings from this project are:
- Can these results be replicated in data taken only from women's universities?
- Does the type of research done at each university influence thier research scores?

<br><br>

---
## References

Fairweather, J. S. (2002). The mythologies of faculty productivity: Implications for Institutional Policy and decision making. The Journal of Higher Education, 73(1), 26–48. https://doi.org/10.1353/jhe.2002.0006  

Gaikwad, P. (2021). Balancing Research Productivity and Teaching by Faculty in Higher Education: A Case Study in the Philippines. Journal of Higher Education Theory and Practice, 21(7). https://doi.org/10.33423/jhetp.v21i7.4495

Khan, M. A. (2017). Achieving an appropriate balance between teaching and research in institutions of Higher Education: An exploratory study. International Journal of Information and Education Technology, 7(5), 341–349. https://doi.org/10.18178/ijiet.2017.7.5.892 

Restrepo, N., Unceta, A., & Barandiaran, X. (2021). Gender diversity in research and innovation projects: The proportion of women in the context of Higher Education. Sustainability, 13(9), 5111. https://doi.org/10.3390/su13095111 

Taqi, S. A. (2023). World University Rankings 2023 [Data set]. Kaggle. https://doi.org/10.34740/KAGGLE/DSV/6394958