# NB 4: - Grading the professor, Pt. 1

*Modified from Data Science in a box*

In [None]:
# This code will load the R packages we will use
install.packages(c("csucistats", "openintro"),
                 repos = c("https://inqs909.r-universe.dev", "https://cloud.r-project.org"))
library(csucistats)
library(tidyverse)
library(openintro)


# Uncomment and run for categorical plots
# csucistats::install_plots()
# library(ggtricks)
# library(ggmosaic)
# library(waffle)

# Uncomment and run for themes
# csucistats::install_themes()
# library(ThemePark)
# library(ggthemes)


## Introduction

Many college courses conclude by giving students the opportunity to evaluate the course and the instructor anonymously.
However, the use of these student evaluations as an indicator of course quality and teaching effectiveness is often criticized because these measures may reflect the influence of non-teaching related characteristics, such as the physical appearance of the instructor.
The article titled, "Beauty in the classroom: instructors' pulchritude and putative pedagogical productivity" (Hamermesh and Parker, 2005) found that instructors who are viewed to be better looking receive higher instructional ratings.
(Daniel S. Hamermesh, Amy Parker, Beauty in the classroom: instructors pulchritude and putative pedagogical productivity, Economics of Education Review, Volume 24, Issue 4, August 2005, Pages 369-376, ISSN 0272-7757, 10.1016/j.econedurev.2004.07.013. <http://www.sciencedirect.com/science/article/pii/S0272775704001165>.)





## Data

In this notebook you will analyze the data from this study in order to learn what goes into a positive professor evaluation.

The data were gathered from end of semester student evaluations for a large sample of professors from the University of Texas at Austin.
In addition, six students rated the professors' physical appearance.
(This is a slightly modified version of the original data set that was released as part of the replication data for Data Analysis Using Regression and Multilevel/Hierarchical Models (Gelman and Hill, 2007).) The result is a data frame where each row contains a different course and columns represent variables about the courses and professors.

The data can be found in the **openintro** package, and it's called `evals`.
Since the dataset is distributed with the package, we don't need to load it separately; it becomes available to us when we load the package.
You can find out more about the dataset by inspecting its documentation, which you can access by running `?evals` in the Console or using the Help menu in RStudio to search for `evals`.
You can also find this information [here](https://www.openintro.org/data/index.php?data=evals).



## 1.0 - Exploratory Data Analysis

1.1 - Visualize the distribution of `score`.
    Is the distribution skewed?
    What does that tell you about how students rate courses?
    Is this what you expected to see?
    Why, or why not?
    


1.2 - Visualize and describe the relationship between `score` and `bty_avg`.

1.3 - Create summary statistics of the two a variables. Describe the distribution of the variables.

1.4 - Compute the correlation between the two variables and intepret it.


## 2.0 - Linear regression with a numerical predictor

2.1 - Let's see if the apparent trend in the plot is something more than natural variation.
    Fit a linear model called `score_bty_fit` to predict average professor evaluation `score` by average beauty rating (`bty_avg`).
    Based on the regression output, write the linear model.



2.2  Recreate the scatterplot from Exercise 2, and add the regression line using the function `geom_smooth(method = "lm")` to the plot in any color found [here](http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf).



2.3 - Interpret the slope of the linear model in context of the data.



2.4 - Interpret the intercept of the linear model in context of the data.
    Comment on whether or not the intercept makes sense in this context.




2.5 -  Determine the $R^2$ of the model and interpret it in context of the data.


## 3.0 Linear regression with a categorical predictor



3.1 Create a visual plot displaying the distribution of the data between the numerical variable `score` and categorical variable `gender`.

3.2 -  Fit a new linear model called `score_gender_fit` to predict average professor evaluation `score` based on `gender` of the professor.
    Based on the regression output, write the linear model and interpret the slope and intercept in context of the data.



3.3 - What is the equation of the line corresponding to male professors?
    What is it for female professors?



## 4.0 Releveling Categorical Variables

4.1 - Fit a new linear model called `score_rank_fit` to predict average professor evaluation `score` based on `rank` of the professor.
    Based on the regression output, write the linear model and interpret the slopes and intercept in context of the data.


4.2 - Create descriptive statistics for the variable `rank`.

The `mutate` function allows you to create new variables in a data frame from the current existing variables.

The `relevel` function allows you to create a new reference category for a variable for the `lm` function.

4.3 - Use the code below to create a new variable called `rank_relevel` where `"tenure track"` is the baseline level.



In [None]:
evals <- mutate(evals, rank_relevel =  relevel(rank, "tenure track"))

4.3 - Fit a new linear model called `score_rank_relevel_fit` to predict average professor evaluation `score` based on `rank_relevel` of the professor.
    This is the new (releveled) variable you created in Exercise 12.
    Based on the regression output, write the linear model and interpret the slopes and intercept in context of the data.
    Also determine and interpret the $R^2$ of the model.



## 5.0 Submit Notebook

Submit the notebook to Canvas.