# Closing the Gap in Higher Education: Factors in Student Success on CLEP Exams

This project was the culmination of all of my math and computer science studies at Fordham, in a field I was passionate about. With a rigorous statistical lens, I looked at attributes of students taking CLEP exams and analyzed whether or not these attributes were correlated with success on the exam, looking for trends that can be used to understand the gaps in accessibility to higher education. I also propose targeted interventions based on my findings to work toward closing these gaps. For a more in-depth discussion, the full text of my paper can be found here.

*For privacy reasons, I will not be showing the data I used nor describing in great detail its structure. The organization I discuss (Online College Access Network) has been given a pseudonym meant to protect the privacy of the organization and its learners.*

## I. Understanding Barriers to Higher Education

In a world where higher education is becoming increasingly necessary, it's important to understand the reasons why people may not pursue this path. Research shows that it's not for lack of interest, but rather cost is one of the most significant factors in the decision to pursue (or not pursue) some sort of post-secondary accredidation.

![Thesis Slides 2.png](<education-predictive-model/visuals/Thesis Slides 2.png>)

Enter: the CollegeBoard's CLEP (College Level Examination) exam, a low-cost (~$90 fee per exam) way to obtain college credit at thousands of U.S. institutions simply by passing a test. This is a great solution for those who have the time and are willing to put in the effort towards a higher education, but financial barriers are the main thing holding them back. However, for some, even the examination fee is a barrier, so organizations like Online College Access Network help learners waive these fees after proving with a sufficient level of certainty that they will pass the test.

![Thesis Visuals .png](<education-predictive-model/visuals/Thesis Visuals .png>)

## II. Descriptive Statistics

As part of my exploratory analysis, I took note of the distribution of learner age, and geographical location in the US. As shown below, learners range from age 13 to well in their 70s, with the highest concentration of learners being about high school age. Notice also the very high concentration of learners in Louisiana: this will be important later. 

![Thesis Slides 5.png](<visuals/Thesis Slides 5.png>)

![Thesis Slides 6.png](<visuals/Thesis Slides 6.png>)

In the spirit of making higher education more equitable and accessible for everyone, OCAN hopes to be seving demographics of learners traditionally underrepresented at the postsecondary level, such as low-income learners, people of color, non-native English speakers, and active-duty US military or veterans. I chose to, with the information OCAN has, see if these demographics are indeed being reached.

**Claim: OCAN disproportionately serves learners from low-income families and people of color.**

The below visuals compare OCAN demographic data with US Census data, supporting my claim that they serve these underrepresented communities. 

![Thesis Slides 7.png](<visuals/Thesis Slides 7.png>)

![Thesis Slides 8.png](<visuals/Thesis Slides 8.png>)

## III. Research Questions

With 3 chosen predictor variables, there were two research questions I aimed to answer with my research:
1. To what extent does each of the chosen components relate to student success on the exam?
2. Given a particular student and their information, what is their projected score? How reliable is this prediction?

![Thesis Slides 10.png](<projects/education-predictive-model/visuals/Thesis Slides 10.png>)


## IV. Methods/Results

For each of the predictor variables, a scatter plot against the response variable (CLEP Exam Score) does not immediately display a clear relationship. 

![Thesis Slides 11.png](<visuals/Thesis Slides 11.png>)
![Thesis Slides 12.png](<visuals/Thesis Slides 12.png>)
![Thesis Slides 13.png](<visuals/Thesis Slides 13.png>)

I decided to try a simple linear regression for each individual predictor score, then a multilinear regression using all 3. Ultimately, what I found that there was not sufficient evidence to say that any of these three variables had a linear relationship with the response variable. 

![Thesis Slides14.png](<visuals/Thesis Slides14.png>)

These inconclusive results meant that I was neither able to accept or reject my hypothesis that none of these variables were related to testing outcomes, only that there is not a linear relationship. Being limited by the constraints of my undergraduate math studies, technological constraints, and time constraints, I decided to try one more approach to look for different results.

## V. An Alternate Approach

Remember how there was a huge concentration of learners in Louisiana? Well, there's another important piece of information to this puzzle: after familiarizing myself the data, I suspected that learners tend to underperform on CLEP exams compared to learners from the rest of the US. This context, along with the fact that there was one other predictive variable (Returning learners) that was a binary categorical variable, led me to try a series of two sample t-tests to definitively state that these two factors are in fact differentiating factors in student success. 

![Thesis Slides 15.png](<visuals/Thesis Slides 15.png>)
![Thesis Slides16.png](<visuals/Thesis Slides16.png>)

Therefore, I did in fact conclude that being a learner from Louisiana and being a first-time exam taker (never having seen this specific exam's test before) lead to lower scores on average—and this is a good start to be flag students earlier for intervention.

## VI. Looking Forward

The below slide discusses where and how OCAN can intervene for learners from Louisiana and first-time learners, which I would flag for being at risk of scoring lower. Additionally, with more time, I would be interested in looking more in depth into other factors that may contribute to success on an exam.

![Thesis Slides 17.png](<visuals/Thesis Slides 17.png>)

I only chose four factors to look at as contributors to learner success. In reality, there are a myriad of other factors that could contribute to a learner’s performance. Time spent engaging with the content outside of the online course with a teacher or tutor, time spent studying the course content before the exam, and even individual differences in learning styles and information retention are among the other factors that could influence learner performance.

## VII. Conclusions/Limitations

Revisiting my two original research questions, I notice that the path of my research deviated slightly from my original plan. The four components I chose were not enough to determine a student’s score—I suspect that there are other factors at play that should be taken into consideration. However, I was able to definitively say that being from Louisiana and never having seen test material before lead to lower scores on average, and this is a good starting place for OCAN to focus their intervention.

As time goes on, OCAN steadily gains more learners, and as a college education becomes more important and more people are accessing higher education via this route, it’s important that OCAN can follow through on the claim that taking a course through them will reliably result in transferable college credit.

![Thesis Visuals  1.png](<visuals/Thesis Visuals  1.png>)

## VIII. References

![Thesis Slides 19.png](<visuals/Thesis Slides 19.png>)