# Title <- Insert Title Here

*Jack Yao, Jessie Lu, Kexin Feng, Vincent Luong*

# Research Question:

Is there a positive correlation between student performance and parents' educational attainment? 

Null hypothesis: There is no difference in mean of student performance between those with well-educated parents and less-educated parents. [H0: $\mu1 - \mu2 = 0$]

Alternative hypothesis: Students with well-educated parents perform better than those with less-educated parents. [H1: $\mu1 - \mu2 > 0$]

$\mu1$: student performance of well-educated parents’ group.

$\mu1$: student performance of less-educated parents’ group. 

# Introduction:

Student Performance is a general term used to describe how well a student has mastered what is taught in school. In our experiment, we simply use the sum of students' grades (first exam grade plus second exam grade plus final grade) to quantify student performance in a specific subject (mathematics). Numerous factors contribute to student performance, such as family relationships, school, parents’ level of education, and class attendance. Based on Dr. Muhammad’s study on tenth-class students of government high schools at District Mardan, the high education of father and mother positively contributes to their children's academic achievement. 

The dataset that will be used in this project is the Student Performance on Math dataset from the UCI machine learning repository. This project classified parents’ level of education into two groups, based on the sum of both parents’ education levels. Level0 means uneducated, level1 means finished (4th grade), level2 finished 5th to 9th grade, level3 finished secondary education, and level4 stands for higher education. If the sum of both parents’ level of education is greater than 5, we classified them as “well-educated”, or else they are “less-educated”.

# Preliminary Results

This section will involve:
- Reading data from UCI database
- Cleaning and wrangling data
- Plotting relevant raw data
- Computing point estimates

In [3]:
# load libraries
library(tidyverse)
library(tidymodels)
library(infer)

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.2 ──
[32m✔[39m [34mggplot2[39m 3.3.6      [32m✔[39m [34mpurrr  [39m 0.3.4 
[32m✔[39m [34mtibble [39m 3.1.8      [32m✔[39m [34mdplyr  [39m 1.0.10
[32m✔[39m [34mtidyr  [39m 1.2.1      [32m✔[39m [34mstringr[39m 1.4.1 
[32m✔[39m [34mreadr  [39m 2.1.2      [32m✔[39m [34mforcats[39m 0.5.2 
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
── [1mAttaching packages[22m ────────────────────────────────────── tidymodels 1.0.0 ──

[32m✔[39m [34mbroom       [39m 1.0.1     [32m✔[39m [34mrsample     [39m 1.1.0
[32m✔[39m [34mdials       [39m 1.0.0     [32m✔[39m [34mtune        [39m 1.0.0
[32m✔[39m [34minfer       [39m 1.0.3     [32m✔[39m [34mworkflows   [39m 1.0.0
[

### Reading Data from the database

In [22]:
# Reading data from UCI Machine Learning Repository and extract CSV file from zip file
url <- "https://archive.ics.uci.edu/static/public/320/student+performance.zip"
download.file(url, "./data/studentperformance.zip")
raw_data <- unzip("./data/studentperformance.zip", "student.zip", exdir = "./data") |>
    unzip("student-mat.csv", exdir = "./data") |>
    read_delim(delim = ";")


student_data <- raw_data |>
    select(Medu, Fedu, G1, G2, G3)
colnames(student_data) <- c("mother_education", "father_education", "term_1_grade", "term_2_grade", "fin")
    
head(student_data)

[1mRows: [22m[34m395[39m [1mColumns: [22m[34m33[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ";"
[31mchr[39m (17): school, sex, address, famsize, Pstatus, Mjob, Fjob, reason, guardi...
[32mdbl[39m (16): age, Medu, Fedu, traveltime, studytime, failures, famrel, freetime...

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


mother_education,father_education,first_period_grade,second_period_grade,final_grade
<fct>,<fct>,<dbl>,<dbl>,<dbl>
4,4,5,6,6
1,1,5,5,6
1,1,7,8,10
4,2,15,14,15
3,3,6,10,10
4,3,15,15,15


**Reading Data:** Reading and processing the Student Performance dataset from the web

# Methods: Plan

```
The previous sections will carry over to your final report (you’ll be allowed to improve them based on feedback you get). Begin this Methods section with a brief description of “the good things” about this report – specifically, in what ways is this report trustworthy?

Continue by explaining why the plot(s) and estimates that you produced are not enough to give to a stakeholder, and what you should provide in addition to address this gap. Make sure your plans include at least one hypothesis test and one confidence interval. If possible, compare both the bootstrapping and asymptotics methods.

Finish this section by reflecting on how your final report might play out:

- What do you expect to find?
- What impact could such findings have?
- What future questions could this lead to?
```

# write more stuff here

# References

Cortez,Paulo. (2014). Student Performance. UCI Machine Learning Repository. https://doi.org/10.24432/C5TG7T.

Dr. Muhammad Idris, Dr. Sajjad Hussain, & Dr. Nasir Ahmad. (2020). Relationship between Parents’ Education and their children’s Academic Achievement. Journal of Arts & Social Sciences , 7(2), 82-92. https://doi.org/10.46662/jass-vol7-iss2-2020(82-92)![image.png](attachment:a72aca51-6d35-4368-8450-cd5de0e79a5f.png)![image.png](attachment:9607c531-799e-4627-8758-262b3cac7908.png)![image.png](attachment:620d5646-6cb4-43a9-9fcf-e5c9de68fe8b.png)![image.png](attachment:19e6f7c5-54f7-4c7c-b608-cf46bd1a73ac.png)