## Introduction

Decades of research have attempted to gauge gender differences in mathematics performance. This report analyzes links between student grades and student gender, supplemental educational support, and plans to enroll in higher education using a data set provided by Paulo Cortez at the University of Minho. The data, as well as a breakdown of its attributes, is available at https://archive.ics.uci.edu/ml/datasets/Student+Performance. 

## Goals

A 1990 study by Hyde et al., focusing on gender differences in mathematics performance, noted "a slight female superiority in performance in the elementary and middle school years" and a "moderate male superiority...in the high school years" and beyond (Hyde et al., 1990, 149). Moreover, Cornell et al. have reported that, "\[o\]n average, students who perceived higher levels of student support reported higher levels of engagement with their school (B = .42), grades (log odds = .08), and educational aspirations (log odds = .07)" (Cornell et al., 2016, 9). To this end, this paper will calculate the magnitude of differences in the average first period, second period, and final grades of males and females. Subsequently, this paper will assess whether a matrix of school, familial, and paid support corresponds with higher grades and whether this support is associated with a desire to pursue higher education.  

## Analysis

The data set includes 33 unique variables and 395 observations. Compared with the study of Hyde et al., which sampled 1,968,846 males and 2,016,836 females (Hyde et al., 1990, 141), this is a significantly smaller data set and its statistics may not accurately reflect the parameters of mathematics students in Portugal. 

In [63]:
*Create ST513 library;
LIBNAME ST513 '/folders/myfolders/ST513/project1';

*Import data into the ST513 library (from file, rather than URL)
G1 and G2 are imported as character variable, while G3 is imported as a numerical variable;
FILENAME REFFILE '/folders/myfolders/ST513/project1/StudentData.txt';

PROC IMPORT DATAFILE=REFFILE
	DBMS=DLM
	OUT=ST513.StudentData;
	DELIMITER=";";
	GETNAMES=YES;
RUN;

*Copy dataset with G1 and G2 recast as numeric values 
(numeric type expected from the description of the original data at https://archive.ics.uci.edu/ml/datasets/Student+Performance#);
DATA ST513.RecastStudentData;
  SET ST513.StudentData;
  numG1 = input(G1, 8.);
  numG2 = input(G2, 8.);
RUN;

*Print number of observations and variables within data set.
*ODS TRACE ON/ODS TRACE OFF was used in SAS Studio to
*exclude EngineHost/Position and output only the Attributes; 
PROC CONTENTS DATA = ST513.StudentData VARNUM;
  ODS SELECT ATTRIBUTES;
RUN;

0,1,2,3
Data Set Name,ST513.STUDENTDATA,Observations,395
Member Type,DATA,Variables,33
Engine,V9,Indexes,0
Created,02/10/2021 00:17:55,Observation Length,224
Last Modified,02/10/2021 00:17:55,Deleted Observations,0
Protection,,Compressed,NO
Data Set Type,,Sorted,NO
Label,,,
Data Representation,"SOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64",,
Encoding,utf-8 Unicode (UTF-8),,


The gender distribution in this sample is fairly even, with 208 females and 187 males.

In [74]:
*Bar plot of gender with counts;
PROC SGPLOT DATA = ST513.RecastStudentData;
  VBAR sex / DATALABEL;
RUN;

Moreover, the average first period, second period, and final grades in this sample do not differ significantly from their respective medians, suggesting that the data is largely symmetric. 

In [79]:
*Select statistics for grades regardless of gender;
PROC MEANS DATA = ST513.RecastStudentData MEAN MEDIAN Q1 Q3 MIN MAX MAXDEC = 2;
  VAR numG1 numG2 G3;
RUN;

Variable,Mean,Median,Lower Quartile,Upper Quartile,Minimum,Maximum
numG1 numG2 G3,10.91 10.71 10.42,11.00 11.00 11.00,8.00 9.00 8.00,13.00 13.00 14.00,3.00 0.00 0.00,19.00 19.00 20.00


A comparison of average grades across genders shows males marginally outperforming females in all three grade categories (first period, second period, and final). 

In [83]:
*Histogram with mean grade reference and a smoothed overlay - gender and final grades;
PROC SGPANEL DATA = ST513.RecastStudentData;
  PANELBY sex;
  HISTOGRAM G3;
  DENSITY G3 / TYPE = kernel;
  REFLINE 10.42 / AXIS = x 
            LINEATTRS = (Pattern = 4 
                         Thickness = 3);
RUN;

*Select statistics for grades, group by gender;
PROC MEANS DATA = ST513.RecastStudentData MEAN MEDIAN Q1 Q3 MIN MAX MAXDEC = 2;
  CLASS sex;
  VAR numG1 numG2 G3;
RUN;

sex,N Obs,Variable,Mean,Median,Lower Quartile,Upper Quartile,Minimum,Maximum
F,208,numG1 numG2 G3,10.62 10.39 9.97,10.00 10.00 10.00,8.00 8.00 8.00,13.00 13.00 13.00,4.00 0.00 0.00,19.00 18.00 19.00
M,187,numG1 numG2 G3,11.23 11.07 10.91,11.00 11.00 11.00,9.00 9.00 9.00,14.00 14.00 14.00,3.00 0.00 0.00,19.00 19.00 20.00


Next, we can assess grade distributions for ordinal variables such as extra educational support, family educational support, and enrollment in extra paid classes. 

In [18]:
*One-way tables to summarize frequency of additional support (extra educational support, family educational support, extra paid classes);
PROC FREQ DATA = ST513.RecastStudentData;
  TABLES schoolsup famsup paid;
RUN;

PROC MEANS DATA = ST513.RecastStudentData;
  CLASS schoolsup famsup paid;
  VAR numG1 numG2 G3;
RUN;


schoolsup,Frequency,Percent,Cumulative Frequency,Cumulative Percent
no,344,87.09,344,87.09
yes,51,12.91,395,100.0

famsup,Frequency,Percent,Cumulative Frequency,Cumulative Percent
no,153,38.73,153,38.73
yes,242,61.27,395,100.0

paid,Frequency,Percent,Cumulative Frequency,Cumulative Percent
no,214,54.18,214,54.18
yes,181,45.82,395,100.0

schoolsup,famsup,paid,N Obs,Variable,N,Mean,Std Dev,Minimum,Maximum
no,no,no,101,numG1 numG2 G3,101 101 101,11.4158416 10.8712871 10.3960396,3.6118342 4.0488600 5.1460261,5.0000000 0 0,19.0000000 19.0000000 20.0000000
,,yes,39,numG1 numG2 G3,39 39 39,11.3846154 11.5384615 11.3076923,3.0574790 3.1190936 3.6717545,6.0000000 5.0000000 0,18.0000000 18.0000000 18.0000000
,yes,no,84,numG1 numG2 G3,84 84 84,10.6785714 9.9880952 9.6666667,3.6509141 4.8854198 5.6276712,3.0000000 0 0,19.0000000 18.0000000 18.0000000
,,yes,120,numG1 numG2 G3,120 120 120,11.2666667 11.3083333 11.0833333,3.0174098 3.0891402 3.9843883,6.0000000 0 0,18.0000000 18.0000000 19.0000000
yes,no,no,10,numG1 numG2 G3,10 10 10,9.7000000 10.8000000 10.8000000,3.0930029 2.6997942 2.8982753,5.0000000 6.0000000 6.0000000,16.0000000 16.0000000 17.0000000
,,yes,3,numG1 numG2 G3,3 3 3,9.6666667 8.6666667 9.6666667,2.3094011 1.1547005 1.5275252,7.0000000 8.0000000 8.0000000,11.0000000 10.0000000 11.0000000
,yes,no,19,numG1 numG2 G3,19 19 19,8.5263158 8.9473684 8.7894737,1.9823785 2.2478059 3.2072651,5.0000000 5.0000000 0,12.0000000 15.0000000 15.0000000
,,yes,19,numG1 numG2 G3,19 19 19,9.2105263 9.6842105 9.3157895,2.2004784 2.1357443 2.5615237,7.0000000 6.0000000 5.0000000,14.0000000 13.0000000 14.0000000


The overwhelming majority of students did not receive extra educational support and more than half did not pay for extra classes in mathematics. Moreover, while the differences are marginal, the highest average grades are associated with students who enrolled in extra paid classes but did not receive extra educational support from their school. This may be explained if we consider the idea that students who feel more academically challenged in their day-to-day coursework may have lower average grades and may therefore be more likely to seek additional support from their school. We may also consider the desire to receive higher education, which shifts the median grade, the extremes, and the lower and upper quartiles toward higher final grades.    

In [33]:
*Side-by-side boxplot;
PROC SGPLOT DATA = ST513.RecastStudentData;
  HBOX G3 / GROUP = higher;
RUN;

## Conclusion

## Future study

While Hyde et al. calculated a marginally higher average performance for males in high school mathematics than for females, the authors conclude that outcomes differ depending on whether the assessments gauge computation, problem solving abilities, or an understanding of mathematical concepts. Decades later, Lindberg et al. drew similar conclusions, finding that "previous research showed that gender differences in mathematics performance were very small and, depending on the sample and outcome measure, sometimes favored boys and sometimes favored girls" (Lindberg et al, 2010, 1124). 

## References

Cornell, D., Shukla, K., & Konold, T. R. (2016). Authoritative School Climate and Student Academic Engagement, Grades, and Aspirations in Middle and High Schools. AERA Open. https://doi.org/10.1177/2332858416633184

Hyde, J. S., Fennema, E., & Lamon, S. J. (1990). Gender differences in mathematics performance: A meta-analysis. Psychological Bulletin, 107(2), 139–155. https://doi.org/10.1037/0033-2909.107.2.139

Lindberg, S. M., Hyde, J. S., Petersen, J. L., & Linn, M. C. (2010). New trends in gender and mathematics performance: A meta-analysis. Psychological Bulletin, 136(6), 1123–1135. https://doi.org/10.1037/a0021276