Skip to content

The School District Analysis uses Python, Anaconda, Jupyter Notebook, the Pandas library specifically DataFrames, and NumPy to perform analysis on school data for fifteen different high schools across four different grades by merging the data into several Pandas DataFrames. Using the Pandas loc method, Pandas groupby function and bins, the avera…

Notifications You must be signed in to change notification settings

melindamalone/School_District_Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 

Repository files navigation

School District Analysis

Overview of the School District Analysis

Purpose

The purpose of Module Four and the School District Analysis Challenge is to gain additional practice using Python, and introduces Anaconda, Jupyter Notebook, the Pandas library, and NumPy. One of the most important concepts covered in Module Four and the School District Analysis Challenge is how to create and manipulate Pandas DataFrames. The Module Four School District Analysis Challenge required the thorough analysis of two external CSV files of school data for fifteen different high schools across four different grade levels by merging the data into several Pandas DataFrames. Using the Pandas loc method, Pandas groupby function and bins, the average math, reading, and overall scores and passing rates were identified by per student budget, by school size, and by school type. The Pandas loc method was specifically used to find and replace math and reading grades for Thomas High School ninth graders to a NumPy NaN value (aka not a number) due to possible academic dishonesty.

Results

  • How is the district summary affected? Overall, after setting the math and reading scores for Thomas High School ninth graders to NaN, and essentially removing these 461 entries from the district summary calculations, the district summary was minimally affected. When Average Math Score, Average Reading Score, % Passing Math, % Passing Reading, and % Overall Passing are rounded to zero decimal places in the district summary, no change is detected. When these averages and percentages are rounded to nearest tenths or hundredths is when either ±.1 or ±.01 change is actually detected.
  • How is the school summary affected? As determined in the district summary analysis, after setting the math and reading scores for Thomas High School ninth graders to NaN, and essentially removing these 461 entries from the school summary calculations, the school summary was also minimally affected. Similar to the district summary, the Average Math Score, Average Reading Score, % Passing Math, % Passing Reading, and % Overall Passing for Thomas High School remains unchanged when rounded to zero decimal places. When these averages and percentages are rounded to the nearest tenths or hundredths is when either ±.1 or ±.01 change is actually detected.
  • How does replacing the ninth graders’ math and reading scores affect Thomas High School’s performance relative to the other schools? After replacing the Thomas High School ninth graders' math and reading scores does not affect Thomas High School's overall performance relative to the other schools. Thomas High School remains the second best top performing school on basis of % Overall Passing percentage which is unchanged after NaNs are entered for Thomas High School ninth graders.
  • How does replacing the ninth-grade scores affect the following:
    • Math and reading scores by grade: The math and reading scores by grade remain unchanged for all grade levels (9th, 10th, 11th, and 12th) and all schools including Thomas High School, except for the Thomas High School ninth graders. In place of actual math and reading averages for Thomas High School ninth graders is NaN.
    • Scores by school spending: Thomas High School falls into the $630-644 spending range (per student) bin and the math and reading averages and passing percentages associated to this bin remain unchanged when rounded to zero decimal places. When these averages and percentages are rounded to the nearest tenths or hundredths is when either ±.1 or ±.01 change is actually detected.
    • Scores by school size: Thomas High School falls into the medium (1000-2000) school size bin and the math and reading averages and passing percentages associated to this bin remain unchanged when rounded to zero decimal places. When these averages and percentages are rounded to the nearest tenths or hundredths is when either ±.1 or ±.01 change is actually detected.
    • Scores by school type: Thomas High School falls into the Charter school type bin and the math and reading averages and passing percentages associated to this bin remain unchanged when rounded to zero decimal places. When these averages and percentages are rounded to the nearest tenths or hundredths is when either ±.1 or ±.01 change is actually detected.

Summary:

In summary, after the reading and math scores for Thomas High School ninth graders have been replaced with NaNs, the changes in the outputs are almost undetectable. As per previous result statements, when the values for Thomas High School ninth graders are rounded to zero decimal places, no change is detected. However when these same averages and percentages are rounded to either the nearest tenths, hundredths, thousandths, etc. is when either ±.1, ±.01, ±.001 change is actually detected. In my opinion, the most significant change is in the math and reading scores by grade dataframes because in place of actual math and reading averages for Thomas High School ninth graders is NaN instead of an average.

About

The School District Analysis uses Python, Anaconda, Jupyter Notebook, the Pandas library specifically DataFrames, and NumPy to perform analysis on school data for fifteen different high schools across four different grades by merging the data into several Pandas DataFrames. Using the Pandas loc method, Pandas groupby function and bins, the avera…

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages