## Find overall counts of responses in each school

We have previously found counts who have a score (with counts excluding pupils NaN for a given score), or counts of responses to particular questions.

This notebooks aims to find the overall count of pupils in the dataset for a given school and group, regardless of whether they answered a particular question.

## Set-up

### Packages and file paths

In [1]:
# Import required packages
from dataclasses import dataclass
import numpy as np
import os
import pandas as pd

In [2]:
# File paths
@dataclass(frozen=True)
class Paths:
    '''Stores paths to data and files'''
    survey = '../data/survey_data'
    synthetic_data = 'synthetic_data_raw.csv'
    overall_counts = 'overall_counts.csv'


paths = Paths()

In [3]:
# Import functions defined elsewhere
import sys
sys.path.append('../')
from create_and_process_data.functions import results_by_school_and_group

### Import raw data

In [4]:
data = pd.read_csv(os.path.join(paths.survey, paths.synthetic_data))
data.head()

Unnamed: 0,gender,transgender,sexual_orientation,neurodivergent,birth_parent1,birth_parent2,birth_you,birth_you_age,autonomy_pressure,autonomy_express,...,peer_talk_listen_lab,peer_talk_helpful_lab,peer_talk_if_lab,accept_peer_lab,year_group_lab,fsm_lab,sen_lab,ethnicity_lab,english_additional_lab,school_lab
0,4.0,2.0,6.0,3.0,2.0,1.0,2.0,1.0,2.0,5.0,...,,Somewhat helpful,,Not at all,Year 10,Non-FSM,,Ethnic minority,No,School E
1,1.0,2.0,1.0,3.0,3.0,,3.0,8.0,4.0,2.0,...,,,,Slightly,Year 10,,Non-SEN,Ethnic minority,No,School D
2,2.0,3.0,4.0,1.0,1.0,1.0,1.0,1.0,5.0,4.0,...,,,Very comfortable,Not at all,Year 10,,Non-SEN,White British,No,School E
3,2.0,,5.0,2.0,2.0,2.0,1.0,3.0,1.0,2.0,...,,,Uncomfortable,Mostly,Year 10,Non-FSM,Non-SEN,White British,No,School G
4,5.0,3.0,4.0,1.0,1.0,3.0,3.0,2.0,5.0,2.0,...,Slightly,Somewhat helpful,,Not at all,Year 8,Non-FSM,Non-SEN,White British,Yes,School B


### Create dataframe

In [5]:
# Make new version of aggregate that just finds overall counts
def aggregate_counts(df):
    '''
    Aggregates the provided dataframe by finding the total people in it.

    Parameters
    ----------
    df : Dataframe
        Dataframe with row for each pupil and columns that include the school
        and groups needed by results_by_school_and_group()

    Returns
    -------
    res : Dataframe
        Dataframe with the count of pupils in each school and group
    '''
    res = pd.DataFrame({
        'count': [len(df.index)]
    })
    return(res)

In [6]:
# Make version for when there are no pupils
no_pupils = aggregate_counts(data)
no_pupils['count'] = 0
no_pupils

Unnamed: 0,count
0,0


In [7]:
# Find counts by school and pupil group
size = results_by_school_and_group(
    data=data, agg_func=aggregate_counts, no_pupils=no_pupils)

In [8]:
# Hide counts where n<10
size.loc[size['count'] < 10, 'count'] = np.nan

In [9]:
# Preview result (sorted so can see that it matches upwith previous calculation)
size.sort_values(by=['sen_lab', 'year_group_lab', 'gender_lab', 
                     'fsm_lab', 'school_lab'])

Unnamed: 0,count,school_lab,year_group_lab,gender_lab,fsm_lab,sen_lab
0,128.0,School A,All,All,All,All
0,132.0,School B,All,All,All,All
0,111.0,School C,All,All,All,All
0,107.0,School D,All,All,All,All
0,106.0,School E,All,All,All,All
...,...,...,...,...,...,...
0,48.0,School C,All,All,All,SEN
0,50.0,School D,All,All,All,SEN
0,54.0,School E,All,All,All,SEN
0,53.0,School F,All,All,All,SEN


In [10]:
# Preview example of one of the schools
size[size['school_lab'] == 'School A']

Unnamed: 0,count,school_lab,year_group_lab,gender_lab,fsm_lab,sen_lab
0,128.0,School A,All,All,All,All
0,59.0,School A,Year 8,All,All,All
0,55.0,School A,Year 10,All,All,All
0,14.0,School A,All,Girl,All,All
0,28.0,School A,All,Boy,All,All
0,62.0,School A,All,All,FSM,All
0,57.0,School A,All,All,Non-FSM,All
0,67.0,School A,All,All,All,SEN
0,57.0,School A,All,All,All,Non-SEN


## Save results

In [11]:
size.to_csv(os.path.join(paths.survey, paths.overall_counts), index=False)