# school_districts
Using the BC OpenData dataset [Student Headcount by Grade 2017/18 to 2020/21](https://catalogue.data.gov.bc.ca/dataset/bc-schools-student-headcount-by-grade/resource/bbf8b17c-a2ed-4f2c-b601-dced4ff1d0ac) generate a list of schools per school district in BC in the 2020/2021 school year.

To re-run download the .csv from the linked URL and place it in `data/`.

To run this notebook you need jupyter notebooks & pandas. The easiest way to set this up is probably to install [Anaconda](https://www.anaconda.com/distribution/) in parallel with your regular python installations.

If you have a recent pip then you can try something like:


<code>$ pip install --user jupyterlab
$ pip install --user pandas</code>


In [8]:
import csv
import pandas as pd
import numpy as np

In [2]:
# load data, filter it to just select 2020/2021 public school data
# and drop unneeded columns

dat = pd.read_csv('data/student_headcount_by_grade_2017_18-to-20120_21.csv')

# get rid of extraneous columns and rows
dat = dat.drop(columns=['INDIGENOUS_STUDENTS', 'NON_INDIGENOUS_STUDENTS', 'ELL_STUDENTS',
                        'NON_ELL_STUDENTS', 'FRENCH_IMMERSION_STUDENTS', 'NON_FRENCH_IMMERSION_STUDENTS',
                        'SPECIAL_NEEDS_STUDENTS', 'NON_SPECIAL_NEEDS_STUDENTS', 'RESIDENT_STUDENTS',
                        'NON_RESIDENT_STUDENTS', 'ADULT_STUDENTS'])


dat = dat.loc[(dat['SCHOOL_YEAR'] == '2020/2021') & (dat['DATA_LEVEL'] == 'SCHOOL LEVEL')]

# restrict to just public schools for now as independent schools do not have a district number in this data set
dat = dat.loc[(dat['PUBLIC_OR_INDEPENDENT'] == 'BC PUBLIC SCHOOL')]


# more extraneous columns
dat = dat.drop(columns=['SCHOOL_YEAR', 'PUBLIC_OR_INDEPENDENT', 'DATA_LEVEL', 'FACILITY_TYPE',
                        'GRADE', 'TOTAL_STUDENTS'])

dat = dat.drop_duplicates()

dat = dat.astype({'DISTRICT_NUMBER': int})
dat.dtypes

  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


DISTRICT_NUMBER     int64
DISTRICT_NAME      object
SCHOOL_NUMBER      object
SCHOOL_NAME        object
dtype: object

In [3]:
# all public schools within each district
dat

Unnamed: 0,DISTRICT_NUMBER,DISTRICT_NAME,SCHOOL_NUMBER,SCHOOL_NAME
72403,5,Southeast Kootenay,501007,Jaffray Elem-Jr Secondary
72418,5,Southeast Kootenay,501009,Isabella Dicken Elementary
72428,5,Southeast Kootenay,501010,Frank J Mitchell Elementary
72438,5,Southeast Kootenay,501017,Rocky Mountain Elementary
72448,5,Southeast Kootenay,502001,Mount Baker Secondary
...,...,...,...,...
87370,93,Conseil scolaire francophone,9393007,des Glaciers
87381,93,Conseil scolaire francophone,9393008,Sophie-Morigeau
87392,93,Conseil scolaire francophone,9393012,La Confluence
87401,93,Conseil scolaire francophone,9393013,La Grande-ourse


In [4]:
# summary count of schools by district
summary = dat.groupby(['DISTRICT_NUMBER', 'DISTRICT_NAME']).count()
summary = summary.drop(columns=['SCHOOL_NUMBER'])
summary = summary.rename(columns= {'SCHOOL_NAME': 'COUNT_SCHOOLS'})
summary

Unnamed: 0_level_0,Unnamed: 1_level_0,COUNT_SCHOOLS
DISTRICT_NUMBER,DISTRICT_NAME,Unnamed: 2_level_1
5,Southeast Kootenay,20
6,Rocky Mountain,18
8,Kootenay Lake,27
10,Arrow Lakes,6
19,Revelstoke,4
20,Kootenay-Columbia,11
22,Vernon,24
23,Central Okanagan,47
27,Cariboo-Chilcotin,25
28,Quesnel,16


In [10]:
# write out data as csv
dat.to_csv('data/districts_schools_names_data.csv', index=False, quoting=csv.QUOTE_NONNUMERIC)
summary.to_csv('data/districts_school_counts.csv', quoting=csv.QUOTE_NONNUMERIC)