This code amalgamates the summary information from the 'mylocalschool.wales' website into one large dictionary for easy access. 

In [1]:
import requests
from bs4 import BeautifulSoup
import numpy as np

This is a list of the schools in the local area that we want to focus on.

In [2]:
local_schools = ['Severn Primary','St. Mary\'s R.C. Primary School','St.Paul\'s C/W Primary School','Grangetown Primary School','St Patrick\'s R C School','Ysgol Gynradd Gymraeg Hamadryad','Ninian Park Primary School','Kitchener Primary School','Radnor Primary School','Lansdowne Primary School','Ysgol Treganna','Ysgol Gymraeg Pwll Coch']

This function makes a dictionary which associates each school name to the URL of its summary page.

In [3]:
def make_school_URL_dict_from_list(schools):
    toppage = requests.get("http://mylocalschool.wales.gov.uk/Schools/SchoolSearch?lang=en")
    soup = BeautifulSoup(toppage.content, 'html.parser')
    school_links = soup.select("a[href*=/School]")[2:]
    schools = [list(school.children)[0] for school in school_links]
    school_URLs = ["http://mylocalschool.wales.gov.uk" + school['href'] for school in school_links]
    schoolURL_dict = {}
    for s,school in enumerate(schools):
        schoolURL_dict[school] = school_URLs[s]
    return schoolURL_dict

In [4]:
schoolURL_dict = make_school_URL_dict_from_list(local_schools)

This next function takes the URL of a schools summary page and makes a dictionary listing the stats for that school.

In [5]:
def tofloat(string):
    try:
        output = float(string)
    except ValueError:
        output = string
    return output
                
def make_stat_dict(schoolURL):
    school_pages = requests.get(schoolURL)
    school_soup = BeautifulSoup(school_pages.content, 'html.parser')
    summary = school_soup.find_all('div',id=False, class_="summaryBox")
    stat_dict = {}
    for i in range(len(summary)):
        stat = ''.join(char for char in list(summary[i].children)[1].getText() if char.isalnum() or char=='.')
        stat = tofloat(stat)
        stat_name = (''.join(char for char in list(summary[i].children)[3].getText() if char.isalnum() or char==' ')).strip()
        if '%' in list(summary[i].children)[1].getText(): stat_name += "(%)"
        stat_dict[stat_name] = stat
    return stat_dict

This dictionary links each school to the summary dictionary for that school.

In [6]:
local_school_dict = {}
for school in local_schools:
    local_school_dict[school] = make_stat_dict(schoolURL_dict[school])

Now we use pandas to turn this dictionary into a data frame.

In [7]:
import pandas as pd

In [8]:
local_school_df = pd.DataFrame.from_dict(local_school_dict, orient='index')

In [9]:
local_school_df

Unnamed: 0,Number of Pupils 2017,Free school meals FSM 3 year average Primary only(%),Pupil Teacher Ratio PTR Primary only,Attendance during the year Primary only(%),School budget per pupil,Pupils who have reached the expected level Core subject indicator Key Stage 2(%),Support Category,Pupils achieving the expected outcome in the Foundation Phase areas of learning(%),Average number of minutes per week allocated for curricular PE Primary only,of pupils in the school who enjoy PE lessons a lot(%),of pupils in the school who are hooked on sport(%)
Grangetown Primary School,404.0,27.2,19.6,94.2,3623.0,80.0,Yellow,76.3,,,
Kitchener Primary School,486.0,29.9,20.5,94.5,3576.0,82.1,Green,80.8,105.0,74.0,26.0
Lansdowne Primary School,486.0,29.6,21.9,93.1,3442.0,85.1,Green,92.0,90.0,,
Ninian Park Primary School,557.0,23.7,24.3,94.1,3569.0,92.5,Green,84.0,,61.0,32.0
Radnor Primary School,323.0,19.1,18.9,95.6,3862.0,100.0,Green,92.5,120.0,,
Severn Primary,482.0,24.0,20.9,93.9,3947.0,87.5,Green,85.5,90.0,,
St Patrick's R C School,298.0,23.2,24.4,94.0,3548.0,91.7,Yellow,95.3,,,
St. Mary's R.C. Primary School,261.0,11.1,21.5,95.7,3388.0,92.3,Yellow,100.0,,,
St.Paul's C/W Primary School,209.0,26.5,23.1,94.7,3774.0,92.9,Yellow,89.7,120.0,74.0,50.0
Ysgol Gymraeg Pwll Coch,506.0,9.5,22.6,95.8,3213.0,98.3,Yellow,95.2,,,


Here we show a summary of the data.

In [10]:
local_school_df.describe()

Unnamed: 0,Number of Pupils 2017,Free school meals FSM 3 year average Primary only(%),Pupil Teacher Ratio PTR Primary only,Attendance during the year Primary only(%),School budget per pupil,Pupils who have reached the expected level Core subject indicator Key Stage 2(%),Pupils achieving the expected outcome in the Foundation Phase areas of learning(%),Average number of minutes per week allocated for curricular PE Primary only,of pupils in the school who enjoy PE lessons a lot(%),of pupils in the school who are hooked on sport(%)
count,12.0,11.0,12.0,11.0,12.0,11.0,11.0,5.0,3.0,3.0
mean,387.666667,20.754545,21.6,94.781818,4897.166667,90.354545,89.2,105.0,69.666667,36.0
std,171.661577,8.651285,2.355651,1.121444,4583.32868,6.20699,6.978682,15.0,7.505553,12.489996
min,17.0,4.5,17.0,93.1,3213.0,80.0,76.3,90.0,61.0,26.0
25%,288.75,15.1,20.275,94.05,3428.5,86.3,84.75,90.0,67.5,29.0
50%,443.0,23.7,21.7,94.5,3572.5,91.7,89.9,105.0,74.0,32.0
75%,491.0,26.85,23.4,95.65,3796.0,92.7,93.85,120.0,74.0,41.0
max,623.0,29.9,24.5,97.0,19436.0,100.0,100.0,120.0,74.0,50.0


In [11]:
local_school_df.keys()

Index(['Number of Pupils 2017',
       'Free school meals FSM  3 year average Primary only(%)',
       'Pupil Teacher Ratio PTR Primary only',
       'Attendance during the year Primary only(%)', 'School budget per pupil',
       'Pupils who have reached the expected level  Core subject indicator Key Stage 2(%)',
       'Support Category',
       'Pupils achieving the expected outcome in the Foundation Phase areas of learning(%)',
       'Average number of minutes per week allocated for curricular PE Primary only',
       'of pupils in the school who enjoy PE lessons a lot(%)',
       'of pupils in the school who are hooked on sport(%)'],
      dtype='object')

The total number of primary school students in the area is:

In [13]:
local_num_students = sum(local_school_df['Number of Pupils 2017'])
print(local_num_students)

4652.0


Here we use the number of primary school students and the percentage on free school meals to estimate the total number of children in the area who are on free school meals.

In [14]:
local_fsm_students = sum((pd.to_numeric(local_school_df['Free school meals FSM  3 year average Primary only(%)'])/100 \
                                                  * local_school_df['Number of Pupils 2017']).dropna())
print(local_fsm_students)

938.0369999999999


The percentage of primary school students in the area on free school meals is:

In [15]:
local_fsm_students/local_num_students * 100

20.164165950128975

We will now compare the local school statistics to the whole of Cardiff. This next function makes a list of every school in a given district (in this case we are interested in Cardiff), together with their URLs.

In [16]:
def make_district_school_URLdict(district): 
    toppage = requests.get("http://mylocalschool.wales.gov.uk/Schools/SchoolSearch?lang=en")
    soup = BeautifulSoup(toppage.content, 'html.parser')
    url_ref = [list(soup.find_all("option"))[-1]['value'] for i in range(len(list(soup.select('option')))) \
                                            if list(soup.find_all("option")[i])[0].strip() == district][0]
    school_url_string = "a[href*=/School/" + url_ref + "]"
    school_links = soup.select(school_url_string)
    schools = [list(school.children)[0] for school in school_links]
    school_URLs = ["http://mylocalschool.wales.gov.uk" + school['href'] for school in school_links]
    schoolURL_dict = {}
    for s,school in enumerate(schools):
        schoolURL_dict[school] = school_URLs[s]
    return schoolURL_dict


In [17]:
district_school_URLdict = make_district_school_URLdict('Cardiff')

This next function searches for schools of a certain type (e.g. primary) in a given district and returns a dictionary of those schools with their URL.

In [18]:
def make_district_school_URLdict_by_type(district,schools=False):
    school_URLs = make_district_school_URLdict(district)
    schools_URLdict = {}
    for school in school_URLs.keys():
        school_pages = requests.get(school_URLs[school])
        school_soup = BeautifulSoup(school_pages.content, 'html.parser')
        school_type = list(school_soup.select("div[class=schDetailsText]")[1].children)[0].strip()
        if 'Primary' in schools:
            if ('Infants' or 'Juniors') in school_type:
                schools_URLdict[school] = school_URLs[school]
        if 'Secondary' in schools:
            if 'Secondary' in school_type:
                schools_URLdict[school] = school_URLs[school]         
    return schools_URLdict

In [19]:
primary_school_URLs = make_district_school_URLdict_by_type('Cardiff',schools=['Primary'])

We then make a dictionary linking each primary school in Cardiff to its summary information.

In [20]:
cardiff_primary_school_dict = {}
for school in primary_school_URLs.keys():
    cardiff_primary_school_dict[school] = make_stat_dict(primary_school_URLs[school])

We then turn this dictionary into a dataframe.

In [21]:
cardiff_primary_school_df = pd.DataFrame.from_dict(cardiff_primary_school_dict, orient='index')

In [22]:
cardiff_primary_school_df

Unnamed: 0,Number of Pupils 2017,Free school meals FSM 3 year average Primary only(%),Pupil Teacher Ratio PTR Primary only,Attendance during the year Primary only(%),School budget per pupil,Pupils who have reached the expected level Core subject indicator Key Stage 2(%),Support Category,Pupils achieving the expected outcome in the Foundation Phase areas of learning(%),of pupils in the school who enjoy PE lessons a lot(%),of pupils in the school who are hooked on sport(%),Average number of minutes per week allocated for curricular PE Primary only
Adamsdown Primary,371.0,44.6,19.2,93.3,3757.0,78.3,Yellow,69.2,75.0,56.0,60.0
Albany Primary School,457.0,24.9,24.1,94.3,3422.0,77.8,Yellow,80.0,,,120.0
All Saints C/W Primary,194.0,19.2,22.9,94.6,3565.0,100.0,Yellow,93.3,,,
Allensbank Primary School,283.0,24.6,19.1,92.6,4541.0,76.2,Amber,86.2,,,
Baden Powell Primary School,417.0,35.2,20.8,93.8,3757.0,87.5,Amber,84.6,,,
Birchgrove Primary School,413.0,8.5,25.9,96.2,3192.0,98.2,Green,94.8,74.0,51.0,100.0
Bishop Childs C/W Primary,212.0,12.2,25.9,95.4,3490.0,96.8,Green,93.5,,,
Bryn Celyn Primary School,194.0,58.2,20.8,95.0,4357.0,85.7,Yellow,87.5,,,
Bryn Deri Primary,251.0,4.3,20.4,96.5,3935.0,100.0,Green,96.7,,,
Bryn Hafod Primary School,368.0,37.5,21.4,94.8,4270.0,95.5,Green,84.4,,,60.0


We can now look at the summary information for this data.

In [23]:
cardiff_primary_school_df.describe()

Unnamed: 0,Number of Pupils 2017,Free school meals FSM 3 year average Primary only(%),Pupil Teacher Ratio PTR Primary only,Attendance during the year Primary only(%),School budget per pupil,Pupils who have reached the expected level Core subject indicator Key Stage 2(%),Pupils achieving the expected outcome in the Foundation Phase areas of learning(%),of pupils in the school who enjoy PE lessons a lot(%),of pupils in the school who are hooked on sport(%),Average number of minutes per week allocated for curricular PE Primary only
count,98.0,97.0,98.0,97.0,98.0,95.0,95.0,26.0,26.0,51.0
mean,341.520408,23.885567,21.522449,94.945361,4070.428571,89.951579,88.174737,75.038462,47.384615,96.137255
std,144.794716,15.547923,2.94151,1.175622,1673.617133,7.333439,9.180587,8.224261,12.592305,30.408564
min,17.0,1.9,14.2,92.1,3095.0,69.2,44.4,59.0,26.0,2.0
25%,229.25,9.9,19.925,94.1,3491.75,85.4,83.65,71.0,36.25,90.0
50%,326.0,23.7,21.4,94.9,3812.0,90.5,90.5,74.0,49.5,105.0
75%,457.0,34.3,23.325,95.8,4200.75,96.4,94.85,81.0,56.0,120.0
max,708.0,60.3,28.6,97.1,19436.0,100.0,100.0,93.0,68.0,120.0


Calculate the number of primary school students in Cardiff.

In [25]:
cardiff_num_students = sum(cardiff_primary_school_df['Number of Pupils 2017'])
print(cardiff_num_students)

33469.0


Calculate the number of primary school students in Cardiff on FSM.

In [27]:
cardiff_fsm_students = sum((pd.to_numeric(cardiff_primary_school_df['Free school meals FSM  3 year average Primary only(%)'])/100 \
                                                  * cardiff_primary_school_df['Number of Pupils 2017']).dropna())
print(cardiff_fsm_students)

7513.486999999998


Calculate the percentage of primary school students in Cardiff who are on FSM.

In [28]:
cardiff_fsm_students/cardiff_num_students *100

22.44909319071379

From this we can see that the local schools actually have fewer primary school children on FSM. 