# Scraping an undocumented API: a BBC election results map

The BBC's election results page at https://www.bbc.co.uk/news/election/2023/england/results presents election results on a map. This notebook explains how to fetch that data and export it as a CSV.

It also explains how to navigate JSON files, flatten branches at different levels, common errors you are likely to come across, and how to tackle those.


## Using the Inspector

As is often the case with maps, the data is fetched from an external file, and you can find that data file by [looking in the Inspector's Network tab](https://developer.chrome.com/docs/devtools/network).

Open the Inspector by making sure you are in Chrome or Firefox, right-clicking anywhere on the page and selecting **Inspect**. The first tab in the inspector is 'Elements' - switch to the 'Network' tab instead.

The Network tab shows all the files that a webpage loads - but it has to be open to be 'recording', so once you've opened the Inspector and gone to the Network tab, make sure you refresh the page so that it can record all the files.

By default the Network tab shows 'All' files, but you can also click on **XHR** (or **Fetch/XHR** in Chrome) to only show likely data requests. Sort by size to bring the largest to the top.

In that view there are a few files that might contain useful data: `councils` and `results` and `council_control_history` are all likely.

You can right-click on these and open them in a new tab to see what they contain. They are all JSON files so use Firefox to automatically 'prettify' them (make them easier to read) or install the Chrome extension [JSONView](https://jsonview.com/) to prettify JSON in that browser.

We are going to import the `councils` file, which the Inspector tells us is at https://static.files.bbci.co.uk/elections/archive/news/election/2023/england/councils

## Importing the `councils` JSON

First, we need to store the URL of the JSON we want to import.

In [4]:
#store the URL of the JSON file
jsonurl = "https://static.files.bbci.co.uk/elections/archive/news/election/2023/england/councils"

## Import libraries

We are going to try to import this using the `read_json()` function from the `pandas` library, so we need to import the `pandas` library first.

In [21]:
#import the pandas library so we can use its read_json() function
import pandas as pd

## Try to read the JSON into a `pandas` data frame

Now we try to read the JSON file at the URL we've stored in the variable `jsonurl` - but we get an error.

In [None]:
#read the JSON from the URL in 'jsonurl'
pd.read_json(jsonurl)

ValueError: All arrays must be of the same length

### Why do we get the error 'All arrays must be of the same length'?

The error says this:

`ValueError: All arrays must be of the same length`

This is because the JSON file doesn't have a simple enough structure for `read_json()` to easily convert it to a data frame.

Instead we are going to need another approach: the `requests` library.


## Using `requests` to fetch JSON

The `requests` library's `get()` function is only concerned with fetching the file, not converting it to a dataframe.

First, we need to import the library.

In [2]:
#import the requests library
import requests

When we use the `get()` function without storing the results we get a 'response' with a number: this is a status code that indicates whether the request was successful or not (you can [find a list here](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status). The number 200 indicates success.

In [5]:
#use the get() function to fetch the file at the URL we stored
requests.get(jsonurl)

<Response [200]>

### Working with the requests 'object'

We need to store the results of the `get()` function in a variable. Because it was created by a `requests` function, we call it a 'requests object'. An object created with `requests` has special properties that we can access.

We call that variable 'response':

In [5]:
#use the get() function to fetch the file at the URL we stored
#store in a variable called 'response'
response = requests.get(jsonurl)
#show the variable - it'll be 200 again
print(response)

<Response [200]>


Requests [has a specific method for working with JSON files](https://www.geeksforgeeks.org/response-json-python-requests/): `.json()`

By adding `.json()` to the end of the variable the contents will be parsed as JSON.

In [7]:
#use the .json() method to extract the contents as JSON
#store in a variable called jsonresponse
jsonresponse = response.json()
#show that variable
print(jsonresponse)

{'metadata': {'title': 'Councils in local elections 2023 in England - A-Z', 'areaId': '', 'pageType': 'a_to_z', 'languageCode': 'en', 'description': 'Get the latest news and election results in the 2023 election from BBC News', 'area': {'name': '', 'kind': '', 'parentId': ''}, 'uri': '/news/election/2023/england/councils', 'parent': {'name': '', 'uri': ''}}, 'language': 'en', 'heading': 'England Councils A-Z', 'campaignMode': False, 'bannerUri': '/news/election/2023/england/banner', 'logoUrl': 'https://static.files.bbci.co.uk/elections/images/england2023locals/election-logo-en.svg', 'links': None, 'key': [{'id': 'letter-a', 'name': 'A', 'enabled': True}, {'id': 'letter-b', 'name': 'B', 'enabled': True}, {'id': 'letter-c', 'name': 'C', 'enabled': True}, {'id': 'letter-d', 'name': 'D', 'enabled': True}, {'id': 'letter-e', 'name': 'E', 'enabled': True}, {'id': 'letter-f', 'name': 'F', 'enabled': True}, {'id': 'letter-g', 'name': 'G', 'enabled': True}, {'id': 'letter-h', 'name': 'H', 'enab

More specifically, it will be turned into a [**dictionary** variable](https://www.w3schools.com/python/python_dictionaries.asp), which works in the same way as JSON.

In [9]:
#use the type() function to check what type of variable this is
type(jsonresponse)

dict

## Navigating the JSON

Now that we have our JSON (or dictionary), we need to drill down to part of the JSON that we can store as a data frame.

A good place to start is by listing the **keys** (branches) at the top of the JSON. We can do this by adding `.keys()` to the end of any dictionary variable.

In [8]:
jsonresponse.keys()

dict_keys(['metadata', 'language', 'heading', 'campaignMode', 'bannerUri', 'logoUrl', 'links', 'key', 'groups', 'labels'])

It's useful to have [the URL that the data came from](https://static.files.bbci.co.uk/elections/archive/news/election/2023/england/councils) open in a browser (Firefox or Chrome with JSONView) so you can see the structure and where you want to go in it.

In this case we want to go into the `groups` branch.

To do that, we name the variable, and then the branch in square brackets like so:

In [11]:
#show the groups branch
print(jsonresponse['groups'])

[{'id': 'letter-a', 'heading': 'A', 'cards': [{'title': 'Amber Valley', 'href': '/news/election/2023/england/councils/E07000032', 'winnerFlash': {'displayMode': 'party', 'year': 2023, 'prevYear': 'previous', 'newColour': '#E91D0E', 'prevColour': '#0575C9', 'flash': 'LABOUR GAIN FROM CONSERVATIVE', 'longFlash': '2023 Labour gain from Conservative', 'textColour': '#FFFFFF', 'partyName': 'Labour', 'flashBold': 'LABOUR GAIN', 'flashRegular': ' FROM CONSERVATIVE', 'winnerPartyCode': 'LAB', 'prevWinnerPartyCode': 'CON'}, 'context': None}, {'title': 'Arun', 'href': '/news/election/2023/england/councils/E07000224', 'winnerFlash': {'displayMode': 'party', 'year': 2023, 'prevYear': 'previous', 'newColour': '#646464', 'prevColour': '#646464', 'flash': 'NO PARTY MAJORITY', 'longFlash': '2023 no overall control, no change from previous year', 'textColour': '#FFFFFF', 'partyName': 'no overall control', 'flashBold': 'NO PARTY MAJORITY', 'flashRegular': 'NO CHANGE', 'winnerPartyCode': 'NOC', 'prevWinn

At this point if we try to show the keys again, we get an error, because this branch doesn't have any keys - instead, it has a list.

In [14]:
#try to show the keys of the 'groups' branch
print(jsonresponse['groups'].keys())

AttributeError: 'list' object has no attribute 'keys'

Instead, then, we can use an index like `[0]` to access the first item in that list.

In [15]:
#show the first item in the groups branch
print(jsonresponse['groups'][0])

{'id': 'letter-a', 'heading': 'A', 'cards': [{'title': 'Amber Valley', 'href': '/news/election/2023/england/councils/E07000032', 'winnerFlash': {'displayMode': 'party', 'year': 2023, 'prevYear': 'previous', 'newColour': '#E91D0E', 'prevColour': '#0575C9', 'flash': 'LABOUR GAIN FROM CONSERVATIVE', 'longFlash': '2023 Labour gain from Conservative', 'textColour': '#FFFFFF', 'partyName': 'Labour', 'flashBold': 'LABOUR GAIN', 'flashRegular': ' FROM CONSERVATIVE', 'winnerPartyCode': 'LAB', 'prevWinnerPartyCode': 'CON'}, 'context': None}, {'title': 'Arun', 'href': '/news/election/2023/england/councils/E07000224', 'winnerFlash': {'displayMode': 'party', 'year': 2023, 'prevYear': 'previous', 'newColour': '#646464', 'prevColour': '#646464', 'flash': 'NO PARTY MAJORITY', 'longFlash': '2023 no overall control, no change from previous year', 'textColour': '#FFFFFF', 'partyName': 'no overall control', 'flashBold': 'NO PARTY MAJORITY', 'flashRegular': 'NO CHANGE', 'winnerPartyCode': 'NOC', 'prevWinne

And we can then try to access the keys of that item:

In [17]:
#show the keys of the first item in the groups branch
print(jsonresponse['groups'][0].keys())

dict_keys(['id', 'heading', 'cards', 'backToTop'])


This trial-and-error process of trying to access keys and/or a list item is a good way for navigating a JSON object until you get to the branch you want.

In [19]:
#show the keys of the first item in the groups branch
print(jsonresponse['groups'][0]['cards'])

[{'title': 'Amber Valley', 'href': '/news/election/2023/england/councils/E07000032', 'winnerFlash': {'displayMode': 'party', 'year': 2023, 'prevYear': 'previous', 'newColour': '#E91D0E', 'prevColour': '#0575C9', 'flash': 'LABOUR GAIN FROM CONSERVATIVE', 'longFlash': '2023 Labour gain from Conservative', 'textColour': '#FFFFFF', 'partyName': 'Labour', 'flashBold': 'LABOUR GAIN', 'flashRegular': ' FROM CONSERVATIVE', 'winnerPartyCode': 'LAB', 'prevWinnerPartyCode': 'CON'}, 'context': None}, {'title': 'Arun', 'href': '/news/election/2023/england/councils/E07000224', 'winnerFlash': {'displayMode': 'party', 'year': 2023, 'prevYear': 'previous', 'newColour': '#646464', 'prevColour': '#646464', 'flash': 'NO PARTY MAJORITY', 'longFlash': '2023 no overall control, no change from previous year', 'textColour': '#FFFFFF', 'partyName': 'no overall control', 'flashBold': 'NO PARTY MAJORITY', 'flashRegular': 'NO CHANGE', 'winnerPartyCode': 'NOC', 'prevWinnerPartyCode': 'NOC'}, 'context': None}, {'tit

## Converting to a data frame

Once we have found the branch we want, we can convert it to a `pandas` data frame.

The most obvious function for creating a data frame from a dictionary is `pd.DataFrame.from_dict()`

We can try that with the branch we've arrived at:

In [27]:
#convert the 'cards' branch of jsonresponse into a dataframe
pd.DataFrame.from_dict(jsonresponse['groups'][0]['cards'])

Unnamed: 0,title,href,winnerFlash,context
0,Amber Valley,/news/election/2023/england/councils/E07000032,"{'displayMode': 'party', 'year': 2023, 'prevYe...",
1,Arun,/news/election/2023/england/councils/E07000224,"{'displayMode': 'party', 'year': 2023, 'prevYe...",
2,Ashfield,/news/election/2023/england/councils/E07000170,"{'displayMode': 'party', 'year': 2023, 'prevYe...",
3,Ashford,/news/election/2023/england/councils/E07000105,"{'displayMode': 'party', 'year': 2023, 'prevYe...",


## Flattening the JSON

The problem is that there are further 'nested' branches, so that the `winnerFlash` column in the data frame above, for example, contains further JSON.

To deal with this, it's better to use the `json_normalize` function: this flattens sub-branches into separate columns.

When we use that, we can see we get a column called `winnerFlash.displayMode` (the `displayMode` branch of the `winnerFlash` branch) as well as `winnerFlash.year` (the `year` branch of the `winnerFlash` branch), and so on.

In [28]:
#create a data frame using json_normalize
pd.json_normalize(jsonresponse['groups'][0]['cards'])

Unnamed: 0,title,href,context,winnerFlash.displayMode,winnerFlash.year,winnerFlash.prevYear,winnerFlash.newColour,winnerFlash.prevColour,winnerFlash.flash,winnerFlash.longFlash,winnerFlash.textColour,winnerFlash.partyName,winnerFlash.flashBold,winnerFlash.flashRegular,winnerFlash.winnerPartyCode,winnerFlash.prevWinnerPartyCode
0,Amber Valley,/news/election/2023/england/councils/E07000032,,party,2023,previous,#E91D0E,#0575C9,LABOUR GAIN FROM CONSERVATIVE,2023 Labour gain from Conservative,#FFFFFF,Labour,LABOUR GAIN,FROM CONSERVATIVE,LAB,CON
1,Arun,/news/election/2023/england/councils/E07000224,,party,2023,previous,#646464,#646464,NO PARTY MAJORITY,"2023 no overall control, no change from previo...",#FFFFFF,no overall control,NO PARTY MAJORITY,NO CHANGE,NOC,NOC
2,Ashfield,/news/election/2023/england/councils/E07000170,,party,2023,previous,#BABABA,#BABABA,ASHFIELD INDEPENDENTS HOLD,"2023 Ashfield Independents hold, from previous...",#3F3F42,Ashfield Independents,ASHFIELD INDEPENDENTS HOLD,,ASH,ASH
3,Ashford,/news/election/2023/england/councils/E07000105,,party,2023,previous,#646464,#646464,NO PARTY MAJORITY,"2023 no overall control, no change from previo...",#FFFFFF,no overall control,NO PARTY MAJORITY,NO CHANGE,NOC,NOC


Once we're happy with that, we can store it in a data frame variable.

In [30]:
#create a data frame using json_normalize - save in cardsdf
cardsdf = pd.json_normalize(jsonresponse['groups'][0]['cards'])
#show variable
cardsdf

Unnamed: 0,title,href,context,winnerFlash.displayMode,winnerFlash.year,winnerFlash.prevYear,winnerFlash.newColour,winnerFlash.prevColour,winnerFlash.flash,winnerFlash.longFlash,winnerFlash.textColour,winnerFlash.partyName,winnerFlash.flashBold,winnerFlash.flashRegular,winnerFlash.winnerPartyCode,winnerFlash.prevWinnerPartyCode
0,Amber Valley,/news/election/2023/england/councils/E07000032,,party,2023,previous,#E91D0E,#0575C9,LABOUR GAIN FROM CONSERVATIVE,2023 Labour gain from Conservative,#FFFFFF,Labour,LABOUR GAIN,FROM CONSERVATIVE,LAB,CON
1,Arun,/news/election/2023/england/councils/E07000224,,party,2023,previous,#646464,#646464,NO PARTY MAJORITY,"2023 no overall control, no change from previo...",#FFFFFF,no overall control,NO PARTY MAJORITY,NO CHANGE,NOC,NOC
2,Ashfield,/news/election/2023/england/councils/E07000170,,party,2023,previous,#BABABA,#BABABA,ASHFIELD INDEPENDENTS HOLD,"2023 Ashfield Independents hold, from previous...",#3F3F42,Ashfield Independents,ASHFIELD INDEPENDENTS HOLD,,ASH,ASH
3,Ashford,/news/election/2023/england/councils/E07000105,,party,2023,previous,#646464,#646464,NO PARTY MAJORITY,"2023 no overall control, no change from previo...",#FFFFFF,no overall control,NO PARTY MAJORITY,NO CHANGE,NOC,NOC


## Export the results (one group)

We can now export the results as a CSV, before doing some further work.

Pandas has a `.to_csv()` method for converting a data frame to a CSV file, which appears in the Files area on the left of the Colab notebook.

We can download from there manually, but here we import another library -  which allows us to create an automatic download.

In [32]:
#import a library for downloading files
from google.colab import files

In [33]:
#export to CSV
cardsdf.to_csv('electioncards.csv')
#start download
files.download('electioncards.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

## Getting multiple branches

Of course so far we've only got the 'A' branch of the JSON data. To get other branches we need to use a `for` loop.

This loop goes through each branch (from A to Z - or at least Y, as there's no Z branch), flattens the JSON at that branch into a data frame, and adds it to a list.

By the end of the loop we have a list full of data frames, which we can convert into one big data frame.

In [41]:
#create an empty list to store the results of our loop
listofjson = []

#loop through the items in the 'groups' branch
for i in jsonresponse['groups']:
  #print the 'cards' sub-branch
  print(i['cards'])
  #flatten into a data frame, and store
  flattened_json = pd.json_normalize(i['cards'])
  #add that data frame to the list
  listofjson.append(flattened_json)

[{'title': 'Amber Valley', 'href': '/news/election/2023/england/councils/E07000032', 'winnerFlash': {'displayMode': 'party', 'year': 2023, 'prevYear': 'previous', 'newColour': '#E91D0E', 'prevColour': '#0575C9', 'flash': 'LABOUR GAIN FROM CONSERVATIVE', 'longFlash': '2023 Labour gain from Conservative', 'textColour': '#FFFFFF', 'partyName': 'Labour', 'flashBold': 'LABOUR GAIN', 'flashRegular': ' FROM CONSERVATIVE', 'winnerPartyCode': 'LAB', 'prevWinnerPartyCode': 'CON'}, 'context': None}, {'title': 'Arun', 'href': '/news/election/2023/england/councils/E07000224', 'winnerFlash': {'displayMode': 'party', 'year': 2023, 'prevYear': 'previous', 'newColour': '#646464', 'prevColour': '#646464', 'flash': 'NO PARTY MAJORITY', 'longFlash': '2023 no overall control, no change from previous year', 'textColour': '#FFFFFF', 'partyName': 'no overall control', 'flashBold': 'NO PARTY MAJORITY', 'flashRegular': 'NO CHANGE', 'winnerPartyCode': 'NOC', 'prevWinnerPartyCode': 'NOC'}, 'context': None}, {'tit

In [44]:
#how many items in that list now?
len(listofjson)

22

### Join multiple data frames into one

We can join a list of multiple data frames by using pandas's `concat()` function. It takes a list of data frames - which we already have.

In [47]:
#join a list of data frames and store in a variable called 'allthegroups'
allthegroups = pd.concat(listofjson)
#show the results
allthegroups

Unnamed: 0,title,href,context,winnerFlash.displayMode,winnerFlash.year,winnerFlash.prevYear,winnerFlash.newColour,winnerFlash.prevColour,winnerFlash.flash,winnerFlash.longFlash,winnerFlash.textColour,winnerFlash.partyName,winnerFlash.flashBold,winnerFlash.flashRegular,winnerFlash.winnerPartyCode,winnerFlash.prevWinnerPartyCode
0,Amber Valley,/news/election/2023/england/councils/E07000032,,party,2023,previous,#E91D0E,#0575C9,LABOUR GAIN FROM CONSERVATIVE,2023 Labour gain from Conservative,#FFFFFF,Labour,LABOUR GAIN,FROM CONSERVATIVE,LAB,CON
1,Arun,/news/election/2023/england/councils/E07000224,,party,2023,previous,#646464,#646464,NO PARTY MAJORITY,"2023 no overall control, no change from previo...",#FFFFFF,no overall control,NO PARTY MAJORITY,NO CHANGE,NOC,NOC
2,Ashfield,/news/election/2023/england/councils/E07000170,,party,2023,previous,#BABABA,#BABABA,ASHFIELD INDEPENDENTS HOLD,"2023 Ashfield Independents hold, from previous...",#3F3F42,Ashfield Independents,ASHFIELD INDEPENDENTS HOLD,,ASH,ASH
3,Ashford,/news/election/2023/england/councils/E07000105,,party,2023,previous,#646464,#646464,NO PARTY MAJORITY,"2023 no overall control, no change from previo...",#FFFFFF,no overall control,NO PARTY MAJORITY,NO CHANGE,NOC,NOC
0,Babergh,/news/election/2023/england/councils/E07000200,,party,2023,previous,#646464,#646464,NO PARTY MAJORITY,"2023 no overall control, no change from previo...",#FFFFFF,no overall control,NO PARTY MAJORITY,NO CHANGE,NOC,NOC
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
21,Worthing,/news/election/2023/england/councils/E07000229,,party,2023,previous,#E91D0E,#E91D0E,LABOUR HOLD,"2023 Labour hold, from previous year",#FFFFFF,Labour,LABOUR HOLD,,LAB,LAB
22,Wychavon,/news/election/2023/england/councils/E07000238,,party,2023,previous,#0575C9,#0575C9,CONSERVATIVE HOLD,2023 Conservative hold,#FFFFFF,Conservative,CONSERVATIVE HOLD,,CON,CON
23,Wyre,/news/election/2023/england/councils/E07000128,,party,2023,previous,#0575C9,#0575C9,CONSERVATIVE HOLD,"2023 Conservative hold, from previous year",#FFFFFF,Conservative,CONSERVATIVE HOLD,,CON,CON
24,Wyre Forest,/news/election/2023/england/councils/E07000239,,party,2023,previous,#0575C9,#646464,CONSERVATIVE GAIN FROM NO PARTY MAJORITY,2023 Conservative gain from no overall control...,#FFFFFF,Conservative,CONSERVATIVE GAIN,FROM NO PARTY MAJORITY,CON,NOC


## Export the results (all groups)

Now we can export the resulting data frame that contains all groups' data.

In [48]:
#convert data frame to CSV
allthegroups.to_csv('allthegroups.csv')
#download the file
files.download('allthegroups.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>