# Pulling data from the programme ratings endpoint using pybarb

In this demo we will show you how to pull data from the programme ratings endpoint and then manipulate it using the pybarb library. 

We illustrate this using the following use case: 
The BBC would like to see how its regular daily news slots have performed over the last couple of years. In particular they would like to pick out any important trends and events over a timeseries of audience figures. 

Note the full API documentation can be found [here](https://barb-api.co.uk/api-docs). 

It might also be useful to consult the [Getting Started](https://barb-api.co.uk/api-docs#section/Getting-started) section for information about authentication and basic API usage.


## Querying the API with pybarb

First we connect to the API using the `pybarb` package as described in "Connecting to the Barb API using Python". 

In [1]:
%load_ext autoreload
%autoreload 2
import sys
sys.path.append("/Users/simon_business/Documents/code_repos/barb_api")

import json
import pybarb as pb

# Set the working directory
working_directory = '/Users/simon_business/Documents/disposable/clients/BARB/'

# Get the access token
with open(working_directory + "creds.json") as file:
    creds = json.load(file)

# Create a BarbAPI object and connect
barb_api = pb.BarbAPI(creds)
barb_api.connect()


## Get data from the API

To pull the right data from the API we need to know the correct station name and panel name. 

### Getting the station name

The `list_stations` method can be used to search all valid station names that contain 'BBC'

In [4]:
barb_api.list_stations("bbc")

['BBC1',
 'BBC1 Network',
 'BBC2',
 'BBC2 Network',
 'BBC Scotland',
 'BBC HD',
 'BBC Three',
 'CBBC',
 'BBC4',
 'BBC Parliament',
 'BBC Knowledge',
 'BBC Choice England',
 'BBC News',
 'BBC RB HD',
 'BBC RB 2',
 'BBC RB 3',
 'BBC Winter Olympics Red Button',
 'BBC RB 0',
 'BBC RB 4',
 'BBC RB 5',
 'BBC RB 603',
 'BBC RB 7',
 'BBC RB 8',
 'BBC RB 602',
 'BBC FREEVIEW 301 HD',
 'BBC Olympics 1',
 'BBC Olympics 2',
 'BBC Olympics 3',
 'BBC Olympics 4',
 'BBC Olympics 5',
 'BBC Olympics 6',
 'BBC Olympics 7',
 'BBC Olympics 8',
 'BBC Olympics 9',
 'BBC Olympics 10',
 'BBC Olympics 11',
 'BBC Olympics 12',
 'BBC Olympics 13',
 'BBC Olympics 14',
 'BBC Olympics 15',
 'BBC Olympics 16',
 'BBC Olympics 17',
 'BBC Olympics 18',
 'BBC Olympics 19',
 'BBC Olympics 20',
 'BBC Olympics 21',
 'BBC Olympics 22',
 'BBC Olympics 23',
 'BBC Olympics 24',
 'BBC RB 6',
 'BBC RB 6781',
 'BBC RB 6785',
 'BBC RB 6786',
 'BBC RB 6787',
 'BBC RB 6788',
 'BBC RB 6789',
 'BBC RB 6790',
 'BBC RB 601',
 'BBC RB 1

### Getting the panel name

Similarly, the `list_panels` method can be used to search all valid station names that contain 'BBC'

In [None]:
barb_api.list_panels("bbc")

['BBC Network',
 'BBC East Region',
 'BBC West Region',
 'BBC South West Region',
 'BBC South Region',
 'BBC Yorkshire & Lincolnshire',
 'BBC North East & Cumbria',
 'BBC North West Region',
 'BBC Scotland Region',
 'BBC Ulster Region',
 'BBC Wales Region',
 'BBC Midlands West',
 'BBC Midlands East',
 'BBC London',
 'BBC South East']

### Querying the programme ratings endpoint

Now we know all the relevant metadata we can query the programme ratings endpoint. This can be done very simply using pybarb's `programme_ratings` method. 

In [2]:
programme_data = barb_api.programme_ratings(min_transmission_date = "2022-01-01",
                           max_transmission_date = "2022-12-31", 
                           station =  "BBC1", 
                           panel="BBC Network")

## Accessing the data

The raw data is stored in the `api_response_data` attribute of the resulting object (in this case the object named `programme_data`)

In [3]:
programme_data.api_response_data

{'endpoint': 'programme_ratings',
 'events': [{'panel': {'panel_code': 50,
    'panel_region': 'BBC Network',
    'is_macro_region': False},
   'station': {'station_code': 10, 'station_name': 'BBC1'},
   'transmission_log_programme_name': 'Holby City',
   'programme_type': 'programme',
   'programme_start_datetime': {'barb_reporting_datetime': '2022-01-25 20:20:54',
    'barb_polling_datetime': '2022-01-25 20:20:54',
    'standard_datetime': '2022-01-25 20:20:54'},
   'programme_duration': 40,
   'spans_normal_day': False,
   'sponsor': {'sponsor_code': None, 'bumpers': 'not_sponsored'},
   'broadcaster_transmission_code': '25276430244812',
   'live_status': 'unknown',
   'uk_premier': False,
   'broadcaster_premier': False,
   'repeat': None,
   'programme_content': {'content_name': 'Holby City: Series 23, Episode 41',
    'barb_content_id': 6163556,
    'broadcaster_content_id': 'm0013vwt',
    'metabroadcast_information': {'metabroadcast_content_id': 'pmsthg'},
    'episode': {'epis

However it is easier to access it as a dataframe. To do this, we can use the `to_dataframe()` method, which flattens the nested json structure.

In [4]:
programme_df = programme_data.to_dataframe()
programme_df


Unnamed: 0,panel_region,station_name,programme_name,programme_type,programme_start_datetime,programme_duration_minutes,spans_normal_day,uk_premiere,broadcaster_premiere,programme_repeat,episode_number,episode_name,genre,audience_size_hundreds,date_of_transmission,audience_name,audience_target_size_hundreds
0,BBC Network,BBC1,"Holby City: Series 23, Episode 41",programme,2022-01-25 20:20:54,40,False,False,False,,,,,1930,2022-01-25,All Homes,270570
1,BBC Network,BBC1,"Holby City: Series 23, Episode 41",programme,2022-01-25 20:20:54,40,False,False,False,,,,,2486,2022-01-25,All Adults,512010
2,BBC Network,BBC1,"Holby City: Series 23, Episode 41",programme,2022-01-25 20:20:54,40,False,False,False,,,,,878,2022-01-25,All Men,249750
3,BBC Network,BBC1,"Holby City: Series 23, Episode 41",programme,2022-01-25 20:20:54,40,False,False,False,,,,,1754,2022-01-25,All Houseperson,270570
4,BBC Network,BBC1,"Holby City: Series 23, Episode 41",programme,2022-01-25 20:20:54,40,False,False,False,,,,,0,2022-01-25,All Children aged 4-15,95390
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1507315,BBC Network,BBC1,FIFA World Cup 2022: Series 2022,programme,2022-11-29 18:00:46,208,False,True,True,False,,,,2133,2022-11-29,Boys 10-12,12690
1507316,BBC Network,BBC1,FIFA World Cup 2022: Series 2022,programme,2022-11-29 18:00:46,208,False,True,True,False,,,,15341,2022-11-29,"Adults, Lightest Third",171470
1507317,BBC Network,BBC1,FIFA World Cup 2022: Series 2022,programme,2022-11-29 18:00:46,208,False,True,True,False,,,,3728,2022-11-29,"Adults, Lightest Sixth",85740
1507318,BBC Network,BBC1,FIFA World Cup 2022: Series 2022,programme,2022-11-29 18:00:46,208,False,True,True,False,,,,9335,2022-11-29,"ABC1 Adults, Lightest Third",97040


## Manipulating the data

We can also get a pivot of the data which turns the audiences into columns.

In [5]:
programme_data.audience_pivot()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,audience_name,"ABC1 Adults, Lightest Third",Adults 16-24,Adults 16-34,"Adults 16-34, Lightest Third",Adults 18-20,Adults 21-24,Adults 35-44,Adults 45-49,Adults 45-54,Adults 55-64,...,Men AB,Men AB working full-time,Men ABC1,Men ABC1 16-24,Men ABC1 16-34,Men ABC1 16-44,Men ABC1 35-54,Men ABC1 working full-time,Men C2,Men working full-time
panel_region,station_name,date_of_transmission,programme_name,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1
BBC Network,BBC1,2022-01-01,Archbishop of Canterbury's New Year Message: Series 2022,298.0,169.0,474.0,63.0,40.0,58.0,667.0,644.0,1181.0,2376.0,...,1038.0,198.0,2210.0,20.0,20.0,398.0,643.0,966.0,643.0,1599.0
BBC Network,BBC1,2022-01-01,Attenborough and the Mammoth Graveyard: Series 2021,208.0,114.0,521.0,41.0,11.0,75.0,1494.0,401.0,1079.0,2020.0,...,1272.0,557.0,2709.0,24.0,35.0,825.0,1162.0,1324.0,918.0,1985.0
BBC Network,BBC1,2022-01-01,BBC London,224.0,211.0,391.0,24.0,8.0,62.0,443.0,197.0,807.0,1131.0,...,700.0,222.0,1478.0,30.0,50.0,69.0,251.0,580.0,138.0,1253.0
BBC Network,BBC1,2022-01-01,"BBC Newsline: Series 2022, Episode 40",19.0,0.0,42.0,8.0,0.0,0.0,47.0,98.0,188.0,389.0,...,146.0,16.0,212.0,0.0,0.0,32.0,32.0,35.0,101.0,59.0
BBC Network,BBC1,2022-01-01,"BBC Wales Today: Series 2022, Episode 40",0.0,0.0,0.0,0.0,0.0,0.0,83.0,11.0,11.0,448.0,...,288.0,47.0,347.0,0.0,0.0,0.0,11.0,106.0,201.0,106.0
BBC Network,BBC1,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
BBC Network,BBC1,2023-01-01,Eurovision Song Contest: Series 2023,994.0,1210.0,2516.0,336.0,346.0,665.0,2204.0,509.0,1577.0,2018.0,...,1350.0,974.0,2833.0,175.0,499.0,1266.0,1223.0,2116.0,542.0,2840.0
BBC Network,BBC1,2023-01-01,Joins BBC News: Series 2022,29.0,142.0,313.0,19.0,86.0,27.0,209.0,82.0,158.0,455.0,...,87.0,52.0,169.0,17.0,77.0,78.0,10.0,118.0,147.0,248.0
BBC Network,BBC1,2023-01-01,Sam Ryder Rocks New Year's Eve: Series 2022,7068.0,6483.0,14379.0,2744.0,2214.0,3223.0,10508.0,5040.0,10551.0,9312.0,...,6867.0,4852.0,14763.0,1533.0,3100.0,5875.0,6161.0,9968.0,5920.0,16138.0
BBC Network,BBC1,2023-01-01,The Graham Norton Show: Series 30,67.0,313.0,394.0,23.0,9.0,303.0,555.0,518.0,852.0,661.0,...,210.0,153.0,705.0,2.0,21.0,130.0,387.0,395.0,295.0,487.0


## Filtering for the news programmes

We can search the programme_names to get the ones we are looking for.

In [21]:
programme_df['programme_name'] = programme_df['programme_name'].str.split(':', expand=True)[0]
programme_df['programme_name'][programme_df['programme_name'].str.contains('News')].unique()

array(['Joins BBC News', 'BBC News at Six', 'BBC Newsline',
       'BBC News at Ten', 'BBC News at One', 'BBC Weekend News',
       'Newscast', 'Have I Got News for You',
       'Have I Got a Bit More News for You', 'BBC News',
       'BBC News Special', 'Breaking the News'], dtype=object)

Now we filter for just the regular news programmes.

In [25]:
bbc_news = programme_df[programme_df['programme_name'].isin(['BBC News at Six',
       'BBC News at Ten', 'BBC News at One', 'BBC Weekend News'])]
bbc_news_all_homes = bbc_news[bbc_news['audience_name']=="All Homes"].sort_values(["programme_name", "programme_start_datetime"])


## Plotting the data

In [26]:
import plotly.express as px
px.line(bbc_news_all_homes, x="programme_start_datetime", y="audience_size_hundreds", color="programme_name", width = 1300, height=500)