# Pulling data from the programme ratings endpoint using pybarb

In this demo we will show you how to pull data from the programme ratings endpoint and then manipulate it using the pybarb library. 

We illustrate this using the following use case: 
The BBC would like to see how its regular daily news slots have performed over the last couple of years. In particular they would like to pick out any important trends and events over a timeseries of audience figures. 

Note the full API documentation can be found [here](https://barb-api.co.uk/api-docs). 

It might also be useful to consult the [Getting Started](https://barb-api.co.uk/api-docs#section/Getting-started) section for information about authentication and basic API usage.


## Querying the API with pybarb

First we connect to the API using the `pybarb` package as described in "Connecting to the Barb API using Python". 

In [1]:
import json
import pybarb as pb

# Set the working directory
working_directory = '/Users/simon_business/Documents/disposable/clients/BARB/'

# Get the access token
with open(working_directory + "creds.json") as file:
    creds = json.load(file)

# Create a BarbAPI object and connect
barb_api = pb.BarbAPI(creds)
barb_api.connect()


## Get data from the API

To pull the right data from the API we need to know the correct station name and panel name. 

### Getting the station name

The `list_stations` method can be used to search all valid station names that contain 'BBC'

In [2]:
barb_api.list_stations("bbc")

['BBC1',
 'BBC1 Network',
 'BBC2',
 'BBC2 Network',
 'BBC Scotland',
 'BBC HD',
 'BBC Three',
 'CBBC',
 'BBC4',
 'BBC Parliament',
 'BBC Knowledge',
 'BBC Choice England',
 'BBC News',
 'BBC RB HD',
 'BBC RB 2',
 'BBC RB 3',
 'BBC Winter Olympics Red Button',
 'BBC RB 0',
 'BBC RB 4',
 'BBC RB 5',
 'BBC RB 603',
 'BBC RB 7',
 'BBC RB 8',
 'BBC RB 602',
 'BBC FREEVIEW 301 HD',
 'BBC Olympics 1',
 'BBC Olympics 2',
 'BBC Olympics 3',
 'BBC Olympics 4',
 'BBC Olympics 5',
 'BBC Olympics 6',
 'BBC Olympics 7',
 'BBC Olympics 8',
 'BBC Olympics 9',
 'BBC Olympics 10',
 'BBC Olympics 11',
 'BBC Olympics 12',
 'BBC Olympics 13',
 'BBC Olympics 14',
 'BBC Olympics 15',
 'BBC Olympics 16',
 'BBC Olympics 17',
 'BBC Olympics 18',
 'BBC Olympics 19',
 'BBC Olympics 20',
 'BBC Olympics 21',
 'BBC Olympics 22',
 'BBC Olympics 23',
 'BBC Olympics 24',
 'BBC RB 6',
 'BBC RB 6781',
 'BBC RB 6785',
 'BBC RB 6786',
 'BBC RB 6787',
 'BBC RB 6788',
 'BBC RB 6789',
 'BBC RB 6790',
 'BBC RB 601',
 'BBC RB 1

### Getting the panel name

Similarly, the `list_panels` method can be used to search all valid station names that contain 'BBC'

In [3]:
barb_api.list_panels("bbc")

['BBC Network',
 'BBC East Region',
 'BBC West Region',
 'BBC South West Region',
 'BBC South Region',
 'BBC Yorkshire & Lincolnshire',
 'BBC North East & Cumbria',
 'BBC North West Region',
 'BBC Scotland Region',
 'BBC Ulster Region',
 'BBC Wales Region',
 'BBC Midlands West',
 'BBC Midlands East',
 'BBC London',
 'BBC South East']

### Querying the programme ratings endpoint

Now we know all the relevant metadata we can query the programme ratings endpoint. This can be done very simply using pybarb's `programme_ratings` method. 

In [7]:
programme_data = barb_api.programme_ratings(min_transmission_date = "2021-01-01",
                           max_transmission_date = "2022-01-02", 
                           station =  "BBC1", 
                           panel="BBC Network")

## Accessing the data

The raw data is stored in the `api_response_data` attribute of the resulting object (in this case the object named `programme_data`)

In [9]:
programme_data.api_response_data

{'endpoint': 'programme_ratings',
 'events': [{'panel': {'panel_code': 50,
    'panel_region': 'BBC Network',
    'is_macro_region': False},
   'station': {'station_code': 10, 'station_name': 'BBC1'},
   'transmission_log_programme_name': 'WEATHER',
   'programme_type': 'programme',
   'programme_start_datetime': {'barb_reporting_datetime': '2021-01-02 22:17:07',
    'barb_polling_datetime': '2021-01-02 22:17:07',
    'standard_datetime': '2021-01-02 22:17:07'},
   'programme_duration': 2,
   'spans_normal_day': False,
   'sponsor': {'sponsor_code': None, 'bumpers': 'not_sponsored'},
   'broadcaster_transmission_code': '9637678218812',
   'live_status': 'unknown',
   'uk_premier': True,
   'broadcaster_premier': True,
   'repeat': False,
   'programme_content': {'content_name': 'BBC Weather',
    'barb_content_id': 0,
    'broadcaster_content_id': 'DEFAULT_BROADCASTER_CONTENT_ID',
    'metabroadcast_information': {'metabroadcast_content_id': 'm44h5w'},
    'episode': {'episode_number':

However it is easier to access it as a dataframe. To do this, we can use the `to_dataframe()` method, which flattens the nested json structure.

In [8]:
programme_df = programme_data.to_dataframe()
programme_df


AttributeError: 'NoneType' object has no attribute 'keys'

## Manipulating the data

We can also get a pivot of the data which turns the audiences into columns.

In [7]:
programme_data.audience_pivot()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,audience_name,"ABC1 Adults, Lightest Third",Adults 16-24,Adults 16-34,"Adults 16-34, Lightest Third",Adults 18-20,Adults 21-24,Adults 35-44,Adults 45-49,Adults 45-54,Adults 55-64,...,Men AB,Men AB working full-time,Men ABC1,Men ABC1 16-24,Men ABC1 16-34,Men ABC1 16-44,Men ABC1 35-54,Men ABC1 working full-time,Men C2,Men working full-time
panel_region,station_name,date_of_transmission,programme_name,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1
BBC Network,BBC1,2022-01-01,Archbishop of Canterbury's New Year Message: Series 2022,298,169,474,63,40,58,667,644,1181,2376,...,1038,198,2210,20,20,398,643,966,643,1599
BBC Network,BBC1,2022-01-01,Attenborough and the Mammoth Graveyard: Series 2021,208,114,521,41,11,75,1494,401,1079,2020,...,1272,557,2709,24,35,825,1162,1324,918,1985
BBC Network,BBC1,2022-01-01,BBC London,224,211,391,24,8,62,443,197,807,1131,...,700,222,1478,30,50,69,251,580,138,1253
BBC Network,BBC1,2022-01-01,"BBC Newsline: Series 2022, Episode 40",19,0,42,8,0,0,47,98,188,389,...,146,16,212,0,0,32,32,35,101,59
BBC Network,BBC1,2022-01-01,"BBC Wales Today: Series 2022, Episode 40",0,0,0,0,0,0,83,11,11,448,...,288,47,347,0,0,0,11,106,201,106
BBC Network,BBC1,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
BBC Network,BBC1,2022-01-02,"Spotlight: Series 2022, Episode 277",63,35,77,0,0,0,141,16,22,287,...,192,105,282,0,0,0,0,105,140,124
BBC Network,BBC1,2022-01-02,"The Tourist: Series 1, Episode 2",7104,1752,5839,861,474,1130,7745,5015,11479,15046,...,10097,5966,20441,470,1843,4159,6525,10996,5278,15585
BBC Network,BBC1,2022-01-02,Weather for the Week Ahead: Series 2022,353,882,1399,58,233,493,1619,1516,3822,3255,...,1866,781,4115,0,118,547,1679,2375,2201,3457
BBC Network,BBC1,2022-01-03,BBC News,2,4,7,0,0,2,118,62,284,460,...,106,36,277,2,5,6,68,171,182,271


## Filtering for the news programmes

We can search the programme_names to get the ones we are looking for.

In [8]:
programme_df['programme_name'] = programme_df['programme_name'].str.split(':', expand=True)[0]
programme_df['programme_name'][programme_df['programme_name'].str.contains('News')].unique()

array(['BBC Weekend News', 'BBC News', 'BBC Newsline'], dtype=object)

Now we filter for just the regular news programmes.

In [9]:
bbc_news = programme_df[programme_df['programme_name'].isin(['BBC News at Six',
       'BBC News at Ten', 'BBC News at One', 'BBC Weekend News'])]
bbc_news_all_homes = bbc_news[bbc_news['audience_name']=="All Homes"].sort_values(["programme_name", "programme_start_datetime"])


## Plotting the data

In [10]:
import plotly.express as px
px.line(bbc_news_all_homes, x="programme_start_datetime", y="audience_size_hundreds", color="programme_name", width = 1300, height=500)