# Pulling data from the viewing endpoint using pybarb

In this demo we will show you how to pull data from the asynchronus viewing endpoint and then manipulate it using the pybarb library. We will cover pulling data for all activities and then look at how you can limit the request to specific activities

Note the full API documentation can be found [here](https://barb-api.co.uk/api-docs). 

It might also be useful to consult the [Getting Started](https://barb-api.co.uk/api-docs#section/Getting-started) section for information about authentication and basic API usage.



## Pulling all viewing data for a particular station and channel

As usual we begin by connecting to the API and authenticating.

In [1]:
import json
import pybarb as pb

# Set the working directory
working_directory = '/path/to/your/dir/'

# Get the access token
with open(working_directory + "creds.json") as file:
    creds = json.load(file)

# Create a BarbAPI object and connect
barb_api = pb.BarbAPI(creds)
barb_api.connect()

We would like one day's worth of viewing data (2023-07-06) for BBC1 East viewing station and the BBC East Region panel.

First we need to look up the viewing station and panels to check we have the correct query parameters

In [2]:
barb_api.list_panels("BBC")

['BBC Network',
 'BBC East Region',
 'BBC West Region',
 'BBC South West Region',
 'BBC South Region',
 'BBC Yorkshire & Lincolnshire',
 'BBC North East & Cumbria',
 'BBC North West Region',
 'BBC Scotland Region',
 'BBC Ulster Region',
 'BBC Wales Region',
 'BBC Midlands West',
 'BBC Midlands East',
 'BBC London',
 'BBC South East']

In [3]:
barb_api.list_viewing_stations("BBC1")

['BBC1 Midlands West',
 'BBC1 East',
 'BBC1 West',
 'BBC1 South West',
 'BBC1 South',
 'BBC1 Yorks/Lincs',
 'BBC1 North East/Cumbria',
 'BBC1 North West',
 'BBC1 Scotland',
 'BBC1 Wales',
 'BBC1 Northern Ireland',
 'BBC1 Midlands East',
 'BBC1 London',
 'BBC1 South East']

Next we need to request the data sets from the asynchronous API. We will use the `pybarb` package to do this.

In [4]:
barb_api.viewing(min_session_date="2023-07-06", max_session_date="2023-07-06", 
                 viewing_station="BBC1 East", panel="BBC East Region")

Job successfully started. The job id is 29995344-0056-4bf0-934e-56e33006c290


Once the job has started we can use the ping_job_status method to check the status of the job. It will check the status every 60 seconds until the job is complete.

In [5]:
barb_api.ping_job_status()

Job not ready yet. Sleeping for 60 seconds.
Job not ready yet. Sleeping for 60 seconds.
Job complete. 1 files are ready for download.


Now we can download the data using the `get_asynch_files` method.

In [6]:
viewing_results_set = barb_api.get_asynch_files()

In its raw form  we can see many of the cells contain json. We will need to unpack this.

In [7]:
viewing_results_set.api_response_data.head(5)

Unnamed: 0,STANDARD_DATE_OF_ACTIVITY,SESSION_START,SESSION_END,HOUSEHOLD,DEVICE,PANEL_VIEWERS,GUEST_VIEWERS,PROGRAMMES_VIEWED,SPOTS_VIEWED,VIEWING_STATION,...,PLATFORM,ACTIVITY_TYPE,CONTENT_ASSET_ITEM_OFFSET,PLAYBACK_TYPE,SKY_ULTRA_HD,START_OF_RECORDING,TARGETED_PROMOTION,VOD_INDICATOR,VOD_PROVIDER,REPLICATE_ID
0,2023-07-06,{'barb_polling_datetime': '2023-07-06 18:27:00...,{'barb_polling_datetime': '2023-07-06 18:51:00...,{'bbc_itv_segment': 'bbc east / east of englan...,"{'date_valid_for': '2023-07-06', 'device_numbe...","[{'date_of_birth': '1957-02-01', 'dependency_o...",{},"[{'broadcaster_premier': True, 'broadcaster_tr...",[],"{'viewing_station_code': 32, 'viewing_station_...",...,digital terrestrial,live viewing (excl targeted advertising),0,unknown,False,{},False,not on-demand,"{'vod_provider': 'unknown', 'vod_service': 'un...",
1,2023-07-06,{'barb_polling_datetime': '2023-07-06 21:22:00...,{'barb_polling_datetime': '2023-07-06 21:23:00...,{'bbc_itv_segment': 'bbc east / east of englan...,"{'date_valid_for': '2023-07-06', 'device_numbe...","[{'date_of_birth': '1970-06-01', 'dependency_o...",{},"[{'broadcaster_premier': True, 'broadcaster_tr...",[],"{'viewing_station_code': 32, 'viewing_station_...",...,online via tv (& peripherals),vosdal (viewing on same day as live) (excl tar...,0,other device,False,{'barb_polling_datetime': '2023-07-06 21:21:00...,False,not on-demand,"{'vod_provider': 'unknown', 'vod_service': 'un...",
2,2023-07-06,{'barb_polling_datetime': '2023-07-06 21:25:00...,{'barb_polling_datetime': '2023-07-06 21:34:00...,{'bbc_itv_segment': 'bbc east / east of englan...,"{'date_valid_for': '2023-07-06', 'device_numbe...","[{'date_of_birth': '1970-06-01', 'dependency_o...",{},"[{'broadcaster_premier': True, 'broadcaster_tr...",[],"{'viewing_station_code': 32, 'viewing_station_...",...,online via tv (& peripherals),vosdal (viewing on same day as live) (excl tar...,0,other device,False,{'barb_polling_datetime': '2023-07-06 21:24:00...,False,not on-demand,"{'vod_provider': 'unknown', 'vod_service': 'un...",
3,2023-07-06,{'barb_polling_datetime': '2023-07-06 18:19:00...,{'barb_polling_datetime': '2023-07-06 18:21:00...,{'bbc_itv_segment': 'bbc east / east of englan...,"{'date_valid_for': '2023-07-06', 'device_numbe...","[{'date_of_birth': '1977-02-01', 'dependency_o...",{},"[{'broadcaster_premier': True, 'broadcaster_tr...",[],"{'viewing_station_code': 32, 'viewing_station_...",...,digital terrestrial,live viewing (excl targeted advertising),0,unknown,False,{},False,not on-demand,"{'vod_provider': 'unknown', 'vod_service': 'un...",
4,2023-07-06,{'barb_polling_datetime': '2023-07-06 21:02:00...,{'barb_polling_datetime': '2023-07-06 21:03:00...,{'bbc_itv_segment': 'bbc east / east of englan...,"{'date_valid_for': '2023-07-06', 'device_numbe...","[{'date_of_birth': '1941-01-01', 'dependency_o...",{},"[{'broadcaster_premier': True, 'broadcaster_tr...",[],"{'viewing_station_code': 32, 'viewing_station_...",...,online via tv (& peripherals),vosdal (viewing on same day as live) (excl tar...,0,other device,False,{'barb_polling_datetime': '2023-07-06 21:01:00...,False,not on-demand,"{'vod_provider': 'unknown', 'vod_service': 'un...",


We can either save it as json using the `to_json` method...

In [8]:
viewing_results_set.to_json("results.json")

Or we can use the `to_dataframe` method to reshape the data so that we have one row per viewer per programme.

In [9]:
df = viewing_results_set.to_dataframe(unpack=["viewers", "programmes"])
df.head()

Unnamed: 0,session_start_datetime,programme_start_datetime,programme_name,date_of_birth,dependency_of_children,disability,ethnic_origin,gaelic_language,household_status,life_stage,...,number_of_tv_sets,number_of_vcrs,panel_membership_status,presence_of_children,replication_factor,social_class,welsh_speaking_home,device_number,device_on_panel,device_type
0,2023-07-06 18:27:00.000,2023-07-06 18:31:09,"East Midlands Today: Series 2023, Episode 97",1957-02-01,unclassified,no,white british,not gaelic speaking/not in scotland,houseperson and not Head of household,couple no children aged 55+,...,2,0,home on panel (valid reporter),no children,16,C1,non welsh speaking,1,True,tv
1,2023-07-06 18:27:00.000,2023-07-06 18:31:09,"Look East: Series 2023, Episode 97",1957-02-01,unclassified,no,white british,not gaelic speaking/not in scotland,houseperson and not Head of household,couple no children aged 55+,...,2,0,home on panel (valid reporter),no children,16,C1,non welsh speaking,1,True,tv
2,2023-07-06 18:27:00.000,2023-07-06 18:00:03,BBC News at Six: Series 2023,1957-02-01,unclassified,no,white british,not gaelic speaking/not in scotland,houseperson and not Head of household,couple no children aged 55+,...,2,0,home on panel (valid reporter),no children,16,C1,non welsh speaking,1,True,tv
6,2023-07-06 18:27:00.000,2023-07-06 18:31:09,"Look North (Yorkshire): Series 2023, Episode 97",1957-02-01,unclassified,no,white british,not gaelic speaking/not in scotland,houseperson and not Head of household,couple no children aged 55+,...,2,0,home on panel (valid reporter),no children,16,C1,non welsh speaking,1,True,tv
7,2023-07-06 18:27:00.000,2023-07-06 18:31:10,"BBC London: Series 2023, Episode 97",1957-02-01,unclassified,no,white british,not gaelic speaking/not in scotland,houseperson and not Head of household,couple no children aged 55+,...,2,0,home on panel (valid reporter),no children,16,C1,non welsh speaking,1,True,tv


## Pulling data for SVOD only

To restrict the returned data to SVOD only we set the `activity_type`` parameter to `tv_non_linear`. Here we bring back all data for Netflix viewing on the BBC East Region panel.

In [11]:
barb_api.viewing(min_session_date="2023-07-06", max_session_date="2023-07-06", 
                  viewing_station="Netflix", panel="BBC East Region", activity_type="tv_non_linear")
barb_api.ping_job_status()
viewing_results_set = barb_api.get_asynch_files()
df = viewing_results_set.to_dataframe(unpack=["viewers", "programmes"])
df.head()

Job successfully started. The job id is f89baa2f-90aa-4e7e-b9d3-23bab4ac2c69
Job not ready yet. Sleeping for 60 seconds.
Job not ready yet. Sleeping for 60 seconds.
Job complete. 1 files are ready for download.


Unnamed: 0,session_start_datetime,programme_name,date_of_birth,dependency_of_children,disability,ethnic_origin,gaelic_language,household_status,life_stage,marital_status,...,number_of_tv_sets,number_of_vcrs,panel_membership_status,presence_of_children,replication_factor,social_class,welsh_speaking_home,device_number,device_on_panel,device_type
0,2023-07-06 21:56:00.000,"Black Mirror: Series 6, Episode 1",1975-03-01,unclassified,no,white british,not gaelic speaking/not in scotland,neither houseperson nor head of household,couple no children aged 35-54,married / living as married,...,1,0,home on panel (valid reporter),no children,16,C1,non welsh speaking,1,True,tv
1,2023-07-06 21:56:00.000,"Black Mirror: Series 6, Episode 1",1958-09-01,unclassified,no,white british,not gaelic speaking/not in scotland,both houseperson and head of household,couple no children aged 55+,married / living as married,...,1,0,home on panel (valid reporter),no children,16,C1,non welsh speaking,1,True,tv
2,2023-07-06 19:10:00.000,FILM: WHAM (2023),1992-06-01,unclassified,no,other White,not gaelic speaking/not in scotland,houseperson and not Head of household,couple no children aged 16-34,married / living as married,...,2,0,home on panel (valid reporter),no children,6,AB,non welsh speaking,1,True,tv
3,2023-07-06 22:58:00.000,"Brooklyn Nine-Nine: Series 4, Episode 21",1992-06-01,unclassified,no,other White,not gaelic speaking/not in scotland,houseperson and not Head of household,couple no children aged 16-34,married / living as married,...,2,0,home on panel (valid reporter),no children,6,AB,non welsh speaking,1,True,tv
4,2023-07-06 22:58:00.000,"Brooklyn Nine-Nine: Series 4, Episode 21",1986-09-01,unclassified,no,white british,not gaelic speaking/not in scotland,head of household and not houseperson,couple no children aged 35-54,married / living as married,...,2,0,home on panel (valid reporter),no children,6,AB,non welsh speaking,1,True,tv


## Pulling data only for viewing that took place on tablets, PCs and smartphones.
This is done by setting the `activity_type` parameter to "online_devices". Here we bring back all data for viewing on tablets, PCs and smartphones of the station "BBC London" by those panelists who live in the BBC London region.

In [12]:
barb_api.viewing(min_session_date="2023-07-06", max_session_date="2023-07-06", 
                  viewing_station="BBC1 London", panel="BBC London", activity_type="online_devices")
barb_api.ping_job_status()
viewing_results_set = barb_api.get_asynch_files()
df = viewing_results_set.to_dataframe(unpack=["viewers", "programmes"])
df.head()


Job successfully started. The job id is b625e74b-35f9-4ec0-a77b-c353a72fc318
Job not ready yet. Sleeping for 60 seconds.
Job not ready yet. Sleeping for 60 seconds.
Job complete. 1 files are ready for download.


Unnamed: 0,session_start_datetime,programme_start_datetime,programme_name,date_of_birth,dependency_of_children,disability,ethnic_origin,gaelic_language,household_status,life_stage,...,number_of_tv_sets,number_of_vcrs,panel_membership_status,presence_of_children,replication_factor,social_class,welsh_speaking_home,device_number,device_on_panel,device_type
0,2023-07-06 01:30:16.000,2023-07-06 01:33:57,Joins BBC News: Series 2023,1979-02-01,unclassified,no,other White,not gaelic speaking/not in scotland,neither houseperson nor head of household,couple no children aged 35-54,...,1,0,home on panel (valid reporter),no children,44,AB,non welsh speaking,11,True,computer
2,2023-07-06 01:30:16.000,2023-07-06 01:28:55,Weather for the Week Ahead: Series 2023,1979-02-01,unclassified,no,other White,not gaelic speaking/not in scotland,neither houseperson nor head of household,couple no children aged 35-54,...,1,0,home on panel (valid reporter),no children,44,AB,non welsh speaking,11,True,computer
8,2023-07-06 19:34:29.000,2023-07-06 18:59:42,Wimbledon: Series 2023,1982-11-01,unclassified,no,white british,not gaelic speaking/not in scotland,houseperson and not Head of household,either youngest children 0-4 includes single p...,...,2,0,home on panel (valid reporter),with children aged 4-9 years,16,AB,non welsh speaking,22,True,tablet
12,2023-07-06 21:01:06.000,2023-07-07 00:50:53,Weather for the Week Ahead: Series 2023,1982-11-01,unclassified,no,white british,not gaelic speaking/not in scotland,houseperson and not Head of household,either youngest children 0-4 includes single p...,...,2,0,home on panel (valid reporter),with children aged 4-9 years,16,AB,non welsh speaking,22,True,tablet
13,2023-07-06 21:01:06.000,2023-07-07 00:20:20,"Newscast: Series 4, Episode 10",1982-11-01,unclassified,no,white british,not gaelic speaking/not in scotland,houseperson and not Head of household,either youngest children 0-4 includes single p...,...,2,0,home on panel (valid reporter),with children aged 4-9 years,16,AB,non welsh speaking,22,True,tablet


We can check the number of rows for each device type.

In [13]:
df.device_type.value_counts()

device_type
tablet        1637
computer       749
smartphone     273
Name: count, dtype: int64