# Extract Facebook Audience Estimates from Facebook Graph API

This file is dedicated to help extract Facebook Audience Estimates from any geographic location (+ radius). This can be used to for instance map out Facebook usage across a country, continent or the world.

The great majority of code in this repository is borrowed from [Matheus Araujo](https://github.com/maraujo) (which was updated by [Joao Palotti](https://github.com/joaopalotti)). The added capabilities in this repository include precise geographic targeting for audience estimations, i.e. I build a simple way to inject coordinates and radius in order to obtain audience estimates. 

### Prepare and Test the Collection Module

In [4]:
import os
os.chdir("pySocialWatcher/")

from pysocialwatcher import watcherAPI

# Prepare module
watcher = watcherAPI(api_version="11.0", sleep_time=3, save_every_x=3000, outputname="../../Data Exports/Facebook/my_collection_output.csv.gz")
watcher.load_credentials_file("pysocialwatcher/credentials copy.csv")

In [5]:
# Run Test data collection
watcher.run_data_collection("input/test_hamburg_15k_radius.json", remove_tmp_files=True)

2021-08-03 19:15:16 Jonathans-MacBook-Pro-4.local root[4306] INFO Building Collection Dataframe
2021-08-03 19:15:16 Jonathans-MacBook-Pro-4.local root[4306] INFO Total API Requests:2
2021-08-03 19:15:16 Jonathans-MacBook-Pro-4.local root[4306] INFO Completed: 0.00
2021-08-03 19:15:16 Jonathans-MacBook-Pro-4.local root[4306] INFO Completed: 50.00
2021-08-03 19:15:16 Jonathans-MacBook-Pro-4.local root[4306] INFO Saving Skeleton file: dataframe_skeleton_1628014513.csv.gz
2021-08-03 19:15:16 Jonathans-MacBook-Pro-4.local root[4306] INFO Collecting... Completed: 0.00% , 0/2


Reading Json Done.
Input Json Valid.
Collection Dataframe built.


2021-08-03 19:15:17 Jonathans-MacBook-Pro-4.local root[4306] INFO Collecting... Completed: 50.00% , 1/2
2021-08-03 19:15:20 Jonathans-MacBook-Pro-4.local root[4306] INFO Data Collection Complete
2021-08-03 19:15:20 Jonathans-MacBook-Pro-4.local root[4306] INFO Saving temporary file: dataframe_collecting_1628014513.csv.gz
2021-08-03 19:15:20 Jonathans-MacBook-Pro-4.local root[4306] INFO Computing Audience and DAU column
2021-08-03 19:15:20 Jonathans-MacBook-Pro-4.local root[4306] INFO Saving after collecting file: ../../Data Exports/Facebook/my_collection_output.csv.gz


Collection Done.


Unnamed: 0,name,interests,ages_ranges,genders,behavior,scholarities,languages,family_statuses,relationship_statuses,geo_locations,household_composition,all_fields,targeting,response,dau_audience,mau_audience,timestamp,publisher_platforms,mock_response
0,Location_coordinate_test,,{'min': 13},0,,,,,,"{'name': 'custom_locations', 'values': [{'lati...",,"((ages_ranges, {'min': 13}), (genders, 0), (ge...",{'geo_locations': {'custom_locations': [{'lati...,"b'{""data"":[{""daily_outcomes_curve"":[{""spend"":0...",563619,890000,1628014513,"[""facebook""]",False
1,Location_coordinate_test,,{'min': 13},1,,,,,,"{'name': 'custom_locations', 'values': [{'lati...",,"((ages_ranges, {'min': 13}), (genders, 1), (ge...",{'geo_locations': {'custom_locations': [{'lati...,"b'{""data"":[{""daily_outcomes_curve"":[{""spend"":0...",284109,450000,1628014513,"[""facebook""]",False


In [6]:
# Find potential interests and behaviours of interest

# Interest ID Search
#watcher.print_search_targeting_from_query_dataframe("Water")

# Behaviour ID Search
watcher.print_behaviors_list()

+-----+---------------+-----------------------------------------------------------------------------------+-----------+---------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+---------------------+
|     |            id | name                                                                              | type      | path                                                                                                                            | description                                                                                                                                                                     |   audience_size | real_time_cluster   |
|-----+---------------+-----------------------------------------

### Data Collection

The subsequent chunk of code creates one large JSON file that is used to collect all of the Facebook Audience estimates mentioned in the Appendix of this paper. For the collection of specific variables, different constellations of the code below may be necessary. This merely serves as a starting point for future researchers to see which variables I used.

In [7]:
import pandas as pd
from pysocialwatcher.utils import double_country_conversion
from pysocialwatcher.json_builder import JSONBuilder, BehaviorGroup, BehaviorList, Behavior, BehaviorGroups
from pysocialwatcher.json_builder import AgeList, Age, Genders, Location, LocationList, Scholarity, ScholarityList

# Locations
custom_location_df_DHS_IND = pd.read_csv("Input/custom_location_df_DHS_IND.csv")
custom_location_df_DHS_IND = custom_location_df_DHS_IND[1:3]
loclist = LocationList()
loclist.get_location_list_from_df(custom_location_df_DHS_IND)

# Ages
ageList = AgeList()
ageList.add(Age(18, None))# Baseline
#ageList.add(Age(13, 34))
#ageList.add(Age(35, 54))
#ageList.add(Age(55, None))

# Gender
genders = Genders(male=False, female=False, combined=True)

# Education
education = ScholarityList()
education.add(Scholarity.from_pre_defined_list("nodegree"))
education.add(Scholarity.from_pre_defined_list("highschool"))
education.add(Scholarity.from_pre_defined_list("graduated"))

# Digital behaviour (ID - Description)

# 6004386044572 - Android Device
# 6004385895772 - Windows Phone
# 6004384041172 - iOS


# 6004382299972 - All mobile devices
# 6004383049972 - Smartphones and Tablets
# 6016286626383 - Tablet
# 6004383149972 - feature phones
# 6023460590583 - Cherry Mobile
# 6056265200983 - Oppo
# 6056265212183 - Vivo
# 6011390261383 - Huawai


# 6075237200983 - Samsung Galaxy S8
# 6075237226583 - Samsung Galaxy S8+
# 6106223987983 - Samsung Galaxy S9
# 6106224431383 - Samsung Galaxy S9+
# 6092512412783 - iPhone 8
# 6092512424583 - iPhone 8+
# 6092512462983 - iPhone X (newer models were to sparsely used)


# 6015235495383 - Wifi
# 6017253486583 - 2G
# 6017253511583 - 3G
# 6017253531383 - 4G


# Set up behaviour group
bgrp_os = BehaviorGroup("operating_systems")

b_ios = BehaviorList(list_name="ios", operator="or")
b_ios.add(Behavior(6004384041172)) # iOS

b_android = BehaviorList(list_name="android", operator="or")
b_android.add(Behavior(6004386044572)) # Android

b_windows = BehaviorList(list_name="windows", operator="or")
b_windows.add(Behavior(6004385895772)) # Windows Phone

# Add to behaviour group
bgrp_os.add(b_ios)
bgrp_os.add(b_android)
bgrp_os.add(b_windows)

##########
# Set up behaviour group
bgrp_device = BehaviorGroup("devices")

b_devices = BehaviorList(list_name="moliledevices", operator="or")
b_devices.add(Behavior(6004382299972)) # All mobile devices

b_smartphones = BehaviorList(list_name="smartphonestables", operator="or")
b_smartphones.add(Behavior(6004383049972)) # Smartphones and Tablets

b_tablets = BehaviorList(list_name="table", operator="or")
b_tablets.add(Behavior(6016286626383)) # Tablet

b_feature = BehaviorList(list_name="feature", operator="or")
b_feature.add(Behavior(6004383149972)) # feature phones

b_cherry = BehaviorList(list_name="Cherry", operator="or")
b_cherry.add(Behavior(6023460590583)) # Cherry Mobile

b_oppo = BehaviorList(list_name="Oppo", operator="or")
b_oppo.add(Behavior(6056265200983)) # Oppo

b_vivo = BehaviorList(list_name="Vivo", operator="or")
b_vivo.add(Behavior(6056265212183)) # Vivo

b_oppovivocherry = BehaviorList(list_name="Oppovivocherry", operator="or")
b_oppovivocherry.add(Behavior(6056265212183)) # Vivo
b_oppovivocherry.add(Behavior(6056265200983)) # Oppo
b_oppovivocherry.add(Behavior(6023460590583)) # Cherry Mobile

b_Huawai = BehaviorList(list_name="Huawai", operator="or")
b_Huawai.add(Behavior(6011390261383)) # Huawai

# Add to behaviour group
bgrp_device.add(b_devices)
bgrp_device.add(b_smartphones)
bgrp_device.add(b_tablets)
bgrp_device.add(b_feature)
bgrp_device.add(b_cherry)
bgrp_device.add(b_oppo)
bgrp_device.add(b_vivo)
bgrp_device.add(b_oppovivocherry)
bgrp_device.add(b_Huawai)


##########
# Set up behaviour group
bgrp_hedevice = BehaviorGroup("highenddevices")

b_hedevices = BehaviorList(list_name="highenddevices", operator="or")
b_hedevices.add(Behavior(6075237200983)) # Samsung Galaxy S8
b_hedevices.add(Behavior(6075237226583)) # Samsung Galaxy S8+
b_hedevices.add(Behavior(6106223987983)) # Samsung Galaxy S9
b_hedevices.add(Behavior(6106224431383)) # Samsung Galaxy S9+
b_hedevices.add(Behavior(6092512412783)) # iPhone 8
b_hedevices.add(Behavior(6092512424583)) # iPhone 8+
b_hedevices.add(Behavior(6092512462983)) # iPhone X (newer models were to sparsely used)

b_Galaxy = BehaviorList(list_name="Galaxy", operator="or")
b_Galaxy.add(Behavior(6075237200983)) # Samsung Galaxy S8
b_Galaxy.add(Behavior(6075237226583)) # Samsung Galaxy S8+
b_Galaxy.add(Behavior(6106223987983)) # Samsung Galaxy S9
b_Galaxy.add(Behavior(6106224431383)) # Samsung Galaxy S9+
b_Apple = BehaviorList(list_name="Apple", operator="or")
b_Apple.add(Behavior(6092512412783)) # iPhone 8
b_Apple.add(Behavior(6092512424583)) # iPhone 8+
b_Apple.add(Behavior(6092512462983)) # iPhone X (newer models were to sparsely used)

# Add to behaviour group
bgrp_hedevice.add(b_hedevices)
bgrp_hedevice.add(b_Galaxy)
bgrp_hedevice.add(b_Apple)

##########
# Set up behaviour group
bgrp_connection = BehaviorGroup("connection")

b_Wifi = BehaviorList(list_name="Wifi", operator="or")
b_Wifi.add(Behavior(6015235495383)) # Wifi

b_2G = BehaviorList(list_name="2G", operator="or")
b_2G.add(Behavior(6017253486583)) # 2G

b_3G = BehaviorList(list_name="3G", operator="or")
b_3G.add(Behavior(6017253511583)) # 3G

b_4G = BehaviorList(list_name="4G", operator="or")
b_4G.add(Behavior(6017253531383)) # 4G

# Add to behaviour group
bgrp_connection.add(b_Wifi)
bgrp_connection.add(b_2G)
bgrp_connection.add(b_3G)
bgrp_connection.add(b_4G)


# Add all groups into one object
bgrps = BehaviorGroups()
bgrps.add(bgrp_os)
bgrps.add(bgrp_device)
bgrps.add(bgrp_hedevice)
bgrps.add(bgrp_connection)


jsonb = JSONBuilder(name="India", age_list=ageList, location_list=loclist, genders=genders, scholarities = education, behavior_groups=bgrps)

jsonb.jsonfy("Input/India_Data_Collection.json")

Created file Input/India_Data_Collection.json.


{'name': 'India',
 'geo_locations': [{'name': 'custom_locations',
   'values': [{'latitude': 13.1183798056,
     'longitude': 79.8039165532,
     'radius': 5,
     'distance_unit': 'kilometer'}],
   'location_types': ['home', 'recent']},
  {'name': 'custom_locations',
   'values': [{'latitude': 13.1470015741,
     'longitude': 79.8047549042,
     'radius': 5,
     'distance_unit': 'kilometer'}],
   'location_types': ['home', 'recent']}],
 'ages_ranges': [{'min': 18}],
 'genders': [0],
 'behavior': {'operating_systems': [{'name': 'ios', 'or': [6004384041172]},
   {'name': 'android', 'or': [6004386044572]},
   {'name': 'windows', 'or': [6004385895772]}],
  'devices': [{'name': 'moliledevices', 'or': [6004382299972]},
   {'name': 'smartphonestables', 'or': [6004383049972]},
   {'name': 'table', 'or': [6016286626383]},
   {'name': 'feature', 'or': [6004383149972]},
   {'name': 'Cherry', 'or': [6023460590583]},
   {'name': 'Oppo', 'or': [6056265200983]},
   {'name': 'Vivo', 'or': [605626521

In [None]:
watcher = watcherAPI(api_version="11.0", sleep_time=3, outputname="../../Data Exports/Facebook/my_collection_output.csv.gz") 
watcher.load_credentials_file("pysocialwatcher/credentials copy.csv")
df = watcher.run_data_collection("Input/India_Data_Collection.json", remove_tmp_files=True)

In [None]:
df.to_csv("../../Data Exports/Facebook/India_Data_Collection.csv")
df