# Introduction
This document will describe the steps that were taken to prepare the Facebook / Instagram data set (containing information on gender and age) for analysis.

The data set contains data coming from an online (health-related) field experiment, by the AFIP Foundation, that ran between May 23 until June 23 (2023). 12 different advertisements were distributed on Facebook and Instagram and sent people to the OHC of the AFIP Foundation
on which different levels of web behavior and engagement was measured. This data originates from Facebook and Instagram and contains information on the Gender and Age of users and their behavior towards the advertisements of the experiment. The data set is cleaned up and prepared for analysis through the code below.


In [None]:
import pandas as pd
!pip install pyreadstat
import pyreadstat
import re

# Read the CSV file
df = pd.read_excel('YourFile')

In [None]:
final_data = df
final_data = df.copy()
#Checking whether the file is coming through correctly in the dataframe
final_data.head()

# Creating dummy variables

In this data set, multiple dummy variables are created. This section will explain the definition of the dummy variables, followed by the code.

**Communication Concept Dummies**

Dummies are assigned based on the Ad set name that was given to the advertisement. The experiment contains 12 different variants. The concepts (i.e., concepts 1 to 12) contain differtent emotions, topics, appeal and linguistic style.

*   Concept 1: contains 'Concept 1' in Ad set name
*   Concept 2: contains 'Concept 2' in Ad set name
*   Concept 3: contains 'Concept 3' in Ad set name
*   Concept 4: contains 'Concept 4' in Ad set name
*   Concept 5: contains 'Concept 5' in Ad set name
*   Concept 6: contains 'Concept 6' in Ad set name
*   Concept 7: contains 'Concept 7' in Ad set name
*   Concept 8: contains 'Concept 8' in Ad set name
*   Concept 9: contains 'Concept 9' in Ad set name
*   Concept 10: contains 'Concept 10' in Ad set name
*   Concept 11: contains 'Concept 11' in Ad set name
*   Concept 12: contains 'Concept 12' in Ad set name






In [None]:
def generate_content_dummies(ad_set_name):
    content_dummies = {
        'Emotion_Love':0,
        'Emotion_Fear': 0,
        'Topic_Sprotection':0,
        'Topic_Affiliation': 0,
        'Topic_Kincare': 0,
        'Appeal_Exp':0,
        'Appeal_Testi': 0,
        'Appeal_Infor':0,
        'Appeal_Pers': 0,
        'LStyle_Fperson':0,
        'LStyle_Tperson': 0,
    }


    if 'Concept 1' in ad_set_name:
        content_dummies['Emotion_Love'] = 1
        content_dummies['Topic_Kincare'] = 1
        content_dummies['Appeal_Testi'] = 1
        content_dummies['Appeal_Pers'] = 1
        content_dummies['LStyle_Fperson'] = 1
    elif 'Concept 2' in ad_set_name:
        content_dummies['Emotion_Fear'] = 1
        content_dummies['Topic_Sprotection'] = 1
        content_dummies['Appeal_Exp'] = 1
        content_dummies['Appeal_Infor'] = 1
        content_dummies['LStyle_Tperson'] = 1
    elif 'Concept 3' in ad_set_name:
        content_dummies['Emotion_Fear'] = 1
        content_dummies['Topic_Affiliation'] = 1
        content_dummies['Appeal_Testi'] = 1
        content_dummies['Appeal_Pers'] = 1
        content_dummies['LStyle_Tperson'] = 1
    elif 'Concept 4' in ad_set_name:
        content_dummies['Emotion_Love'] = 1
        content_dummies['Topic_Sprotection'] = 1
        content_dummies['Appeal_Testi'] = 1
        content_dummies['Appeal_Infor'] = 1
        content_dummies['LStyle_Fperson'] = 1
    elif 'Concept 5' in ad_set_name:
        content_dummies['Emotion_Fear'] = 1
        content_dummies['Topic_Kincare'] = 1
        content_dummies['Appeal_Exp'] = 1
        content_dummies['Appeal_Infor'] = 1
        content_dummies['LStyle_Fperson'] = 1
    elif 'Concept 6' in ad_set_name:
        content_dummies['Emotion_Love'] = 1
        content_dummies['Topic_Affiliation'] = 1
        content_dummies['Appeal_Testi'] = 1
        content_dummies['Appeal_Infor'] = 1
        content_dummies['LStyle_Tperson'] = 1
    elif 'Concept 7' in ad_set_name:
        content_dummies['Emotion_Fear'] = 1
        content_dummies['Topic_Kincare'] = 1
        content_dummies['Appeal_Testi'] = 1
        content_dummies['Appeal_Pers'] = 1
        content_dummies['LStyle_Tperson'] = 1
    elif 'Concept 8' in ad_set_name:
        content_dummies['Emotion_Love'] = 1
        content_dummies['Topic_Affiliation'] = 1
        content_dummies['Appeal_Exp'] = 1
        content_dummies['Appeal_Infor'] = 1
        content_dummies['LStyle_Fperson'] = 1
    elif 'Concept 9' in ad_set_name:
        content_dummies['Emotion_Fear'] = 1
        content_dummies['Topic_Sprotection'] = 1
        content_dummies['Appeal_Exp'] = 1
        content_dummies['Appeal_Testi'] = 1
        content_dummies['Appeal_Pers'] = 1
        content_dummies['LStyle_Fperson'] = 1
    elif 'Concept 10' in ad_set_name:
        content_dummies['Emotion_Love'] = 1
        content_dummies['Topic_Kincare'] = 1
        content_dummies['Appeal_Exp'] = 1
        content_dummies['Appeal_Infor'] = 1
        content_dummies['LStyle_Tperson'] = 1
    elif 'Concept 11' in ad_set_name:
        content_dummies['Emotion_Love'] = 1
        content_dummies['Topic_Affiliation'] = 1
        content_dummies['Appeal_Exp'] = 1
        content_dummies['Appeal_Pers'] = 1
        content_dummies['LStyle_Tperson'] = 1
    elif 'Concept 12' in ad_set_name:
        content_dummies['Emotion_Fear'] = 1
        content_dummies['Topic_Sprotection'] = 1
        content_dummies['Appeal_Exp'] = 1
        content_dummies['Appeal_Pers'] = 1
        content_dummies['LStyle_Fperson'] = 1

    return content_dummies

**Concept label dummies**

This function creates a variable, named 'concepts', that will label the concept numbers based on the Ad set names.

In [None]:
def generate_concept(ad_set_name):

    if 'Concept 1' in ad_set_name:
        return 1
    elif 'Concept 2' in ad_set_name:
        return 2
    elif 'Concept 3' in ad_set_name:
        return 3
    elif 'Concept 4' in ad_set_name:
        return 4
    elif 'Concept 5' in ad_set_name:
        return 5
    elif 'Concept 6' in ad_set_name:
        return 6
    elif 'Concept 7' in ad_set_name:
        return 7
    elif 'Concept 8' in ad_set_name:
        return 8
    elif 'Concept 9' in ad_set_name:
        return 9
    elif 'Concept 10' in ad_set_name:
        return 10
    elif 'Concept 11' in ad_set_name:
        return 11
    elif 'Concept 12' in ad_set_name:
        return 12
    else:
        return 0


The next step is to apply the results from the functions to the dataframe.

In [None]:
final_data['concept'] = final_data['Ad set name'].apply(generate_concept)

# Creating a new dataframe with the dummy variables
dummy_df = final_data['Ad set name'].apply(generate_content_dummies).apply(pd.Series)
final_data = pd.concat([final_data, dummy_df], axis=1)

**Other dummies**

*Gender*

The data sets contains male, female and unknown genders. This will be coded as follows:


*   000 = Male
*   010 = Female
*   001 = Unknown










In [None]:
#Defining the dummies for gender
final_data['Gender_Male'] = (final_data['Gender'] == 'male').astype(int)
final_data['Gender_Female'] = (final_data['Gender'] == 'female').astype(int)
final_data['Gender_Unknown'] = (final_data['Gender'] == 'unknown').astype(int)

In [None]:
#Defining dummies for age
final_data['Age_13-17'] = (final_data['Age'] == '13-17').astype(int)
final_data['Age_18-24'] = (final_data['Age'] == '18-24').astype(int)
final_data['Age_25-34'] = (final_data['Age'] == '35-44').astype(int)
final_data['Age_35-44'] = (final_data['Age'] == '35-44').astype(int)
final_data['Age_45-54'] = (final_data['Age'] == '35-44').astype(int)
final_data['Age_55-64'] = (final_data['Age'] == '55-64').astype(int)
final_data['Age_65+'] = (final_data['Age'] == '65+').astype(int)


# Replacing missing values
Some variables left some blanks. Blanks for numeric variables are replaced with '0'.

In [None]:
final_data['Amount spent (EUR)'] = pd.to_numeric(final_data['Amount spent (EUR)'], errors='coerce').fillna(0).astype(int)
final_data['CPM (cost per 1,000 impressions)'] = pd.to_numeric(final_data['CPM (cost per 1,000 impressions)'], errors='coerce').fillna(0).astype(int)
final_data['Reach'] = pd.to_numeric(final_data['Reach'], errors='coerce').fillna(0).astype(int)
final_data['Frequency'] = pd.to_numeric(final_data['Frequency'], errors='coerce').fillna(0).astype(int)
final_data['CTR (all)'] = pd.to_numeric(final_data['CTR (all)'], errors='coerce').fillna(0).astype(int)
final_data['Link clicks'] = pd.to_numeric(final_data['Link clicks'], errors='coerce').fillna(0).astype(int)
final_data['CPC (all)'] = pd.to_numeric(final_data['CPC (all)'], errors='coerce').fillna(0).astype(int)
final_data['CPC (cost per link click)'] = pd.to_numeric(final_data['CPC (cost per link click)'], errors='coerce').fillna(0).astype(int)
final_data['Follows or likes'] = pd.to_numeric(final_data['Follows or likes'], errors='coerce').fillna(0).astype(int)
final_data['Post comments'] = pd.to_numeric(final_data['Post comments'], errors='coerce').fillna(0).astype(int)
final_data['Post reactions'] = pd.to_numeric(final_data['Post reactions'], errors='coerce').fillna(0).astype(int)
final_data['Post shares'] = pd.to_numeric(final_data['Post shares'], errors='coerce').fillna(0).astype(int)
final_data['CTR (link click-through rate)'] = pd.to_numeric(final_data['CTR (link click-through rate)'], errors='coerce').fillna(0).astype(int)

In [None]:
#This code is only relevant if you aim to use the dataset in SPSS
#Renaming variables so that they're suitable for SPSS

def clean_variable_name(name):
    cleaned_name = re.sub(r'[^\w\s]', '', name).replace(' ', '_')
    return cleaned_name

columns_to_rename = [
    'Ad set name', 'concept',
    'Emotion_Love', 'Emotion_Fear', 'Topic_Sprotection', 'Topic_Affiliation', 'Topic_Kincare', 'Appeal_Exp',
    'Appeal_Testi', 'Appeal_Infor', 'Appeal_Pers', 'LStyle_Fperson', 'LStyle_Tperson',
    'Age', 'Age_13-17', 'Age_18-24', 'Age_25-34', 'Age_35-44', 'Age_45-54', 'Age_55-64', 'Age_65+',
    'Gender', 'Gender_Male', 'Gender_Female', 'Gender_Unknown',
    'Amount spent (EUR)',
    'Impressions',
    'CPM (cost per 1,000 impressions)',
    'Reach',
    'Frequency',
    'CTR (all)',
    'Link clicks',
    'CPC (all)',
    'CPC (cost per link click)',
    'Follows or likes',
    'Post comments',
    'Post reactions',
    'Post shares',
    'CTR (link click-through rate)'
]

renamed_columns = {column: clean_variable_name(column) for column in columns_to_rename}
final_data.rename(columns=renamed_columns, inplace=True)
print(renamed_columns)


#Column selection
The following columns are selected to be included within the data set.

In [None]:


selected_columns = [
    'Day',
    'Ad_set_name', 'concept',
    'Emotion_Love', 'Emotion_Fear', 'Topic_Sprotection', 'Topic_Affiliation', 'Topic_Kincare', 'Appeal_Exp',
    'Appeal_Testi', 'Appeal_Infor', 'Appeal_Pers', 'LStyle_Fperson', 'LStyle_Tperson',
    'Age', 'Age_1317', 'Age_1824', 'Age_2534', 'Age_3544', 'Age_4554', 'Age_5564', 'Age_65',
    'Gender', 'Gender_Male', 'Gender_Female', 'Gender_Unknown',
    'Amount_spent_EUR',
    'Impressions',
    'CPM_cost_per_1000_impressions',
    'Reach',
    'Frequency',
    'CTR_all',
    'Link_clicks',
    'CPC_all',
    'CPC_cost_per_link_click',
    'Follows_or_likes',
    'Post_comments',
    'Post_reactions',
    'Post_shares',
    'CTR_link_clickthrough_rate',


]

In [None]:
final_data = final_data[selected_columns].reset_index()
final_data.head()

# Saving the file to .csv or SPSS



In [None]:
final_data.to_csv('YourFile', index=False)

# Saving the file to SPSS format
pyreadstat.write_sav(final_data,'YourFile')