### Election Ad Spending Dataset

This dataset provides a comprehensive overview of the advertising expenditure by political parties on Facebook and Instagram during the 2024 elections in India. The dataset is divided into three distinct files, each capturing different aspects of the ad spending and election results:

1. **Advertisers Data**:
   - **Page ID**: A unique identifier for the advertiser's page.
   - **Page Name**: The name of the advertiser's page.
   - **Disclaimer**: Information about the advertiser, typically who paid for the ads.
   - **Amount Spent (INR)**: The total amount of money spent on ads in Indian Rupees.
   - **Number of Ads in Library**: The number of ads associated with the advertiser.

2. **Locations Data**:
   - **Location Name**: The name of the location.
   - **Amount Spent (INR)**: The total amount of money spent on ads in that location in Indian Rupees.

3. **Results Data**:
   - **_id**: A unique identifier for the entry.
   - **Sl No**: Serial number.
   - **State**: The name of the state.
   - **PC_Name**: The name of the parliamentary constituency.
   - **Total Electors**: The total number of registered voters.
   - **Polled (%)**: The percentage of votes polled.
   - **Total Votes**: The total number of votes cast.
   - **Phase**: The phase of the election.

By analyzing this dataset, we can gain insights into the spending patterns of different political parties, the geographical distribution of their ad expenditures, and how these factors correlate with voter turnout and election results across various parliamentary constituencies. This information is crucial for understanding the impact of digital advertising on electoral outcomes and voter engagement in India's 2024 elections.

# Data Gathering 

In [2]:
import pandas as pd

# Load the advertisers data
advertisers_df = pd.read_csv('Datasets/advertisers.csv')
print("Advertisers Data:")
print(advertisers_df.head())

# Load the locations data
locations_df = pd.read_csv('Datasets/locations.csv')
print("\nLocations Data:")
print(locations_df.head())

# Load the results data
results_df = pd.read_csv('Datasets/results.csv')
print("\nResults Data:")
print(results_df.head())

Advertisers Data:
           Page ID                     Page name  \
0  121439954563203  Bharatiya Janata Party (BJP)   
1  351616078284404      Indian National Congress   
2  132715103269897      Ama Chinha Sankha Chinha   
3  192856493908290      Ama Chinha Sankha Chinha   
4  109470364774303              Ellorum Nammudan   

                                    Disclaimer Amount spent (INR)  \
0                 Bharatiya Janata Party (BJP)          193854342   
1                     Indian National Congress          108787100   
2                     Ama Chinha Sankha Chinha           73361399   
3                     Ama Chinha Sankha Chinha           32294327   
4  Populus Empowerment Network Private Limited           22399499   

   Number of ads in Library  
0                     43455  
1                       846  
2                      1799  
3                       680  
4                       879  

Locations Data:
                 Location name  Amount spent (INR)
0  And

# Data Cleaning and Processing

In [3]:
# Check for missing values and basic statistics
print("\nAdvertisers Data Info:")
print(advertisers_df.info())
print("\nLocations Data Info:")
print(locations_df.info())
print("\nResults Data Info:")
print(results_df.info())


Advertisers Data Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20832 entries, 0 to 20831
Data columns (total 5 columns):
 #   Column                    Non-Null Count  Dtype 
---  ------                    --------------  ----- 
 0   Page ID                   20832 non-null  int64 
 1   Page name                 20832 non-null  object
 2   Disclaimer                20832 non-null  object
 3   Amount spent (INR)        20832 non-null  object
 4   Number of ads in Library  20832 non-null  int64 
dtypes: int64(2), object(3)
memory usage: 813.9+ KB
None

Locations Data Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 36 entries, 0 to 35
Data columns (total 2 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   Location name       36 non-null     object
 1   Amount spent (INR)  36 non-null     int64 
dtypes: int64(1), object(1)
memory usage: 708.0+ bytes
None

Results Data Info:
<class 'pandas.core.frame.DataFrame'>

In [4]:
# Fill missing values if necessary (example: fill with 0)
advertisers_df.fillna({'Amount spent (INR)': 0, 'Number of ads in Library': 0}, inplace=True)
locations_df.fillna({'Amount spent (INR)': 0}, inplace=True)
results_df.fillna(0, inplace=True)

In [5]:
# Convert 'Amount spent (INR)' to numeric (in case of any data issues)
advertisers_df['Amount spent (INR)'] = pd.to_numeric(advertisers_df['Amount spent (INR)'], errors='coerce')
locations_df['Amount spent (INR)'] = pd.to_numeric(locations_df['Amount spent (INR)'], errors='coerce')

In [6]:
# Convert 'Amount spent (INR)' to numeric (in case of any data issues)
advertisers_df['Amount spent (INR)'] = pd.to_numeric(advertisers_df['Amount spent (INR)'], errors='coerce')
locations_df['Amount spent (INR)'] = pd.to_numeric(locations_df['Amount spent (INR)'], errors='coerce')

In [7]:
# Merge locations data with results data on state
merged_df = locations_df.merge(results_df, left_on='Location name', right_on='State', how='inner')


In [8]:
# Total Ad Spend by State
total_spend_by_state = merged_df.groupby('State')['Amount spent (INR)'].sum().reset_index()

# Total Votes vs. Ad Spend
total_votes_vs_spend = merged_df.groupby('State').agg({'Total Votes': 'sum', 'Amount spent (INR)': 'sum'}).reset_index()


# Data Visualization

In [9]:
import plotly.express as px

    # Visualization: Total Ad Spend by State
fig = px.bar(total_spend_by_state,x='State', y='Amount spent (INR)',
             labels={'State': 'State', 'Amount spent (INR)': 'Ad Spend (INR)'},
             title='Total Ad Spend by State')

fig.update_layout(xaxis={'categoryorder': 'total descending'},
                  xaxis_tickangle=-90,
                  width=800,
                  height=600)

fig.show()




In [10]:
#average voter turnout by state:
state_voter_turnout = merged_df.groupby('State')['Polled (%)'].mean().reset_index()

fig = px.bar(state_voter_turnout, x='State', y='Polled (%)',
             labels={'State': 'State', 'Polled (%)': 'Voter Turnout (%)'},
             title='Average Voter Turnout by State')


fig.update_layout(xaxis={'categoryorder': 'total descending'},
                  xaxis_tickangle=-90,
                  width=800,
                  height=600)

fig.show()


In [11]:
# Visualization: Total Votes vs. Ad Spend
fig = px.scatter(total_votes_vs_spend, x='Amount spent (INR)', y='Total Votes', title='Total Votes vs Ad Spend', hover_name='State')
fig.show()


In [12]:
# 4. Relationship Between Number of Ads and Ad Spend
fig = px.scatter(advertisers_df, x='Number of ads in Library', y='Amount spent (INR)', title='Number of Ads vs Ad Spend', hover_name='Page name')
fig.show()

In [13]:
advertisers_df['Amount spent (INR)'] = pd.to_numeric(advertisers_df['Amount spent (INR)'], errors='coerce')

advertisers_df.dropna(subset=['Amount spent (INR)'], inplace=True)

party_ad_spend = advertisers_df.groupby('Page name')['Amount spent (INR)'].sum().sort_values(ascending=False)

top_5_parties = party_ad_spend.head(5).reset_index()

colors = ['#ff9999', '#66b3ff', '#99ff99', '#ffcc99', '#c2c2f0']

fig = px.pie(top_5_parties, values='Amount spent (INR)', names='Page name',
             title='Top 5 Parties by Ad Spend', color_discrete_sequence=colors,
             labels={'Page name': 'Political Party', 'Amount spent (INR)': 'Ad Spend (INR)'})

fig.update_traces(textinfo='percent')

fig.update_layout(
    showlegend=True,
    legend=dict(
        orientation="v",
        yanchor="top",
        y=1,
        xanchor="left",
        x=-0.3
    ),
    title=dict(
        y=0.95,
        x=0.5,
        xanchor='center',
        yanchor='top'
    ),
    margin=dict(l=200, r=50, t=100, b=50) 
)

fig.show()

In [14]:
correlation = merged_df[['Amount spent (INR)', 'Polled (%)']].corr()
print(correlation)

                    Amount spent (INR)  Polled (%)
Amount spent (INR)            1.000000   -0.010688
Polled (%)                   -0.010688    1.000000


In [15]:
merged_df

Unnamed: 0,Location name,Amount spent (INR),_id,Sl No,State,PC_Name,Total Electors,Polled (%),Total Votes,Phase
0,Andhra Pradesh,100819732,287,1.0,Andhra Pradesh,Amalapuram (SC),1531410,83.85,1284018,4.0
1,Andhra Pradesh,100819732,288,2.0,Andhra Pradesh,Anakapalle,1596916,82.03,1309977,4.0
2,Andhra Pradesh,100819732,289,3.0,Andhra Pradesh,Ananthapur,1767591,80.51,1423108,4.0
3,Andhra Pradesh,100819732,290,4.0,Andhra Pradesh,Araku (ST),1554633,73.68,1145426,4.0
4,Andhra Pradesh,100819732,291,5.0,Andhra Pradesh,Bapatla (SC),1506354,85.48,1287704,4.0
...,...,...,...,...,...,...,...,...,...,...
514,West Bengal,77244996,545,53.0,West Bengal,Jadavpur,2033525,76.68,1559330,0.0
515,West Bengal,77244996,546,54.0,West Bengal,Joynagar,1844780,80.08,1477298,0.0
516,West Bengal,77244996,547,55.0,West Bengal,Kolkata Dakshin,1849520,66.95,1238256,0.0
517,West Bengal,77244996,548,56.0,West Bengal,Kolkata Uttar,1505356,63.59,957319,0.0


In [18]:
import plotly.graph_objects as go

phase_analysis = merged_df.groupby('Phase').agg({
    'Amount spent (INR)': 'sum',
    'Polled (%)': 'mean'
}).reset_index()

fig = go.Figure()

fig.add_trace(go.Bar(
    x=phase_analysis['Phase'],
    y=phase_analysis['Amount spent (INR)'],
    name='Ad Spend (INR)',
    marker_color='blue',
    yaxis='y1'
))

fig.add_trace(go.Scatter(
    x=phase_analysis['Phase'],
    y=phase_analysis['Polled (%)'],
    name='Voter Turnout (%)',
    marker_color='lightsalmon',
    yaxis='y2'
))

fig.update_layout(
    title='Ad Spend and Voter Turnout by Election Phase',
    xaxis=dict(title='Election Phase'),
    yaxis=dict(
        title='Ad Spend (INR)',
        titlefont=dict(color='indianred'),
        tickfont=dict(color='indianred')
    ),
    yaxis2=dict(
        title='Voter Turnout (%)',
        titlefont=dict(color='lightsalmon'),
        tickfont=dict(color='lightsalmon'),
        overlaying='y',
        side='right'
    ),
    legend=dict(x=0.1, y=1.1, orientation='h'),
    width=800,
    height=600
)

fig.show()

### Summary

The analysis reveals no consistent trend between ad spending and voter turnout. Notably, election phases 1 and 4 exhibit the highest ad expenditures. Phase 4, with the peak ad spend, sees the highest voter turnout at around 70%. However, phase 1, despite also having significant ad spend, experiences a lower voter turnout of about 67%. Phases with moderate ad spend, such as 2 and 6, demonstrate lower voter turnout. Interestingly, phase 5 has a notably low turnout despite its moderate spending levels. This irregular pattern suggests that factors beyond ad spending influence voter engagement and turnout.

