# Election Data Analysis - Notebook 1: Data  Preparation and EDA


# Business Understanding

This notebook is part of an election data analysis project using the CRISP-DM methodology. The primary objectives are:

1. Understand voter participation trends by demographics (e.g., age, gender, education, employment status).
2. Analyze political interest and its variations across demographic groups (e.g., gender, marital status, education).
3. Investigate turnout predictions based on willingness to vote and historical trends.
4. Explore the impact of household income on voter turnout and priority issues.
5. Identify key local issues and their variation by demographics and geography (e.g., urban vs. rural).
6. Study the distribution of reasons for not voting.
7. Examine sources of political news and their usage trends across demographic groups.
8. Analyze engagement with political news and its variation by education level, gender and occupation.


In [1]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set visualization style
sns.set_theme(style="whitegrid")

import warnings
warnings.filterwarnings('ignore')

# Data Understanding

In this step, we will:
- Load the dataset.
- Inspect its structure and metadata.
- Summarize key statistics.
- Check for missing or inconsistent values.

In [7]:
# Load the dataset
file_path = "Political_form_-_all_versions_-_labels_-_2023-07-01-17-04-33.xlsx"

data = pd.read_excel(file_path)

# Inspect the dataset
print("Dataset Overview:")
print(data.info())

Dataset Overview:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4537 entries, 0 to 4536
Data columns (total 52 columns):
 #   Column                                                                                                                                                                                      Non-Null Count  Dtype         
---  ------                                                                                                                                                                                      --------------  -----         
 0   start                                                                                                                                                                                       4537 non-null   datetime64[ns]
 1   end                                                                                                                                                                                         4537 non-n

In [6]:
# First few rows of the data
data.head()

Unnamed: 0,start,end,Enrollment Number,Location,_Location_latitude,_Location_longitude,_Location_altitude,_Location_precision,Date and Time,Age,...,_id,_uuid,_submission_time,_validation_status,_notes,_status,_submitted_by,__version__,_tags,_index
0,2023-05-15 15:14:02.994,2023-05-15 15:17:11.162,4001,-0.3552632 34.7552613 0.0 3000.0,-0.355263,34.755261,0.0,3000.0,2023-05-15 15:14:00,50-59,...,238327833,ad968d31-2518-4884-83cf-82701c682c16,2023-05-15 12:20:00,,,submitted_via_web,safra_data,vNX83DW9mzvDcv5eBqBF7x,,1
1,2023-05-15 22:25:48.557,2023-05-15 22:27:44.505,800116,0.336606 37.5641233 1141.9 4.466,0.336606,37.564123,1141.9,4.466,2023-05-15 22:26:00,20-29,...,238436622,dad9bd3f-8916-4d1a-8d44-aba0a8b84dff,2023-05-15 19:27:52,,,submitted_via_web,safra_data,vNX83DW9mzvDcv5eBqBF7x,,2
2,2023-05-15 22:32:38.130,2023-05-15 22:45:09.993,800111,-1.153349 36.9162069 1537.5999755859375 20.0,-1.153349,36.916207,1537.599976,20.0,2023-05-15 22:34:00,20-29,...,238438952,53fa1ac1-4153-4d3d-a875-fe5db0a0e4c6,2023-05-15 19:46:33,,,submitted_via_web,safra_data,vNX83DW9mzvDcv5eBqBF7x,,3
3,2023-05-16 10:17:05.512,2023-05-16 10:22:31.477,800140,-1.2547935 36.9001899 0.0 20.9,-1.254793,36.90019,0.0,20.9,2023-05-16 10:17:00,20-29,...,238519273,3d7ac288-cf9b-47e2-a658-ed741ae23895,2023-05-16 07:22:46,,,submitted_via_web,safra_data,vNX83DW9mzvDcv5eBqBF7x,,4
4,2023-05-18 08:23:59.311,2023-05-18 08:33:17.272,800146,-2.6296274 38.1185042 750.603 3.9,-2.629627,38.118504,750.603,3.9,2023-05-18 08:27:00,20-29,...,239059150,a1609008-d0b7-4f22-a170-45a9599a5d7e,2023-05-18 05:33:30,,,submitted_via_web,safra_data,vNX83DW9mzvDcv5eBqBF7x,,5
