## Theoretical Framework

In this project, we aim to develop a **Composite Happiness Index** for cities. The index is designed to measure the overall happiness and well-being of a city’s population based on several factors:

- **Happiness Score**: A direct measure of overall happiness in the city.
- **Air Quality Index**: A measure of the city's air quality and pollution.
- **Green Space Area**: The amount of green space in the city, which impacts health and well-being.
- **Cost of Living**: A measure of how expensive it is to live in the city.
- **Healthcare Index**: A measure of the quality of healthcare in the city.
- **Traffic Density**: How congested the city is, which affects both quality of life and health.
- **Noise Level (Decibel Level)**: How much noise pollution the city has.

Each of these factors will be weighted and combined into a single index to provide a comprehensive view of a city's overall happiness and quality of life.


In [3]:
!pip install pandas numpy matplotlib




[notice] A new release of pip is available: 25.0 -> 25.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [5]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


In [9]:
import pandas as pd

# Load the data
file_path = 'C:/Users/SEJAL/DAV/Project/test.csv'

df = pd.read_csv(file_path)

# Display the first few rows of the data
df.head()


Unnamed: 0,City,Month,Year,Decibel_Level,Traffic_Density,Green_Space_Area,Air_Quality_Index,Happiness_Score,Cost_of_Living_Index,Healthcare_Index
0,Auckland,January,2030,55,Low,80,40,8.4,110,97
1,Berlin,January,2030,50,Low,60,45,7.9,80,93
2,Cairo,January,2030,75,Very High,15,110,4.1,55,69
3,Denver,January,2030,60,Medium,40,50,7.5,95,89
4,Edinburgh,January,2030,55,Low,65,55,7.8,85,92


In [10]:
df.isnull().sum()


City                    0
Month                   0
Year                    0
Decibel_Level           0
Traffic_Density         0
Green_Space_Area        0
Air_Quality_Index       0
Happiness_Score         0
Cost_of_Living_Index    0
Healthcare_Index        0
dtype: int64

In [11]:
df.dtypes


City                     object
Month                    object
Year                      int64
Decibel_Level             int64
Traffic_Density          object
Green_Space_Area          int64
Air_Quality_Index         int64
Happiness_Score         float64
Cost_of_Living_Index      int64
Healthcare_Index          int64
dtype: object

In [12]:
df.describe()


Unnamed: 0,Year,Decibel_Level,Green_Space_Area,Air_Quality_Index,Happiness_Score,Cost_of_Living_Index,Healthcare_Index
count,51.0,51.0,51.0,51.0,51.0,51.0,51.0
mean,2030.0,63.333333,46.666667,66.666667,6.919608,83.823529,83.705882
std,0.0,8.103497,20.116328,41.432676,1.265546,24.074639,12.954218
min,2030.0,50.0,10.0,25.0,3.9,35.0,43.0
25%,2030.0,57.5,32.5,40.0,6.35,65.0,74.5
50%,2030.0,60.0,45.0,55.0,7.3,85.0,88.0
75%,2030.0,70.0,62.5,77.5,7.85,102.5,93.5
max,2030.0,85.0,80.0,220.0,8.7,130.0,104.0


In [14]:
# Fill missing values in numeric columns with the mean
numeric_columns = df.select_dtypes(include=['int64', 'float64']).columns
df[numeric_columns] = df[numeric_columns].fillna(df[numeric_columns].mean())

# Convert Traffic_Density to numeric (Low = 1, Medium = 2, High = 3)
df['Traffic_Density'] = df['Traffic_Density'].map({'Low': 1, 'Medium': 2, 'High': 3})

# Check the cleaned data
df.head()


Unnamed: 0,City,Month,Year,Decibel_Level,Traffic_Density,Green_Space_Area,Air_Quality_Index,Happiness_Score,Cost_of_Living_Index,Healthcare_Index
0,Auckland,January,2030,55,1.0,80,40,8.4,110,97
1,Berlin,January,2030,50,1.0,60,45,7.9,80,93
2,Cairo,January,2030,75,,15,110,4.1,55,69
3,Denver,January,2030,60,2.0,40,50,7.5,95,89
4,Edinburgh,January,2030,55,1.0,65,55,7.8,85,92


In [None]:
df.dtypes
