<a href="https://colab.research.google.com/github/gndede/myproject/blob/master/Happiness_Analysis_282.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
#import the data analysis libraries
import pandas as pd
import numpy as np

In [None]:
#load the world happiness dataset
world_happiness = pd.read_csv("/content/2019.csv")

In [None]:
print(world_happiness.info())

In [None]:
print(world_happiness.describe())

In [None]:
#rename the columns country and corruption instead for columns 2 and 9
new_world_happiness = world_happiness.rename(columns={'Country or region': 'country', 'Perceptions of corruption': 'corruption'})

In [None]:
#list the first 5 countries
new_world_happiness.head(5)

In [None]:
#list the last 5 countries
new_world_happiness.tail(5)

In [None]:
countries = new_world_happiness.loc[[152, 151, 135, 120, 144, 155]]

**Data Collection and Preparation**

**Gather Data:**
Collect data from sources like the World Bank, the World Health Organization, and the Gallup World Poll. This data should include the factors mentioned earlier, for the population or regions of interest.

In [None]:
countries

In [None]:
#rename the columns to conform to the columns for study
data = world_happiness.rename(columns={'GDP per capita': 'GDP_per_capita',
                          'Social support': 'Social_support',
                          'Healthy life expectancy':'Healthy_life_expectancy',
                          'Freedom_to_make_life_choices':'Freedom_to_make_life_choices',
                          'Generosity':'Generosity',
                          'corruption':'Perceptions_of_corruption'})

**Create a Dataframe:**
Use the pandas library to create a dataframe, organizing the data with columns for each factor and rows for each population group (e.g., country or/and region).

In [None]:
data = {
    'Country': ['Tanzania', 'Rwanda', 'Uganda', 'Kenya', 'Burundi','South Sudan'],
    'GDP_per_capita': [0.476, 0.359, 0.332, 0.512, 0.046, 0.306],
    'Social_support': [0.885, 0.711, 0.069, 0.983, 0.447, 0.575],
    'Healthy_life_expectancy': [0.499, 0.614, 0.443, 0.581, 0.380, 0.295],
    'Freedom_to_make_life_choices': [0.417, 0.555, 0.356, 0.431, 0.220, 0.010],
    'Generosity': [0.276, 0.217, 0.252, 0.372, 0.176, 0.202],
    'Perceptions_of_corruption': [0.147, 0.411, 0.060, 0.053, 0.180, 0.091]
}

df = pd.DataFrame(data)

**Calculate the Happiness Index**

**Weighting:**
Assign weights to each factor based on their relative importance. The World Happiness Report uses equal weights, but these can be adjusted based on specific research goals.

**Normalize Data (Optional):**
If the data is on different scales, normalize it to a common scale (e.g., 0 to 1) to ensure fair comparison.

**Calculate the Index:**
Multiply each factor by its weight and sum the results for each population group.

In [None]:
weights = {
    'GDP_per_capita': 1,
    'Social_support': 1,
    'Healthy_life_expectancy': 1,
    'Freedom_to_make_life_choices': 1,
    'Generosity': 1,
    'Perceptions_of_corruption': 1
}

In [None]:
df['Happiness_index'] = (
    df['GDP_per_capita'] * weights['GDP_per_capita'] +
    df['Social_support'] * weights['Social_support'] +
    df['Healthy_life_expectancy'] * weights['Healthy_life_expectancy'] +
    df['Freedom_to_make_life_choices'] * weights['Freedom_to_make_life_choices'] +
    df['Generosity'] * weights['Generosity'] +
    df['Perceptions_of_corruption'] * weights['Perceptions_of_corruption']
)

**Analyze:**
Examine the calculated happiness index for each population group. Compare the indices and identify trends or patterns.

In [None]:
#Visualize:
#Use libraries like matplotlib or seaborn to create visualizations, such as bar charts or maps, to represent the happiness index.
import matplotlib.pyplot as plt
plt.bar(df['Country'], df['Happiness_index'])
plt.xlabel('Country')
plt.ylabel('Happiness Index')
plt.title('Happiness Index by Country')
plt.show()

In [None]:
# Step 5: Analyze factors influencing happiness
# This pair plot allows us to examine the relationships between key factors (GDP per Capita, Family, Health) and Happiness Score.
# Each point in the scatter plots represents a country, and the patterns can reveal insights into which factors contribute most to overall happiness.
import seaborn as sns
sns.pairplot(df, x_vars=['GDP_per_capita', 'Generosity', 'Healthy_life_expectancy'], kind='scatter')
plt.suptitle('Factors Influencing Happiness')
plt.show()

**Calculate Correlation Matrix.**
Compute the correlation matrix using the .corr() method on the DataFrame.

In [None]:
#Choose the relevant variables for the correlation analysis, including the happiness score and the factors contributing to it.
df_corr = df[['GDP_per_capita', 'Social_support', 'Healthy_life_expectancy', 'Freedom_to_make_life_choices', 'Generosity', 'Perceptions_of_corruption']]
df_corr.corr()

This code will output a heatmap showing the correlation coefficients between the selected variables in the Happiness Index dataset. The annot=True argument displays the correlation values on the heatmap, cmap='coolwarm' sets the color scheme, and fmt=".2f" formats the correlation values to two decimal places.

In [None]:
correlation_matrix = df_corr.corr(numeric_only=True)
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt=".2f")
plt.title('Correlation Matrix')
plt.show()