# World Happiness Report 2021 Analysis

This notebook analyzes the World Happiness Report 2021 dataset using Python and NumPy. We will load the data, perform statistical analysis, and explore correlations between happiness scores and various factors.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Set print options for better readability
np.set_printoptions(precision=3, suppress=True)

## 1. Data Loading

We use `np.genfromtxt` to load the dataset. Since the dataset contains both strings (country names) and numbers, we specify `dtype=None` and `encoding='utf-8-sig'` (to handle BOM).

In [None]:
# Load the dataset
data = np.genfromtxt('world-happiness-report-2021.csv', delimiter=',', dtype=None, names=True, encoding='utf-8-sig')

# Display the column names
print("Column names:")
print(data.dtype.names)

## 2. Basic Statistics

Let's calculate the mean, median, and standard deviation for the 'Ladder score' (Happiness Score).

In [None]:
ladder_scores = data['Ladder_score']

mean_score = np.mean(ladder_scores)
median_score = np.median(ladder_scores)
std_score = np.std(ladder_scores)

print(f"Mean Ladder Score: {mean_score:.3f}")
print(f"Median Ladder Score: {median_score:.3f}")
print(f"Standard Deviation: {std_score:.3f}")

## 3. Top and Bottom Countries

Identify the 10 happiest and 10 least happy countries.

In [None]:
# Sort the data by Ladder score in descending order
sorted_indices = np.argsort(data['Ladder_score'])[::-1]
sorted_data = data[sorted_indices]

top_10 = sorted_data[:10]
bottom_10 = sorted_data[-10:]

print("Top 10 Happiest Countries:")
for i, country in enumerate(top_10):
    print(f"{i+1}. {country['Country_name']} ({country['Ladder_score']:.3f})")

print("\nBottom 10 Least Happy Countries:")
for i, country in enumerate(bottom_10):
    print(f"{len(data) - 9 + i}. {country['Country_name']} ({country['Ladder_score']:.3f})")

## 4. Regional Analysis

Calculate the average happiness score for each `Regional indicator`.

In [None]:
regions = np.unique(data['Regional_indicator'])

print("Average Fairness Score by Region:")
region_scores = []
for region in regions:
    # Filter data for the current region
    region_mask = data['Regional_indicator'] == region
    region_data = data[region_mask]
    avg_score = np.mean(region_data['Ladder_score'])
    region_scores.append((region, avg_score))

# Sort regions by score
region_scores.sort(key=lambda x: x[1], reverse=True)

for region, score in region_scores:
    print(f"{region}: {score:.3f}")

## 5. Correlations

Let's see which factors correlate most strongly with the Ladder score.
Factors to consider:
- Logged GDP per capita
- Social support
- Healthy life expectancy
- Freedom to make life choices
- Generosity
- Perceptions of corruption

In [None]:
factors = [
    'Logged_GDP_per_capita',
    'Social_support',
    'Healthy_life_expectancy',
    'Freedom_to_make_life_choices',
    'Generosity',
    'Perceptions_of_corruption'
]

print("Correlation with Ladder Score:")
for factor in factors:
    correlation = np.corrcoef(data['Ladder_score'], data[factor])[0, 1]
    print(f"{factor}: {correlation:.3f}")