![life](life.jpg)

You are a data analyst for a United Nations initiative focused on understanding global health trends. Your latest assignment is to explore and visualize life expectancy data from around the world, focusing on gender differences. 

Life expectancy can vary significantly over time and across different countries due to numerous factors, including advancements in medicine, a country's development level, and the impacts of conflicts. Interestingly, data consistently shows that women tend to live longer than men, raising intriguing questions. Could this be due to biological factors or perhaps because women generally care for their health better?

Your task is to explore these patterns and disparities. 

### The Data

The dataset contains information about life expectancy in various countries or areas, broken down by gender and time periods. The data is sourced from the _United Nations Population Division, Gender Statistics, Life Expectancy at Birth_.

#### UNdata.csv

| Column            | Meaning                                                                                        |
| ----------------- | ---------------------------------------------------------------------------------------------- |
| `Country.or.Area` | The name of the country or region being described.                                              |
| `Subgroup`        | The specific subgroup within the country or area (e.g., Female, Male).                          |
| `Year`            | The time period for the data provided (e.g., 2000-2005).                                        |
| `Source`          | The source of the data, specifying the UN publication or report where the data originated.      |
| `Unit`            | The unit of measurement for life expectancy.                  |
| `Value`           | The measured value for the life expectancy in the specified country, subgroup, and time period. |
| `Value.Footnotes` | Additional notes or comments related to the value, if any.                                      |


Analyze and visualize global life expectancy data with a focus on gender disparities, using a United Nations dataset. As part of your analysis, answer the following key questions:

Does the Value column contain any missing data?

Save your answer as a boolean variable (TRUE or FALSE) named missing.

How does life expectancy differ between men and women across countries overall, in the 2000-2005 period? 
Save your answer as a variable named subgroup with the value "Female" if female life expectancy is higher, and "Male" if male life expectancy is higher.

Which countries exhibit the largest disparities in life expectancy between genders, in the 2000-2005 subgroup? 
Save the top 3 countries with the largest male-female disparities as a variable named disparities.

In [None]:
library(dplyr)
library(tidyr)
library(ggplot2)
life_expectancy = read.csv("datasets/UNdata.csv") # nolint

In [None]:
missing <- life_expectancy %>%
  filter(is.na(Value)) %>%
  count()
   # nolint
print(missing)

In [None]:
subgroup_data <- life_expectancy %>%
  filter(Year == '2000-2005' & (Subgroup == "Female" | Subgroup == "Male")) %>%
  spread(key = Subgroup, value = Value) %>%
  # Spread to wide format  # nolint
  mutate(subgroup = ifelse(Female > Male, "Female", ifelse(Female < Male, "Male", "Female"))) # nolint

In [None]:
# Create a scatter plot with a smooth line
ggplot(subgroup_data, aes(x = Female, y = Male, color = subgroup)) +
  geom_point() + 
  #geom_smooth(formula = y ~ x, method = "lm", se = FALSE) +
  labs(title = "Comparison of Values between Male and Female by Country",
	   x = "Male Value", # nolint
	   y = "Female Value", # nolint
	   color = "Dominant Subgroup") + # nolint
  theme_minimal()

In [None]:
subgroup <- subgroup_data %>%
  summarise(highest_avg_life_expectancy = ifelse(mean(Female, na.rm = TRUE) >
            mean(Male, na.rm = TRUE), "Female","Male")) # nolint

In [None]:
disparities_data <- life_expectancy %>% 
  filter(Year == '2000-2005' & (Subgroup == "Female" | Subgroup == "Male")) %>%
  pivot_wider(names_from = Subgroup, values_from = Value) %>%
  # Spread to wide format  # nolint
  mutate(subgroup = ifelse(Female > Male, "Female", ifelse(Female < Male, "Male", "Female"))) %>%  # nolint
  mutate(disparities = abs(Female - Male)) %>%
  arrange(desc(disparities)) %>%
  slice_max(disparities, n = 3, with_ties = FALSE)

In [None]:
disparities <- c(disparities_data$Country.or.Area)

In [None]:

# Join Male and Female data by 'Country.or.Area' and add a subgroup column
#subgroup <- Female %>%
# inner_join(Male, by = "Country.or.Area", suffix = c("_female", "_male")) %>%
# mutate(subgroup = ifelse(Value_female > Value_male, "Female", 
#                       ifelse(Value_female < Value_male, "Male", "Equal")))

# View the resulting data
print(subgroup)
