#  Data analysis portfolio project in Excel. Decoding Health Patterns: Obesity and Demographic Factors in USA
![pic](https://miro.medium.com/v2/resize:fit:1128/format:webp/0*z7EjXS-ks4utPZ7A.jpg)

## Introduction
In this project, I’m digging into obesity and overweight trends from **2009 to 2021**. The investigation extends to exploring the intricate correlations among **obesity, inadequate vegetable consumption, and insufficient physical activity**. Furthermore, the analysis aims to discern variations in these factors across diverse demographic groups, categorized by **gender, age, education level, income, and race/ethnicity**. I’m following the usual steps in data analysis: ask, prepare, process, analyze, share, and act.

## Ask
Here are the main things I’m curious about:
- How have overweight and obesity rates changed over the examined years?
- How do **obesity rates, low vegetable consumption, and low physical activity** vary across different demographic categories such as **age groups, genders, education levels, income brackets, and racial/ethnic groups**?
- Top 10 regions with higher prevalence of **obesity**.

## Prepare
The dataset utilized for this project was sourced from the **Centers for Disease Control and Prevention**. They have this dataset called **“Nutrition, Physical Activity, and Obesity — Behavioral Risk Factor Surveillance System.”** Using this platform made it easy to carefully look at the data and get the info I needed for the project. I got the dataset in CSV format and smoothly brought it into **Excel** to dig deeper.

## Process
Upon initial review, the dataset presented a **raw and unstructured form**:

![cleaning1](https://miro.medium.com/v2/resize:fit:4800/format:webp/0*THGELd5p5AiVjOOq)

To enhance its usability, I transformed the data into a table. Then sorted columns tin order to bring it more structure.

![cleaning2](https://miro.medium.com/v2/resize:fit:4800/format:webp/0*Y-SoCmOgkP7Ee0dh)

**Subsequently**, I optimized column names for clarity, eliminating any redundant information. The **“Value”** column, denoting the **percentage of adults** in each category, I converted into **numeric values after replacing commas with periods**. A thorough check for **duplicates was conducted (none were found)**. The **“Questions”** column was shortened for **conciseness and clarity**, encapsulating the most pertinent information.

![cleaning3](https://miro.medium.com/v2/resize:fit:4800/format:webp/0*LMeQ4N5GuA6nZ-uq)

Now, my data’s looking neat, and I’m ready to dive into the analysis.

![cleaning](https://miro.medium.com/v2/resize:fit:4800/format:webp/0*C4ktSLoWD5bV5R-e)

To make it clearer, the dataset is divided into the following classes, each associated with a set of related questions:

**The Obesity/Weight Status class** presented two groups of adults: those who have **overweight** and those who have **obesity**;

**Physical Activity:** different intensities of physical activity were presented. For my analysis, I focused on the category of individuals who engage in **no leisure-time physical activity**.

In the **Fruits and Vegetables class**, I selected the category of individuals who report consuming vegetables **less than once daily**.

Additionally, the following stratification categories were accessible for analysis:

**Age (years):**
- 18–24;
- 25–34;
- 35–44;
- 45–54;
- 55–64;
- 65 or older;

**Education:**
- Less than high school;
- High school graduate;
- Some college or technical school;
- College graduate

**Gender:**
- Female;
- Male;

**Income:**
- Less than $15,000
- $15,000 — $24,999;
- $25,000 — $34,999;
- $35,000 — $49,999;
- $50,000 — $74,999
- $75,000 or greater;

**Race/Ethnicity:**
- American Indian/Alaska Native;
- Asian;
- Hawaiian/Pacific Islander;
- Hispanic;
- Non-Hispanic Black;
- Non-Hispanic White;
- 2 or more races;
- Other;
- Total.

##  Analyse and Share
### How have overweight and obesity rates changed over the examined years?

I made a pivot table with **“Year”** in rows and **“Question”** in columns. For filters, I selected **“National”** in Location to focus on the entire country, and under Category, I chose **“Total.”**

![Obesity Trends](https://miro.medium.com/v2/resize:fit:1400/format:webp/0*FWy18yzLMfy0ymQH)

Based on the **line chart**, it’s apparent that overweight rates remained relatively stable with a slight decrease over the years. However, the trend for **obesity** shows a gradual increase almost every year. By creating a trendline for the next three years, it suggests a potential growth of around **2.3% in 2024**.

### How do obesity rates, low vegetable consumption, and low physical activity vary across different demographic categories such as age groups, genders, education levels, income brackets, and racial/ethnic groups?

For the subsequent analysis, I maintained the filters, examining data for the entire population but within distinct categories.

In the classification, I concentrated on three criteria: **obesity, absence of leisure-time physical activity, and daily vegetable consumption less than once**.

![gender distribution](https://miro.medium.com/v2/resize:fit:1400/format:webp/0*gyv-5QhCixjcC_-X)

Between the genders, there is **no correlation in obesity**; both are equal. A slight correlation is observed in the **absence of leisure-time physical activity**, where women tend to be less active. In other criteria, such as **vegetable consumption**, men tend to eat fewer vegetables.

![age_groups](https://miro.medium.com/v2/resize:fit:1400/format:webp/0*DJzEbuL1HvOp8m5r)

The following **line chart** illustrates that the prevalence of **obesity increases with age**, with the group aged **45–64 showing the highest tendency toward the condition**. However, this trend decreases by approximately **7% among individuals aged 65 and older**. The lack of **leisure-time physical activity** gradually increases with age. Additionally, the age group of **18–24 exhibits the highest occurrence of low vegetable consumption**, while the rest of the population shows less pronounced correlation in this aspect.

![income related](https://miro.medium.com/v2/resize:fit:1400/format:webp/0*WhviE8qK5Xiz_a8Y)

The graph clearly illustrates a trend of **decreasing obesity with increasing income**. A similar correlation is evident in other criteria: individuals with **higher income tend to engage in more leisure-time physical activities and consume vegetables more frequently** compared to groups with lower income.

![racial/ethnic groups](https://miro.medium.com/v2/resize:fit:1400/format:webp/0*u0V2KHSoqAltLd0m)

The **bar chart** depicts the **Racial/Ethnic groups based on the increase in obesity rates**, with a higher tendency towards the right. It highlights that the **Asian group has the least inclination towards obesity, almost one-third of the rating observed in the next group (Non-Hispanic White)**. Conversely, the group with the highest tendency is **Non-Hispanic Black**. When comparing the two groups with the **lowest and highest propensity for obesity**, it becomes evident that other criteria correlate with the same pattern. However, **no evident correlation among obesity rates, low vegetable consumption, and low physical activity** is discernible from the graph.

![education level](https://miro.medium.com/v2/resize:fit:1400/format:webp/0*HoSHnX-VgaSFquYX)

The **graph** demonstrates a clear correlation between the three criteria and **education level**: individuals with **higher education levels exhibit a lower propensity for obesity, engage in more physical activity during leisure, and consume more vegetables**.

### Top 10 regions with higher prevalence of obesity
In the initial table, employing **conditional formatting** and sorting the data based on the percentage of adults with obesity in the “Total” category, I identified the top 10 regions with the highest prevalence of obesity:

1. West Virginia
2. Kentucky
3. Alabama
4. Oklahoma
5. Mississippi
6. Arkansas
7. Louisiana
8. South Dakota
9. Ohio
10. New Jersey

You can visualize the distribution of obesity levels on the **map-graph**:

![obesity prevalence](https://miro.medium.com/v2/resize:fit:4800/format:webp/1*482vrpZ_tKsG3puAcNl1bQ.png)

## Conclusion
In this project, I conducted a thorough analysis of obesity and related factors spanning the years 2009–2021. By exploring trends in obesity rates, vegetable consumption, and physical activity levels, I aimed to uncover patterns across various demographic categories such as age groups, genders, education levels, income brackets, and racial/ethnic groups.

### Notable Observations:
- The trendline projection for the next three years suggested a potential growth of around **2.3% in obesity rates by 2024**.
- Between the genders, there is **no correlation in obesity** noted.
- There is a **positive correlation between age and obesity prevalence**, peaking in the 45–64 age group. Notably, there is a subsequent decrease among individuals aged 65 and older. Lack of leisure-time physical activity tends to rise gradually with age. In terms of low vegetable consumption, the age group of 18–24 stands out with the highest occurrence. The findings highlight age as a significant factor influencing obesity rates, physical activity levels, and dietary habits.
- Individuals with higher income levels demonstrate a notable trend of **lower obesity rates, higher engagement in physical activity during leisure, and more frequent consumption of vegetables**. This correlation suggests a potential link between socio-economic status and healthier lifestyle choices.
- Stratification by **race/ethnicity showcased significant variations**, with the Asian group exhibiting the lowest tendency for obesity and the Non-Hispanic Black group showing the highest.
- A compelling correlation was observed between **education level and the three criteria**, indicating that individuals with higher education levels tend to have **lower obesity rates, engage in more physical activity, and consume more vegetables**.

Understanding these patterns is crucial for tailoring effective public health interventions and targeted strategies to address the diverse health needs of different age groups.


