<a href="https://colab.research.google.com/github/fortune-uwha/eating-disorder-dataset/blob/main/Eating_disorder.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Introduction

There is a commonly held misconception that eating disorders are a lifestyle choice. Eating disorders are actually serious and often fatal illnesses that are associated with severe disturbances in people’s eating behaviors and related thoughts and emotions. Preoccupation with food, body weight, and shape may also signal an eating disorder. Common eating disorders include anorexia nervosa, bulimia nervosa, and binge-eating disorder-**National Institute of Mental Health(NIMH)**

**Data Source:** Data was obtained from Our World in Data and sourced from the Institute for Health Metrics and Evaluation, Global Burden of Disease(2019)

In [1]:
import numpy as np
import pandas as pd

In [2]:
eating_disorder_df = pd.read_csv('/content/prevalence-of-eating-disorders-by-age.csv')
eating_disorder_df.head()

Unnamed: 0,Entity,Code,Year,Prevalence - Eating disorders - Sex: Both - Age: 20 to 24 (Percent),Prevalence - Eating disorders - Sex: Both - Age: 10 to 14 (Percent),Prevalence - Eating disorders - Sex: Both - Age: All Ages (Percent),Prevalence - Eating disorders - Sex: Both - Age: 30 to 34 (Percent),Prevalence - Eating disorders - Sex: Both - Age: 25 to 29 (Percent),Prevalence - Eating disorders - Sex: Both - Age: 5-14 years (Percent),Prevalence - Eating disorders - Sex: Both - Age: 50-69 years (Percent),Prevalence - Eating disorders - Sex: Both - Age: Age-standardized (Percent),Prevalence - Eating disorders - Sex: Both - Age: 70+ years (Percent),Prevalence - Eating disorders - Sex: Both - Age: 15 to 19 (Percent)
0,Afghanistan,AFG,1990,0.31,0.1,0.12,0.3,0.31,0.05,0,0.13,0,0.28
1,Afghanistan,AFG,1991,0.3,0.1,0.12,0.29,0.29,0.04,0,0.13,0,0.27
2,Afghanistan,AFG,1992,0.29,0.09,0.12,0.28,0.28,0.04,0,0.12,0,0.26
3,Afghanistan,AFG,1993,0.28,0.09,0.12,0.27,0.27,0.04,0,0.12,0,0.25
4,Afghanistan,AFG,1994,0.27,0.09,0.12,0.26,0.26,0.04,0,0.11,0,0.25


In [30]:
def clean_column_headers(df) -> pd.DataFrame:
    """
    Cleans column headers by removing the string "Prevalence - Eating disorders - Sex: Both - Age: "
    from the beginning of each header and removing the string "(Percent)" from the end of each header.

    Args:
        df (pandas.DataFrame): The DataFrame to clean.

    Returns:
        pandas.DataFrame: The cleaned DataFrame.
    """
    df.columns = df.columns.str.replace("Prevalence - Eating disorders - Sex: Both - Age: ", "")
    df.columns = df.columns.str.replace("\(Percent\)", "", regex=True)
    
    # Strip whitespace from column names
    df.columns = df.columns.str.strip()
    return df

In [35]:
cleaned_eating_disorder_df = clean_column_headers(eating_disorder_df)
cleaned_eating_disorder_df

Unnamed: 0,Entity,Code,Year,20 to 24,10 to 14,All Ages,30 to 34,25 to 29,5-14 years,50-69 years,Age-standardized,70+ years,15 to 19
0,Afghanistan,AFG,1990,0.31,0.10,0.12,0.30,0.31,0.05,0,0.13,0,0.28
1,Afghanistan,AFG,1991,0.30,0.10,0.12,0.29,0.29,0.04,0,0.13,0,0.27
2,Afghanistan,AFG,1992,0.29,0.09,0.12,0.28,0.28,0.04,0,0.12,0,0.26
3,Afghanistan,AFG,1993,0.28,0.09,0.12,0.27,0.27,0.04,0,0.12,0,0.25
4,Afghanistan,AFG,1994,0.27,0.09,0.12,0.26,0.26,0.04,0,0.11,0,0.25
...,...,...,...,...,...,...,...,...,...,...,...,...,...
6835,Zimbabwe,ZWE,2015,0.25,0.07,0.11,0.23,0.24,0.04,0,0.10,0,0.20
6836,Zimbabwe,ZWE,2016,0.25,0.07,0.11,0.23,0.24,0.04,0,0.10,0,0.20
6837,Zimbabwe,ZWE,2017,0.25,0.08,0.11,0.23,0.24,0.04,0,0.10,0,0.20
6838,Zimbabwe,ZWE,2018,0.25,0.08,0.11,0.23,0.24,0.04,0,0.10,0,0.20


In [39]:
# Transform the data
transformed_eating_disorder_df = pd.melt(cleaned_eating_disorder_df, id_vars=["Entity", "Code","Year"], value_vars=["20 to 24", "10 to 14", "All Ages", "30 to 34", "25 to 29", 
                                                                                 "5-14 years", "50-69 years", "Age-standardized", "70+ years", "15 to 19"], value_name="Prevalence Range")

# Display the transformed DataFrame
print(transformed_eating_disorder_df)

            Entity Code  Year  variable  Prevalence Range
0      Afghanistan  AFG  1990  20 to 24              0.31
1      Afghanistan  AFG  1991  20 to 24              0.30
2      Afghanistan  AFG  1992  20 to 24              0.29
3      Afghanistan  AFG  1993  20 to 24              0.28
4      Afghanistan  AFG  1994  20 to 24              0.27
...            ...  ...   ...       ...               ...
68395     Zimbabwe  ZWE  2015  15 to 19              0.20
68396     Zimbabwe  ZWE  2016  15 to 19              0.20
68397     Zimbabwe  ZWE  2017  15 to 19              0.20
68398     Zimbabwe  ZWE  2018  15 to 19              0.20
68399     Zimbabwe  ZWE  2019  15 to 19              0.20

[68400 rows x 5 columns]


### Adding population data 

This is sourced from **the world Bank**(databank.worldbank.org). This would be used to estimate the population of people who are clinically diagnosed.



In [None]:
population_df = pd.read_excel(r'/content/population_data.xlsx')
population_df.tail(20)

In [None]:
population_df = ''