```markdown
Assignment 3
Nazmul Hasan Rabbi
CISC 0672: Data Visualization
```


## Project Setup
Importing required packages, reading in files and preprocessing required for this assignment


In [1]:
# importing packages
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px

In [2]:
# set the data directory
datadir = './data/'

# read the data
df = pd.read_csv(datadir + 'population-and-demography.csv')

In [3]:
# columns to keep
c = list(df.columns[:3]) + [df.columns[5]] + [df.columns[7]]

# select columns and rename them
df = df[c].rename(columns={
    'Country name': 'Country',
    'Population of children under the age of 15': 'Pop under 15',
    'Population aged 15 to 64 years': 'Pop between 15 and 64'
})

# calculate 'Pop over 64'
df['Pop over 64'] = df['Population'] - (df['Pop under 15'] + df['Pop between 15 and 64'])

In [4]:
# see the first few rows after the transformation
df.head()

Unnamed: 0,Country,Year,Population,Pop under 15,Pop between 15 and 64,Pop over 64
0,Afghanistan,1950,7480464,3068855,4198587,213022
1,Afghanistan,1951,7571542,3105444,4250002,216096
2,Afghanistan,1952,7667534,3145070,4303436,219028
3,Afghanistan,1953,7764549,3186382,4356242,221925
4,Afghanistan,1954,7864289,3231060,4408474,224755


## Exercise 1

Define a function `plot_population(frame, country)` that takes the dataframe `df` and the name of one of its countries and returns a line chart of its total population (an annual time series). For example, the this code produces the following figure:

    fig = plot_population(df, 'United States')
    show(fig)

In [5]:
# function to plot the population
def plot_population(frame, country):
    # Filter the dataframe and create a line chart
    fig = go.Figure(data=go.Scatter(x=frame[frame['Country'] == country]['Year'],
                                    y=frame[frame['Country'] == country]['Population'],
                                    mode='lines'))
    # Set the title and axis labels
    fig.update_layout(title={'text': country, 'x':0.5}, xaxis_title='Year', yaxis_title='Population')

    return fig  # Return the figure

In [6]:
# plot the population of the United States
plot_population(df, 'United States').show()

## Exercise 2

Define a function `plot_populations(frame, countries)` that takes dataframe `df` and a *list* of countries and returns a time-series plotting total population for those countries. Note that the countries are ordered lexicographically in the chart's legend. For example:

    fig = plot_populations(df, ['United States', 'Canada', 'Europe (UN)'])
    show(fig)

*Hint:* For this, you might want to construct a *pivoted* dataframe in which each relevant country has its own column. Refer to Assignment 2 for an example.

In [7]:
# function to plot the population of multiple countries
def plot_populations(frame, countries):
    # Filter, pivot, and convert the dataframe for plotting
    plot_df = frame[frame['Country'].isin(countries)].pivot(index='Year', columns='Country', values='Population').reset_index().melt(id_vars='Year', var_name='Country', value_name='Population')

    # Create the plot and center the title
    fig = px.line(plot_df, x='Year', y='Population', color='Country', title='Population Over Time').update_layout(title={'x':0.5}, legend_title_text='Countries', xaxis_title='Year')

    return fig  # Return the figure


In [8]:
# plot the populations of the United States, Canada, and Europe over time
plot_populations(df, ['United States', 'Canada', 'Europe (UN)']).show()

## Exercise 3

Write a function `plot_by_age_group(frame, country)` that returns a figure for a stacked bar chart. For each year, the bar stacks the population values for under 15 years of age, between 15 and 64 years of age, and over 64 years of age. Note the ordering of the three age categories (from youngest to oldest category) in the legend and chart's title and labels.

For example:

    fig = plot_by_age_group(df, 'United States')
    show(fig)

In [9]:
# function to plot the population by age group
def plot_by_age_group(frame, country):
    # Filter the dataframe
    country_frame = frame[frame['Country'] == country]

    # Create a stacked bar chart
    fig = go.Figure(data=[
        go.Bar(name=age_group, x=country_frame['Year'], y=country_frame[age_group])
        for age_group in ['Pop under 15', 'Pop between 15 and 64', 'Pop over 64']
    ]).update_layout(
        barmode='stack',  # Change the bar mode
        title={'text': f'Population by Age Group in {country}', 'x':0.5},  # Set the title
        xaxis_title='Year',  # Set the x-axis label
        yaxis_title='Population',  # Set the y-axis label
        legend_title_text='Age Groups'  # Set the legend title
    )

    return fig  # Return the figure

In [10]:
# plot the population by age group in the United States
plot_by_age_group(df, 'United States').show()

## Exercise 4

Define the function `plot_age_group_by_percentage(frame, country)` that returns a 100% stacked bar chart &ndash; that is, representing the percentage of each year's total population &ndash; that falls under each of the three age groups. For example:

    fig = plot_age_group_by_percentage(df, 'Japan')
    show(fig)

Hint: Add calculated columns to the dataframe `df`. For example:

      df['Under 15'] = 100 * df['Pop under 15'] / df['Population']

In [11]:
# function to plot the population by age group by percentage
def plot_age_group_by_percentage(frame, country):
    # Filter the dataframe for the specified country
    df_country = frame[frame['Country'] == country]

    # Define age groups and corresponding population columns
    age_groups = {'Under 15': 'Pop under 15', 'Between 15 and 64': 'Pop between 15 and 64', 'Over 64': 'Pop over 64'}

    # Plot a stacked bar chart
    fig = go.Figure(data=[
        go.Bar(
            name=age_group,
            x=df_country['Year'],
            y=100 * df_country[pop] / df_country['Population']
        )
        for age_group, pop in age_groups.items()
    ])

    # Update layout of the plot
    fig.update_layout(
        barmode='stack',
        title={'text': f'Population by Age Group in {country}', 'x':0.5},
        xaxis_title='Year',
        yaxis_title='Percentage',
        legend_title_text='Age Groups'
    )

    return fig

In [12]:
# plot the population by age group in Japan as a percentage
plot_age_group_by_percentage(df, 'Japan').show()

# Exercise 5

Let's construct a geographical enhanced scatterplot where we place a bubble over each country where the bubble's size corresponds to the country's total population. The bubbles are quite small; we want them to be bigger. Also the bubbles are all blue; we want each bubble's color to correspond to the country's total population, in addition to the bubble's size. Choose a colorful colormap for your colors. We choose to eliminate the colorbar that shows maps colors to population values because we're able to hover the mouse over a country's bubble to see its population in a tooltip. Here's an example for the year 2000:

    fig = population_map(df, 2000)
    show(fig)

In [13]:
# function to plot population by country in a given year on a map
def population_map(df, year):
    # Filter data for the given year and exclude 'Less developed regions, excluding China'
    data = df[(df['Year'] == year) & (df['Country'] != 'Less developed regions, excluding China')]

    # Create a scatter geo plot
    fig = px.scatter_geo(data, locations='Country', locationmode='country names', hover_name="Country",
                         size="Population", color="Population", projection="natural earth",
                         color_continuous_scale=px.colors.sequential.Rainbow, size_max=100,
                         range_color=[0, df['Population'].quantile(0.95)])

    # Update plot layout
    fig.update_layout(title_text=f'Population by Country in {year}', title_x=0.5, coloraxis_showscale=True)

    return fig

In [14]:
# plot the population map of the world in 2000
population_map(df, 2000).show()