Project 1 Jupyter Notebook
========

### Group 4: John Graunt
Authors: Barbara Zeynep (Zeynep) Ganley, Edward Shen, Ivan Chan, Natalie Cornejo

In this notebook, we'll be introducing population functions, population pyramid functions, and more in order to examine the dependency ratio between rural and urban countries using data analysis. The dataset we used is the WBData Population dataset, and the dataset is available at https://wbdata.readthedocs.io. We primarily aim to analyze the dependency ratios using population pyramids and other visualizations.

### Set up & Imports
Below are all necessary imports and pip installs for the rest of this project.

In [14]:
# Uncomment the code to install
#!pip install wbdata

import wbdata
import pandas as pd
import numpy as np

## Population Function, <span style="color: red;">*Deliverable 1[A] - Population Statistics* <a name="population-f"></span></a>

The population function takes in four arguments:
-   **year (int):** The specified year, works from 1960 to 2024.
-   **sex (str):** Accepts any of the three variations ("People", "Male", "Female"). The function will auto-check if the input corresponds to any of the three options.
-   **age_range (array):** An array with length of 2, 2 integers. Please be aware that integers above **80** will be the same since data availability only provides general statistics for 80 or above.
-   **place (str):** A specific country or region of interest. Ideally, this should be a 3-digit country code, although the function try auto-match a location based on your input.

#### Some Helper Functions used in the population function

In [15]:
# Helper functions
# place_Finder Search and Match input string to the location of interest available
# Output: 3-digit country code
def place_Finder(place):
    # From the list of available countries, search to see if the user input 'place' matches any of them.
    # If not, raise an error
    # Code inspired by previous projects
    countries = wbdata.get_countries()
    country_dict = {}

    # Create a location id with name dictionaries for later search.
    for location in countries:
        code = location['id']
        name = location['name']
    
        country_dict[name] = code

    # Nested if statements to make sure it returns either a correct country code or an error message.
    if place in country_dict:
        return country_dict[place]
    elif place in country_dict.values():
        return place
    else:
        raise ValueError(f"The region '{place}' is not valid. Please try again")

# age_list: Organize input age_range array to create a list of five-year age group strings
# Output: List of five-year age group strings
def age_list(age_range):
    # Compute the correct start and end ages that the dataset accepts
    start = age_range[0] - (age_range[0] % 5)
    end = age_range[1] + (4 - age_range[1] % 5)

    # Generate the list
    results = []
    while start < min(end, 80):
        low = str(start).zfill(2)
        high = str(start + 4).zfill(2)
        results.append(f"{low}{high}")
        start += 5

    # Case when it's 80 or above
    if start >= 80:
        results.append('80UP')
    return results

#### Delieverable: population function

In [17]:
# Deliverable: Population Function
def population(year, sex, age_range, place):
    """Your doc-string here"""
    # Make sure place_code is a valid 3-digit code, even if the user input is a valid country code.
    place_code = place_Finder(place)
    age_labels = age_list(age_range)
    year = str(year)
    population = 0

    # Loop each age label to get each age range's population
    for i in age_labels:
        # indicators
        male_ind = f"SP.POP.{i}.MA"
        female_ind = f"SP.POP.{i}.FE"

        df = wbdata.get_dataframe({male_ind: "Male", female_ind: "Female"}, country=place_code)
        df = df.reset_index()
        df = df[df["date"] == year]

        # Make sure to skip the age range if it doesn't have data
        if not df.empty:
            male_pop = int(df["Male"].fillna(0).iloc[0])
            female_pop = int(df["Female"].fillna(0).iloc[0])
        else:
            print(f"Skipping {i}: No data for {year}.")

        # Determine sex Input and increase the population accordingly
        s = sex.casefold()
        if s == "people":
            population += male_pop + female_pop
        elif s == "male":
            population += male_pop
        elif s == "female":
            population += female_pop

    #"In [year], how many [people/males/females] aged [low] to [high] were living in [the world/region/country]?"
    #return place_code, age_labels, population # Test Only
    # Special Case: Add "the" before "world"
    prefix = "the " if "world" in place.casefold() else ""
    
    return f"In {year}, there are {population:,} {sex.casefold()} aged {age_range[0]} to {age_range[1]} were living in {prefix}{place}."

#### Example

In [18]:
population(2013, "Male", (40, 80), "China")

'In 2013, there are 296,803,697 male aged 40 to 80 were living in China.'