# Canadian Birth Probability Analysis

By Nicole Bidwell 

## Introduction 

This analysis explores the chance of being born in Canada in a particular year. A description of how to reproduce the analysis by running the data pipeline can be found in the `README.md` file. The outline below serves as summary, explanation, and interpretation of the analysis. 

## Data

The data used for this analysis is the birth rate and population data obtained through the World Bank's API. 

### The Chosen Years

This analysis explores the years 2010 to 2023. An example calculation for the probability of being born in Canada for year 2012 is included, along with the changes in probability over time across all countries. 

### Retrieving and Loading the Data

The script `retrieve_load.py` found in the `src` is used to obtain the required data from the World Bank API in a JSON format. This includes pulling data from 2010 to 2023 across multiple pages for both birth rate and population, along with using `sqlite3` to create a table, load the data into the database, and querying for the required subset of the data. 

When querying for the required subset of data it is important to filter for valid countries since the original data includes countries grouped in specific regions. Including these regions in the processed data would have resulted over counting in the later calculations of total births. 

### The Processed Data

After querying for the required data a `pandas` dataframe is created which obtains the country information (ISO3 code, id, and name), year, birth rate, and population. The data is saved as a csv file, `country_br_pop.csv`, in the `data` folder for later usage. 


## Calculated Values


The script `calculate_probabilities.py` in the `src` folder is used to perform the probability calculations. 

### Number of Births

After loading the `country_br_pop.csv` data, a column `birth` is added to the data frame. This provides the number of births in each year for each country, using the formula:

$$\text{Number of Births} = \frac{Birth Rate}{1000}\times\text{Population}$$

These values are used in the following calculations.  

### Probability of Being Born in Canada for 2012

To calculate the probability of being born in Canada for a specified year I created the function `calc_probability_country`. This function calculates the percentage probability of being born in any specified country for any specified year within the dataset. The two formulas used are: 

$$\text{Total Worldwide Births in the Year} =  \text{sum of all countries' births in the year}$$

$$\text{Percentage Probability for a Country} = \frac{\text{Country's Number of Births in the Year}}{\text{Total Worldwide Births in the Year}}\times 100$$


For calculating the probability of being born in Canada for 2012, the function is called with `Canada` for the `country` parameter and `2012` for the `year` parameter. For more a more tangible interpretation, I included the equivalent ratio using the formula: 

$$\text{Ratio Value} = \frac{1}{\text{Percentage Probability}}\times100$$ 

These values are saved in the `output` folder. 

### Probability of Being Born in any Specified Country for any Specified Year

The `calc_probability_country` functions was also used to calculate the probabilites of being born in all other countries in the dataset for each year. These values are saved in the csv file, `countries_prob.csv`, in the `data` folder. 

## Results and Interpretation

### Probability of Being Born in Canada in 2012

The probability of being born in Canada in 2012 is {yr_2012 percentage}, which is is equivalent 1 in {ratio}. 

### Data Visualization and Interpretation

The script `graphs.py` in the `src` folder is used to generate plots using Plotly Graph Objects and Plotly Express, which are later saved in the `output` folder. These plots allow for easier interpretation and deeper analysis into the birth probabilities. 

#### Canada Bar Chart for 2012

This plot displays the probability of being born in Canada in 2012. A simple but effective for categorical data when wanting to compare values.  
{CHART}

Here we see the probability appears to be very low. When hovering over the bars we can confirm the exact values for being born in Canada or not. Although this value seems quite low we can dive deeper into more meaningful values. 

#### Canada Trendline Over Time

This plot displays the change in probability of being born in Canada from 2010 to 2023. 

Notably, we see a minimum at 2012 with a probability of {} and a maximum in 2021 with a probability of {}. While this difference of 0.017% may seem quite small, when we consider the total number of births world wide averaging {} that equates to {} more people being born in 2023 compared to 2012. 

#### Top 5, Bottom 5, and Canada Timeline

Similar to the Canada Trendline Over Time, this plot includes additional countries' probability trend lines between 2012 to 2013. 

The included countries on the plot are the 5 countries with the highest average probability and the 5 countries with the lowest average probability, along with Canada for comparison. 


{INTERPRETATION}



### References 

In [None]:
data links

### Appendix

#### Additional Scripts
