<h3> Luis Garduno <h3>

Dataset: [__International Database (IDB)__](https://www.census.gov/data-tools/demo/idb/#/country?COUNTRY_YEAR=2022&COUNTRY_YR_ANIM=2022)

Question Of Interest: Predict the population of earth in 2122.

# Data Understanding

## Data Description

In [4]:
import numpy as np
import pandas as pd

# Load dataset into dataframe
df = pd.read_csv('https://raw.githubusercontent.com/luisegarduno/MachineLearning_Projects/master/data/idb5yr.all', delimiter='|', encoding='ISO-8859-1')

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 34237 entries, 0 to 34236
Data columns (total 99 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   #YR        34237 non-null  int64  
 1   TFR        26163 non-null  float64
 2   SRB        26163 non-null  float64
 3   RNI        26172 non-null  float64
 4   POP95_99   26171 non-null  float64
 5   POP90_94   26171 non-null  float64
 6   POP85_89   26171 non-null  float64
 7   POP80_84   26171 non-null  float64
 8   POP75_79   26171 non-null  float64
 9   POP70_74   26171 non-null  float64
 10  POP65_69   26171 non-null  float64
 11  POP60_64   26171 non-null  float64
 12  POP5_9     26171 non-null  float64
 13  POP55_59   26171 non-null  float64
 14  POP50_54   26171 non-null  float64
 15  POP45_49   26171 non-null  float64
 16  POP40_44   26171 non-null  float64
 17  POP35_39   26171 non-null  float64
 18  POP30_34   26171 non-null  float64
 19  POP25_29   26171 non-null  float64
 20  POP20_

---------------------------------

Printing out the information about the dataframe we are able to see that there are a total of 34,237 instances, and 98 attributes.

Additionally we are able to see that there are several attributes with similar names, most of which, involve population (*POP*,*MPOP*, *FPOP*).

Attributes such as year, population, etc. will be of type integer (int64) because they will always be whole numbers. Attributes involving rates such as
fertility rate, mortality rate, etc. should be of double-precision floating-point format (float64). Lastly, attributes that help identify instances such as
a country/area's name or code number, should remain as type 'object', or string.

As a result, the data types presented for each attribute are correct and should not be changed.


Below is a brief description of some of the key attributes:     

| Name      | Label                                                                         | Type   |
| --------- | ----------------------------------------------------------------------------- | ------ |
| AREA_KM2  | Area in square kilometers                                                     | int    |
| ASFRXX_YY | Age specific fertility rate for women age XX-YY (births per 1,000 population) | float  |
| CBR       | Crude birth rate (births per 1,000 population)                                | float  |
| CDR       | Crude death rate (deaths per 1,000 population)                                | float  |
| E0        | Both sexes life expectancy at birth                                           | float  |
| E0_F      | Female life expectancy at birth                                               | float  |
| E0_M      | Male life expectancy at birth                                                 | float  |
| FIPS      | FIPS country or area code                                                     | string |
| FMR0_4    | Mortality rates for females under 5 years of age                              | float  |
| FMR1_4    | Mortality rates for females aged 1-4 years                                    | float  |
| FPOP      | Female midyear population                                                     | int    |
| FPOPXX_YY | Female midyear population aged XX-YY years                                    | int    |
| FPOP100_  | Female midyear population aged 100+ years                                     | int    |
| GENC      | Geopolitical Entities, Names, & Codes (GENC) 2 char. country code standard    | string |
| GR        | Growth rate (percent)                                                         | float  |
| GRR       | Gross reproduction rate (lifetime births per woman)                           | float  |
| IMR       | Both sexes Infant mortality rate (infant deaths per 1,000 population)         | float  |
| IMR_F     | Female infant mortality rate (infant deaths per 1,000 population)             | float  |
| IMR_M     | Male infant mortality rate (infant deaths per 1,000 population)               | float  |
| MMR0_4    | Mortality rates for males under 5 years of age                                | float  |
| MMR1_4    | Mortality rates for males aged 1-4 years                                      | float  |
| MPOP      | Male midyear population                                                       | int    |
| MPOPXX_YY | Male midyear population aged XX-YY years                                      | int    |
| MPOP100_  | Male midyear population aged 100+ years                                       | int    |
| MR0_4     | Total mortality rates under 5 years of age                                    | float  |
| MR1_4     | Total mortality rates aged 1-4 years                                          | float  |
| NAME      | Country or area name                                                          | string |
| NMR       | Net migration rate (net number of migrants per 1,000 population)              | float  |
| POP       | Total midyear population                                                      | int    |
| POP_DENS  | Population density (total population divided by area in square kilometers)    | float  |
| POPXX_YY  | Total midyear population aged XX-YY years                                     | int    |
| POP100_   | Total midyear population aged 100+ years                                      | int    |
| RNI       | Rate of natural increase (percent)                                            | float  |
| SRB       | Sex ratio at birth (males per female)                                         | float  |
| TFR       | Total fertility rate                                                          | float  |
| YR        | Year                                                                          | int    |

</br>

** Complete table with all variables, can be found here: [Census Data API](https://api.census.gov/data/timeseries/idb/5year/variables.html)

Resources

- [IDB - Variable definitions](https://www.census.gov/data/developers/data-sets/international-database.html)
- [IDB - Release notes](https://www.census.gov/programs-surveys/international-programs/about/idb.html)
- [IDB - Methodology](https://www2.census.gov/programs-surveys/international-programs/technical-documentation/methodology/idb-methodology.pdf)
- [IDB - Census Data API Variables](https://api.census.gov/data/timeseries/idb/5year/variables.html)