In [1]:
import pandas as pd

# Challenge line

The Pacific Dataviz Challenge is a competition for all—of any age, background or skillset—that encourages storytelling, design, innovation and technical skill in the visualisation of Pacific data.

Participants, individually or as a team, are invited to create data visualisations that highlight important issues or opportunities facing the Pacific region.

Your submission could be an infographic, animation, dashboard, web app, poster, PDF report, drawing—almost anything you can imagine and create.

# Theme (taken directly from site)

This year’s theme ‘Blue Pacific 2050’ references the long-term approach to working together as a region, the 2050 Strategy for the Blue Pacific Continent.

A shared vision, it shaped by and for Pacific peoples, cultures, and their deep connection to the land and ocean.

The Strategy focuses on seven key areas:

1. Political Leadership and Regionalism
2. People-Centred Development
3. Peace and Security
4. Resources and Economic Development
5. Climate Change and Disasters
6. Ocean and Environment
7. Technology and Connectivity.

# Exploration

Note that all datasets have been renamed from their original downloaded file names for ease-of-use

__Mapping__

- _Political Leadership and Regionalism_: `bp2050_pol_leadership.csv`
- _People-Centred Development_: `bp2050_ppl_centered_dev.csv`
- _Peace and Security_: `bp2050_peace_and_sec.csv`
- _Resources and Economic Development_: `bp2050_economic_dev.csv`
- _Climate Change and Disasters_: `bp2050_climate_change.csv`
- _Ocean and Environment_: `bp2050_ocean_and_env.csv`
- _Technology and Connectivity_: `bp2050_tech_and_conn.csv`

In [4]:
all_df = pd.read_csv("../data/bp2050_all.csv")
pol_df = pd.read_csv("../data/bp2050_pol_leadership.csv")
ppl_df = pd.read_csv("../data/bp2050_ppl_centered_dev.csv")
peace_df = pd.read_csv("../data/bp2050_peace_and_sec.csv")
eco_df = pd.read_csv("../data/bp2050_economic_dev.csv")
clim_df = pd.read_csv("../data/bp2050_climate_change.csv")
ocean_df = pd.read_csv("../data/bp2050_ocean_and_env.csv")
tech_df = pd.read_csv("../data/bp2050_tech_and_conn.csv")

dfs = [ # excluding all
    pol_df, # 2753 data-points
    ppl_df, # 12000 data-points
    peace_df, # 9 data-points
    eco_df, # 4533 data-points
    clim_df, # 2949 data-points
    ocean_df, # 978 data-points
    tech_df # 1731 data-points
]

## Investigative notes

- There's a mapping which I can't fully find on the site but it's much easier to understand and navigate in tandem with the datahubs UI
- Columns like `DATAFLOW` and `FREQ` probably aren't relevant for this shorter analysis and challenge

### Noted Mappings:

`_T` indicates Total and `_Z` means "Not stated" (or the equivalent)

- `GEO_PICT`: are the individual Pacific Island Countries and Territories
    - 'FM': Micronesia
    - 'TV': Tuvalu
    - 'NR': Nauru
    - 'VU': Vanuatu
    - 'PW':
    - 'TO': Tonga
    - 'MH': Mashall Islands
    - 'CK': Cook Islands
    - 'PG': Papua New Guinea
    - 'WS':
    - 'KI': Kiribati
    - 'SB':
    - 'FJ': Fiji
    - 'PF': French Polynesia
    - 'NU': Niue
    - 'NC': New caledonia
- `SEX`:
    - '_T': Total
    - 'F': Female
    - 'M': Male
- `AGE`: Provided as ranges (`Y` = start of range and `T` = end of range)
- `URBANIZATION`:
    - '_T': Total
    - 'U': Urban
    - 'R': Rural
- `INCOME`: Denotes quintiles, bottom 40\%, and top 60\%
- `EDUCATION`: Specific range based on entry
- `OCCUPATION`: Specific mapping based on entry
- `COMPOSITE_BREAKDOWN`: Special groups based on theme

In [20]:
for df in dfs:
    print(len(df))

2753
12000
9
4533
2949
978
1731


In [41]:
ppl_df.head(5)

Unnamed: 0,DATAFLOW,FREQ,INDICATOR,GEO_PICT,SEX,AGE,URBANIZATION,INCOME,EDUCATION,OCCUPATION,COMPOSITE_BREAKDOWN,DISABILITY,TIME_PERIOD,OBS_VALUE,UNIT_MEASURE,REPORTING_TYPE,NATURE,DATA_SOURCE,OBS_STATUS,OBS_COMMENT
0,SPC:DF_BP50_2(1.0),A,SH_IHR_CAPS,TO,_T,_T,_T,_T,_T,_T,IHR_08,_T,2010,39.0,INDEX,G,_X,,,
1,SPC:DF_BP50_2(1.0),A,SH_IHR_CAPS,TO,_T,_T,_T,_T,_T,_T,IHR_08,_T,2011,50.0,INDEX,G,_X,,,
2,SPC:DF_BP50_2(1.0),A,SH_IHR_CAPS,TO,_T,_T,_T,_T,_T,_T,IHR_08,_T,2012,25.0,INDEX,G,_X,,,
3,SPC:DF_BP50_2(1.0),A,SH_IHR_CAPS,TO,_T,_T,_T,_T,_T,_T,IHR_08,_T,2013,96.0,INDEX,G,_X,,,
4,SPC:DF_BP50_2(1.0),A,SH_STA_MALR,NC,_T,_T,_T,_T,_T,_T,_Z,_T,2000,0.0,PER_1000_POP,G,_X,,,


In [48]:
all_df["REPORTING_TYPE"].unique()

array(['R', 'N', 'G', '_Z'], dtype=object)