# Data operations for prepping the WorldBank_Health Dataset 

### Summary: 
The dataset [`world_bank_health_population`](https://cloud.google.com/bigquery?sq=1057666841514:3bb229234d7f4b379098581f0101e923&_ga=2.81369303.-1379782407.1673021064&project=paulkamau&ws=!1m9!1m3!3m2!1sbigquery-public-data!2sworld_bank_health_population!1m4!4m3!1sbigquery-public-data!2sworld_bank_health_population!3scountry_series_definitions) combines key health statistics from a variety of sources to provide a look at global health and population trends. It includes information on nutrition, reproductive health, education, immunization, and diseases from over 200 countries.


Multiple tables have been used to compile the data. The data was flattened to create a single working set: 

- Country series definitions 
- Country summary 
- Health nutrition population
- International debt
- Series summary 
- Series times

### Actions: 
1. Created a pivot table which takes the indicator codes as rows and tranforms them into columns in a new table. 
1. Created a merged table of the countries_data and the new pivot table we made
1. Created a table definition of the indicator_code and their description

## SQLs

### 1. Pivot Query. Convert the row values into column values 

original table we want: 
  `bigquery-public-data.world_bank_health_population.health_nutrition_population`

New pivot table with everything: 
--cinternational_health_nutrition_pivot

```sql

DECLARE IndicatorCodes STRING;

SET IndicatorCodes = (
  SELECT 
    CONCAT('("', STRING_AGG(DISTINCT replace(indicator_code,".","_"), '", "'), '")'),
  FROM `bigquery-public-data.world_bank_health_population.health_nutrition_population` where value is not null
);


EXECUTE IMMEDIATE format("""
CREATE TABLE IF NOT EXISTS `paulkamau.WorldBank_Health.international_health_nutrition_pivot` as 
SELECT * FROM
(
  SELECT country_name, country_code, replace(indicator_code,".","_") as ic,value, year
  FROM `bigquery-public-data.world_bank_health_population.health_nutrition_population`
) 
PIVOT
(
  MAX(value)
  FOR ic in %s
)
ORDER BY country_name ASC
""",
IndicatorCodes);

```

Country table 

```sql
CREATE TABLE IF NOT EXISTS `paulkamau.WorldBank_Health.countries_data` as 
SELECT 
country_code, short_name, currency_unit, income_group, national_accounts_base_year, sna_price_valuation, system_of_national_accounts, system_of_trade, government_accounting_concept
 FROM `bigquery-public-data.world_bank_health_population.country_summary` 
where system_of_trade is not null
LIMIT 1000;

```



What’s the average age of first marriages for females around the world?

```sql
SELECT
  country_name,
  country_code,
  ROUND(AVG(value),2) AS avg_age_female_marriage
FROM
  `bigquery-public-data.world_bank_health_population.health_nutrition_population`
WHERE
  indicator_code = "SP.DYN.SMAM.FE" --Age at first marriage, female
GROUP BY
  country_name,country_code
ORDER BY
  avg_age_female_marriage;
```


Get the distinct indicators used in the entire datasets 

```sql
SELECT * FROM `paulkamau.WorldBank_Health.Indicators_code_desc` LIMIT 1000
```
*results*


<!-- SQL to create a single table analysizng the average age of marriage for females -->

```sql
CREATE TABLE IF NOT EXISTS `paulkamau.WorldBank_Health.age_fe_marriage` as 
SELECT
  country_name,
  country_code,
  year,
  value AS age_female_marriage
FROM
  `bigquery-public-data.world_bank_health_population.health_nutrition_population`
WHERE
  indicator_code = "SP.DYN.SMAM.FE" --Age at first marriage, female;
```


sql to replace values 

```sql

UPDATE `paulkamau.WorldBank_Health.Indicators_code_desc` 
SET indicator_code_new = REPLACE(indicator_code_new,'.','_')
where indicator_code = indicator_code_new;

```