## Unit 1 Capstone Analytics and Narrative
### Prepared by Robin Fladebo

### Introduction

In reading a report published in July 2018 by UNICEF, I learned that iodine deficiency can lead to health and developmental problems (called Iodine Deficiency Disorders, or IDDs). Iodine deficiency is especially problematic during pregnancy and early childhood, and is a major cause of preventable mental retardation and diminished cognitive ability.

UNICEF compiled data about access to and consumption of iodized salt from people in many countries around the globe. The data came from two global surveys: the Multiple Indicator Cluster Survey (MICS) conducted by UNICEF, and the Demographic and Health Survey (DHS), funded by US AID.

To date, most efforts to ensure that adequate iodine is consumed have centered around providing iodized salt.
This dataset reports the proportion of households consuming salt with any iodine, regardless of the level of iodine in the salt. An adequate level of iodine in salt would be high enough, but not too high, to meet the needs of the population. The UNICEF data is only a part of the picture of iodine deficiency disorders.

The ultimate study goal, according to World Health Organization (WHO) guidelines, is to determine population iodine status in addition to data about access to iodized salt. Population iodine status can be measured using urinary iodine, with the prevalence of goiters being a secondary measure. Using population iodine status data and data about access to iodized salt -- particularly about vulnerable populations -- countries can design and manage a program to ensure that an adequate level of iodine is present in the salt that is available locally.

The most recent WHO data about median urinary iodine is from 2003. 

### Dataset exploration

Data Source: https://data.unicef.org/topic/nutrition/iodine-deficiency/

In [2]:
import chardet
import csv
import pandas as pd
pd.set_option('display.max_rows', 1000)
import numpy as np

In [3]:
def find_encoding(fname):
    r_file = open(fname, 'rb').read()
    result = chardet.detect(r_file)
    charenc = result['encoding']
    return charenc

In [4]:
thisfile_encoding = find_encoding('UNICEF_salt.csv')
print(thisfile_encoding)

UTF-8-SIG


In [5]:
unicef_salt = pd.read_csv('UNICEF_salt.csv', encoding = 'UTF-8')

In [6]:
type(unicef_salt)

pandas.core.frame.DataFrame

In [7]:
unicef_salt.shape

(290, 145)

In [8]:
unicef_salt.rename(columns=lambda x: x.replace(' ', '_'), inplace=True)

In [9]:
list(unicef_salt.columns.values)

['ISO_Ctry',
 'Country',
 'Region',
 'Year_range',
 'Year',
 'Survey_type',
 'Source',
 'Source_2',
 'Status',
 'Natl_Point_Estimate',
 'Natl_Lower_Limit',
 'Natl_Upper_Limit',
 'Natl_Footnote',
 'MaleHH_Point_Estimate',
 'MaleHH_Lower_Limit',
 'MaleHH_Upper_Limit',
 'MaleHH_Footnote',
 'Female_HH_Point_Estimate',
 'Female_HH_Lower_Limit',
 'Female_HH_Upper_Limit',
 'Female_HH_Footnote',
 'Urban_Point_Estimate',
 'Urban_Lower_Limit',
 'Urban_Upper_Limit',
 'Urban_Footnote',
 'Rural_Point_Estimate',
 'Rural_Lower_Limit',
 'Rural_Upper_Limit',
 'Rural_Footnote',
 'WQ1_Point_Estimate',
 'WQ1_Lower_Limit',
 'WQ1_Upper_Limit',
 'WQ1_Footnote',
 'WQ2_Point_Estimate',
 'WQ2_Lower_Limit',
 'WQ2_Upper_Limit',
 'WQ2_Footnote',
 'WQ3_Point_Estimate',
 'WQ3_Lower_Limit',
 'WQ3_Upper_Limit',
 'WQ3_Footnote',
 'WQ4_Point_Estimate',
 'WQ4_Lower_Limit',
 'WQ4_Upper_Limit',
 'WQ4_Footnote',
 'WQ5_Point_Estimate',
 'WQ5_Lower_Limit',
 'WQ5_Upper_Limit',
 'WQ5_Footnote',
 'Bot60_Point_Estimate',
 'Bot60_

In [10]:
unicef_salt['Year'] = pd.to_numeric(unicef_salt['Year'], errors='coerce')

In [11]:
unicef_salt['Natl_Point_Estimate'] = pd.to_numeric(unicef_salt['Natl_Point_Estimate'], errors='coerce')

The original dataset contains data from multiple years, ranging from 1994 to 2017

In [12]:
unicef_salt['Year'].groupby(unicef_salt['Year']).min()

Year
1994    1994
1995    1995
1996    1996
1997    1997
1998    1998
1999    1999
2000    2000
2001    2001
2002    2002
2003    2003
2004    2004
2005    2005
2006    2006
2007    2007
2008    2008
2009    2009
2010    2010
2011    2011
2012    2012
2013    2013
2014    2014
2015    2015
2016    2016
2017    2017
Name: Year, dtype: int64

In [14]:
#Restrict the data set to the most recent year for each country
salt_recent = unicef_salt.groupby(unicef_salt['Country']).max().reset_index()

In [15]:
#Need correct syntax to add head()
print(pd.DataFrame(salt_recent.sort_values('Natl_Point_Estimate', ascending=True), columns=('Country', 'Region', 'Year', 'Natl_Point_Estimate')))

                              Country Region  Year  Natl_Point_Estimate
26                           Djibouti    ESA  2006                  4.4
79                            Somalia    ESA  2009                  7.6
41                              Haiti    LAC  2012                 23.1
58                         Mauritania    WCA  2011                 24.4
56                           Malaysia    EAP  2008                 28.2
27                 Dominican Republic    LAC  2000                 29.9
96                            Vanuatu    EAP  2007                 32.6
39                      Guinea-Bissau    WCA  2014                 33.9
83                              Sudan    ESA  2014                 34.4
93                            Ukraine   EECA  2012                 35.9
7                            Barbados    LAC  2012                 36.8
40                             Guyana    LAC  2014                 42.8
60                            Morocco   MENA  2007              

In [21]:

salt_region = salt_recent.groupby(salt_recent['Region']).mean().reset_index()
print(pd.DataFrame(salt_region.sort_values('Natl_Point_Estimate', ascending=True), columns=('Region', 'Natl_Point_Estimate')))

  Region  Natl_Point_Estimate
3    LAC            67.758333
4   MENA            74.890000
2    ESA            74.990909
0    EAP            76.028571
6    WCA            77.866667
5     SA            82.533333
1   EECA            87.961538


### Further experimentation and analysis

#### Experimental hypothesis 
Median iodine level is inadequate in children age 6 to 12 in ... in West Africa, where access to iodized salt is lowest globally.

#### Rollout plan
Obtain current data including measures of population iodine and households consuming adequately iodized salt from the same populations. The current data on access to iodized salt and population iodine level is not consistent by year.

Collect information about how many children and how many adults are in each household that is part of the sample. Identify the number of children ages 6 to 12 in the household.

Measure the amount of iodine in the salt that is in use in the household using a quick test to obtain a consistent measure. In the current dataset, some countries consider iodized salt as having any iodine, as opposed to the accepted level of 15 - 40 mg ppm.

Using the methods defined by WHO, collect current sample data for ... 

#### Evaluation plan
