<a href="https://colab.research.google.com/github/joseeden/joeden/blob/master/docs/021-Software-Engineering/021-Jupyter-Notebooks/002-Data-Engineering-with-Python/010-python-toolbox.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Case Study: World Development Indicators

In this case study, we'll use Python functions, iterators, and list comprehensions to analyze World Bank's World Development Indicators dataset. The dataset includes global economic and social indicators over several decades.  

- Covers 217 countries from 1960 to 2015  
- Includes indicators like population, electricity use, and CO2 emissions  
- Tracks literacy rates, unemployment, and mortality  

## Objectives

- **Work with Dictionaries**  
  - Store and retrieve country-level data efficiently  
  - Use keys for quick lookups of indicators  

- **Use List Comprehensions**  
  - Extract specific indicators in a single line  
  - Improve readability and performance  

- **Apply Custom Functions**  
  - Automate data transformations  
  - Handle missing values and format data consistently  

## Zipping Lists 

We need to load the "WDICSV.csv" and do the following:

- Save the first row (column names) to `feature_names` list.
- Save the second row (records) to `row_vals` list.

Convert the two related lists below into a dictionary.

- Use `zip` to pair names with values  
- Convert to a dictionary using `dict()`

In [None]:
import pandas as pd 

url = 'https://raw.githubusercontent.com/joseeden/joeden/refs/heads/master/docs/021-Software-Engineering/021-Jupyter-Notebooks/002-Data-Engineering-with-Python/WDICSV.csv'

df = pd.read_csv(url, header=None)
feature_names = df.iloc[0].tolist()
row_vals = df.iloc[1].tolist()

print(type(feature_names))
print(feature_names)

print(type(row_vals))
print(row_vals)

<class 'list'>
['Country Name', 'Country Code', 'Indicator Name', 'Indicator Code', np.float64(1960.0), np.float64(1961.0), np.float64(1962.0), np.float64(1963.0), np.float64(1964.0), np.float64(1965.0), np.float64(1966.0), np.float64(1967.0), np.float64(1968.0), np.float64(1969.0), np.float64(1970.0), np.float64(1971.0), np.float64(1972.0), np.float64(1973.0), np.float64(1974.0), np.float64(1975.0), np.float64(1976.0), np.float64(1977.0), np.float64(1978.0), np.float64(1979.0), np.float64(1980.0), np.float64(1981.0), np.float64(1982.0), np.float64(1983.0), np.float64(1984.0), np.float64(1985.0), np.float64(1986.0), np.float64(1987.0), np.float64(1988.0), np.float64(1989.0), np.float64(1990.0), np.float64(1991.0), np.float64(1992.0), np.float64(1993.0), np.float64(1994.0), np.float64(1995.0), np.float64(1996.0), np.float64(1997.0), np.float64(1998.0), np.float64(1999.0), np.float64(2000.0), np.float64(2001.0), np.float64(2002.0), np.float64(2003.0), np.float64(2004.0), np.float64(2005.

Next, zip the lists together and convert to a dictionary.

In [19]:
zipped_lists = zip(feature_names, row_vals)
print(type(zipped_lists))
print(zipped_lists)

conv_lists = dict(zipped_lists)
print(type(conv_lists))
print(conv_lists)

<class 'zip'>
<zip object at 0x000002B01204C940>
<class 'dict'>
{'Country Name': 'Africa Eastern and Southern', 'Country Code': 'AFE', 'Indicator Name': 'Access to clean fuels and technologies for cooking (% of population)', 'Indicator Code': 'EG.CFT.ACCS.ZS', np.float64(1960.0): np.float64(nan), np.float64(1961.0): np.float64(nan), np.float64(1962.0): np.float64(nan), np.float64(1963.0): np.float64(nan), np.float64(1964.0): np.float64(nan), np.float64(1965.0): np.float64(nan), np.float64(1966.0): np.float64(nan), np.float64(1967.0): np.float64(nan), np.float64(1968.0): np.float64(nan), np.float64(1969.0): np.float64(nan), np.float64(1970.0): np.float64(nan), np.float64(1971.0): np.float64(nan), np.float64(1972.0): np.float64(nan), np.float64(1973.0): np.float64(nan), np.float64(1974.0): np.float64(nan), np.float64(1975.0): np.float64(nan), np.float64(1976.0): np.float64(nan), np.float64(1977.0): np.float64(nan), np.float64(1978.0): np.float64(nan), np.float64(1979.0): np.float64(nan),

## Make it Reusable

Manually repeating code for multiple rows is inefficient. Use a function to make it reusable and concise.

In [20]:
def list2dict(list1, list2):
  """
  Return a docitionary with 
  list 1 as keys and 
  list 2 as values
  """
  
  zipped_lists = zip(list1, list2)
  rs_dict = dict(zipped_lists)
  return rs_dict

converted_list = list2dict(feature_names, row_vals)
print(converted_list)


{'Country Name': 'Africa Eastern and Southern', 'Country Code': 'AFE', 'Indicator Name': 'Access to clean fuels and technologies for cooking (% of population)', 'Indicator Code': 'EG.CFT.ACCS.ZS', np.float64(1960.0): np.float64(nan), np.float64(1961.0): np.float64(nan), np.float64(1962.0): np.float64(nan), np.float64(1963.0): np.float64(nan), np.float64(1964.0): np.float64(nan), np.float64(1965.0): np.float64(nan), np.float64(1966.0): np.float64(nan), np.float64(1967.0): np.float64(nan), np.float64(1968.0): np.float64(nan), np.float64(1969.0): np.float64(nan), np.float64(1970.0): np.float64(nan), np.float64(1971.0): np.float64(nan), np.float64(1972.0): np.float64(nan), np.float64(1973.0): np.float64(nan), np.float64(1974.0): np.float64(nan), np.float64(1975.0): np.float64(nan), np.float64(1976.0): np.float64(nan), np.float64(1977.0): np.float64(nan), np.float64(1978.0): np.float64(nan), np.float64(1979.0): np.float64(nan), np.float64(1980.0): np.float64(nan), np.float64(1981.0): np.flo