# Data Analysis without Pandas

### Background

    We have been asked to analyze a future population growth dataset. We have been asked, by our stakeholders, to analyze the relative percent change of the population of the top 5 countries and the bottom 5 countries. 

### Import Packages

In [None]:
## run this cell without changes

import csv 
import matplotlib.pyplot as plt

### Import the Data

In [None]:
# read the data with csv package into a list called population_data
population_data = []

with open('5.1-base_py_lab_projected_pop_data.csv', 'r') as file:
    csv_reader = csv.reader(file, delimiter=',')

    for row in csv_reader:
        population_data.append(row)

### Inspect the Data

In [None]:
# print the first 5 elements(rows) of the list 
print(population_data[:5])

    The table that we imported is in a wide format where each ‘row’ is a country and each year is a ‘column’. That means each list within the list contains all population information for each country.

In [None]:
# find out the total number of elements (rows) in population data 
print(len(population_data))

### Create a Summary Table

    We will start analyzing the population dataset by creating a summary table containing information about the highest and lowest expected population for each country, as well as the relative change in population from today to the year 2100. We will create a table called population_summary which will contain the following columns:
    
    - country
    - lowest_population
    - lowest_population_year
    - highest_population
    - highest_population_year
    - percent_change_2020_2100
 

#### Data Cleaning

    Before we create our table, we need to clean the data. 

In [None]:
# check the datatypes of values in the 'rows'

print(population_data[1])
print(population_data[1][1] , type(population_data[1][1]))

    The values will need to be integers.

In [None]:
# step 1: we will need to iterate through all of the rows (list elements) 

for row_index in range(1, len(population_data)):
    print(population_data[row_index])

In [None]:
# step 2: we need to iterate through the column values in each row

for row_index in range(1, len(population_data)):
    for column_value_index in range(1, len(population_data[row_index])):
        print(population_data[row_index][column_value_index])

In [None]:
# step 3: we need to update the element value and cast it to a new data type 
for row_index in range(1, len(population_data)):
    for column_value_index in range(1, len(population_data[row_index])):
        population_data[row_index][column_value_index] = int(population_data[row_index][column_value_index])


In [None]:
# step 4: CHECK YOUR WORK... What are the data types of population values in the rows?
print(population_data[1])
print(population_data[1][1] , type(population_data[1][1]))

#### Summary List

In [None]:
# create a new list (table) to store summary data
# add country names in single elements lists 
population_summary = []

#method 1 
population_summary = [[population_data[row][0]] for row in range(0,len(population_data))]

#method 2 

# # add countries to column 0
# for row in range(0, len(population_data)):

#     # at each new element add an empty list 
#     population_summary.append([])
    
#     # append the country name into the new list 
#     population_summary[row].append(population_data[row][0])

# #method 3
# for row in range(0, len(population_data)):
    
#     # append the [country name] into the new list 
#     population_summary.append([population_data[row][0]])

    To find the other column values, we use for-loops to iterate over population_summary, find the highest and lowest population values and we will add them to their respective rows. In the same iteration, we will also find the highest and lowest years as well as the relative percent change.




In [None]:
# figure out lowest values

# lowest values code explained 

""" 
    lowest = min(population_data[row][1:])
    lowest_index = population_data[row].index(lowest)
    lowest_year = population_data[0][lowest_index]
"""

print("lowest: ", min(population_data[1][1:]))
      
print("index: ", population_data[1][1:].index(min(population_data[1][1:])))

print("year: ", population_data[1][1:][0])

In [None]:
# figure out highest values 

# highest values code explained 
""" 
    highest = max(population_data[row][1:])
    highest_index = population_data[row].index(highest)
    highest_year = population_data[0][highest_index]
"""


print("highest: ", max(population_data[1][1:]))
      
print("index: ", population_data[1][1:].index(max(population_data[1][1:])))

print("year: ", population_data[0][13])

In [None]:
# figure out relative percent change from 2020-2100

# relative change in population code explained
# assumes data is in order of past - future order 
# (V2) newest value minus the (V1) oldest value 
# divided by the (V1) oldest value 
# multplyed by 100 to get the percent value 

(population_data[1][-1] - population_data[1][1]) / population_data[1][1] * 100


![img](./image/prctchange.png)

#### Summary table for loop 

In [None]:
# add highest/lowest projected population 
for row in range(1, len(population_data)):
    # see lowest example above
    lowest = min(population_data[row][1:])
    lowest_index = population_data[row].index(lowest)
    lowest_year = population_data[0][lowest_index]

    #see highest example above 
    highest = max(population_data[row][1:])
    highest_index = population_data[row].index(highest)
    highest_year = population_data[0][highest_index]

    # appending the values to the new list 
    population_summary[row].append(lowest)
    population_summary[row].append(lowest_year)
    population_summary[row].append(highest)
    population_summary[row].append(highest_year)

    # add relative change in population 2020-2100
    dev = round((population_data[row][-1] - population_data[row][1]) / population_data[row][1] * 100, 2)
    
    # append dev value
    population_summary[row].append(dev)
 

In [None]:
# add column names 
col_names = ['country', 'lowest_population', 'lowest_population_year', 'highest_population', 'highest_population_year', 'percent_change_2020_2100']

population_summary[0] = col_names

In [None]:
population_summary

### Optional: Export Summary Table

In [None]:
# export to csv file

with open ('summarized.csv', 'w') as file:
    csv_writer = csv.writer(file, delimiter=',')
    for row in population_summary:
        csv_writer.writerow(row)

#### Create a subset for visualization

In [None]:
# sort list by percent change, highest to lowest

population_summary_srt = sorted(population_summary[1:], reverse=True, key=lambda x: x[5])

In [None]:
# define new list, growth_percent, and append first and last 5 rows

growth_percent = [] 
# append 5 first and 5 last from sorted list
for row in population_summary_srt[:6]:
    growth_percent.append(row)
for row in population_summary_srt[-5:]:
    growth_percent.append(row)

In [None]:
#run this cell whithout changes (you have not learned this tool yet)

# make a horisontal bar plot
import matplotlib.pyplot as plt

# save values for x and y 
countries = []
rel_change = []

for row in range(1, len(growth_percent)):
    #appending the country name value in position 0 
    countries.append(growth_percent[row][0])
    #appending the last value in the row (relative change)
    rel_change.append(growth_percent[row][-1])

plt.grid()
plt.barh(countries, rel_change)
plt.title('World Population Projection 2019 - 2100.\n Countries with Largest Growth and Decline Respectively')
plt.xlabel('Change factor in percent')
plt.show()

    You brought this to your stakeholders and they were very pleased! Great Work! However, their have been a last minute addition to the analysis...They also want to view for only European countries.

#### Stakeholder Request: Europe Subset

In [None]:


# create a list with European countries
europe = ['Russia', 'Germany', 'United Kingdom', 'France', 'Italy', 'Spain', 'Ukraine', 'Poland', 'Romania', 'Netherlands', 
          'Belgium', 'Czech Republic', 'Greece', 'Portugal', 'Sweden', 'Hungary', 'Belarus', 'Austria', 'Serbia', 
          'Switzerland', 'Bulgaria', 'Denmark', 'Finland', 'Slovak Republic','Norway', 'Ireland', 'Croatia', 'Moldova', 
          'Bosnia and Herzegovina', 'Albania',	'Lithuania','Macedonia, FYR', 'Slovenia', 'Latvia', 'Kosovo', 'Estonia', 
          'Montenegro', 'Luxembourg', 'Malta', 'Iceland', 'Andorra', 'Monaco', 'Liechtenstein', 'San Marino', 'Holy See']


In [None]:
# Define a new list, europe_list, and append only European Countries

europe_list = []

# find all countries in population_summary and add to europ_list

for row in range(1, len(population_summary)):
    if population_summary[row][0] in europe:
        europe_list.append(population_summary[row])

In [None]:
# sort list by percent change, highest to lowest
europe_list_srt = sorted(europe_list, reverse=True, key=lambda x: x[5])

In [None]:
# define new list, europe_growth, and append first and last 5 rows 
europe_growth = []

# append 5 first and 5 last from sorted list
for row in europe_list_srt[:6]:
    europe_growth.append(row)
for row in europe_list_srt[-5:]:
    europe_growth.append(row)

In [None]:
# run this cell whithout changes (you have not learned this tool yet)

# make a horisontal bar plot
import matplotlib.pyplot as plt

# save values for x and y 
europe_countries = []
europe_percent_change = []

for row in range(1, len(europe_growth)):
    #appending the country name value in position 0 
    europe_countries.append(europe_growth[row][0])
    #appending the last value in the row (relative change)
    europe_percent_change.append(europe_growth[row][-1])

plt.grid()
plt.barh(europe_countries, europe_percent_change)
plt.title('World Population Projection 2019 - 2100.\n European Countries with Largest Growth and Decline Respectively')
plt.xlabel('Change factor in percent')
plt.show()

    Congratulations! You brought this to your stakeholders and they were very pleased, great Work! However, yet again, there was a last minute addition to the analysis...they also want a comparison of projected population development between countries.

### Stakeholder Request: Projected population development between countries

In [None]:
# define new list, pop_norm, to add new values to

pop_norm = []



In [None]:
population_data

In [None]:
# iterate through population data and add country names in single element lists

# add country
for row in range(len(population_data)):
    pop_norm.append([population_data[row][0]])
pop_norm

### Normalizing Values Notes: 

    As some countries have large populations and some countries have small
    populations it is difficult to compare their population development. We can
    solve this by normalizing the population values (Scaling technique). We will do this by setting
    the year 2020 as an index year, and all other years will be presented in
    relation to this index year. The population of each year is divided by the
    population of the index year and multiplied by 100.

    https://www.economicsdiscussion.net/price/index-number/index-numbers-characteristics-formula-examples-types-importance-and-limitations/31211

![img](./image/norm_index_equation.png)

In [None]:
# add normalized values for each year, use 2020 as index year

for row_index in range(1,len(population_data)):
    for column_value in range(1, len(population_data[row_index])):
        pop_norm[row_index].append(round(population_data[row_index][column_value]/population_data[row_index][1]*100, 2))


In [None]:
pop_norm[0:5]

In [None]:
# add column names 
cols = ['country', '2020', '2025', '2030', '2035', '2040', '2045', '2050', '2055', '2060', '2065', '2070', '2075', '2080', '2085', '2090', '2095', '2100']

pop_norm[0] = cols

In [None]:
pop_norm[0:5]

In [None]:
#print first five rows

for row in pop_norm[:5]:
    print(row)

In [None]:
# sort the list into a new list called pop_norm_sorted

pop_norm_sorted = sorted(pop_norm[1:], reverse=True, key=lambda x: x[-1])

In [None]:

# define new list, population_development, and append first and last 5 rows

population_development = []

# append 5 first and 5 last from sorted list
for row in pop_norm_sorted[:6]:
    population_development.append(row)
for row in pop_norm_sorted[-5:]:
    population_development.append(row)

In [None]:
# run this cell without changes 

fig,ax = plt.subplots()

for i in range(2, len(population_development)):
    ax.plot(population_development[0][1:],
            population_development[i][1:],
            label = population_development[i][0])

fig.set_figwidth(12)
fig.set_figheight(8)

plt.grid()

plt.legend()
plt.title('World Population Projection 2019 - 2100.\n Population Development')
plt.xlabel('Population Years')
plt.ylabel('Relative Population Development, Index year 2020')
plt.ylim(0,600)
plt.yscale('linear')
y_tick_labels = [0, 100, 200, 300, 400, 500]
ax.set_yticklabels(y_tick_labels)
ind = y_tick_labels.index(100)

gridlines = ax.yaxis.get_gridlines()
gridlines[ind].set_color("k")
gridlines[ind].set_linewidth(1.5)

plt.show()


    Congratulations...Again! You brought this additional analysis to your stakeholders and they were very pleased, great work! However, as predicted, there was a last minute addition to the analysis...they also want a comparison of projected population development between a specific list of countries that was provided to you.

### Stakeholder Request: Population Development Subset

In [None]:
pop_dev_subset_list= ['Australia', 'Japan', 'Moldova', 'Sweden', 'United Kingdom', 'United States']

In [None]:
#define a new list pop_dev_subset and add values from pop_norm only if in pop_dev_subset_list


pop_dev_subset = []

for i in pop_norm[1:]:
    if i[0] in pop_dev_subset_list:
        pop_dev_subset.append(i)

In [None]:
# add column names
cols = ['country', '2020', '2025', '2030', '2035', '2040', '2045', '2050', '2055', '2060', '2065', '2070', '2075', '2080', '2085', '2090', '2095', '2100']
pop_dev_subset.insert(1, cols)

    Congrats...For real this time! There were no more last minute additions. You are finished! 