<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Obtaining-&amp;-Cleaning-Data-from-CIA-Factbook" data-toc-modified-id="Obtaining-&amp;-Cleaning-Data-from-CIA-Factbook-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Obtaining &amp; Cleaning Data from CIA Factbook</a></span><ul class="toc-item"><li><span><a href="#Imports" data-toc-modified-id="Imports-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Imports</a></span><ul class="toc-item"><li><span><a href="#Keeping-the-rows-with-countries" data-toc-modified-id="Keeping-the-rows-with-countries-1.1.1"><span class="toc-item-num">1.1.1&nbsp;&nbsp;</span>Keeping the rows with countries</a></span></li></ul></li><li><span><a href="#Quick-check-for-nulls" data-toc-modified-id="Quick-check-for-nulls-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Quick check for nulls</a></span></li><li><span><a href="#Creating-a-dataframe-with-only-the-columns-that-might-be-relevant-to-the-protest-dataset" data-toc-modified-id="Creating-a-dataframe-with-only-the-columns-that-might-be-relevant-to-the-protest-dataset-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Creating a dataframe with only the columns that might be relevant to the protest dataset</a></span></li><li><span><a href="#Saving-an-unclean-version-of-the-csv" data-toc-modified-id="Saving-an-unclean-version-of-the-csv-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>Saving an unclean version of the csv</a></span></li><li><span><a href="#Starting-to-dig-deeper-into-each-column-in-the-dataframe" data-toc-modified-id="Starting-to-dig-deeper-into-each-column-in-the-dataframe-1.5"><span class="toc-item-num">1.5&nbsp;&nbsp;</span>Starting to dig deeper into each column in the dataframe</a></span><ul class="toc-item"><li><span><a href="#Language-Column" data-toc-modified-id="Language-Column-1.5.1"><span class="toc-item-num">1.5.1&nbsp;&nbsp;</span>Language Column</a></span></li><li><span><a href="#Transnational-Disputes-Column" data-toc-modified-id="Transnational-Disputes-Column-1.5.2"><span class="toc-item-num">1.5.2&nbsp;&nbsp;</span>Transnational Disputes Column</a></span></li><li><span><a href="#Natural-Resources-Column" data-toc-modified-id="Natural-Resources-Column-1.5.3"><span class="toc-item-num">1.5.3&nbsp;&nbsp;</span>Natural Resources Column</a></span></li><li><span><a href="#Military-Spending-Column" data-toc-modified-id="Military-Spending-Column-1.5.4"><span class="toc-item-num">1.5.4&nbsp;&nbsp;</span>Military Spending Column</a></span></li><li><span><a href="#Filling-in-the-rest-of-the-null-values-with-0-for-future-modeling" data-toc-modified-id="Filling-in-the-rest-of-the-null-values-with-0-for-future-modeling-1.5.5"><span class="toc-item-num">1.5.5&nbsp;&nbsp;</span>Filling in the rest of the null values with 0 for future modeling</a></span></li></ul></li><li><span><a href="#Checking-the-data-types" data-toc-modified-id="Checking-the-data-types-1.6"><span class="toc-item-num">1.6&nbsp;&nbsp;</span>Checking the data types</a></span></li><li><span><a href="#Combining-the-CIA-data-with-the-protest-data" data-toc-modified-id="Combining-the-CIA-data-with-the-protest-data-1.7"><span class="toc-item-num">1.7&nbsp;&nbsp;</span>Combining the CIA data with the protest data</a></span><ul class="toc-item"><li><span><a href="#Importing-the-global-protest-data" data-toc-modified-id="Importing-the-global-protest-data-1.7.1"><span class="toc-item-num">1.7.1&nbsp;&nbsp;</span>Importing the global protest data</a></span></li><li><span><a href="#Merging-on-the-Protest-Data-with-Country-name-as-the-index" data-toc-modified-id="Merging-on-the-Protest-Data-with-Country-name-as-the-index-1.7.2"><span class="toc-item-num">1.7.2&nbsp;&nbsp;</span>Merging on the Protest Data with Country name as the index</a></span></li><li><span><a href="#Exporting-the-newly-merged-dataframe-as-a-csv" data-toc-modified-id="Exporting-the-newly-merged-dataframe-as-a-csv-1.7.3"><span class="toc-item-num">1.7.3&nbsp;&nbsp;</span>Exporting the newly merged dataframe as a csv</a></span></li></ul></li></ul></li></ul></div>

# Obtaining & Cleaning Data from CIA Factbook

## Imports

In [41]:
import pandas as pd
from pandas.io.json import json_normalize

In [42]:
data= pd.read_json('C:/Users/kendr/Downloads/weekly_json/weekly_json/2020-09-21_factbook.json')

In [43]:
data['countries'].drop('date', inplace=True)

### Keeping the rows with countries

The first row was a global summary and the final couple three rows also did not have information for individual countries. I only kept the relevent information.

In [4]:
data = data['countries'][1:-3]

In [5]:
cia_df = pd.json_normalize(data, errors= 'ignore')

## Quick check for nulls

Several columns have over 200 null values. I will drop those columns.

In [6]:
cia_df.isna().sum()

data.name                                                   0
data.introduction.background                                0
data.geography.location                                     0
data.geography.geographic_coordinates.latitude.degrees      4
data.geography.geographic_coordinates.latitude.minutes      4
                                                         ... 
data.government.union_name.abbreviation                   257
data.government.political_structure                       257
data.government.capital.geographic_coordinates            257
data.government.member_states                             257
data.transportation.ports_and_terminals.major_ports       257
Length: 845, dtype: int64

In [7]:
cia_df.drop(columns = ([i for i in cia_df.columns if cia_df[i].isna().sum() > 200]), inplace=True)

In [8]:
cia_df.isna().sum()

data.name                                                   0
data.introduction.background                                0
data.geography.location                                     0
data.geography.geographic_coordinates.latitude.degrees      4
data.geography.geographic_coordinates.latitude.minutes      4
                                                         ... 
data.geography.maritime_claims.contiguous_zone.value      171
data.geography.maritime_claims.contiguous_zone.units      171
data.people.hiv_aids.adult_prevalence_rate.global_rank    123
data.people.hiv_aids.deaths.global_rank                   192
data.government.administrative_divisions                  152
Length: 569, dtype: int64

In [9]:
pd.set_option('display.max_columns', None)

cia_df.head()

Unnamed: 0,data.name,data.introduction.background,data.geography.location,data.geography.geographic_coordinates.latitude.degrees,data.geography.geographic_coordinates.latitude.minutes,data.geography.geographic_coordinates.latitude.hemisphere,data.geography.geographic_coordinates.longitude.degrees,data.geography.geographic_coordinates.longitude.minutes,data.geography.geographic_coordinates.longitude.hemisphere,data.geography.map_references,data.geography.area.total.value,data.geography.area.total.units,data.geography.area.land.value,data.geography.area.land.units,data.geography.area.water.value,data.geography.area.water.units,data.geography.area.global_rank,data.geography.area.comparative,data.geography.land_boundaries.total.value,data.geography.land_boundaries.total.units,data.geography.land_boundaries.border_countries,data.geography.coastline.value,data.geography.coastline.units,data.geography.coastline.note,data.geography.climate,data.geography.terrain,data.geography.elevation.mean_elevation.value,data.geography.elevation.mean_elevation.units,data.geography.elevation.lowest_point.name,data.geography.elevation.lowest_point.elevation.value,data.geography.elevation.lowest_point.elevation.units,data.geography.elevation.highest_point.name,data.geography.elevation.highest_point.elevation.value,data.geography.elevation.highest_point.elevation.units,data.geography.natural_resources.resources,data.geography.land_use.by_sector.agricultural_land_total.value,data.geography.land_use.by_sector.agricultural_land_total.units,data.geography.land_use.by_sector.arable_land.value,data.geography.land_use.by_sector.arable_land.units,data.geography.land_use.by_sector.arable_land.note,data.geography.land_use.by_sector.permanent_crops.value,data.geography.land_use.by_sector.permanent_crops.units,data.geography.land_use.by_sector.permanent_crops.note,data.geography.land_use.by_sector.permanent_pasture.value,data.geography.land_use.by_sector.permanent_pasture.units,data.geography.land_use.by_sector.forest.value,data.geography.land_use.by_sector.forest.units,data.geography.land_use.by_sector.other.value,data.geography.land_use.by_sector.other.units,data.geography.land_use.date,data.geography.irrigated_land.value,data.geography.irrigated_land.units,data.geography.irrigated_land.date,data.geography.population_distribution,data.geography.natural_hazards,data.geography.environment.current_issues,data.geography.environment.international_agreements.party_to,data.geography.environment.international_agreements.signed_but_not_ratified,data.people.population.total,data.people.population.global_rank,data.people.population.date,data.people.nationality.noun,data.people.nationality.adjective,data.people.ethnic_groups.ethnicity,data.people.ethnic_groups.note,data.people.ethnic_groups.date,data.people.languages.language,data.people.languages.note,data.people.languages.date,data.people.religions.religion,data.people.religions.date,data.people.age_structure.0_to_14.percent,data.people.age_structure.0_to_14.males,data.people.age_structure.0_to_14.females,data.people.age_structure.15_to_24.percent,data.people.age_structure.15_to_24.males,data.people.age_structure.15_to_24.females,data.people.age_structure.25_to_54.percent,data.people.age_structure.25_to_54.males,data.people.age_structure.25_to_54.females,data.people.age_structure.55_to_64.percent,data.people.age_structure.55_to_64.males,data.people.age_structure.55_to_64.females,data.people.age_structure.65_and_over.percent,data.people.age_structure.65_and_over.males,data.people.age_structure.65_and_over.females,data.people.age_structure.date,data.people.dependency_ratios.ratios.total_dependency_ratio.value,data.people.dependency_ratios.ratios.total_dependency_ratio.units,data.people.dependency_ratios.ratios.youth_dependency_ratio.value,data.people.dependency_ratios.ratios.youth_dependency_ratio.units,data.people.dependency_ratios.ratios.elderly_dependency_ratio.value,data.people.dependency_ratios.ratios.elderly_dependency_ratio.units,data.people.dependency_ratios.ratios.potential_support_ratio.value,data.people.dependency_ratios.ratios.potential_support_ratio.units,data.people.dependency_ratios.date,data.people.median_age.total.value,data.people.median_age.total.units,data.people.median_age.male.value,data.people.median_age.male.units,data.people.median_age.female.value,data.people.median_age.female.units,data.people.median_age.global_rank,data.people.median_age.date,data.people.population_growth_rate.growth_rate,data.people.population_growth_rate.global_rank,data.people.population_growth_rate.date,data.people.birth_rate.births_per_1000_population,data.people.birth_rate.global_rank,data.people.birth_rate.date,data.people.death_rate.deaths_per_1000_population,data.people.death_rate.global_rank,data.people.death_rate.date,data.people.net_migration_rate.migrants_per_1000_population,data.people.net_migration_rate.global_rank,data.people.net_migration_rate.date,data.people.population_distribution,data.people.urbanization.urban_population.value,data.people.urbanization.urban_population.units,data.people.urbanization.urban_population.date,data.people.urbanization.rate_of_urbanization.value,data.people.urbanization.rate_of_urbanization.units,data.people.major_urban_areas.places,data.people.major_urban_areas.date,data.people.sex_ratio.by_age.at_birth.value,data.people.sex_ratio.by_age.at_birth.units,data.people.sex_ratio.by_age.0_to_14_years.value,data.people.sex_ratio.by_age.0_to_14_years.units,data.people.sex_ratio.by_age.15_to_24_years.value,data.people.sex_ratio.by_age.15_to_24_years.units,data.people.sex_ratio.by_age.25_to_54_years.value,data.people.sex_ratio.by_age.25_to_54_years.units,data.people.sex_ratio.by_age.55_to_64_years.value,data.people.sex_ratio.by_age.55_to_64_years.units,data.people.sex_ratio.by_age.65_years_and_over.value,data.people.sex_ratio.by_age.65_years_and_over.units,data.people.sex_ratio.total_population.value,data.people.sex_ratio.total_population.units,data.people.sex_ratio.date,data.people.mothers_mean_age_at_first_birth.age,data.people.mothers_mean_age_at_first_birth.date,data.people.maternal_mortality_rate.deaths_per_100k_live_births,data.people.maternal_mortality_rate.global_rank,data.people.maternal_mortality_rate.date,data.people.infant_mortality_rate.total.value,data.people.infant_mortality_rate.total.units,data.people.infant_mortality_rate.male.value,data.people.infant_mortality_rate.male.units,data.people.infant_mortality_rate.female.value,data.people.infant_mortality_rate.female.units,data.people.infant_mortality_rate.global_rank,data.people.infant_mortality_rate.date,data.people.life_expectancy_at_birth.total_population.value,data.people.life_expectancy_at_birth.total_population.units,data.people.life_expectancy_at_birth.male.value,data.people.life_expectancy_at_birth.male.units,data.people.life_expectancy_at_birth.female.value,data.people.life_expectancy_at_birth.female.units,data.people.life_expectancy_at_birth.global_rank,data.people.life_expectancy_at_birth.date,data.people.total_fertility_rate.children_born_per_woman,data.people.total_fertility_rate.global_rank,data.people.total_fertility_rate.date,data.people.contraceptive_prevalence_rate.value,data.people.contraceptive_prevalence_rate.units,data.people.contraceptive_prevalence_rate.date,data.people.physicians_density.physicians_per_1000_population,data.people.physicians_density.date,data.people.hospital_bed_density.beds_per_1000_population,data.people.hospital_bed_density.date,data.people.drinking_water_source.improved.urban.value,data.people.drinking_water_source.improved.urban.units,data.people.drinking_water_source.improved.rural.value,data.people.drinking_water_source.improved.rural.units,data.people.drinking_water_source.improved.total.value,data.people.drinking_water_source.improved.total.units,data.people.drinking_water_source.unimproved.urban.value,data.people.drinking_water_source.unimproved.urban.units,data.people.drinking_water_source.unimproved.rural.value,data.people.drinking_water_source.unimproved.rural.units,data.people.drinking_water_source.unimproved.total.value,data.people.drinking_water_source.unimproved.total.units,data.people.drinking_water_source.date,data.people.sanitation_facility_access.improved.urban.value,data.people.sanitation_facility_access.improved.urban.units,data.people.sanitation_facility_access.improved.rural.value,data.people.sanitation_facility_access.improved.rural.units,data.people.sanitation_facility_access.improved.total.value,data.people.sanitation_facility_access.improved.total.units,data.people.sanitation_facility_access.unimproved.urban.value,data.people.sanitation_facility_access.unimproved.urban.units,data.people.sanitation_facility_access.unimproved.rural.value,data.people.sanitation_facility_access.unimproved.rural.units,data.people.sanitation_facility_access.unimproved.total.value,data.people.sanitation_facility_access.unimproved.total.units,data.people.sanitation_facility_access.date,data.people.hiv_aids.adult_prevalence_rate.percent_of_adults,data.people.hiv_aids.adult_prevalence_rate.date,data.people.hiv_aids.people_living_with_hiv_aids.total,data.people.hiv_aids.people_living_with_hiv_aids.global_rank,data.people.hiv_aids.people_living_with_hiv_aids.date,data.people.hiv_aids.deaths.total,data.people.hiv_aids.deaths.date,data.people.major_infectious_diseases.degree_of_risk,data.people.major_infectious_diseases.food_or_waterborne_diseases,data.people.major_infectious_diseases.vectorborne_diseases,data.people.major_infectious_diseases.date,data.people.adult_obesity.percent_of_adults,data.people.adult_obesity.global_rank,data.people.adult_obesity.date,data.people.underweight_children.percent_of_children_under_the_age_of_five,data.people.underweight_children.global_rank,data.people.underweight_children.date,data.people.education_expenditures.percent_of_gdp,data.people.education_expenditures.global_rank,data.people.education_expenditures.date,data.people.literacy.definition,data.people.literacy.total_population.value,data.people.literacy.total_population.units,data.people.literacy.male.value,data.people.literacy.male.units,data.people.literacy.female.value,data.people.literacy.female.units,data.people.literacy.date,data.people.school_life_expectancy.total.value,data.people.school_life_expectancy.total.units,data.people.school_life_expectancy.male.value,data.people.school_life_expectancy.male.units,data.people.school_life_expectancy.female.value,data.people.school_life_expectancy.female.units,data.people.school_life_expectancy.date,data.people.youth_unemployment.total.value,data.people.youth_unemployment.total.units,data.people.youth_unemployment.male.value,data.people.youth_unemployment.male.units,data.people.youth_unemployment.female.value,data.people.youth_unemployment.female.units,data.people.youth_unemployment.global_rank,data.people.youth_unemployment.date,data.government.country_name.conventional_long_form,data.government.country_name.conventional_short_form,data.government.country_name.local_long_form,data.government.country_name.local_short_form,data.government.country_name.former,data.government.country_name.etymology,data.government.government_type,data.government.capital.name,data.government.capital.geographic_coordinates.latitude.degrees,data.government.capital.geographic_coordinates.latitude.minutes,data.government.capital.geographic_coordinates.latitude.hemisphere,data.government.capital.geographic_coordinates.longitude.degrees,data.government.capital.geographic_coordinates.longitude.minutes,data.government.capital.geographic_coordinates.longitude.hemisphere,data.government.capital.time_difference.timezone,data.government.capital.time_difference.note,data.government.capital.daylight_saving_time,data.government.capital.etymology,data.government.independence.date,data.government.independence.note,data.government.national_holidays,data.government.constitution.history,data.government.constitution.amendments,data.government.legal_system,data.government.international_law_organization_participation,data.government.citizenship.citizenship_by_birth,data.government.citizenship.citizenship_by_descent_only,data.government.citizenship.dual_citizenship_recognized,data.government.citizenship.residency_requirement_for_naturalization,data.government.suffrage.age,data.government.suffrage.universal,data.government.suffrage.compulsory,data.government.executive_branch.chief_of_state,data.government.executive_branch.head_of_government,data.government.executive_branch.cabinet,data.government.executive_branch.elections_appointments,data.government.executive_branch.election_results,data.government.legislative_branch.description,data.government.legislative_branch.elections,data.government.legislative_branch.election_results,data.government.judicial_branch.highest_courts,data.government.judicial_branch.judge_selection_and_term_of_office,data.government.judicial_branch.subordinate_courts,data.government.political_parties_and_leaders.note,data.government.international_organization_participation,data.government.diplomatic_representation.in_united_states.chancery,data.government.diplomatic_representation.in_united_states.telephone,data.government.diplomatic_representation.in_united_states.fax,data.government.diplomatic_representation.in_united_states.consulates_general,data.government.diplomatic_representation.from_united_states.chief_of_mission,data.government.diplomatic_representation.from_united_states.telephone,data.government.diplomatic_representation.from_united_states.embassy,data.government.diplomatic_representation.from_united_states.mailing_address,data.government.diplomatic_representation.from_united_states.fax,data.government.flag_description.description,data.government.flag_description.note,data.government.national_symbol.symbols,data.government.national_symbol.colors,data.government.national_anthem.name,data.government.national_anthem.lyrics_music,data.government.national_anthem.note,data.government.national_anthem.audio_url,data.economy.overview,data.economy.gdp.purchasing_power_parity.annual_values,data.economy.gdp.purchasing_power_parity.global_rank,data.economy.gdp.purchasing_power_parity.note,data.economy.gdp.official_exchange_rate.USD,data.economy.gdp.official_exchange_rate.date,data.economy.gdp.real_growth_rate.annual_values,data.economy.gdp.real_growth_rate.global_rank,data.economy.gdp.per_capita_purchasing_power_parity.annual_values,data.economy.gdp.per_capita_purchasing_power_parity.global_rank,data.economy.gdp.per_capita_purchasing_power_parity.note,data.economy.gdp.composition.by_end_use.end_uses.household_consumption.value,data.economy.gdp.composition.by_end_use.end_uses.household_consumption.units,data.economy.gdp.composition.by_end_use.end_uses.government_consumption.value,data.economy.gdp.composition.by_end_use.end_uses.government_consumption.units,data.economy.gdp.composition.by_end_use.end_uses.investment_in_fixed_capital.value,data.economy.gdp.composition.by_end_use.end_uses.investment_in_fixed_capital.units,data.economy.gdp.composition.by_end_use.end_uses.investment_in_inventories.value,data.economy.gdp.composition.by_end_use.end_uses.investment_in_inventories.units,data.economy.gdp.composition.by_end_use.end_uses.exports_of_goods_and_services.value,data.economy.gdp.composition.by_end_use.end_uses.exports_of_goods_and_services.units,data.economy.gdp.composition.by_end_use.end_uses.imports_of_goods_and_services.value,data.economy.gdp.composition.by_end_use.end_uses.imports_of_goods_and_services.units,data.economy.gdp.composition.by_end_use.date,data.economy.gdp.composition.by_sector_of_origin.sectors.agriculture.value,data.economy.gdp.composition.by_sector_of_origin.sectors.agriculture.units,data.economy.gdp.composition.by_sector_of_origin.sectors.industry.value,data.economy.gdp.composition.by_sector_of_origin.sectors.industry.units,data.economy.gdp.composition.by_sector_of_origin.sectors.services.value,data.economy.gdp.composition.by_sector_of_origin.sectors.services.units,data.economy.gdp.composition.by_sector_of_origin.date,data.economy.gross_national_saving.annual_values,data.economy.gross_national_saving.global_rank,data.economy.agriculture_products.products,data.economy.industries.industries,data.economy.industrial_production_growth_rate.annual_percentage_increase,data.economy.industrial_production_growth_rate.global_rank,data.economy.industrial_production_growth_rate.date,data.economy.labor_force.total_size.total_people,data.economy.labor_force.total_size.global_rank,data.economy.labor_force.total_size.date,data.economy.labor_force.by_occupation.occupation.agriculture.value,data.economy.labor_force.by_occupation.occupation.agriculture.units,data.economy.labor_force.by_occupation.occupation.industry.value,data.economy.labor_force.by_occupation.occupation.industry.units,data.economy.labor_force.by_occupation.occupation.services.value,data.economy.labor_force.by_occupation.occupation.services.units,data.economy.labor_force.by_occupation.date,data.economy.unemployment_rate.annual_values,data.economy.unemployment_rate.global_rank,data.economy.population_below_poverty_line.value,data.economy.population_below_poverty_line.units,data.economy.population_below_poverty_line.date,data.economy.household_income_by_percentage_share.lowest_ten_percent.value,data.economy.household_income_by_percentage_share.lowest_ten_percent.units,data.economy.household_income_by_percentage_share.highest_ten_percent.value,data.economy.household_income_by_percentage_share.highest_ten_percent.units,data.economy.household_income_by_percentage_share.date,data.economy.budget.revenues.value,data.economy.budget.revenues.units,data.economy.budget.expenditures.value,data.economy.budget.expenditures.units,data.economy.budget.date,data.economy.taxes_and_other_revenues.percent_of_gdp,data.economy.taxes_and_other_revenues.global_rank,data.economy.taxes_and_other_revenues.date,data.economy.budget_surplus_or_deficit.percent_of_gdp,data.economy.budget_surplus_or_deficit.global_rank,data.economy.budget_surplus_or_deficit.date,data.economy.public_debt.annual_values,data.economy.public_debt.global_rank,data.economy.fiscal_year.start,data.economy.fiscal_year.end,data.economy.inflation_rate.annual_values,data.economy.inflation_rate.global_rank,data.economy.current_account_balance.annual_values,data.economy.current_account_balance.global_rank,data.economy.exports.total_value.annual_values,data.economy.exports.total_value.global_rank,data.economy.exports.commodities.by_commodity,data.economy.exports.partners.by_country,data.economy.exports.partners.date,data.economy.imports.total_value.annual_values,data.economy.imports.total_value.global_rank,data.economy.imports.commodities.by_commodity,data.economy.imports.partners.by_country,data.economy.imports.partners.date,data.economy.reserves_of_foreign_exchange_and_gold.annual_values,data.economy.reserves_of_foreign_exchange_and_gold.global_rank,data.economy.external_debt.annual_values,data.economy.external_debt.global_rank,data.economy.exchange_rates.annual_values,data.economy.exchange_rates.note,data.energy.electricity.access.population_without_electricity.value,data.energy.electricity.access.population_without_electricity.units,data.energy.electricity.access.total_electrification.value,data.energy.electricity.access.total_electrification.units,data.energy.electricity.access.urban_electrification.value,data.energy.electricity.access.urban_electrification.units,data.energy.electricity.access.rural_electrification.value,data.energy.electricity.access.rural_electrification.units,data.energy.electricity.access.date,data.energy.electricity.production.kWh,data.energy.electricity.production.global_rank,data.energy.electricity.production.date,data.energy.electricity.consumption.kWh,data.energy.electricity.consumption.global_rank,data.energy.electricity.consumption.date,data.energy.electricity.exports.kWh,data.energy.electricity.exports.global_rank,data.energy.electricity.exports.date,data.energy.electricity.imports.kWh,data.energy.electricity.imports.global_rank,data.energy.electricity.imports.date,data.energy.electricity.installed_generating_capacity.kW,data.energy.electricity.installed_generating_capacity.global_rank,data.energy.electricity.installed_generating_capacity.date,data.energy.electricity.by_source.fossil_fuels.percent,data.energy.electricity.by_source.fossil_fuels.global_rank,data.energy.electricity.by_source.fossil_fuels.date,data.energy.electricity.by_source.nuclear_fuels.percent,data.energy.electricity.by_source.nuclear_fuels.global_rank,data.energy.electricity.by_source.nuclear_fuels.date,data.energy.electricity.by_source.hydroelectric_plants.percent,data.energy.electricity.by_source.hydroelectric_plants.global_rank,data.energy.electricity.by_source.hydroelectric_plants.date,data.energy.electricity.by_source.other_renewable_sources.percent,data.energy.electricity.by_source.other_renewable_sources.global_rank,data.energy.electricity.by_source.other_renewable_sources.date,data.energy.crude_oil.production.bbl_per_day,data.energy.crude_oil.production.global_rank,data.energy.crude_oil.production.date,data.energy.crude_oil.exports.bbl_per_day,data.energy.crude_oil.exports.global_rank,data.energy.crude_oil.exports.date,data.energy.crude_oil.imports.bbl_per_day,data.energy.crude_oil.imports.global_rank,data.energy.crude_oil.imports.date,data.energy.crude_oil.proved_reserves.bbl,data.energy.crude_oil.proved_reserves.global_rank,data.energy.crude_oil.proved_reserves.date,data.energy.refined_petroleum_products.production.bbl_per_day,data.energy.refined_petroleum_products.production.global_rank,data.energy.refined_petroleum_products.production.date,data.energy.refined_petroleum_products.consumption.bbl_per_day,data.energy.refined_petroleum_products.consumption.global_rank,data.energy.refined_petroleum_products.consumption.date,data.energy.refined_petroleum_products.exports.bbl_per_day,data.energy.refined_petroleum_products.exports.global_rank,data.energy.refined_petroleum_products.exports.date,data.energy.refined_petroleum_products.imports.bbl_per_day,data.energy.refined_petroleum_products.imports.global_rank,data.energy.refined_petroleum_products.imports.date,data.energy.natural_gas.production.cubic_metres,data.energy.natural_gas.production.global_rank,data.energy.natural_gas.production.date,data.energy.natural_gas.consumption.cubic_metres,data.energy.natural_gas.consumption.global_rank,data.energy.natural_gas.consumption.date,data.energy.natural_gas.exports.cubic_metres,data.energy.natural_gas.exports.global_rank,data.energy.natural_gas.exports.date,data.energy.natural_gas.imports.cubic_metres,data.energy.natural_gas.imports.global_rank,data.energy.natural_gas.imports.date,data.energy.natural_gas.proved_reserves.cubic_metres,data.energy.natural_gas.proved_reserves.global_rank,data.energy.natural_gas.proved_reserves.date,data.energy.carbon_dioxide_emissions_from_consumption_of_energy.megatonnes,data.energy.carbon_dioxide_emissions_from_consumption_of_energy.global_rank,data.energy.carbon_dioxide_emissions_from_consumption_of_energy.date,data.communications.telephones.fixed_lines.total_subscriptions,data.communications.telephones.fixed_lines.subscriptions_per_one_hundred_inhabitants,data.communications.telephones.fixed_lines.global_rank,data.communications.telephones.fixed_lines.date,data.communications.telephones.mobile_cellular.total_subscriptions,data.communications.telephones.mobile_cellular.subscriptions_per_one_hundred_inhabitants,data.communications.telephones.mobile_cellular.global_rank,data.communications.telephones.mobile_cellular.date,data.communications.broadcast_media,data.communications.internet.country_code,data.communications.internet.users.total,data.communications.internet.users.percent_of_population,data.communications.internet.users.global_rank,data.communications.internet.users.date,data.transportation.air_transport.national_system.number_of_registered_air_carriers,data.transportation.air_transport.national_system.inventory_of_registered_aircraft_operated_by_air_carriers,data.transportation.air_transport.national_system.annual_passenger_traffic_on_registered_air_carriers,data.transportation.air_transport.national_system.annual_freight_traffic_on_registered_air_carriers,data.transportation.air_transport.national_system.date,data.transportation.air_transport.civil_aircraft_registration_country_code_prefix.prefix,data.transportation.air_transport.civil_aircraft_registration_country_code_prefix.date,data.transportation.air_transport.airports.total.airports,data.transportation.air_transport.airports.total.global_rank,data.transportation.air_transport.airports.total.date,data.transportation.air_transport.airports.paved.total,data.transportation.air_transport.airports.paved.over_3047_metres,data.transportation.air_transport.airports.paved.2438_to_3047_metres,data.transportation.air_transport.airports.paved.1524_to_2437_metres,data.transportation.air_transport.airports.paved.914_to_1523_metres,data.transportation.air_transport.airports.paved.under_914_metres,data.transportation.air_transport.airports.paved.date,data.transportation.air_transport.airports.unpaved.total,data.transportation.air_transport.airports.unpaved.1524_to_2437_metres,data.transportation.air_transport.airports.unpaved.914_to_1523_metres,data.transportation.air_transport.airports.unpaved.under_914_metres,data.transportation.air_transport.airports.unpaved.date,data.transportation.air_transport.heliports.total,data.transportation.air_transport.heliports.date,data.transportation.pipelines.by_type,data.transportation.pipelines.date,data.transportation.roadways.total.value,data.transportation.roadways.total.units,data.transportation.roadways.paved.value,data.transportation.roadways.paved.units,data.transportation.roadways.unpaved.value,data.transportation.roadways.unpaved.units,data.transportation.roadways.global_rank,data.transportation.roadways.date,data.transportation.waterways.value,data.transportation.waterways.units,data.transportation.waterways.note,data.transportation.waterways.global_rank,data.transportation.waterways.date,data.military_and_security.expenditures.annual_values,data.military_and_security.expenditures.global_rank,data.military_and_security.service_age_and_obligation.years_of_age,data.military_and_security.service_age_and_obligation.note,data.military_and_security.service_age_and_obligation.date,data.military_and_security.note,data.transnational_issues.disputes,data.transnational_issues.refugees_and_iternally_displaced_persons.refugees.by_country,data.transnational_issues.refugees_and_iternally_displaced_persons.refugees.date,data.transnational_issues.illicit_drugs.note,metadata.date,metadata.source,metadata.nearby_dates,data.geography.maritime_claims.territorial_sea.value,data.geography.maritime_claims.territorial_sea.units,data.geography.maritime_claims.continental_shelf.value,data.geography.maritime_claims.continental_shelf.units,data.geography.maritime_claims.continental_shelf.note,data.government.political_parties_and_leaders.parties,data.transportation.railways.total.length,data.transportation.railways.total.units,data.transportation.railways.standard_gauge.length,data.transportation.railways.standard_gauge.units,data.transportation.railways.global_rank,data.transportation.railways.date,data.transportation.merchant_marine.total,data.transportation.merchant_marine.by_type,data.transportation.merchant_marine.global_rank,data.transportation.merchant_marine.date,data.transportation.ports_and_terminals.major_seaports,data.transnational_issues.refugees_and_iternally_displaced_persons.stateless_persons.people,data.transnational_issues.refugees_and_iternally_displaced_persons.stateless_persons.date,data.people.demographic_profile,data.transportation.railways.narrow_gauge.length,data.transportation.railways.narrow_gauge.units,data.transnational_issues.trafficking_in_persons.current_situation,data.transnational_issues.trafficking_in_persons.tier_rating,data.geography.maritime_claims.exclusive_economic_zone.value,data.geography.maritime_claims.exclusive_economic_zone.units,data.geography.maritime_claims.contiguous_zone.value,data.geography.maritime_claims.contiguous_zone.units,data.people.hiv_aids.adult_prevalence_rate.global_rank,data.people.hiv_aids.deaths.global_rank,data.government.administrative_divisions
0,Afghanistan,Ahmad Shah DURRANI unified the Pashtun tribes ...,"Southern Asia, north and west of Pakistan, eas...",33.0,0.0,N,65.0,0.0,E,Asia,652230.0,sq km,652230.0,sq km,0.0,sq km,42.0,almost six times the size of Virginia; slightl...,5987.0,km,"[{'country': 'China', 'border_length': {'value...",0.0,km,landlocked,arid to semiarid; cold winters and hot summers,mostly rugged mountains; plains in north and s...,1884.0,m,Amu Darya,258.0,m,Noshak,7492.0,m,"[natural gas, petroleum, coal, copper, chromit...",58.1,%,11.8,%,/,0.3,%,/,46.0,%,2.07,%,39.0,%,2016.0,32080.0,sq km,2012.0,populations tend to cluster in the foothills a...,[{'description': 'damaging earthquakes occur i...,"[limited natural freshwater resources, inadequ...","[Biodiversity, Climate Change, Desertification...","[Hazardous Wastes, Law of the Sea, Marine Life...",36643815.0,39.0,2020-07-01,Afghan(s),Afghan,"[{'name': 'Pashtun'}, {'name': 'Tajik'}, {'nam...",current statistical data on the sensitive subj...,2015.0,"[{'name': 'Afghan Persian or Dari', 'percent':...",data represent most widely spoken languages; s...,2017.0,"[{'name': 'Muslim', 'percent': 99.7, 'breakdow...",2009.0,40.62,7562703.0,7321646.0,21.26,3960044.0,3828670.0,31.44,5858675.0,5661887.0,4.01,724597.0,744910.0,2.68,451852.0,528831.0,2020.0,88.8,%,75.3,%,4.8,%,21.0,%,2020.0,19.5,years,19.4,years,19.5,years,202.0,2020.0,2.38,28.0,2020.0,36.7,15.0,2020.0,12.7,12.0,2020.0,-0.1,99.0,2020.0,populations tend to cluster in the foothills a...,26.0,%,2020.0,3.37,%,"[{'place': 'Kabul', 'population': 4222000, 'is...",2020.0,1.05,males/female,1.03,males/female,1.03,males/female,1.03,males/female,0.97,males/female,0.85,males/female,1.03,males/female,2020.0,19.9,2015.0,638.0,11.0,2017.0,104.3,deaths_per_1000_live_births,111.3,deaths_per_1000_live_births,96.9,deaths_per_1000_live_births,1.0,2020.0,52.8,years,51.4,years,54.4,years,228.0,2020.0,4.82,16.0,2020.0,22.5,%,2015.0,0.28,2016.0,0.4,2017.0,95.9,percent of population,61.4,percent of population,70.2,percent of population,3.2,percent of population,38.6,percent of population,38.6,percent of population,2017.0,83.6,percent of population,43.0,percent of population,53.2,percent of population,16.4,percent of population,57.0,percent of population,46.8,percent of population,2017.0,0.1,2018.0,7200.0,113.0,2018.0,500.0,2018.0,intermediate,"[bacterial diarrhea, hepatitis A, typhoid fever]","[Crimea-Congo hemorrhagic fever, malaria]",2020.0,5.5,176.0,2016.0,25.0,17.0,2013.0,4.1,95.0,2017.0,age 15 and over can read and write,43.0,%,55.5,%,29.8,%,2018.0,10.0,years,13.0,years,8.0,years,2014.0,17.6,%,16.3,%,21.4,%,75.0,2017.0,Islamic Republic of Afghanistan,Afghanistan,Jamhuri-ye Islami-ye Afghanistan,Afghanistan,Republic of Afghanistan,"the name ""Afghan"" originally referred to the P...",presidential Islamic republic,Kabul,34.0,31.0,N,69.0,11.0,E,4.5,"9.5 hours ahead of Washington, DC, during Stan...",does not observe daylight savings time,"named for the Kabul River, but the river's nam...",1919-08-19,from UK control over Afghan foreign affairs,"[{'name': 'Independence Day', 'day': '19 Augus...",several previous; latest drafted 14 December 2...,proposed by a commission formed by presidentia...,"mixed legal system of civil, customary, and Is...",[has not submitted an ICJ jurisdiction declara...,no,at least one parent must have been born in - a...,no,5 years,18.0,True,False,President of the Islamic Republic of Afghanist...,President of the Islamic Republic of Afghanist...,Cabinet consists of 25 ministers appointed by ...,president directly elected by absolute majorit...,Ashraf GHANI declared winner by the Independen...,Wolesi Jirga or House of People (250 seats; me...,Meshrano Jirga - district councils - within 5 ...,Meshrano Jirga - percent of vote by party - NA...,Supreme Court or Stera Mahkama (consists of th...,court chief and justices appointed by the pres...,Appeals Courts; Primary Courts; Special Courts...,the Ministry of Justice licensed 72 political ...,"[{'organization': 'ADB'}, {'organization': 'CI...","2341 Wyoming Avenue NW, Washington, DC 20008",[1] (202) 483-6410,[1] (202) 483-6488,"Los Angeles, New York, Washington, DC",Ambassador (vacant); Charge d'Affaires Ross WI...,[00 93] 0700 108 001,"Bibi Mahru, Kabul","U.S. Embassy Kabul, APO AE 09806",[00 93] 0700 108 564,three equal vertical bands of black (hoist sid...,Afghanistan had more changes to its national f...,[{'symbol': 'lion'}],"[{'color': 'red'}, {'color': 'green'}, {'color...","""Milli Surood"" (National Anthem)",Abdul Bari JAHANI/Babrak WASA,adopted 2006; the 2004 constitution of the pos...,https://www.cia.gov/library/publications/the-w...,"Despite improvements in life expectancy, incom...","[{'value': 69450000000, 'units': 'USD', 'date'...",101.0,data are in 2017 dollars,20240000000.0,2017.0,"[{'value': 2.7, 'units': '%', 'date': '2017'},...",124.0,"[{'value': 2000, 'units': 'USD', 'date': '2017...",209.0,data are in 2017 dollars,81.6,%,12.0,%,17.2,%,30.0,%,6.7,%,-47.6,%,2016.0,23.0,%,21.1,%,55.9,%,2016.0,"[{'value': 22.7, 'units': 'percent_of_gdp', 'd...",78.0,"[opium, wheat, fruits, nuts, wool, mutton, she...","[small-scale production of bricks, textiles, s...",-1.9,181.0,2016.0,8478000.0,61.0,2017.0,44.3,%,18.1,%,37.6,%,2017.0,"[{'value': 23.9, 'units': '%', 'date': '2017'}...",194.0,54.5,%,2017.0,3.8,%,24.0,%,2008.0,2276000000.0,USD,5328000000.0,USD,2017.0,11.2,210.0,2017.0,-15.1,217.0,2017.0,"[{'value': 7, 'units': 'percent_of_gdp', 'date...",202.0,21 December,20 December,"[{'value': 5, 'units': '%', 'date': '2017'}, {...",171.0,"[{'value': 1014000000, 'units': 'USD', 'date':...",48.0,"[{'value': 784000000, 'units': 'USD', 'date': ...",170.0,"[opium, fruits, nuts, handwoven carpets, wool,...","[{'name': 'India', 'percent': 56.5}, {'name': ...",2017.0,"[{'value': 7616000000, 'units': 'USD', 'date':...",113.0,"[machinery, other capital goods, food, textile...","[{'name': 'China', 'percent': 21}, {'name': 'I...",2017.0,"[{'value': 7187000000, 'units': 'USD', 'date':...",85.0,"[{'value': 2840000000, 'units': 'USD'}]",144.0,"[{'value': 7.87, 'units': 'USD', 'date': '2017...",afghanis (AFA) per US dollar,18999254.0,people,84.1,%,98.0,%,79.0,%,2012.0,1211000000.0,146.0,2016.0,5526000000.0,119.0,2016.0,0.0,96.0,2016.0,4400000000.0,42.0,2016.0,634100.0,138.0,2016.0,45.0,159.0,2016.0,0.0,32.0,2017.0,52.0,34.0,2017.0,4.0,111.0,2017.0,0.0,101.0,2018.0,0.0,82.0,2015.0,0.0,84.0,2015.0,0.0,99.0,2018-01-01,0.0,110.0,2015.0,35000.0,117.0,2016.0,0.0,124.0,2015.0,34210.0,97.0,2015.0,164200000.0,79.0,2017.0,164200000.0,108.0,2017.0,0.0,57.0,2017.0,0.0,81.0,2017.0,49550000000.0,62.0,2018-01-01,9067000.0,111.0,2017.0,127794.0,1.0,135.0,2018,21976355.0,63.0,53.0,2018.0,"state-owned broadcaster, Radio Television Afgh...",.af,4717013.0,13.5,86.0,2018-07-01,4.0,20.0,1929907.0,33102038.0,2015.0,YA,2016.0,46.0,94.0,2020,29.0,4.0,8.0,12.0,2.0,3.0,2020,17.0,7.0,4.0,5.0,2020.0,1.0,2020.0,"[{'type': 'gas', 'length': 466, 'units': 'km'}]",2013.0,34903.0,km,17903.0,km,17000.0,km,93.0,2017.0,1200.0,km,"chiefly Amu Darya, which handles vessels up to...",58.0,2011.0,"[{'value': 1.2, 'units': 'percent_of_gdp', 'da...",100.0,18.0,18 is the legal minimum age for voluntary mili...,2017.0,"since early 2015, the NATO-led mission in Afgh...","[Afghan, Coalition, and Pakistan military meet...","[{'people': 72194, 'country_of_origin': 'Pakis...",2018.0,world's largest producer of opium; poppy culti...,2020-09-21,https://web.archive.org/web/20200921163729/htt...,https://web.archive.org/web/20200921000000*/ht...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,Akrotiri,By terms of the 1960 Treaty of Establishment t...,"Eastern Mediterranean, peninsula on the southw...",34.0,37.0,N,32.0,58.0,E,Middle East,123.0,sq km,,,,,224.0,"about 0.7 times the size of Washington, DC",48.0,km,"[{'country': 'Cyprus', 'border_length': {'valu...",56.3,km,,"temperate; Mediterranean with hot, dry summers...",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"[hunting around the salt lake, note - breeding...",,,15500.0,,2011,,,,,,"[{'name': 'English'}, {'name': 'Greek'}]",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,none,Akrotiri,,,,named for the village that lies within the Wes...,,Episkopi Cantonment (base administrative cente...,34.0,40.0,N,32.0,51.0,E,2.0,"7 hours ahead of Washington, DC, during Standa...","+1hr, begins last Sunday in March; ends last S...","""Episkopi"" means ""episcopal"" in Greek and stem...",,,,"presented 3 August 1960, effective 16 August 1...",amended 1966 (2016),"laws applicable to the Cypriot population are,...",,,,,,,,,Queen ELIZABETH II (since 6 February 1952),Administrator Major General Robert J. THOMSON ...,,the monarchy is hereditary; administrator appo...,,,,,Senior Judges' Court (consists of several visi...,see entry for United Kingdom,Resident Judges' Court; Courts Martial,,,,,,,,,,,,the flag of the UK is used,,,,,,"as a UK area of special sovereignty, ""God Save...",https://www.cia.gov/library/publications/the-w...,Economic activity is limited to providing serv...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,uses the euro,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,British Forces Broadcast Service (BFBS) provid...,,,,,,,,,,,,,1.0,211.0,2017,,,1.0,,,,2017,,,,,,,,,,,,,,,,,,,,,,,,,,,,defense is the responsibility of the UK; Akrot...,,,,,2020-04-23,https://web.archive.org/web/20200423115833/htt...,https://web.archive.org/web/20200423000000*/ht...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,Albania,Albania declared its independence from the Ott...,"Southeastern Europe, bordering the Adriatic Se...",41.0,0.0,N,20.0,0.0,E,Europe,28748.0,sq km,27398.0,sq km,1350.0,sq km,145.0,slightly smaller than Maryland,691.0,km,"[{'country': 'Greece', 'border_length': {'valu...",362.0,km,,"mild temperate; cool, cloudy, wet winters; hot...",mostly mountains and hills; small plains along...,708.0,m,Adriatic Sea,0.0,m,Maja e Korabit (Golem Korab),2764.0,m,"[petroleum, natural gas, coal, bauxite, chromi...",43.1,%,22.6,%,/,3.0,%,/,17.5,%,28.12,%,28.75,%,2016.0,3537.0,sq km,2014.0,"a fairly even distribution, with somewhat high...","[{'description': 'destructive earthquakes', 't...","[deforestation, soil erosion, water pollution ...","[Air Pollution, Biodiversity, Climate Change, ...",,3074579.0,136.0,2020-07-01,Albanian(s),Albanian,"[{'name': 'Albanian', 'percent': 82.6}, {'name...",data represent population by ethnic and cultur...,2011.0,"[{'name': 'Albanian', 'percent': 98.8, 'note':...",,2011.0,"[{'name': 'Muslim', 'percent': 56.7}, {'name':...",2011.0,17.6,284636.0,256474.0,15.39,246931.0,226318.0,42.04,622100.0,670307.0,11.94,178419.0,188783.0,13.03,186335.0,214276.0,2020.0,46.9,%,25.3,%,21.6,%,4.6,%,2020.0,34.3,years,32.9,years,35.7,years,91.0,2020.0,0.28,173.0,2020.0,13.0,143.0,2020.0,7.1,121.0,2020.0,-3.3,179.0,2020.0,"a fairly even distribution, with somewhat high...",62.1,%,2020.0,1.69,%,"[{'place': 'Tirana', 'population': 494000, 'is...",2020.0,1.08,males/female,1.11,males/female,1.09,males/female,0.93,males/female,0.95,males/female,0.87,males/female,0.98,males/female,2020.0,24.8,2017.0,15.0,136.0,2017.0,10.8,deaths_per_1000_live_births,12.1,deaths_per_1000_live_births,9.5,deaths_per_1000_live_births,125.0,2020.0,79.0,years,76.3,years,81.9,years,61.0,2020.0,1.53,197.0,2020.0,46.0,%,2017.0,1.22,2016.0,2.9,2013.0,96.8,percent of population,95.3,percent of population,96.2,percent of population,4.7,percent of population,4.7,percent of population,3.8,percent of population,2017.0,100.0,percent of population,99.5,percent of population,99.8,percent of population,0.0,percent of population,0.5,percent of population,0.2,percent of population,2017.0,0.1,2017.0,1400.0,137.0,2017.0,100.0,2017.0,,,,,21.7,85.0,2016.0,1.5,116.0,2017.0,4.0,101.0,2016.0,age 15 and over can read and write,98.1,%,98.5,%,97.8,%,2018.0,15.0,years,15.0,years,16.0,years,2017.0,31.9,%,34.2,%,27.7,%,27.0,2017.0,Republic of Albania,Albania,Republika e Shqiperise,Shqiperia,People's Socialist Republic of Albania,the English-language country name seems to be ...,parliamentary republic,Tirana (Tirane),41.0,19.0,N,19.0,49.0,E,1.0,"6 hours ahead of Washington, DC, during Standa...","+1hr, begins last Sunday in March; ends last S...",the name Tirana first appears in a 1418 Veneti...,1912-11-28,from the Ottoman Empire,"[{'name': 'Independence Day', 'day': '28 Novem...",several previous; latest approved by the Assem...,proposed by at least one-fifth of the Assembly...,civil law system except in the northern rural ...,[has not submitted an ICJ jurisdiction declara...,no,at least one parent must be a citizen of Albania,yes,5 years,18.0,True,False,President of the Republic Ilir META (since 24 ...,Prime Minister Edi RAMA (since 10 September 20...,Council of Ministers proposed by the prime min...,president indirectly elected by the Assembly f...,Ilir META elected president; Assembly vote - 8...,unicameral Assembly or Kuvendi (140 seats; mem...,last held on 25 June 2017 (next to be held in ...,"percent of vote by party - PS 48.3%, PD 28.9%,...","Supreme Court (consists of 19 judges, includin...",Supreme Court judges appointed by the High Jud...,Courts of Appeal; Courts of First Instance; sp...,,"[{'organization': 'BSEC'}, {'organization': 'C...","2100 S Street NW, Washington, DC 20008",[1] (202) 223-4942,[1] (202) 628-7342,New York,Ambassador (vacant); Charge d'Affaires Leyla M...,[355] (4) 2247-285,"Rruga e Elbasanit, 103, Tirana","US Department of State, 9510 Tirana Place, Dul...",[355] (4) 2232-222,red with a black two-headed eagle in the cente...,,[{'symbol': 'black double-headed eagle'}],"[{'color': 'red'}, {'color': 'black'}]","""Hymni i Flamurit"" (Hymn to the Flag)",Aleksander Stavre DRENOVA/Ciprian PORUMBESCU,adopted 1912,https://www.cia.gov/library/publications/the-w...,"Albania, a formerly closed, centrally planned ...","[{'value': 36010000000, 'units': 'USD', 'date'...",125.0,,13070000000.0,2017.0,"[{'value': 3.8, 'units': '%', 'date': '2017'},...",85.0,"[{'value': 12500, 'units': 'USD', 'date': '201...",125.0,data are in 2017 dollars,78.1,%,11.5,%,25.2,%,0.2,%,31.5,%,-46.6,%,2017.0,21.7,%,24.2,%,54.1,%,2017.0,"[{'value': 15.9, 'units': 'percent_of_gdp', 'd...",130.0,"[wheat, corn, potatoes, vegetables, fruits, ol...","[food, footwear, apparel, clothing, lumber, oi...",6.8,31.0,2017.0,1198000.0,140.0,2017.0,41.4,%,18.3,%,40.3,%,2017.0,"[{'value': 13.8, 'units': '%', 'date': '2017'}...",168.0,14.3,%,2012.0,4.1,%,19.6,%,2015.0,3614000000.0,USD,3874000000.0,USD,2017.0,27.6,99.0,2017.0,-2.0,103.0,2017.0,"[{'value': 71.8, 'units': 'percent_of_gdp', 'd...",45.0,1 January,31 December,"[{'value': 2, 'units': '%', 'date': '2017'}, {...",102.0,"[{'value': -908000000, 'units': 'USD', 'date':...",139.0,"[{'value': 900700000, 'units': 'USD', 'date': ...",164.0,"[apparel, clothing, footwear, asphalt, metals,...","[{'name': 'Italy', 'percent': 53.4}, {'name': ...",2017.0,"[{'value': 4102999999.9999995, 'units': 'USD',...",138.0,"[machinery, equipment, foodstuffs, textiles, c...","[{'name': 'Italy', 'percent': 28.5}, {'name': ...",2017.0,"[{'value': 3590000000, 'units': 'USD', 'date':...",103.0,"[{'value': 9505000000, 'units': 'USD', 'date':...",114.0,"[{'value': 121.9, 'units': 'USD', 'date': '201...",leke (ALL) per US dollar,,,100.0,%,,,,,2016.0,7138000000.0,111.0,2016.0,5110000000.0,122.0,2016.0,1869000000.0,46.0,2016.0,1827000000.0,58.0,2016.0,2109000.0,112.0,2016.0,5.0,202.0,2016.0,0.0,33.0,2017.0,95.0,5.0,2017.0,0.0,172.0,2017.0,14000.0,73.0,2018.0,17290.0,51.0,2015.0,0.0,85.0,2015.0,168300000.0,59.0,2018-01-01,5638.0,103.0,2015.0,29000.0,120.0,2016.0,3250.0,98.0,2015.0,26660.0,103.0,2015.0,50970000.0,86.0,2017.0,50970000.0,112.0,2017.0,0.0,58.0,2017.0,0.0,82.0,2017.0,821200000.0,101.0,2018-01-01,4500000.0,136.0,2017.0,248631.0,8.0,122.0,2018,2714878.0,89.0,145.0,2018.0,"Albania has more than 65 TV stations, includin...",.al,2196613.0,71.85,119.0,2018-07-01,1.0,1.0,151632.0,0.0,2015.0,ZA,2016.0,3.0,191.0,2020,3.0,,2.0,1.0,,,2020,,,,,,,,[{'type': 'gas (a majority of the network is i...,2015.0,3945.0,km,,,,,156.0,2018.0,41.0,km,on the Bojana River,103.0,2011.0,"[{'value': 1.3, 'units': 'percent_of_gdp', 'da...",91.0,19.0,19 is the legal minimum age for voluntary mili...,2012.0,,[none],,,active transshipment point for Southwest Asian...,2020-09-20,https://web.archive.org/web/20200920031801/htt...,https://web.archive.org/web/20200920000000*/ht...,12.0,nm,200.0,m,depth or to the depth of exploitation,"[{'name': 'Democratic Party', 'name_alternativ...",677.0,km,677.0,km\n1.435-m,103.0,2015.0,68.0,"[{'type': 'general cargo', 'count': 49}, {'typ...",104.0,2019.0,"[Durres, Sarande, Shengjin, Vlore]",4160.0,2018.0,,,,,,,,,,,,
3,Algeria,Algeria has known many empires and dynasties s...,"Northern Africa, bordering the Mediterranean S...",28.0,0.0,N,3.0,0.0,E,Africa,2381740.0,sq km,2381740.0,sq km,0.0,sq km,11.0,slightly less than 3.5 times the size of Texas,6734.0,km,"[{'country': 'Libya', 'border_length': {'value...",998.0,km,,"arid to semiarid; mild, wet winters with hot, ...",mostly high plateau and desert; Atlas Mountain...,800.0,m,Chott Melrhir,-40.0,m,Tahat,2908.0,m,"[petroleum, natural gas, iron ore, phosphates,...",17.4,%,3.1,%,/,0.4,%,/,13.8,%,0.8,%,81.8,%,2016.0,13600.0,sq km,2014.0,the vast majority of the populace is found in ...,[{'description': 'mountainous areas subject to...,"[air pollution in major cities, soil erosion f...","[Biodiversity, Climate Change, Climate Change-...",,42972878.0,35.0,2020-07-01,Algerian(s),Algerian,"[{'name': 'Arab-Berber', 'percent': 99}, {'nam...",although almost all Algerians are Berber in or...,,"[{'name': 'Arabic', 'note': 'official'}, {'nam...",,,"[{'name': 'Muslim', 'percent': 99, 'note': 'of...",2012.0,29.58,6509490.0,6201450.0,13.93,3063972.0,2922368.0,42.91,9345997.0,9091558.0,7.41,1599369.0,1585233.0,6.17,1252084.0,1401357.0,2020.0,60.1,%,49.3,%,10.8,%,9.3,%,2020.0,28.9,years,28.6,years,29.3,years,139.0,2020.0,1.52,67.0,2020.0,20.0,76.0,2020.0,4.4,208.0,2020.0,-0.9,137.0,2020.0,the vast majority of the populace is found in ...,73.7,%,2020.0,2.46,%,"[{'place': 'Algiers', 'population': 2768000, '...",2020.0,1.05,males/female,1.05,males/female,1.05,males/female,1.03,males/female,1.01,males/female,0.89,males/female,1.03,males/female,2020.0,,,112.0,68.0,2017.0,17.6,deaths_per_1000_live_births,19.1,deaths_per_1000_live_births,16.0,deaths_per_1000_live_births,85.0,2020.0,77.5,years,76.1,years,79.1,years,77.0,2020.0,2.59,67.0,2020.0,57.1,%,2012.0,1.79,2017.0,1.9,2015.0,99.2,percent of population,97.4,percent of population,98.7,percent of population,0.8,percent of population,2.1,percent of population,1.1,percent of population,2017.0,96.9,percent of population,93.4,percent of population,96.0,percent of population,3.1,percent of population,6.6,percent of population,4.0,percent of population,2017.0,0.1,2018.0,16000.0,89.0,2018.0,200.0,2018.0,,,,,27.4,38.0,2016.0,3.0,99.0,2012.0,,,,age 15 and over can read and write,81.4,%,87.4,%,75.3,%,2018.0,14.0,years,14.0,years,15.0,years,2011.0,39.3,%,33.1,%,82.0,%,13.0,2017.0,People's Democratic Republic of Algeria,Algeria,Al Jumhuriyah al Jaza'iriyah ad Dimuqratiyah a...,Al Jaza'ir,,the country name derives from the capital city...,presidential republic,Algiers,36.0,45.0,N,3.0,3.0,E,1.0,"6 hours ahead of Washington, DC, during Standa...",,"name derives from the Arabic ""al-Jazair"" meani...",1962-07-05,from France,"[{'name': 'Independence Day', 'day': '5 July',...",several previous; latest approved by referendu...,proposed by the president of the republic or t...,mixed legal system of French civil law and Isl...,[has not submitted an ICJ jurisdiction declara...,no,the mother must be a citizen of Algeria,no,7 years,18.0,True,False,President Abdelmadjid TEBBOUNE (since 12 Decem...,Abdelaziz DJERAD (since 28 December 2019),Cabinet of Ministers appointed by the president,president directly elected by absolute majorit...,"Abdelmadjid TEBBOUNE (NLF) 58.1%, Abdelkader B...",bicameral Parliament consists of: Council of t...,Council of the Nation - last held on 29 Decemb...,Council of the Nation - percent of vote by par...,"Supreme Court or Cour Suprême, (consists of 15...",Supreme Court judges appointed by the High Cou...,appellate or wilaya courts; first instance or ...,a law banning political parties based on relig...,"[{'organization': 'ABEDA'}, {'organization': '...","2118 Kalorama Road NW, Washington, DC 20008",[1] (202) 265-2800,[1] (202) 986-5906,New York,Ambassador John P. DESROCHER (since 5 Septembe...,[213] (0) 770-08-2000,"05 Chemin Cheikh Bachir, El Ibrahimi, El-Biar ...","B. P. 408, Alger-Gare, 16030 Algiers",[213] (0) 770-08-2064,two equal vertical bands of green (hoist side)...,,[{'symbol': 'five-pointed star between the ext...,"[{'color': 'green'}, {'color': 'white'}, {'col...","""Kassaman"" (We Pledge)",Mufdi ZAKARIAH/Mohamed FAWZI,"adopted 1962; ZAKARIAH wrote ""Kassaman"" as a p...",https://www.cia.gov/library/publications/the-w...,Algeria's economy remains dominated by the sta...,"[{'value': 630000000000, 'units': 'USD', 'date...",36.0,data are in 2017 dollars,167600000000.0,2017.0,"[{'value': 1.4, 'units': '%', 'date': '2017'},...",174.0,"[{'value': 15200, 'units': 'USD', 'date': '201...",109.0,data are in 2017 dollars,42.7,%,20.2,%,38.1,%,11.2,%,23.6,%,-35.8,%,2017.0,13.3,%,39.3,%,47.4,%,2017.0,"[{'value': 37.8, 'units': 'percent_of_gdp', 'd...",13.0,"[wheat, barley, oats, grapes, olives, citrus, ...","[petroleum, natural gas, light industries, min...",0.6,164.0,2017.0,11820000.0,50.0,2017.0,10.8,%,30.9,%,58.4,%,2011.0,"[{'value': 11.7, 'units': '%', 'date': '2017'}...",155.0,23.0,%,2006.0,2.8,%,26.8,%,1995.0,54150000000.0,USD,70200000000.0,USD,2017.0,32.3,67.0,2017.0,-9.6,207.0,2017.0,"[{'value': 27.5, 'units': 'percent_of_gdp', 'd...",170.0,1 January,31 December,"[{'value': 5.6, 'units': '%', 'date': '2017'},...",179.0,"[{'value': -22100000000, 'units': 'USD', 'date...",199.0,"[{'value': 34370000000, 'units': 'USD', 'date'...",58.0,"[petroleum, natural gas, petroleum products]","[{'name': 'Italy', 'percent': 17.4}, {'name': ...",2017.0,"[{'value': 48540000000, 'units': 'USD', 'date'...",55.0,"[capital goods, foodstuffs, consumer goods]","[{'name': 'China', 'percent': 18.2}, {'name': ...",2017.0,"[{'value': 97890000000, 'units': 'USD', 'date'...",26.0,"[{'value': 6260000000, 'units': 'USD', 'date':...",128.0,"[{'value': 108.9, 'units': 'USD', 'date': '201...",Algerian dinars (DZD) per US dollar,400000.0,people,99.4,%,99.6,%,99.0,%,2016.0,66890000000.0,42.0,2016.0,55960000000.0,46.0,2016.0,641000000.0,65.0,2015.0,257000000.0,91.0,2016.0,19270000.0,45.0,2016.0,96.0,36.0,2016.0,0.0,34.0,2017.0,1.0,144.0,2017.0,2.0,130.0,2017.0,1259000.0,18.0,2018.0,756400.0,15.0,2015.0,5340.0,75.0,2015.0,12200000000.0,15.0,2018-01-01,627900.0,29.0,2015.0,405000.0,37.0,2016.0,578800.0,15.0,2015.0,82930.0,61.0,2015.0,93500000000.0,10.0,2017.0,41280000000.0,25.0,2017.0,53880000000.0,7.0,2017.0,0.0,83.0,2017.0,4504000000000.0,10.0,2018-01-01,135900000.0,34.0,2017.0,4200919.0,10.0,34.0,2018,47154264.0,113.0,32.0,2018.0,state-run Radio-Television Algerienne operates...,.dz,24819531.0,59.58,31.0,2018-07-01,4.0,74.0,5910835.0,24723377.0,2015.0,7T,2016.0,149.0,36.0,2020,67.0,14.0,27.0,18.0,6.0,2.0,2020,82.0,16.0,36.0,28.0,2020.0,3.0,2013.0,"[{'type': 'condensate', 'length': 2600, 'units...",2013.0,104000.0,km,71656.0,km,32344.0,km,46.0,2015.0,,,,,,"[{'value': 6, 'units': 'percent_of_gdp', 'date...",3.0,18.0,18 is the legal minimum age for voluntary mili...,2018.0,,[Algeria and many other states reject Moroccan...,"[{'people': 100000, 'country_of_origin': 'West...",2018.0,,2020-09-20,https://web.archive.org/web/20200920031803/htt...,https://web.archive.org/web/20200920000000*/ht...,12.0,nm,,,,"[{'name': 'Algerian National Front', 'name_alt...",3973.0,km,2888.0,km\n1.432-m,50.0,2014.0,114.0,"[{'type': 'bulk carrier', 'count': 2}, {'type'...",83.0,2019.0,"[Algiers, Annaba, Arzew, Bejaia, Djendjene, Ji...",,,"For the first two thirds of the 20th century, ...",1085.0,km\n1.055-m,"Algeria is a transit and, to a lesser extent, ...",Tier 3 – Algeria does not fully comply with th...,,,,,,,
4,American Samoa,"Settled as early as 1000 B.C., Samoa was not r...","Oceania, group of islands in the South Pacific...",14.0,20.0,S,170.0,0.0,W,Oceania,224.0,sq km,224.0,sq km,0.0,sq km,216.0,"slightly larger than Washington, DC",0.0,km,,116.0,km,,"tropical marine, moderated by southeast trade ...",five volcanic islands with rugged peaks and li...,,,Pacific Ocean,0.0,m,Lata Mountain,964.0,m,"[pumice, pumicite]",21.9,%,13.4,%,/,8.5,%,/,0.0,%,78.1,%,0.0,%,2016.0,0.0,sq km,2012.0,,[{'description': 'cyclones common from Decembe...,"[limited supply of drinking water, pollution, ...",,,49437.0,211.0,2020-07-01,American Samoan(s) (US nationals),American Samoan,"[{'name': 'Pacific Islander', 'percent': 92.6,...",data represent population by ethnic origin or ...,2010.0,"[{'name': 'Samoan', 'percent': 88.6, 'note': '...",most people are bilingual,2010.0,"[{'name': 'Christian', 'percent': 98.3}, {'nam...",2010.0,27.76,7063.0,6662.0,18.16,4521.0,4458.0,37.49,9164.0,9370.0,9.69,2341.0,2447.0,6.9,1580.0,1831.0,2020.0,,,,,,,,,,27.2,years,26.7,years,27.7,years,149.0,2020.0,-1.4,234.0,2020.0,17.8,93.0,2020.0,5.9,168.0,2020.0,-26.1,226.0,2020.0,,87.2,%,2020.0,0.07,%,"[{'place': 'Pago Pago', 'population': 49000, '...",2018.0,1.06,males/female,1.06,males/female,1.01,males/female,0.98,males/female,0.96,males/female,0.86,males/female,1.0,males/female,2020.0,,,,,,9.9,deaths_per_1000_live_births,11.7,deaths_per_1000_live_births,8.0,deaths_per_1000_live_births,131.0,2020.0,74.8,years,72.3,years,77.5,years,123.0,2020.0,2.35,80.0,2020.0,,,,,,,,,,,,100.0,percent of population,,,,,0.0,percent of population,2017.0,,,,,99.0,percent of population,,,,,1.0,percent of population,2017.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,American Samoa,American Samoa,,,,the meaning of Samoa is disputed; some modern ...,republican form of government with separate ex...,Pago Pago,14.0,16.0,S,170.0,42.0,W,-11.0,"6 hours behind Washington, DC, during Standard...",,,,territory of the US,"[{'name': 'Flag Day', 'day': '17 April', 'orig...",adopted 17 October 1960; revised 1 July 1967,proposed by either house of the Legislative As...,mixed legal system of US common law and custom...,,,,,,18.0,True,False,President Donald J. TRUMP (since 20 January 20...,Governor Lolo Matalasi MOLIGA (since 3 January...,Cabinet consists of 12 department directors ap...,president and vice president indirectly electe...,Lolo Matalasi MOLIGA reelected governor in fir...,bicameral Legislature or Fono consists of: Sen...,Senate - last held on 8 November 2016 (next to...,Senate - percent of vote by party - NA; seats ...,High Court of American Samoa (consists of the ...,chief justice and associate chief justice appo...,district and village courts,,"[{'organization': 'AOSIS ', 'note': 'observer'...",,,,,,,,,,"blue, with a white triangle edged in red that ...",,"[{'symbol': 'a fue crossed with a to'oto'o ',...","[{'color': 'red'}, {'color': 'white'}, {'color...","""Amerika Samoa"" (American Samoa)",Mariota Tiumalu TUIASOSOPO/Napoleon Andrew TUI...,local anthem adopted 1950; as a territory of t...,,American Samoa s a traditional Polynesian econ...,"[{'value': 658000000, 'units': 'USD', 'date': ...",209.0,data are in 2016 US dollars,658000000.0,2016.0,"[{'value': -2.5, 'units': '%', 'date': '2016'}...",208.0,"[{'value': 11200, 'units': 'USD', 'date': '201...",134.0,,66.4,%,49.7,%,7.3,%,5.1,%,65.0,%,-93.5,%,2016.0,27.4,%,12.4,%,60.2,%,2012.0,,,"[bananas, coconuts, vegetables, taro, breadfru...","[tuna canneries, largely supplied by foreign f...",,,,17850.0,213.0,2015.0,,,15.5,%,46.4,%,2015.0,"[{'value': 29.8, 'units': '%', 'date': '2005'}]",206.0,,,,,,,,,249000000.0,USD,262500000.0,USD,2016.0,37.8,53.0,2016.0,-2.1,107.0,2016.0,"[{'value': 12.2, 'units': 'percent_of_gdp', 'd...",197.0,1 October,30 September,"[{'value': -0.5, 'units': '%', 'date': '2015'}...",5.0,,,"[{'value': 428000000, 'units': 'USD', 'date': ...",178.0,[canned tuna],"[{'name': 'Australia', 'percent': 25}, {'name'...",2017.0,"[{'value': 615000000, 'units': 'USD', 'date': ...",194.0,"[raw materials for canneries, food, petroleum ...","[{'name': 'Fiji', 'percent': 10.7}, {'name': '...",2017.0,,,,,,the US dollar is used,22219.0,people,59.0,%,60.0,%,45.0,%,2012.0,169000000.0,195.0,2016.0,157200000.0,197.0,2016.0,0.0,97.0,2016.0,0.0,118.0,2016.0,43000.0,195.0,2016.0,98.0,27.0,2016.0,0.0,35.0,2017.0,0.0,152.0,2017.0,2.0,131.0,2017.0,0.0,102.0,2018.0,0.0,83.0,2015.0,0.0,86.0,2015.0,0.0,100.0,2018-01-01,0.0,111.0,2015.0,2375.0,192.0,2016.0,0.0,125.0,2015.0,2346.0,188.0,2015.0,0.0,97.0,2017.0,0.0,117.0,2017.0,0.0,59.0,2017.0,0.0,84.0,2017.0,0.0,103.0,2014-01-01,361100.0,189.0,2017.0,10000.0,18.0,192.0,2016-07-01,,,,,3 TV stations; multi-channel pay TV services a...,.as,17000.0,31.3,210.0,2016-07-01,,,,,,,,3.0,192.0,2020,3.0,1.0,,,1.0,1.0,2019,,,,,,,,,,241.0,km,,,,,205.0,2016.0,,,,,,,,,,,defense is the responsibility of the US,[Tokelau included American Samoa's Swains Isla...,,,,2020-09-20,https://web.archive.org/web/20200920050908/htt...,https://web.archive.org/web/20200920000000*/ht...,12.0,nm,,,,"[{'name': 'Democratic Party', 'leaders': ['Fag...",,,,,,,,,,,[Pago Pago],,,,,,,,200.0,nm,,,,,


In [10]:
cia_df.shape

(258, 569)

## Creating a dataframe with only the columns that might be relevant to the protest dataset

Below I kept Country Name, climate information, population distribution (to see if countires with more urban living have more protests,) migration rate, age structure of the country, government type, legal system, global purchasing power, availibility of electricity, availibility of internet access, amount of cell phone users, security spending & transnational disputes as they might be relevant to citizens protesting against their government.

In [11]:
cia_df = cia_df[['data.name',
    'data.geography.climate',
        'data.geography.natural_resources.resources',
        'data.geography.population_distribution',
        'data.people.net_migration_rate.migrants_per_1000_population',
        'data.people.age_structure.0_to_14.percent', 
        'data.people.age_structure.15_to_24.percent',  
        'data.people.age_structure.25_to_54.percent', 
        'data.people.age_structure.55_to_64.percent', 
        'data.people.age_structure.65_and_over.percent', 
        'data.people.languages.language',
        'data.government.government_type',
        'data.government.legal_system',
        'data.economy.gdp.purchasing_power_parity.global_rank',
        'data.economy.gdp.real_growth_rate.global_rank',
        'data.energy.electricity.access.total_electrification.value',
        'data.energy.electricity.installed_generating_capacity.global_rank',
        'data.communications.internet.users.percent_of_population',
        'data.communications.telephones.mobile_cellular.subscriptions_per_one_hundred_inhabitants',
        'data.military_and_security.expenditures.annual_values',
        'data.transnational_issues.disputes']]

In [12]:
cia_df.rename(columns = {'data.name' : 'country',
    'data.geography.climate' : 'climate',
        'data.geography.natural_resources.resources' : 'natural_resources',
        'data.geography.population_distribution' : 'population_distribution',
        'data.people.net_migration_rate.migrants_per_1000_population' : 'net_migration_per_1000_population',
        'data.people.age_structure.0_to_14.percent' : 'age_0_14_percent', 
        'data.people.age_structure.15_to_24.percent' : 'age_15_24_percent',  
        'data.people.age_structure.25_to_54.percent' : 'age_25_54_percent', 
        'data.people.age_structure.55_to_64.percent' : 'age_55_64_percent', 
        'data.people.age_structure.65_and_over.percent' : 'age_65_over_percent', 
        'data.people.languages.language' : 'language',
        'data.government.government_type' : 'govt_type',
        'data.government.legal_system' : 'legal_system',
        'data.economy.gdp.purchasing_power_parity.global_rank' : 'gdp_purchasing_power_global_rank',
        'data.economy.gdp.real_growth_rate.global_rank' : 'gdp_growth_global_rank',
        'data.energy.electricity.access.total_electrification.value' : 'electricity_access_percent',
        'data.energy.electricity.installed_generating_capacity.global_rank' : 'electricity_generating_capacity_global_rank',
        'data.communications.internet.users.percent_of_population' : 'internet_access_percent',
        'data.communications.telephones.mobile_cellular.subscriptions_per_one_hundred_inhabitants' : 'cell_phone_per_100',
        'data.military_and_security.expenditures.annual_values' : 'military_spending_annual_percent_gdp',
        'data.transnational_issues.disputes' : 'transnational_disputes'}, inplace= True)

## Saving an unclean version of the csv

In [13]:
cia_df.to_csv('../data/cia_df_unclean.csv', index=False)

## Starting to dig deeper into each column in the dataframe

In [14]:
cia_df.dtypes

country                                         object
climate                                         object
natural_resources                               object
population_distribution                         object
net_migration_per_1000_population              float64
age_0_14_percent                               float64
age_15_24_percent                              float64
age_25_54_percent                              float64
age_55_64_percent                              float64
age_65_over_percent                            float64
language                                        object
govt_type                                       object
legal_system                                    object
gdp_purchasing_power_global_rank               float64
gdp_growth_global_rank                         float64
electricity_access_percent                     float64
electricity_generating_capacity_global_rank    float64
internet_access_percent                        float64
cell_phone

In [15]:
cia_df.isna().sum()

country                                          0
climate                                          0
natural_resources                               10
population_distribution                         28
net_migration_per_1000_population               29
age_0_14_percent                                28
age_15_24_percent                               28
age_25_54_percent                               28
age_55_64_percent                               28
age_65_over_percent                             28
language                                        18
govt_type                                       22
legal_system                                    11
gdp_purchasing_power_global_rank                29
gdp_growth_global_rank                          34
electricity_access_percent                      42
electricity_generating_capacity_global_rank     43
internet_access_percent                         28
cell_phone_per_100                              39
military_spending_annual_percen

### Language Column

In [16]:
cia_df['language'].isna().sum()

18

In [17]:
cia_df['language'].fillna('None', inplace=True)

In [18]:
lang_list = []
for el in cia_df['language'][cia_df['language'] != 'None']:
    lang_list.append(el[0]['name'])

In [19]:
cia_df['language'][cia_df['language'] != 'None'] = lang_list

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cia_df['language'][cia_df['language'] != 'None'] = lang_list


### Transnational Disputes Column

In [21]:
cia_df['transnational_disputes'].fillna('None', inplace=True)

lang_list = []
for el in cia_df['transnational_disputes'][cia_df['transnational_disputes'] != 'None']:
    lang_list.append(el[0])
    
cia_df['transnational_disputes'][cia_df['transnational_disputes'] != 'None'] = lang_list

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cia_df['transnational_disputes'][cia_df['transnational_disputes'] != 'None'] = lang_list


### Natural Resources Column

In [22]:
def clean_list(column):
    cia_df[column].fillna('None', inplace=True)

    data_list = []
    for el in cia_df[column][cia_df[column] != 'None']:
        data_list.append(el[0])

    cia_df[column][cia_df[column] != 'None'] = data_list

In [23]:
clean_list('natural_resources')

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cia_df[column][cia_df[column] != 'None'] = data_list


### Military Spending Column

In [24]:
def clean_dict(column):
    cia_df[column].fillna('None', inplace=True)

    data_list = []
    for el in cia_df[column][cia_df[column] != 'None']:
        data_list.append(el[0]['value'])

    cia_df[column][cia_df[column] != 'None'] = data_list

In [27]:
clean_dict('military_spending_annual_percent_gdp')

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cia_df[column][cia_df[column] != 'None'] = data_list


In [28]:
cia_df.head()

Unnamed: 0,country,climate,natural_resources,population_distribution,net_migration_per_1000_population,age_0_14_percent,age_15_24_percent,age_25_54_percent,age_55_64_percent,age_65_over_percent,language,govt_type,legal_system,gdp_purchasing_power_global_rank,gdp_growth_global_rank,electricity_access_percent,electricity_generating_capacity_global_rank,internet_access_percent,cell_phone_per_100,military_spending_annual_percent_gdp,transnational_disputes
0,Afghanistan,arid to semiarid; cold winters and hot summers,natural gas,populations tend to cluster in the foothills a...,-0.1,40.62,21.26,31.44,4.01,2.68,Afghan Persian or Dari,presidential Islamic republic,"mixed legal system of civil, customary, and Is...",101.0,124.0,84.1,138.0,13.5,63.0,1.2,"Afghan, Coalition, and Pakistan military meet ..."
1,Akrotiri,"temperate; Mediterranean with hot, dry summers...",,,,,,,,,English,,"laws applicable to the Cypriot population are,...",,,,,,,,
2,Albania,"mild temperate; cool, cloudy, wet winters; hot...",petroleum,"a fairly even distribution, with somewhat high...",-3.3,17.6,15.39,42.04,11.94,13.03,Albanian,parliamentary republic,civil law system except in the northern rural ...,125.0,85.0,100.0,112.0,71.85,89.0,1.3,none
3,Algeria,"arid to semiarid; mild, wet winters with hot, ...",petroleum,the vast majority of the populace is found in ...,-0.9,29.58,13.93,42.91,7.41,6.17,Arabic,presidential republic,mixed legal system of French civil law and Isl...,36.0,174.0,99.4,45.0,59.58,113.0,6.0,Algeria and many other states reject Moroccan ...
4,American Samoa,"tropical marine, moderated by southeast trade ...",pumice,,-26.1,27.76,18.16,37.49,9.69,6.9,Samoan,republican form of government with separate ex...,mixed legal system of US common law and custom...,209.0,208.0,59.0,195.0,31.3,,,Tokelau included American Samoa's Swains Islan...


### Filling in the rest of the null values with 0 for future modeling

In [29]:
cia_df.fillna(0, inplace=True)

## Checking the data types

There are still some features that will need to be one hot encoded if they are used in a model. For now I will save the cleaned up version to a csv.

In [30]:
cia_df.dtypes

country                                         object
climate                                         object
natural_resources                               object
population_distribution                         object
net_migration_per_1000_population              float64
age_0_14_percent                               float64
age_15_24_percent                              float64
age_25_54_percent                              float64
age_55_64_percent                              float64
age_65_over_percent                            float64
language                                        object
govt_type                                       object
legal_system                                    object
gdp_purchasing_power_global_rank               float64
gdp_growth_global_rank                         float64
electricity_access_percent                     float64
electricity_generating_capacity_global_rank    float64
internet_access_percent                        float64
cell_phone

In [31]:
cia_df.to_csv('../data/cia_clean.csv', index=False)

## Combining the CIA data with the protest data

### Importing the global protest data

In [33]:
global_df = pd.read_csv('../data/global.csv')

In [34]:
global_df.head()

Unnamed: 0,country,year,region,protest,protestnumber,startday,startmonth,startyear,endday,endmonth,endyear,protesterviolence,location,participants,protesteridentity,sources,notes,final,ignore,crowd dispersal,arrests,accomodation,shootings,beatings,killings,"political behavior, process",labor wage dispute,"price increases, tax policy",removal of politician,police brutality,land farm issue,social restrictions
0,Canada,1990,North America,1,1,15.0,1.0,1990.0,15.0,1.0,1990.0,0.0,national,1000s,unspecified,1. great canadian train journeys into history;...,canada s railway passenger system was finally ...,ignore,1,0,0,0,0,0,0,1,1,0,0,0,0,0
1,Canada,1990,North America,1,2,25.0,6.0,1990.0,25.0,6.0,1990.0,0.0,"Montreal, Quebec",1000,unspecified,1. autonomy s cry revived in quebec the new yo...,protestors were only identified as young peopl...,ignore,1,0,0,0,0,0,0,1,0,0,0,0,0,0
2,Canada,1990,North America,1,3,1.0,7.0,1990.0,1.0,7.0,1990.0,0.0,"Montreal, Quebec",500,separatist parti quebecois,1. quebec protest after queen calls for unity ...,"the queen, after calling on canadians to remai...",ignore,1,0,0,0,0,0,0,1,0,0,0,0,0,0
3,Canada,1990,North America,1,4,12.0,7.0,1990.0,6.0,9.0,1990.0,1.0,"Montreal, Quebec",100s,mohawk indians,1. indians gather as siege intensifies; armed ...,canada s federal government has agreed to acqu...,accomodation,0,0,0,1,0,0,0,0,0,0,0,0,1,0
4,Canada,1990,North America,1,5,14.0,8.0,1990.0,15.0,8.0,1990.0,1.0,"Montreal, Quebec",950,local residents,1. dozens hurt in mohawk blockade protest the ...,protests were directed against the state due t...,crowd dispersal,0,1,1,1,0,0,0,1,0,0,0,0,0,0


### Merging on the Protest Data with Country name as the index

In [38]:
global_cia = pd.merge(left = global_df, right = cia_df, left_on = 'country', right_on = 'country', how = 'left')

### Exporting the newly merged dataframe as a csv

In [39]:
global_cia.to_csv('../data/global_cia_combined.csv', index = False)