# Share of education in government expenditure

## Plan of Action

- Explore the data 
- Add in any missing values from other data sources
- Clean the data
- Get the mean rate of education in government expenditure over a given time period (2000-2021)

## Data source

- The raw data source is avaliable in a csv file titled 'share-of-education-in-government-expenditure.csv'
- This data source (https://ourworldindata.org/grapher/share-of-education-in-government-expenditure) was derived from the World Bank government spending across the world
- The original datasets spans a time period from 1980 to 2021
- The world bank describes the data as "General government expenditure on education (current, capital, and transfers) is expressed as a percentage of total general government expenditure on all sectors (including health, education, social services, etc.)" https://ourworldindata.org/government-spending
- They also add "General government usually refers to local, regional and central governments."

In [2]:
import pandas as pd

In [3]:
df2 = pd.read_csv("share-of-education-in-government-expenditure.csv")
df2

Unnamed: 0,Entity,Code,Year,"Government expenditure on education, total (% of government expenditure)"
0,Afghanistan,AFG,2010,17.067560
1,Afghanistan,AFG,2011,16.048429
2,Afghanistan,AFG,2013,14.102800
3,Afghanistan,AFG,2014,14.465930
4,Afghanistan,AFG,2015,12.509000
...,...,...,...,...
3529,Zimbabwe,ZWE,2014,30.015150
3530,Zimbabwe,ZWE,2015,29.470831
3531,Zimbabwe,ZWE,2016,23.527081
3532,Zimbabwe,ZWE,2017,20.874201


## Let's see what this data looks like 

In [4]:
df2.shape

(3534, 4)

This dataframe has 3534 rows and 4 columns

In [5]:
df2['Year'].min()

1980

The earliest year in this dataset is 1980

In [6]:
df2.dtypes

Entity                                                                       object
Code                                                                         object
Year                                                                          int64
Government expenditure on education, total (% of government expenditure)    float64
dtype: object

In [7]:
df2.columns

Index(['Entity', 'Code', 'Year',
       'Government expenditure on education, total (% of government expenditure)'],
      dtype='object')

In [8]:
df2['Entity'].nunique()

206

## Renaming columns 
- All other names are rather simple, except for the government expenditure one, which may cause unnecessary problems when dealing with this column. So let's rename this column 
- The 'Entity' columnn refers to  countries, so let's rename this column as well  

In [9]:
df2.rename(columns={'Government expenditure on education, total (% of government expenditure)': 'Gvt_Exp_on_Education'}, inplace=True)
df2.rename(columns = {'Entity': 'Country'}, inplace=True)

Let's check the column names now

In [10]:
df2.columns

Index(['Country', 'Code', 'Year', 'Gvt_Exp_on_Education'], dtype='object')

Let's create a copy of the filtered table in case we want to go back to the original

In [11]:
filtered_df = df2.copy()

Since we have already identified the countries with the 5 highest and lowest highest literacy rates, we can filter those countries 

In [12]:
countries = ['Korea', 'Latvia', 'Estonia', 'Lithuania', 'Cuba', 'Chad', 'Afghanistan', 'Mali', 'Niger', 'Guinea']
filtered_df = df2[df2["Country"].isin(countries)]
filtered_df = filtered_df.reset_index(drop=True)
filtered_df

Unnamed: 0,Country,Code,Year,Gvt_Exp_on_Education
0,Afghanistan,AFG,2010,17.067560
1,Afghanistan,AFG,2011,16.048429
2,Afghanistan,AFG,2013,14.102800
3,Afghanistan,AFG,2014,14.465930
4,Afghanistan,AFG,2015,12.509000
...,...,...,...,...
157,Niger,NER,2017,13.215160
158,Niger,NER,2018,16.339970
159,Niger,NER,2019,13.012810
160,Niger,NER,2020,13.332540


Let's check to see if our data has all of the countries that we are looking for

In [13]:
filtered_df['Country'].nunique()

8

In [14]:
unique_countries = filtered_df['Country'].unique()
print("Unique countries:", unique_countries)

Unique countries: ['Afghanistan' 'Chad' 'Estonia' 'Guinea' 'Latvia' 'Lithuania' 'Mali'
 'Niger']


We can see that Cuba and Korea are missing from the dataset, so we'll have to add those in

### Additional dataset

We found another dataset (https://data.worldbank.org/indicator/SE.XPD.TOTL.GD.ZS?end=2020&start=1980&view=chart) which had some more values from different region and countries.

Let's check this dataset to see if we can find some data for Cuba and Korea

In [15]:
df3 = pd.read_csv("education-gvt-exp-incl-cuba.csv")
df3

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,...,2013,2014,2015,2016,2017,2018,2019,2020,2021,2022
0,Aruba,ABW,"Government expenditure on education, total (% ...",SE.XPD.TOTL.GB.ZS,,,,,,,...,21.877630,19.617979,23.201380,21.853750,,,,,,
1,Africa Eastern and Southern,AFE,"Government expenditure on education, total (% ...",SE.XPD.TOTL.GB.ZS,,,,,,,...,17.243259,18.097099,16.962910,17.198811,17.150761,17.306705,15.35272,14.564090,13.65829,
2,Afghanistan,AFG,"Government expenditure on education, total (% ...",SE.XPD.TOTL.GB.ZS,,,,,,,...,14.102800,14.465930,12.509000,13.091000,12.033200,11.696060,11.34377,10.253860,10.88011,
3,Africa Western and Central,AFW,"Government expenditure on education, total (% ...",SE.XPD.TOTL.GB.ZS,,,,,,,...,14.963605,12.939880,13.121995,12.854880,16.058439,16.114195,14.15939,14.339985,14.93463,13.813610
4,Angola,AGO,"Government expenditure on education, total (% ...",SE.XPD.TOTL.GB.ZS,,,,,,,...,8.826970,6.162840,8.918780,6.550970,6.763780,5.410230,6.04536,6.467230,6.91961,6.640472
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
261,Kosovo,XKX,"Government expenditure on education, total (% ...",SE.XPD.TOTL.GB.ZS,,,,,,,...,,,,,,,,,,
262,"Yemen, Rep.",YEM,"Government expenditure on education, total (% ...",SE.XPD.TOTL.GB.ZS,,,,,,,...,,,,,,,,,,
263,South Africa,ZAF,"Government expenditure on education, total (% ...",SE.XPD.TOTL.GB.ZS,,,,,,,...,18.696600,18.989161,18.699350,18.048740,18.719290,18.901590,19.59623,19.527281,18.41724,19.750040
264,Zambia,ZMB,"Government expenditure on education, total (% ...",SE.XPD.TOTL.GB.ZS,,,,,,,...,15.400000,20.100000,16.335600,15.663620,14.934340,17.118719,15.29187,12.378020,11.51414,10.447814


Let's explore this dataset

In [16]:
df3.columns

Index(['Country Name', 'Country Code', 'Indicator Name', 'Indicator Code',
       '1960', '1961', '1962', '1963', '1964', '1965', '1966', '1967', '1968',
       '1969', '1970', '1971', '1972', '1973', '1974', '1975', '1976', '1977',
       '1978', '1979', '1980', '1981', '1982', '1983', '1984', '1985', '1986',
       '1987', '1988', '1989', '1990', '1991', '1992', '1993', '1994', '1995',
       '1996', '1997', '1998', '1999', '2000', '2001', '2002', '2003', '2004',
       '2005', '2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013',
       '2014', '2015', '2016', '2017', '2018', '2019', '2020', '2021', '2022'],
      dtype='object')

In [17]:
df3['Country Name'].unique()

array(['Aruba', 'Africa Eastern and Southern', 'Afghanistan',
       'Africa Western and Central', 'Angola', 'Albania', 'Andorra',
       'Arab World', 'United Arab Emirates', 'Argentina', 'Armenia',
       'American Samoa', 'Antigua and Barbuda', 'Australia', 'Austria',
       'Azerbaijan', 'Burundi', 'Belgium', 'Benin', 'Burkina Faso',
       'Bangladesh', 'Bulgaria', 'Bahrain', 'Bahamas, The',
       'Bosnia and Herzegovina', 'Belarus', 'Belize', 'Bermuda',
       'Bolivia', 'Brazil', 'Barbados', 'Brunei Darussalam', 'Bhutan',
       'Botswana', 'Central African Republic', 'Canada',
       'Central Europe and the Baltics', 'Switzerland', 'Channel Islands',
       'Chile', 'China', "Cote d'Ivoire", 'Cameroon', 'Congo, Dem. Rep.',
       'Congo, Rep.', 'Colombia', 'Comoros', 'Cabo Verde', 'Costa Rica',
       'Caribbean small states', 'Cuba', 'Curacao', 'Cayman Islands',
       'Cyprus', 'Czechia', 'Germany', 'Djibouti', 'Dominica', 'Denmark',
       'Dominican Republic', 'Algeria',
 

This dataset also has regional figures for government expenditure on education alongside its country data. This will be useful if we can not find any figures for our top 5 and our bottom 5

### Let's see if we can find data for Cuba and Korea

In [18]:
korea_data = df3[df3['Country Name'].str.contains('Korea')]
korea_data

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,...,2013,2014,2015,2016,2017,2018,2019,2020,2021,2022
126,"Korea, Rep.",KOR,"Government expenditure on education, total (% ...",SE.XPD.TOTL.GB.ZS,,,,,,,...,,,14.65296,14.30254,14.29845,14.31293,13.83082,,,
193,"Korea, Dem. People's Rep.",PRK,"Government expenditure on education, total (% ...",SE.XPD.TOTL.GB.ZS,,,,,,,...,,,,,,,,,,


We are interested in Korea, Dem. People's Rep. and there is no data for Korea in this dataset so we will use the regional figures in its place

In [19]:
df3[df3['Country Name'].str.contains('East Asia')]

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,...,2013,2014,2015,2016,2017,2018,2019,2020,2021,2022
61,East Asia & Pacific (excluding high income),EAP,"Government expenditure on education, total (% ...",SE.XPD.TOTL.GB.ZS,,,,,,,...,16.301397,16.045906,15.317964,15.115435,14.25683,16.32588,15.521901,14.40717,15.276836,
63,East Asia & Pacific,EAS,"Government expenditure on education, total (% ...",SE.XPD.TOTL.GB.ZS,,,,,,,...,16.422539,16.155823,15.317964,15.115435,14.695935,16.086915,13.930195,14.418585,14.754606,
230,East Asia & Pacific (IDA & IBRD countries),TEA,"Government expenditure on education, total (% ...",SE.XPD.TOTL.GB.ZS,,,,,,,...,16.301397,16.045906,15.317964,15.115435,14.25683,16.32588,15.521901,14.40717,15.276836,


We will go for the 'East Asia & Pacific' since we don't want to exclude high income

In [20]:
korean_data = df3[df3['Country Name'] == 'East Asia & Pacific']
korean_data

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,...,2013,2014,2015,2016,2017,2018,2019,2020,2021,2022
63,East Asia & Pacific,EAS,"Government expenditure on education, total (% ...",SE.XPD.TOTL.GB.ZS,,,,,,,...,16.422539,16.155823,15.317964,15.115435,14.695935,16.086915,13.930195,14.418585,14.754606,


Let's check for our Cuban data

In [21]:
cuba_data = df3[df3['Country Name'] == 'Cuba']
cuba_data

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,...,2013,2014,2015,2016,2017,2018,2019,2020,2021,2022
50,Cuba,CUB,"Government expenditure on education, total (% ...",SE.XPD.TOTL.GB.ZS,,,,,,,...,16.519653,15.801738,15.263568,14.937566,13.564914,12.659974,14.225944,16.6611,,


Let's make a data frame with both our Cuba and East Asian figures # delete this

## Let's add both our Cuban and Korean figueres to our filtered_df dataframe

In [22]:
cuba_data = cuba_data.copy()

In [23]:
cuba_data.columns

Index(['Country Name', 'Country Code', 'Indicator Name', 'Indicator Code',
       '1960', '1961', '1962', '1963', '1964', '1965', '1966', '1967', '1968',
       '1969', '1970', '1971', '1972', '1973', '1974', '1975', '1976', '1977',
       '1978', '1979', '1980', '1981', '1982', '1983', '1984', '1985', '1986',
       '1987', '1988', '1989', '1990', '1991', '1992', '1993', '1994', '1995',
       '1996', '1997', '1998', '1999', '2000', '2001', '2002', '2003', '2004',
       '2005', '2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013',
       '2014', '2015', '2016', '2017', '2018', '2019', '2020', '2021', '2022'],
      dtype='object')

### Cleaning the Cuban data 

First we will drop the columns that do not suit our given range

In [24]:
cuba_data.drop(cuba_data.columns[cuba_data.columns < '2000'], axis=1, inplace=True)

Let's check the columns with the years before 2000 have been dropped correctly

In [25]:
cuba_data.columns

Index(['Country Name', 'Country Code', 'Indicator Name', 'Indicator Code',
       '2000', '2001', '2002', '2003', '2004', '2005', '2006', '2007', '2008',
       '2009', '2010', '2011', '2012', '2013', '2014', '2015', '2016', '2017',
       '2018', '2019', '2020', '2021', '2022'],
      dtype='object')

We need to melt the columns to mirror the format of our filtered_df dataframe

In [26]:
# # Rename the columns
cuba_data = cuba_data.rename(columns={'Country Name': 'Country', 'Country Code': 'Code', 'Indicator Name': 'Indicator', 'Indicator Code': 'Indicator Code'})

# Perform melt
cuba_data = cuba_data.melt(id_vars=['Country', 'Code'], value_vars = ['2000', '2001', '2002', '2003', '2004', '2005', '2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013', '2014', '2015', '2016', '2017','2018', '2019', '2020', '2021', '2022'], 
                                                                      var_name='Year', value_name='Gvt_Exp_on_Education')


In [27]:
cuba_data

Unnamed: 0,Country,Code,Year,Gvt_Exp_on_Education
0,Cuba,CUB,2000,
1,Cuba,CUB,2001,
2,Cuba,CUB,2002,
3,Cuba,CUB,2003,
4,Cuba,CUB,2004,
5,Cuba,CUB,2005,
6,Cuba,CUB,2006,
7,Cuba,CUB,2007,
8,Cuba,CUB,2008,
9,Cuba,CUB,2009,


### Cleaning the Korea data 

In [28]:
korean_data = korean_data.copy()

In [29]:
korean_data.drop(korean_data.columns[korean_data.columns < '2000'], axis=1, inplace=True)

In [30]:
korean_data.columns

Index(['Country Name', 'Country Code', 'Indicator Name', 'Indicator Code',
       '2000', '2001', '2002', '2003', '2004', '2005', '2006', '2007', '2008',
       '2009', '2010', '2011', '2012', '2013', '2014', '2015', '2016', '2017',
       '2018', '2019', '2020', '2021', '2022'],
      dtype='object')

In [31]:
# # Rename the columns
korean_data = korean_data.rename(columns={'Country Name': 'Country', 'Country Code': 'Code', 'Indicator Name': 'Indicator', 'Indicator Code': 'Indicator Code'})

# Perform melt operation
korean_data = korean_data.melt(id_vars=['Country', 'Code'], value_vars = ['2000', '2001', '2002', '2003', '2004', '2005', '2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013', '2014', '2015', '2016', '2017','2018', '2019', '2020', '2021', '2022'], 
                                                                      var_name='Year', value_name='Gvt_Exp_on_Education')


In [32]:
korean_data

Unnamed: 0,Country,Code,Year,Gvt_Exp_on_Education
0,East Asia & Pacific,EAS,2000,
1,East Asia & Pacific,EAS,2001,
2,East Asia & Pacific,EAS,2002,16.497231
3,East Asia & Pacific,EAS,2003,
4,East Asia & Pacific,EAS,2004,16.43948
5,East Asia & Pacific,EAS,2005,
6,East Asia & Pacific,EAS,2006,
7,East Asia & Pacific,EAS,2007,14.964565
8,East Asia & Pacific,EAS,2008,14.188845
9,East Asia & Pacific,EAS,2009,14.51142


Now we need to add these figures onto our filtered_df dataframe using .concat()

In [33]:
# Concatenate the two DataFrames
combined_df = pd.concat([filtered_df, cuba_data, korean_data])

# Optional: Reset the index of the combined DataFrame
combined_df = combined_df.reset_index(drop=True)

combined_df

Unnamed: 0,Country,Code,Year,Gvt_Exp_on_Education
0,Afghanistan,AFG,2010,17.067560
1,Afghanistan,AFG,2011,16.048429
2,Afghanistan,AFG,2013,14.102800
3,Afghanistan,AFG,2014,14.465930
4,Afghanistan,AFG,2015,12.509000
...,...,...,...,...
203,East Asia & Pacific,EAS,2018,16.086915
204,East Asia & Pacific,EAS,2019,13.930195
205,East Asia & Pacific,EAS,2020,14.418585
206,East Asia & Pacific,EAS,2021,14.754606


In [34]:
unique_countries = combined_df['Country'].nunique()
print("Number of unique countries:", unique_countries)

Number of unique countries: 10


We now have all of our ten countries in one dataframe

### Looking at our filtered dataset

Let's change the 'East Asia & Pacific' value to 'Korea, Dem. People's Rep.'

In [35]:
combined_df['Country'] = combined_df['Country'].replace('East Asia & Pacific', "Korea, Dem. People's Rep.")

In [36]:
combined_df.dtypes

Country                  object
Code                     object
Year                     object
Gvt_Exp_on_Education    float64
dtype: object

In [37]:
combined_df['Year'] = combined_df['Year'].astype(int)

By looking at the .min() of the data we can see that the data starts from 1991

In [38]:
print("The minimum year is", combined_df['Year'].min())
print("The maximum year is",combined_df['Year'].max())

The minimum year is 1991
The maximum year is 2022


Since we are only looking at data from 2000 - 2021, we'll need to get rid of any rows from 1997 to 1999 and after 2021

In [39]:
combined_df = combined_df[combined_df['Year'] > 1999]
combined_df = combined_df.loc[combined_df['Year'] <= 2021]
combined_df = combined_df.reset_index(drop=True)

Let's check to see what year the dataset now starts from

In [40]:
print("The minimum year is", combined_df['Year'].min())
print("The maximum year is",combined_df['Year'].max())

The minimum year is 2000
The maximum year is 2021


Not all of the countries in our chosen 10 have the same year range, so we'll have to see what the time span is for each country 

In [41]:
country_count = combined_df.groupby('Country')['Year'].count()
country_count

Country
Afghanistan                  11
Chad                         16
Cuba                         22
Estonia                      18
Guinea                       20
Korea, Dem. People's Rep.    22
Latvia                       18
Lithuania                    18
Mali                         20
Niger                        20
Name: Year, dtype: int64

The highest amount of years is 21 and the lowest is 11, this is not too much of a range, so we can proceed

## Cleaning the data 

In [42]:
combined_df[['Country', 'Gvt_Exp_on_Education']][combined_df.Gvt_Exp_on_Education.isna()]

Unnamed: 0,Country,Gvt_Exp_on_Education
141,Cuba,
142,Cuba,
143,Cuba,
144,Cuba,
145,Cuba,
146,Cuba,
147,Cuba,
148,Cuba,
149,Cuba,
150,Cuba,


In [43]:
#removing the rows with null values
combined_df.dropna(inplace=True)

### Finding the mean of each country 

In [44]:
# we are grouping our data by year and then calculate mean Gross Earnings for each year. 
countries_mean = combined_df.groupby('Country')['Gvt_Exp_on_Education'].mean()
countries_mean

Country
Afghanistan                  13.044702
Chad                         11.933025
Cuba                         15.123699
Estonia                      13.668718
Guinea                       12.861944
Korea, Dem. People's Rep.    15.258061
Latvia                       14.311544
Lithuania                    13.705299
Mali                         16.659471
Niger                        16.770930
Name: Gvt_Exp_on_Education, dtype: float64

### Finding the median for each country

In [45]:
countries_median = combined_df.groupby('Country')['Gvt_Exp_on_Education'].median()
countries_median

Country
Afghanistan                  12.509000
Chad                         12.066485
Cuba                         15.263568
Estonia                      13.524495
Guinea                       12.421840
Korea, Dem. People's Rep.    14.964565
Latvia                       14.097230
Lithuania                    13.239080
Mali                         16.519120
Niger                        16.775310
Name: Gvt_Exp_on_Education, dtype: float64

For this dataset, we have decided to go with the mean, since it is not too far off from the median, so there aren't significant outliers in this data

In [None]:
#scatter plot for the mean values just to look at the outliers 
# visualisations
# join the cuba and korea data to the original dataframe 
# add the mean value as another column onto merged_data.csv

Let's create a scatter plot to show the mean 

In [None]:
import matplotlib.pyplot as plt

# Plot scatter plot
plt.scatter(countries_mean.index, countries_mean.values)

# Set labels and title
plt.xlabel('Country')
plt.ylabel('Mean Gvt_Exp_on_Education')
plt.title('Mean Government Expenditure on Education by Country')

# Rotate x-axis labels if needed
plt.xticks(rotation=90)

# Display the plot
plt.show()


In [None]:
import matplotlib.pyplot as plt

# Group the data by country and year and get all values of 'Gvt_Exp_on_Education'
country_data = filtered_df.groupby('Country')['Gvt_Exp_on_Education'].mean()

# Extract the countries and corresponding values
countries = country_data.index
values = country_data.values

# Plot the data
plt.scatter(countries, values)

# Set labels and title
plt.xlabel('Country')
plt.ylabel('Gvt_Exp_on_Education')
plt.title('Government Expenditure on Education by Country')

# Rotate x-axis tick labels if needed
plt.xticks(rotation=90)

# Display the plot
plt.show()



# Add the mean value as another column onto merged_data.csv

In [53]:
## Merging our mean government expenditure results to our merged_csv table

# Read the CSV file into a DataFrame
merged_tables_df = pd.read_csv('merged_tables.csv')

merged_tables_df = merged_tables_df.drop(columns = 'Gvt_Exp_on_Education')

merged_tables_df

Unnamed: 0,Country Name,Country Code,Status,Mean Adult Female Literacy Rate (%),Median Female Child Marriage Rate (%),Income Level
0,Chad,TCD,Lowest,15.379128,24.556,Low income
1,Afghanistan,AFG,Lowest,19.80931,16.3,Low income
2,Mali,MLI,Lowest,20.470424,42.1,Low income
3,Niger,NER,Lowest,20.530956,21.1555,Low income
4,Guinea,GIN,Lowest,22.271088,28.1,Low income
5,Cuba,CUB,Highest,99.769315,12.362,Upper middle income
6,Lithuania,LTU,Highest,99.777059,0.15,High income
7,Estonia,EST,Highest,99.849846,4.8805,High income
8,Latvia,LVA,Highest,99.858515,4.8805,High income
9,"Korea, Dem. People's Rep.",PRK,Highest,99.997612,0.05,Low income


In [59]:
# Merge the countries_mean with the 'merged_tables' DataFrame
merged_tables = merged_tables_df.merge(countries_mean, left_on='Country Name', right_index=True, how='left')

# Save the updated DataFrame back to the 'merged_tables.csv' CSV file
merged_tables.to_csv('merged_tables.csv', index=False)


In [62]:
merged_tables

Unnamed: 0,Country Name,Country Code,Status,Mean Adult Female Literacy Rate (%),Median Female Child Marriage Rate (%),Income Level,Gvt_Exp_on_Education
0,Chad,TCD,Lowest,15.379128,24.556,Low income,11.933025
1,Afghanistan,AFG,Lowest,19.80931,16.3,Low income,13.044702
2,Mali,MLI,Lowest,20.470424,42.1,Low income,16.659471
3,Niger,NER,Lowest,20.530956,21.1555,Low income,16.77093
4,Guinea,GIN,Lowest,22.271088,28.1,Low income,12.861944
5,Cuba,CUB,Highest,99.769315,12.362,Upper middle income,15.123699
6,Lithuania,LTU,Highest,99.777059,0.15,High income,13.705299
7,Estonia,EST,Highest,99.849846,4.8805,High income,13.668718
8,Latvia,LVA,Highest,99.858515,4.8805,High income,14.311544
9,"Korea, Dem. People's Rep.",PRK,Highest,99.997612,0.05,Low income,15.258061


In [64]:
# Rename the column
merged_tables = merged_tables.rename(columns={'Gvt_Exp_on_Education': "Mean Government Education Expenditure Rate (%)"})

In [65]:
merged_tables

Unnamed: 0,Country Name,Country Code,Status,Mean Adult Female Literacy Rate (%),Median Female Child Marriage Rate (%),Income Level,Mean Government Education Expenditure Rate (%)
0,Chad,TCD,Lowest,15.379128,24.556,Low income,11.933025
1,Afghanistan,AFG,Lowest,19.80931,16.3,Low income,13.044702
2,Mali,MLI,Lowest,20.470424,42.1,Low income,16.659471
3,Niger,NER,Lowest,20.530956,21.1555,Low income,16.77093
4,Guinea,GIN,Lowest,22.271088,28.1,Low income,12.861944
5,Cuba,CUB,Highest,99.769315,12.362,Upper middle income,15.123699
6,Lithuania,LTU,Highest,99.777059,0.15,High income,13.705299
7,Estonia,EST,Highest,99.849846,4.8805,High income,13.668718
8,Latvia,LVA,Highest,99.858515,4.8805,High income,14.311544
9,"Korea, Dem. People's Rep.",PRK,Highest,99.997612,0.05,Low income,15.258061


In [66]:
# Save the updated DataFrame back to the 'merged_tables.csv' CSV file
merged_tables.to_csv('merged_tables.csv', index=False)