<h3>Theoretical Framework: Higher Education Affordability vs. Economic Strength Index (HEAEI)</h3>

<h5>Introduction and Justification of this Index</h5>
<p> In 2025 there are many countries around the world, including Ireland which are facing a cost of living crisis. The cost of most things has risen, including higher education, which is making it harder for students worldwide to access. Tution fees have increased by approx 460% over the past 50 years, which has even outpaced inflation, according to (WGU, 2022). Whil some nations do offer free or a lower-cost education, other countries charge extreme and I think this will be interesting to compare.</p>
<p> I want to look at this issue in a way that takes in realistic economic factors, aswell as information on universities in order to paint a more accurate picture to create this index.I will be comparing my higher education data to a country's economic strength (their GDP per capita). If I was to only consider the university data, I would not be able to properly analyse and visualise this data, and tell if a universities tution fees accuratly represents a countries econmic status.</p>

<h5>Purpose of this Composite Index</h5>
<p> The Higher Education Affordability vs. Economic Strength Index (HEAEI) will be aiming to rank countries based on the affordability of the higher education systems, relative to their economic capacity, which is defind as "The amount an economy can produce using current capital at full tilt" () . There will be multiple factors combined and compared in this Index, which will allow me to easily comapre various countries for the best outcome.
<h5>The main goals of the HEAEI Index</h5>
<ul>
<li> To find which countries offer the most affordable education based on their economic capacity</li>
<li>Identify countries where the tution fees are much higher when compared to GDP per capita.</li>
</ul>

This index will be created using two datasets:

1. Global Tuition Fees & Education Trends 2024
Link: https://www.kaggle.com/datasets/kathrinaroldan/global-tuition-fees-and-education-trends-2024
Key variables:
<ul>
<li>Public university tuition fees</li>
<li>Private university tuition fees</li>
<li>Number of universities</li>
<li>Enrollment counts</li>
</ul>
2. World Bank GDP per Capita
Link: https://data.worldbank.org/indicator/NY.GDP.PCAP.CD
Key variable:
<ul>
<li>GDP per capita (latest year available)</li>



<h3>References</h3>
<p>1. Ahmed-Haq, R. (2012) What is economic capacity? Economic News, 18 October. Available at: https://rates.ca/resources/what-is-economic-capacity (Accessed: 19 March 2025).
2. Western Governors University (2022) Affordability and value in higher education. Available at: https://www.wgu.edu/blog/affordability-value-higher-education-advocate-post2211.html (Accessed: 19 March 2025).


In [11]:
%pip install pandas openpyxl
import pandas as pd

tuition_df = pd.read_csv("tuition_fees.csv")
gdp_df = pd.read_csv("gdp_per_capita.csv", skiprows=4)

Defaulting to user installation because normal site-packages is not writeable
You should consider upgrading via the '/Library/Developer/CommandLineTools/usr/bin/python3 -m pip install --upgrade pip' command.[0m
Note: you may need to restart the kernel to use updated packages.


In [22]:
# Clean gdp_df
gdp_df_cleaned = gdp_df.drop(columns=['Indicator Name', 'Indicator Code', 'Unnamed: 68'])  # Drop unnecessary columns
gdp_df_cleaned = gdp_df_cleaned.dropna(how='all', subset=gdp_df_cleaned.columns[4:])  # Drop rows with all NaN GDP values
gdp_df_cleaned = gdp_df_cleaned.rename(columns={'Country Name': 'Country', 'Country Code': 'Code'})  # Rename columns
# Keep only the latest GDP year
gdp_df_cleaned = gdp_df_cleaned[["Country", "2022"]]
gdp_df_cleaned = gdp_df_cleaned.rename(columns={"2022": "GDP per Capita (USD)"})

gdp_df_cleaned["Country"] = gdp_df_cleaned["Country"].str.strip().str.lower()
tuition_df_cleaned["Country"] = tuition_df_cleaned["Country"].str.strip().str.lower()

# Clean tuition_df
tuition_df_cleaned = tuition_df.copy()
tuition_df_cleaned = tuition_df_cleaned.rename(columns={
    'Average Tuition Fee (USD)': 'Avg Tuition Fee (USD)',
    'Min Tuition Fee (USD)': 'Min Tuition Fee (USD)',
    'Max Tuition Fee (USD)': 'Max Tuition Fee (USD)',
    'No. of Private Universities': 'Private Universities',
    'No. of Public Universities': 'Public Universities',
    'Percentage of Private Universities (%)': 'Private Univ (%)',
    'Total Students in Higher Education (millions)': 'Total Students (millions)',
    'Students in Private Universities (millions)': 'Private Students (millions)',
    'Students in Public Universities (millions)': 'Public Students (millions)',
    'Students in Vocational Courses (millions)': 'Vocational Students (millions)',
    'Students Not Studying (millions)': 'Not Studying (millions)',
    'Cost of Living Index': 'Cost of Living',
    'Scholarship Availability (%)': 'Scholarship (%)'
})  # Rename columns


In [20]:
#logic from chat gpt errors fixed with co-pilot and stack overflow
merged_df = pd.merge(tuition_df_cleaned, gdp_df_cleaned, on="Country", how="inner")

# Check for missing values
print(merged_df.isnull().sum())

# Fill missing values with column average
merged_df["Avg Tuition Fee (USD)"].fillna(merged_df["Avg Tuition Fee (USD)"].mean(), inplace=True)
# Ensure the correct column name is used
if "GDP per Capita (USD)" not in merged_df.columns:
	print("Column 'GDP per Capita (USD)' not found. Please verify the column names in gdp_df_cleaned.")
else:
	merged_df["GDP per Capita (USD)"].fillna(merged_df["GDP per Capita (USD)"].mean(), inplace=True)

Country                  0
Year                     0
Avg Tuition Fee (USD)    0
Min Tuition Fee (USD)    0
Max Tuition Fee (USD)    0
                        ..
2019                     0
2020                     0
2021                     0
2022                     0
2023                     0
Length: 81, dtype: int64
Column 'GDP per Capita (USD)' not found. Please verify the column names in gdp_df_cleaned.


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  merged_df["Avg Tuition Fee (USD)"].fillna(merged_df["Avg Tuition Fee (USD)"].mean(), inplace=True)
