🌍 Global GDP Per Capita Analysis & Forecast (2000–2030)

A data science project analyzing global economic performance using the Penn World Table dataset.
The project explores GDP per capita trends, country clustering, statistical testing, and future predictions using machine learning.

📊 Project Goals

This project aims to:

Analyze global GDP per capita trends (2000–2018)
Identify economic groupings of countries using clustering
Compare developing vs developed economies statistically
Predict future GDP per capita trends using regression
Visualize global economic patterns

📁 Dataset

Source: Penn World Table (PWT 10.01)

The dataset contains macroeconomic indicators for countries worldwide.

Key variables used:

Variable	Description
`country`	Country name
`year`	Year of observation
`rgdpo`	Real GDP (output-side)
`pop`	Population
`gdp_per_capita`	Calculated as `rgdpo / pop`

The analysis focuses on the 2000–2018 time range.

🧹 Data Preparation

Data preprocessing steps included:

Filtering data between 2000–2018
Removing missing values
Removing rows with population = 0
Creating a new feature:

gdp_per_capita = rgdpo / pop

This metric allows comparison of economic productivity per person across countries.

📈 Exploratory Data Analysis

Descriptive statistics were used to understand global economic patterns.

Metrics calculated:

Mean GDP per capita
Median GDP per capita
Mode GDP per capita
Quartiles (25%, 50%, 75%)

These provide insights into global standards of living and economic inequality.

🤖 Machine Learning: Country Clustering

To identify economic groups, K-Means clustering was applied.

Steps:

Calculate average GDP per capita per country
Standardize values using StandardScaler
Apply K-Means clustering (k = 4)

Clusters represent different economic categories such as:

Low-income economies
Emerging economies
Upper-middle income countries
High-income economies

Visualization example:

Scatter plot of GDP per capita vs population
Countries grouped by cluster

📊 Statistical Testing

The project tests whether GDP per capita distributions differ between economic groups.

Normality Test

Shapiro–Wilk Test

Used to check whether the GDP distribution is normal.

Result:

GDP per capita data is not normally distributed.

Group Comparison

Mann–Whitney U Test

Used instead of a t-test because the data is non-normal.

This test compares GDP distributions between developing and developed country clusters.

📉 Global GDP Growth Over Time

Year-over-year GDP growth was calculated using:

GDP Growth = (GDP_t − GDP_(t−1)) / GDP_(t−1)

The average global GDP growth was then visualized over time to observe long-term economic trends.

🌍 Top Performing Countries

Countries were ranked by average GDP per capita.

The project identifies the top 10 performing economies based on average GDP per capita between 2000–2018.

Example country comparison:

United States
Germany
Egypt
India

📈 GDP Forecasting (Machine Learning)

A Linear Regression model was used to predict GDP per capita trends.

Example prediction:

Finland GDP per Capita (2000–2030)

Model workflow:

Train model using historical data (2000–2018)
Use year as predictor
Predict GDP per capita through 2030
Compare actual vs predicted values

This demonstrates a basic economic forecasting approach.

📊 Example Visualizations

The project includes:

GDP clustering scatter plots
Global GDP growth over time
Country GDP comparisons
GDP prediction graphs

Libraries used:

Matplotlib
Seaborn

🧰 Technologies Used

Technology	Purpose
Python	Programming language
Pandas	Data manipulation
NumPy	Numerical computing
Matplotlib	Data visualization
Seaborn	Statistical visualization
Scikit-learn	Machine learning
SciPy	Statistical testing

▶️ Running the Project

Clone the repository

git clone https://github.com/yourusername/gdp-analysis-project.git
cd gdp-analysis-project

Install dependencies

pip install pandas numpy matplotlib seaborn scikit-learn scipy

Run the analysis

python gdp_analysis.py

Make sure the dataset file pwt1001.csv is located in the project directory.

📌 Key Insights

GDP per capita varies widely between countries.
Machine learning clustering reveals clear economic groupings.
GDP distributions are non-normal, requiring non-parametric tests.
Linear regression can approximate long-term economic trends, though it cannot capture complex economic shocks.

🚀 Future Improvements

Potential extensions:

Add more economic indicators (inflation, unemployment, education)
Use time series models (ARIMA, Prophet, LSTM)
Build an interactive dashboard (Plotly / Streamlit)
Apply multi-feature clustering for deeper economic classification

👩‍💻 Author

Olga Chitembo

This project is part of my data science and analytics portfolio, demonstrating skills in:

Data analysis
Statistical testing
Machine learning
Economic data interpretation
Data visualization

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.idea		.idea
GlobalEconomicGrowth.ipynb		GlobalEconomicGrowth.ipynb
README.md		README.md
pwt1001.csv		pwt1001.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌍 Global GDP Per Capita Analysis & Forecast (2000–2030)

📊 Project Goals

📁 Dataset

🧹 Data Preparation

📈 Exploratory Data Analysis

🤖 Machine Learning: Country Clustering

📊 Statistical Testing

Normality Test

Group Comparison

📉 Global GDP Growth Over Time

🌍 Top Performing Countries

📈 GDP Forecasting (Machine Learning)

📊 Example Visualizations

🧰 Technologies Used

▶️ Running the Project

Clone the repository

Install dependencies

Run the analysis

📌 Key Insights

🚀 Future Improvements

👩‍💻 Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🌍 Global GDP Per Capita Analysis & Forecast (2000–2030)

📊 Project Goals

📁 Dataset

🧹 Data Preparation

📈 Exploratory Data Analysis

🤖 Machine Learning: Country Clustering

📊 Statistical Testing

Normality Test

Group Comparison

📉 Global GDP Growth Over Time

🌍 Top Performing Countries

📈 GDP Forecasting (Machine Learning)

📊 Example Visualizations

🧰 Technologies Used

▶️ Running the Project

Clone the repository

Install dependencies

Run the analysis

📌 Key Insights

🚀 Future Improvements

👩‍💻 Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages