Skip to content

olgachitembo/Python-and-Data-Analytics-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

🌍 Global GDP Per Capita Analysis & Forecast (2000–2030)

A data science project analyzing global economic performance using the Penn World Table dataset.
The project explores GDP per capita trends, country clustering, statistical testing, and future predictions using machine learning.


📊 Project Goals

This project aims to:

  • Analyze global GDP per capita trends (2000–2018)
  • Identify economic groupings of countries using clustering
  • Compare developing vs developed economies statistically
  • Predict future GDP per capita trends using regression
  • Visualize global economic patterns

📁 Dataset

Source: Penn World Table (PWT 10.01)

The dataset contains macroeconomic indicators for countries worldwide.

Key variables used:

Variable Description
country Country name
year Year of observation
rgdpo Real GDP (output-side)
pop Population
gdp_per_capita Calculated as rgdpo / pop

The analysis focuses on the 2000–2018 time range.


🧹 Data Preparation

Data preprocessing steps included:

  • Filtering data between 2000–2018
  • Removing missing values
  • Removing rows with population = 0
  • Creating a new feature:
gdp_per_capita = rgdpo / pop

This metric allows comparison of economic productivity per person across countries.


📈 Exploratory Data Analysis

Descriptive statistics were used to understand global economic patterns.

Metrics calculated:

  • Mean GDP per capita
  • Median GDP per capita
  • Mode GDP per capita
  • Quartiles (25%, 50%, 75%)

These provide insights into global standards of living and economic inequality.


🤖 Machine Learning: Country Clustering

To identify economic groups, K-Means clustering was applied.

Steps:

  1. Calculate average GDP per capita per country
  2. Standardize values using StandardScaler
  3. Apply K-Means clustering (k = 4)

Clusters represent different economic categories such as:

  • Low-income economies
  • Emerging economies
  • Upper-middle income countries
  • High-income economies

Visualization example:

  • Scatter plot of GDP per capita vs population
  • Countries grouped by cluster

📊 Statistical Testing

The project tests whether GDP per capita distributions differ between economic groups.

Normality Test

Shapiro–Wilk Test

Used to check whether the GDP distribution is normal.

Result:

  • GDP per capita data is not normally distributed.

Group Comparison

Mann–Whitney U Test

Used instead of a t-test because the data is non-normal.

This test compares GDP distributions between developing and developed country clusters.


📉 Global GDP Growth Over Time

Year-over-year GDP growth was calculated using:

GDP Growth = (GDP_t − GDP_(t−1)) / GDP_(t−1)

The average global GDP growth was then visualized over time to observe long-term economic trends.


🌍 Top Performing Countries

Countries were ranked by average GDP per capita.

The project identifies the top 10 performing economies based on average GDP per capita between 2000–2018.

Example country comparison:

  • United States
  • Germany
  • Egypt
  • India

📈 GDP Forecasting (Machine Learning)

A Linear Regression model was used to predict GDP per capita trends.

Example prediction:

Finland GDP per Capita (2000–2030)

Model workflow:

  1. Train model using historical data (2000–2018)
  2. Use year as predictor
  3. Predict GDP per capita through 2030
  4. Compare actual vs predicted values

This demonstrates a basic economic forecasting approach.


📊 Example Visualizations

The project includes:

  • GDP clustering scatter plots
  • Global GDP growth over time
  • Country GDP comparisons
  • GDP prediction graphs

Libraries used:

  • Matplotlib
  • Seaborn

🧰 Technologies Used

Technology Purpose
Python Programming language
Pandas Data manipulation
NumPy Numerical computing
Matplotlib Data visualization
Seaborn Statistical visualization
Scikit-learn Machine learning
SciPy Statistical testing

▶️ Running the Project

Clone the repository

git clone https://github.com/yourusername/gdp-analysis-project.git
cd gdp-analysis-project

Install dependencies

pip install pandas numpy matplotlib seaborn scikit-learn scipy

Run the analysis

python gdp_analysis.py

Make sure the dataset file pwt1001.csv is located in the project directory.


📌 Key Insights

  • GDP per capita varies widely between countries.
  • Machine learning clustering reveals clear economic groupings.
  • GDP distributions are non-normal, requiring non-parametric tests.
  • Linear regression can approximate long-term economic trends, though it cannot capture complex economic shocks.

🚀 Future Improvements

Potential extensions:

  • Add more economic indicators (inflation, unemployment, education)
  • Use time series models (ARIMA, Prophet, LSTM)
  • Build an interactive dashboard (Plotly / Streamlit)
  • Apply multi-feature clustering for deeper economic classification

👩‍💻 Author

Olga Chitembo

This project is part of my data science and analytics portfolio, demonstrating skills in:

  • Data analysis
  • Statistical testing
  • Machine learning
  • Economic data interpretation
  • Data visualization

About

Global GDP per Capita Analysis, Clustering & Forecasting (2000–2030) using Python, ML & Statistical Analysis

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors