This is a mini-project for SC1015. We identified a real-life problem when it comes to measuring how happy humans are. And at a global scale with billions of people, it's difficult for governments to make informed policy decisions. First of all, it is hard to determine out of so many factors, which is the most influential one on happiness scores and human development indexes. Because of this difficulty, it is also hard to determine the amount of resources and time that should be invested into improving and designing these policies.
Hence, our project is about the dynamics of happiness, where we use datasets obtained from Kaggle and the HDR site to find another way to best predict happiness. This will be done through the data science pipeline.
Ng Zhengbin Claven - Sample collection, exploratory analysis, analytical visualization, machine learning, conclusion
Lim Kang You - Sample collection, exploratory analysis, analytical visualization, machine learning, conclusion
Chin Shao Yang - Sample collection, exploratory analysis, analytical visualization, machine learning, conclusion
<3 we did everything together <3
To utilize machine learning models to determine which factors are most predictive of happiness scores
Whether there is any correlation between happiness scores and the human development index
Random Forest
Gradient Boosting
K-Means
Key Correlations: Happiness Score strongly linked with GDP, Social Support, and Healthy Life Expectancy
Model Comparison: Gradient Boosting consistently outperformed Random Forest in terms of R^2 as well as MSE, demonstrating its efficacy in error correction and learning from weaker models
Key Feature: Social Support identified as the most important feature based on Random Forest and Gradient Boosting
ML Techniques: Random Forest, Gradient Boosting, K-Means
Employed Scikit-learn for implementing algorithms such as RandomForestRegressor and GradientBoostingRegressor, and tools like Feature Importance and StandardScaler for data processing
Collaborating using Github
Gradient boosting was the best model and outperformed random forest in predictive accuracy
https://www.kaggle.com/datasets/unsdsn/world-happiness/data https://worldhappiness.report https://hdr.undp.org/data-center/human-development-index#/indicies/HDI https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingRegressor.html https://scikit-learn.org/stable/auto_examples/ensemble/plot_forest_importances.html