Skip to content

World Happiness Data Science and Artificial Intelligence Case Study. Project with datasets obtained from Kaggle and the HDR site to find another way to best predict happiness. This will be done through the data science pipeline.

Notifications You must be signed in to change notification settings

shaoyangchin/DataScience-AI-Project

Repository files navigation

DSAI-Project-Group1

Welcome to Happiness report analysis repository

About

This is a mini-project for SC1015. We identified a real-life problem when it comes to measuring how happy humans are. And at a global scale with billions of people, it's difficult for governments to make informed policy decisions. First of all, it is hard to determine out of so many factors, which is the most influential one on happiness scores and human development indexes. Because of this difficulty, it is also hard to determine the amount of resources and time that should be invested into improving and designing these policies.

Hence, our project is about the dynamics of happiness, where we use datasets obtained from Kaggle and the HDR site to find another way to best predict happiness. This will be done through the data science pipeline.

Contributors

Ng Zhengbin Claven - Sample collection, exploratory analysis, analytical visualization, machine learning, conclusion

Lim Kang You - Sample collection, exploratory analysis, analytical visualization, machine learning, conclusion

Chin Shao Yang - Sample collection, exploratory analysis, analytical visualization, machine learning, conclusion

<3 we did everything together <3

Problem Definition

To utilize machine learning models to determine which factors are most predictive of happiness scores

Whether there is any correlation between happiness scores and the human development index

Models Used

Random Forest

Gradient Boosting

K-Means

Conclusion

Key Correlations: Happiness Score strongly linked with GDP, Social Support, and Healthy Life Expectancy

Model Comparison: Gradient Boosting consistently outperformed Random Forest in terms of R^2 as well as MSE, demonstrating its efficacy in error correction and learning from weaker models

Key Feature: Social Support identified as the most important feature based on Random Forest and Gradient Boosting

What did we learn from this project?

ML Techniques: Random Forest, Gradient Boosting, K-Means

Employed Scikit-learn for implementing algorithms such as RandomForestRegressor and GradientBoostingRegressor, and tools like Feature Importance and StandardScaler for data processing

Collaborating using Github

Gradient boosting was the best model and outperformed random forest in predictive accuracy

References

https://www.kaggle.com/datasets/unsdsn/world-happiness/data https://worldhappiness.report https://hdr.undp.org/data-center/human-development-index#/indicies/HDI https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingRegressor.html https://scikit-learn.org/stable/auto_examples/ensemble/plot_forest_importances.html

About

World Happiness Data Science and Artificial Intelligence Case Study. Project with datasets obtained from Kaggle and the HDR site to find another way to best predict happiness. This will be done through the data science pipeline.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published