DSAI-Project-Group1

Welcome to Happiness report analysis repository

About

This is a mini-project for SC1015. We identified a real-life problem when it comes to measuring how happy humans are. And at a global scale with billions of people, it's difficult for governments to make informed policy decisions. First of all, it is hard to determine out of so many factors, which is the most influential one on happiness scores and human development indexes. Because of this difficulty, it is also hard to determine the amount of resources and time that should be invested into improving and designing these policies.

Hence, our project is about the dynamics of happiness, where we use datasets obtained from Kaggle and the HDR site to find another way to best predict happiness. This will be done through the data science pipeline.

Contributors

Ng Zhengbin Claven - Sample collection, exploratory analysis, analytical visualization, machine learning, conclusion

Lim Kang You - Sample collection, exploratory analysis, analytical visualization, machine learning, conclusion

Chin Shao Yang - Sample collection, exploratory analysis, analytical visualization, machine learning, conclusion

<3 we did everything together <3

Problem Definition

To utilize machine learning models to determine which factors are most predictive of happiness scores

Whether there is any correlation between happiness scores and the human development index

Models Used

Random Forest

Gradient Boosting

K-Means

Conclusion

Key Correlations: Happiness Score strongly linked with GDP, Social Support, and Healthy Life Expectancy

Model Comparison: Gradient Boosting consistently outperformed Random Forest in terms of R^2 as well as MSE, demonstrating its efficacy in error correction and learning from weaker models

Key Feature: Social Support identified as the most important feature based on Random Forest and Gradient Boosting

What did we learn from this project?

ML Techniques: Random Forest, Gradient Boosting, K-Means

Employed Scikit-learn for implementing algorithms such as RandomForestRegressor and GradientBoostingRegressor, and tools like Feature Importance and StandardScaler for data processing

Collaborating using Github

Gradient boosting was the best model and outperformed random forest in predictive accuracy

References

https://www.kaggle.com/datasets/unsdsn/world-happiness/data https://worldhappiness.report https://hdr.undp.org/data-center/human-development-index#/indicies/HDI https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingRegressor.html https://scikit-learn.org/stable/auto_examples/ensemble/plot_forest_importances.html

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
2017.csv		2017.csv
2018.csv		2018.csv
2019.csv		2019.csv
Data Extraction.ipynb		Data Extraction.ipynb
Data Visualization.ipynb		Data Visualization.ipynb
HDI_table.csv		HDI_table.csv
Machine Learning.ipynb		Machine Learning.ipynb
README.md		README.md
finalData.csv		finalData.csv
presentationSlides		presentationSlides

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DSAI-Project-Group1

Welcome to Happiness report analysis repository

About

Contributors

Problem Definition

Models Used

Conclusion

What did we learn from this project?

References

About

Uh oh!

Releases

Packages

Uh oh!

Languages

shaoyangchin/DataScience-AI-Project

Folders and files

Latest commit

History

Repository files navigation

DSAI-Project-Group1

Welcome to Happiness report analysis repository

About

Contributors

Problem Definition

Models Used

Conclusion

What did we learn from this project?

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages