Link to project article: Pending...
After business hours, databases and storage facilities could easily get crowded with papers and files, making their manual processing laborious and time-consuming. This makes it necessary to use machine learning techniques to automate this operation. Clustering analysis will be used in this study to address this sampling some Wikpedia articles.
Prerequisites: Knowledge of clustering techniques or previous elbow method use are prerequisites for finishing this assignment, but both are greatly advantageous.
The idea of clustering techniques or prior experience utilizing the elbow approach is a great advantage.
We will employ a Wikipedia Python library made availabe for easy access to Wikipedia pages. We deal with 14 articles using elbow method which employs heurisics and then K-Mean algorithm for clustering.
The dataset contains 14 articles including topics which are:
- Analytics
- Lawsuit
- Military
- Economy
- Health
- Education
- Food
- Languages
- Africa
- Countries
- Finance
- Earth
- Agriculture
- Plants
All the articles are imported and first converted into vectors for suitability.
Feel free to follow me and ask questions:
https://twitter.com/InuwaAbraham
https://www.linkedin.com/in/mobarak-inuwa/
https://www.analyticsvidhya.com/blog/author/inuwamobarak/
https://mobarak.mystrikingly.com/
References/Links:
Image Source: By Wikimedia Foundation — Wikimedia Foundation, Public Domain, https://commons.wikimedia.org/w/index.php?curid=12611181
Sklearn K-Means Clustering: https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html#
Python Wikipedia Package: https://pypi.org/project/wikipedia/