Skip to content

Latest commit

 

History

History
58 lines (48 loc) · 3.24 KB

Description.rst

File metadata and controls

58 lines (48 loc) · 3.24 KB

orange-plus

About: This is my Bachelor's thesis project in the Hellenic Open University.

Thesis title: Development of Widgets for the Orange data mining platform.

Abstract

The scope of the present thesis is to create three widgets for the Orange platform. The main goal is to study and become familiar with data mining techniques, knowledge discovery, the Python language and the development environment. The first widget implements the SMOTE algorithm. SMOTE is used to balance classes in a dataset in order to allow for the dataset to be more effectively used in a machine learning model. The second widget is OPTICS, which allows clustering of an unsupervised dataset based on the dynamic density of the data. The third widget is KDE-2D and it yields a visualization of data based on a two-dimensional kernel-density estimate using Gaussian kernels. This methodology is a very useful illustration for direct detection of special features between 2 variables in large data sets and which are difficult to detect in other graphs, such as the scatter plot. In addition, hidden clusters can be found, as well as it can indicate whether the data form normal distributions. VS Code, Python and Orange, as well as “imbalanced-learn”, “scikit-learn” and “sciPy” were used in the development process. Through the aforementioned development, Orange's exceptional potential in data mining was uncovered and an insightful understanding of Data Science concepts and techniques was achieved, bringing about a valuable skillset that can be expanded and built upon.

References

Demsar J, Curk T, Erjavec A, Gorup C, Hocevar T, Milutinovic M, Mozina M, Polajnar M, Toplak M,
Staric A, Stajdohar M, Umek L, Zagar L, Zbontar J, Zitnik M, Zupan B (2013) Orange: Data Mining
Toolbox in Python, Journal of Machine Learning Research 14(Aug): 2349-2353.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M.,
Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, D., Brucher, M., Perrot, M.,
& Duchesnay, E. (2011). Scikit-learn: Machine Learning in PythonJournal of Machine Learning
Research, 12, 2825–2830.
Guillaume Lemaitre, Fernando Nogueira, & Christos K. Aridas (2017). Imbalanced-learn: A Python
Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning, Journal of Machine
Learning Research, 18(17), 1-5.
Virtanen, P., Gommers, R., Oliphant, T., Haberland, M., Reddy, D., Burovski, E., Peterson, P.,
Weckesser, W., Bright, J., Walt, S., Brett, M., Wilson, K., Mayorov, N., Nelson, A., Jones, E.,
Kern, R., Larson, C., Polat, ., Feng, Y., Moore, E., Vand erPlas, J., Laxalde, J., Cimrman, R.,
Henriksen, E., Harris, C., Archibald, A., Ribeiro, A., Pedregosa, P., & Contributors, S. (2020).
SciPy 1.0: Fundamental Algorithms for Scientific Computing in PythonNature Methods, 17, 261–272.