A data mining and machine learning task. (http://jomi.das.ufsc.br/ia/2017/tp-dm.pdf)
The task is based on html data, found in: ./webkb/
or http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/
This data has to be classified in 7 target classes:
- student (1641)
- faculty (1124)
- staff (137)
- department (182)
- course (930)
- project (504)
- other (3764)
And, the data is divided by universities:
- Cornell (867)
- Texas (827)
- Washington (1205)
- Wisconsin (1263)
--
To solve this problem, we made a jupyter-notebook, called "final_project.ipynb"
, on Python 2.7. Visualize it on ./final_project.html
.
PS.1 To edit, please use Python 2.7 and download the jupyter: pip install jupyter
PS. To get all the data and put on a csv (./corpus.csv
), we made a python script (./script.py
), that also is on the jupyter-notebook.
This project is to an academic discipline from Universidade Federal de Santa Catarina - http://jomi.das.ufsc.br/ia/
The authors are: Luis Felipe Pelison, Alex Amadeu Cani and Iago Oliveira