VisARTM is intended to become a successor of tm_navigator, a tool for visualizing and assessing Topic Models primarily built using BigARTM - fast and scalable library for Topic Modelling.
VisARTM uses Python 3. While VisARTM is likely to work with Python 2, it is not guaranteed.
pip install -r requirements.txt
before using VisARTM. VisARTM requiers fairly
recent flask
and flask_sqlalchemy
.
All files required by VisARTM should be provided in .csv
format. See columns
and sample values for each input file below.
id | abstract | content |
---|---|---|
0 | document-0 | abstact-0 |
1 | document-1 | abstact-1 |
id | text |
---|---|
0 | milk |
1 | Python |
document_l_id | document_r_id | similarity |
---|---|---|
0 | 1 | 0.5 |
0 | 2 | 0.2 |
term_l_id | term_r_id | similarity |
---|---|---|
0 | 1 | 0.5 |
0 | 2 | 0.6 |
document_id | term_id | count |
---|---|---|
0 | 0 | 100 |
0 | 1 | 0 |
id | title | probability | is_background |
---|---|---|---|
0 | Topic 0 | 0.95 | 1 |
1 | Topic 1 | 0.2 | 0 |
topic_l_id | topic_r_id | similarity |
---|---|---|
0 | 1 | 0.22 |
0 | 2 | 0.6 |
document_id | topic_id | prob_dt | prob_td |
---|---|---|---|
0 | 0 | 0.22 | 0.6 |
0 | 1 | 0.61 | 0.3 |
topic_id | term_id | prob_wt | prob_tw |
---|---|---|---|
0 | 0 | 0.22 | 0.6 |
0 | 1 | 0.4 | 0.2 |
To generate some random data and see its visualization use ./setup_sample.py
.
This script generates some random data, writes everything to data
subfolder
and adds generated data to VisARTM database.
Generating VisARTM-compatible models from BigARTM models would be supported in the future.
To load your custom model into VisARTM do the following:
- Put data files in appropriate format into a folder.
- Call
clear()
andcreate()
to ensure that project database is cleared from everything. - Call following Python functions from
manage.py
:
add_dataset('Your Dataset Name', 'path_to_dataset')
- this creates dataset-related entries in the database and loads data.add_topic_model('Your Topic Model name', 'data', created_dataset_id)
wherecreated_dataset_id
is the id of added dataset returned from previous point
- Good job! Now you're all set. Do
python3 serve.py
to see the loaded model and begin assessment.