IETM (Instructed-Expansion-Based Topic Model)

IETM uses an open-source Java package to implement the algorithm proposed in the paper named Transferring Knowledge from Large Language Models for Short Text Topic Modeling.

1. Requirements

Java （Version=1.8）

2. Datasets

All of corpus files (Tweet, SearchSnippets, and StackOverflow) and the corresponding label files have been prepared in the path ./datasets.

Taking Tweet as an example, the dataset file path is as follows.

datasets

Tweet

Tweet.txt

Tweet_label.txt

Tweet_GPT.txt

Tweet_DREx.txt

Tweet_LLaMa.txt

Tweet_LLaMa2.txt

where 'Tweet.txt' contains the original short texts. The 'Tweet_label.txt' is the label file corresponding to Tweet dataset. The other four files are pseudo long documents generated through different methods. For example, each document in 'Tweet_GPT.txt' is generated by GPT according to the original short text.

3. Run and Evaluate IETM

bash run.sh

-algorithm: IETM.

-dataname: Specify the name of dataset (Tweet, SearchSnippets, or StackOverflow).

-alpha: Specify the value of the Dirichlet prior. The default value is 1.0.

-beta: Specify the value of the Dirichlet prior. The default value is 0.01.

-ntopics: Specify the number of topics. The default value is 50.

-corpus: Specify the file of the input short text corpus file.

-generateCorpus: Specify the file of the input pseudo-long corpus file.

-output: Specify the path to the output directory.

-name: Specify the name of the output file.

-niters: Specify the number of iterations. The default value is 1000.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.idea		.idea
dataset		dataset
lib		lib
out		out
results		results
src		src
IETM.iml		IETM.iml
README.md		README.md
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IETM (Instructed-Expansion-Based Topic Model)

1. Requirements

2. Datasets

3. Run and Evaluate IETM

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

IETM (Instructed-Expansion-Based Topic Model)

1. Requirements

2. Datasets

3. Run and Evaluate IETM

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages