LLM_enhanced_ST_4ConstParsing

A cross-domain constituency parser with LLM-enhanced self-training. Based on LLM-enhanced Self-training for Constituency Parsing from EMNLP 2023.

install

Before training the parser, please refer to requirements.txt and install the necessary packages. And also you need download and compile EVALB. For the detailed information about the berkeley parser, please refer to the original work: Constituency Parsing with a Self-Attentive Encoder.

training

As for our method, the LLM-generation process should be introduced in each iteration, run train_LLM.sh after generating the raw language data. Here are the parameters that need to be modified for each iteration:

    --raw-path：the LLM generated raw corpus (at most 10,000 sentences).
    --start-iter: current iteration (e.g. 0)
    --iter-cnt: current stop iteration (e.g. 1, for the vanilla self-training set to 4 directly)
    --accord: 0: Token, 1: nonTerminal, 2: GRs 3: confidence, 4: GRsConf
    --topK: the count for selected pseudo-trees
    --select-record: path to save the selected pseudo-trees
    --pretrain-parser: the parser trained in the previous iteration, if it is start iteration 0, ignores this parameter
    --tab-score-path: save the test score for domains and to use this please prepare the files in test_path_list of main.py
    --pretrained-model: select a pre-train and copy the path to this parameter, we use bert-base-uncased as well as bert-large-uncased
    --model-path-base: path to save the trained parser

testing

For convenience, we combine train&test together. You can also sepeartely run test followed.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data/raw		data/raw
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run_LLM.sh		run_LLM.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data/raw

data/raw

src

src

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

run_LLM.sh

run_LLM.sh

Repository files navigation

LLM_enhanced_ST_4ConstParsing

install

training

testing

About

Releases

Packages

Languages

License

jianlingl/LLM_ST_ConstParsing

Folders and files

Latest commit

History

Repository files navigation

LLM_enhanced_ST_4ConstParsing

install

training

testing

About

Resources

License

Stars

Watchers

Forks

Languages