Skip to content

Gorov/personet_acl23

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Personality Understanding of Fictional Characters during Book Reading

requirements

Data Format

English Data

personet_data

Chinese Data

Only the URLs of Chinese books will be provided due to license issues. In this way, the aligned sentence mapping between English books and Chinese books will be uploaded so that you can get the Chinese data yourself.

Training Longformer and MultiRow BERT

pending...

Efficient Training LLaMA with LoRA

train (N x A100 GPUs of 40G):

python -m torch.distributed.launch --nproc_per_node 8 --nnodes=1 --node_rank=0 finetune.py \
    --micro_batch_size 4 \
    --batch_size 128 \
    --output_dir 'personet_model_save/ddp_8gpus (your own path)' \
    --lora_r 8 \
    --lora_alpha 16 \
    --lora_target_modules '[q_proj,k_proj,v_proj,o_proj]' \
    --num_epochs 5 \
    --learning_rate 1e-4 \
    --warmup_steps 170 \
    --cutoff_len 1000 \
    --eval_steps 340

generate on dev/test data (dev: full_dev_data.json; test: full_test_data.json; 1 A100 GPU of 40G):

python generate_new.py \
    --load_8bit \
    --lora_r 8 \
    --lora_alpha 16 \
    --lora_target_modules '[q_proj,k_proj,v_proj,o_proj]' \
    --lora_weights '(your own path)' \
    --eval_data_path 'full_test_data.json'

Final Words

If you want to obtain longer history for an instance, it can be achieved by matching the given context to the original book texts from PG19 or the Gutenberg project directly.

For any questions, feel free to email us or create an issue and we will get back to you as soon as possible. Hope this repo is useful to your research.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages