Morality is Non-Binary: Building a Pluralist Moral Sentence Embedding Space using Contrastive Learning
Code for the following EACL2024 paper:
Jeongwoo Park, Enrico Liscio, and Pradeep K. Murukannaiah. 2024. Morality is Non-Binary: Building a Pluralist Moral Sentence Embedding Space using Contrastive Learning. In Findings of the Association for Computational Linguistics: EACL 2024, St. Julian's, Malta. Association for Computational Linguistics.
The code is tested in Python 3.9 (pip=21.1.1) Conda environment.
Please download the MFTC dataset to the data
folder. The trained models used in the paper are available on TU.ResearchData, or can be generated as follows.
- Generate dataset via data folder (Check README in data folder).
- Create SimCSE embeddings. (Check SimCSE README. I recommend pulling their repository and follow their instruction).
- Check the output using
finetune/classify.py
. Make sure to input appropriate hyperparameters.
Due to environment issues in HPC cluster, I recommend the following approach.
- Create a conda environment.
- Install pip 21.1.1.
- Follow README of transformers.
- Install
sentencepiece
.
Details about SimCSE parameters, setup environments can be found in https://github.com/princeton-nlp/SimCSE.
Make sure to run simcse_to_huggingface.py
after creating the embedding space.
python train.py --model_name_or_path princeton-nlp/sup-simcse-bert-large-uncased --train_file data/MFTC_supervised.csv --output_dir result/large-lr5e-05-ep2-seq64-batch32-temp0.1 --num_train_epochs 2 --per_device_train_batch_size 32 --learning_rate 5e-05 --max_seq_length 64 --pad_to_max_length --metric_for_best_model stsb_spearman --load_best_model_at_end --pooler_type cls --overwrite_output_dir --temp 0.1 --do_train
And then change the SimCSE format to huggingface format.
python simcse_to_huggingface.py --path result/large-lr5e-05-ep2-seq64-batch32-temp0.1