More details about running the code #2

IreneZihuiLi · 2021-09-27T23:46:53Z

Hi,

Thanks for sharing the implementation. But I was not able to run finetune_sinkhorn_hepos.sh because I am not sure about the dataset format and also some parameters (i.e.,DATA_DIR and ARCH).
Is it possible to provide more details on running the code?

Thanks.

The text was updated successfully, but these errors were encountered:

valgi0 · 2021-11-24T09:54:39Z

Hi there,
I am struggling with dataset too.
Did you manage to reproduce the results?
Can you kindly share some hints?

Thank you

KMFODA · 2021-12-21T10:55:24Z

same here it seems that the dict.source.txt is missing from the dataset on google drive

pkuzengqi · 2022-01-25T05:12:18Z

I found the data needs to be processed. I post my preprocess shell file here. After processing the training scripts should be runable.

#!/bin/sh

# bart tutorial: https://github.com/pytorch/fairseq/blob/v0.10.2/examples/bart/README.summarization.md
# raw data download from: https://drive.google.com/drive/folders/128KyqPTwZ0Si9RV_IX-md2dcHeRTUHkr

FAIRSEQDIR=/path/to/fairseq
GPT2DIR=/path/to/gpt2_bpe
RAWDATA=/path/to/gov_report_fairseq_format
BARTDIR=/path/to/bart.base

mkdir $GPT2DIR
cd $GPT2DIR
wget -N 'https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/encoder.json'
wget -N 'https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/vocab.bpe'
wget -N 'https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/dict.txt'


for SPLIT in val test train
do
  for LANG in source target
  do
    python  $FAIRSEQDIR/examples/roberta/multiprocessing_bpe_encoder.py \
    --encoder-json $GPT2DIR/encoder.json \
    --vocab-bpe $GPT2DIR/vocab.bpe \
    --inputs "$RAWDATA/$SPLIT.$LANG" \
    --outputs "$RAWDATA/$SPLIT.bpe.$LANG" \
    --workers 60 \
    --keep-empty;
  done
done


fairseq-preprocess \
  --source-lang "source" \
  --target-lang "target" \
  --trainpref "$RAWDATA/train.bpe" \
  --validpref "$RAWDATA/val.bpe" \
  --testpref "$RAWDATA/test.bpe" \
  --destdir "$RAWDATA/processed" \
  --workers 60 \
  --srcdict $BARTDIR/dict.txt \
  --tgtdict $BARTDIR/dict.txt;

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More details about running the code #2

More details about running the code #2

IreneZihuiLi commented Sep 27, 2021

valgi0 commented Nov 24, 2021

KMFODA commented Dec 21, 2021

pkuzengqi commented Jan 25, 2022 •

edited

Loading

More details about running the code #2

More details about running the code #2

Comments

IreneZihuiLi commented Sep 27, 2021

valgi0 commented Nov 24, 2021

KMFODA commented Dec 21, 2021

pkuzengqi commented Jan 25, 2022 • edited Loading

pkuzengqi commented Jan 25, 2022 •

edited

Loading