Adding Word-alignment module for the back-translation task #26

farinamhz · 2023-02-21T02:11:27Z

In this issue, we are going to provide a module that gets two datasets of the same length (each of them contains a list of texts) as the input and gives the alignments list (these alignments will be between every two texts) as the output.

Alignments of the output for the given two texts will be a list of tuples, and each tuple contains an index of a token in the first text and an index of a token in the second text that aligns with text 1.

farinamhz · 2023-02-21T03:09:24Z

Hi @hosseinfani
Please take a look at the results of word alignment.

You can see early results in this commit: link

Now, if this module is suitable, I will start to use modules of word alignment and back-translation for the data augmentation task and work on the aspect semantic comparing to see which of the augmented reviews will be helpful for us to be added to the dataset.

hosseinfani · 2023-02-21T13:35:56Z

@farinamhz
I had a look at both the translate and alignment modules. Looks good. Please proceed with the next step, which is semantic check, right?

farinamhz · 2023-02-22T07:44:16Z

@hosseinfani
Yes, all the related works are under this issue: #27

…) (#27)

… method (#26) (#29)

…anslated version. Also, we do for translated version just in case (#26)

hosseinfani · 2023-05-21T20:24:00Z

@farinamhz
thanks for the effort on this. I did some refactor and integrate your code into Review class in review.py:

LADy/src/cmn/review.py

Line 83 in eeeb48b

def semalign(self, other):

Also, right after translation and backtranslation, I do the alignment on aos:

LADy/src/cmn/review.py

Line 66 in eeeb48b

translated_obj.aos, _ = self.semalign(translated_obj)

LADy/src/cmn/review.py

Line 70 in eeeb48b

back_translated_obj.aos, _ = self.semalign(back_translated_obj)

Please check these lines and let me know your comments.

farinamhz added a commit that referenced this issue Feb 21, 2023

Word-alignment function added with the early results (#26)

e6b1ef0

farinamhz added a commit that referenced this issue Feb 21, 2023

Clean code on Word-alignment function (#26)

6c0ef07

farinamhz self-assigned this Feb 21, 2023

farinamhz added the enhancement New feature or request label Feb 21, 2023

farinamhz added a commit that referenced this issue Feb 21, 2023

renaming the directory of data augmentation modules (#24) (#26)

903ef22

farinamhz mentioned this issue Feb 21, 2023

LADy's Roadmap for Extension (Epic) #23

Open

32 tasks

farinamhz added a commit that referenced this issue Feb 22, 2023

assertion on the same length of inputs (#26) (#27)

17fbaaa

farinamhz added a commit that referenced this issue Mar 10, 2023

Similarity plots for different languages created and results added (#26…

17b1afa

…) (#27)

farinamhz added a commit that referenced this issue Mar 11, 2023

Changed results of sample alignments due to changing the tokenization…

aa3e9cf

… method (#26) (#29)

hosseinfani added a commit that referenced this issue May 21, 2023

word alignment after backtranslation to fix the (a,o,s) of the backtr…

eeeb48b

…anslated version. Also, we do for translated version just in case (#26)

hosseinfani added a commit that referenced this issue May 21, 2023

word alignment after backtranslation to fix the (a,o,s) of the backtr…

fc5d49e

…anslated version. Also, we do for translated version just in case (#26)

hosseinfani closed this as completed May 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding Word-alignment module for the back-translation task #26

Adding Word-alignment module for the back-translation task #26

farinamhz commented Feb 21, 2023

farinamhz commented Feb 21, 2023 •

edited

Loading

hosseinfani commented Feb 21, 2023

farinamhz commented Feb 22, 2023 •

edited

Loading

hosseinfani commented May 21, 2023

Adding Word-alignment module for the back-translation task #26

Adding Word-alignment module for the back-translation task #26

Comments

farinamhz commented Feb 21, 2023

farinamhz commented Feb 21, 2023 • edited Loading

hosseinfani commented Feb 21, 2023

farinamhz commented Feb 22, 2023 • edited Loading

hosseinfani commented May 21, 2023

farinamhz commented Feb 21, 2023 •

edited

Loading

farinamhz commented Feb 22, 2023 •

edited

Loading