Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Word-alignment module for the back-translation task #26

Closed
farinamhz opened this issue Feb 21, 2023 · 4 comments
Closed

Adding Word-alignment module for the back-translation task #26

farinamhz opened this issue Feb 21, 2023 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@farinamhz
Copy link
Member

In this issue, we are going to provide a module that gets two datasets of the same length (each of them contains a list of texts) as the input and gives the alignments list (these alignments will be between every two texts) as the output.

Alignments of the output for the given two texts will be a list of tuples, and each tuple contains an index of a token in the first text and an index of a token in the second text that aligns with text 1.

@farinamhz
Copy link
Member Author

farinamhz commented Feb 21, 2023

Hi @hosseinfani
Please take a look at the results of word alignment.

You can see early results in this commit: link

Now, if this module is suitable, I will start to use modules of word alignment and back-translation for the data augmentation task and work on the aspect semantic comparing to see which of the augmented reviews will be helpful for us to be added to the dataset.

@hosseinfani
Copy link
Member

@farinamhz
I had a look at both the translate and alignment modules. Looks good. Please proceed with the next step, which is semantic check, right?

@farinamhz
Copy link
Member Author

farinamhz commented Feb 22, 2023

@hosseinfani
Yes, all the related works are under this issue: #27

hosseinfani added a commit that referenced this issue May 21, 2023
…anslated version. Also, we do for translated version just in case (#26)
hosseinfani added a commit that referenced this issue May 21, 2023
…anslated version.

Also, we do for translated version just in case (#26)
@hosseinfani
Copy link
Member

@farinamhz
thanks for the effort on this. I did some refactor and integrate your code into Review class in review.py:

def semalign(self, other):

Also, right after translation and backtranslation, I do the alignment on aos:

translated_obj.aos, _ = self.semalign(translated_obj)

back_translated_obj.aos, _ = self.semalign(back_translated_obj)

Please check these lines and let me know your comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants