Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CodeCamp2023-473] Add a text translation example #1283

Merged
merged 2 commits into from
Aug 7, 2023

Conversation

Desjajja
Copy link
Contributor

@Desjajja Desjajja commented Aug 1, 2023

Motivation

One of Openmmlab codecamp task.

Modification

Added a example script training a t5-small model on opus books, a multi-language text translation dataset. Default task is to translate English text to French, which is configurable from line 67 to 69, and line 92.

Checklist

  • Pre-commit or other linting tools are used to fix the potential lint issues.

No other code is altered, hence I didn't go through unit tests or add docs.

@CLAassistant
Copy link

CLAassistant commented Aug 1, 2023

CLA assistant check
All committers have signed the CLA.

@Desjajja Desjajja changed the title [Enhancement] Add a text translation example [Enhance] Add a text translation example Aug 1, 2023
@zhouzaida zhouzaida changed the title [Enhance] Add a text translation example [CodeCamp2023-473] Add a text translation example Aug 1, 2023
@OpenMMLab-Assistant-004
Copy link

Hi @Desjajja,

We'd like to express our appreciation for your valuable contributions to the mmengine. Your efforts have significantly aided in enhancing the project's quality.
It is our pleasure to invite you to join our community thorugh Discord_Special Interest Group (SIG) channel. This is a great place to share your experiences, discuss ideas, and connect with other like-minded people. To become a part of the SIG channel, send a message to the moderator, OpenMMLab, briefly introduce yourself and mention your open-source contributions in the #introductions channel. Our team will gladly facilitate your entry. We eagerly await your presence. Please follow this link to join us: ​https://discord.gg/UjgXkPWNqA.

If you're on WeChat, we'd also love for you to join our community there. Just add our assistant using the WeChat ID: openmmlabwx. When sending the friend request, remember to include the remark "mmsig + Github ID".

Thanks again for your awesome contribution, and we're excited to have you as part of our community!

@zhouzaida
Copy link
Member

Hi @Desjajja , thanks for your contribution. Could you provide the training logs?

@Desjajja
Copy link
Contributor Author

Desjajja commented Aug 3, 2023

Hi @Desjajja , thanks for your contribution. Could you provide the training logs?

Sure.

20230731_081824.log

@zhouzaida
Copy link
Member

Hi, does the bleu_score: 0.0486 in the log match the expected result?

@Desjajja
Copy link
Contributor Author

Desjajja commented Aug 4, 2023

Hi, does the bleu_score: 0.0486 in the log match the expected result?

No, I expected it to be at least 0.4 or something. However, since bleu score is intended for corpus-level tasks and I computed the metric between sentence pairs, it is reasonable to witness this outcome.

decoded_preds = [pred.split() for pred in preds]   # translation text split into sequence of tokens
decoded_labels = [[label.split()] for label in labels] # reference corpus where only one sentence is provided
score = bleu_score(decoded_preds, decoded_labels) # hence the score could be quite low

Maybe the decrease in loss shows a more promising result. And I am planning to add sentence-level bleu if available.

@zhouzaida
Copy link
Member

Got it.

@codecov
Copy link

codecov bot commented Aug 7, 2023

Codecov Report

❗ No coverage uploaded for pull request base (main@d480df7). Click here to learn what that means.
Patch has no changes to coverable lines.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1283   +/-   ##
=======================================
  Coverage        ?   71.38%           
=======================================
  Files           ?      152           
  Lines           ?    13673           
  Branches        ?     2842           
=======================================
  Hits            ?     9761           
  Misses          ?     3464           
  Partials        ?      448           
Flag Coverage Δ
unittests 71.38% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@zhouzaida zhouzaida merged commit 398d229 into open-mmlab:main Aug 7, 2023
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants