[CodeCamp2023-473] Add a text translation example #1283

Desjajja · 2023-08-01T06:56:21Z

Motivation

One of Openmmlab codecamp task.

Modification

Added a example script training a t5-small model on opus books, a multi-language text translation dataset. Default task is to translate English text to French, which is configurable from line 67 to 69, and line 92.

Checklist

Pre-commit or other linting tools are used to fix the potential lint issues.

No other code is altered, hence I didn't go through unit tests or add docs.

CLAassistant · 2023-08-01T06:56:26Z

All committers have signed the CLA.

OpenMMLab-Assistant-004 · 2023-08-02T07:02:27Z

Hi @Desjajja,

We'd like to express our appreciation for your valuable contributions to the mmengine. Your efforts have significantly aided in enhancing the project's quality.
It is our pleasure to invite you to join our community thorugh Discord_Special Interest Group (SIG) channel. This is a great place to share your experiences, discuss ideas, and connect with other like-minded people. To become a part of the SIG channel, send a message to the moderator, OpenMMLab, briefly introduce yourself and mention your open-source contributions in the #introductions channel. Our team will gladly facilitate your entry. We eagerly await your presence. Please follow this link to join us: https://discord.gg/UjgXkPWNqA.

If you're on WeChat, we'd also love for you to join our community there. Just add our assistant using the WeChat ID: openmmlabwx. When sending the friend request, remember to include the remark "mmsig + Github ID".

Thanks again for your awesome contribution, and we're excited to have you as part of our community!

zhouzaida · 2023-08-02T11:00:36Z

Hi @Desjajja , thanks for your contribution. Could you provide the training logs?

Desjajja · 2023-08-03T02:49:45Z

Hi @Desjajja , thanks for your contribution. Could you provide the training logs?

Sure.

20230731_081824.log

zhouzaida · 2023-08-03T12:53:41Z

Hi, does the bleu_score: 0.0486 in the log match the expected result?

Desjajja · 2023-08-04T04:42:08Z

Hi, does the bleu_score: 0.0486 in the log match the expected result?

No, I expected it to be at least 0.4 or something. However, since bleu score is intended for corpus-level tasks and I computed the metric between sentence pairs, it is reasonable to witness this outcome.

decoded_preds = [pred.split() for pred in preds]   # translation text split into sequence of tokens
decoded_labels = [[label.split()] for label in labels] # reference corpus where only one sentence is provided
score = bleu_score(decoded_preds, decoded_labels) # hence the score could be quite low

Maybe the decrease in loss shows a more promising result. And I am planning to add sentence-level bleu if available.

zhouzaida · 2023-08-04T09:46:33Z

Got it.

codecov · 2023-08-07T07:29:20Z

Codecov Report

❗ No coverage uploaded for pull request base (main@d480df7). Click here to learn what that means.
Patch has no changes to coverable lines.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1283   +/-   ##
=======================================
  Coverage        ?   71.38%           
=======================================
  Files           ?      152           
  Lines           ?    13673           
  Branches        ?     2842           
=======================================
  Hits            ?     9761           
  Misses          ?     3464           
  Partials        ?      448

Flag	Coverage Δ
unittests	`71.38% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Desjajja added 2 commits August 1, 2023 14:39

Code formatted

440eb78

corrected the name of dataset

e88b724

Desjajja requested review from zhouzaida and HAOCHENYE as code owners August 1, 2023 06:56

Desjajja changed the title ~~[Enhancement] Add a text translation example~~ [Enhance] Add a text translation example Aug 1, 2023

zhouzaida changed the title ~~[Enhance] Add a text translation example~~ [CodeCamp2023-473] Add a text translation example Aug 1, 2023

zhouzaida approved these changes Aug 7, 2023

View reviewed changes

zhouzaida merged commit 398d229 into open-mmlab:main Aug 7, 2023
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CodeCamp2023-473] Add a text translation example #1283

[CodeCamp2023-473] Add a text translation example #1283

Desjajja commented Aug 1, 2023

CLAassistant commented Aug 1, 2023 •

edited

Loading

OpenMMLab-Assistant-004 commented Aug 2, 2023

zhouzaida commented Aug 2, 2023

Desjajja commented Aug 3, 2023

zhouzaida commented Aug 3, 2023

Desjajja commented Aug 4, 2023

zhouzaida commented Aug 4, 2023

codecov bot commented Aug 7, 2023

[CodeCamp2023-473] Add a text translation example #1283

[CodeCamp2023-473] Add a text translation example #1283

Conversation

Desjajja commented Aug 1, 2023

Motivation

Modification

Checklist

CLAassistant commented Aug 1, 2023 • edited Loading

OpenMMLab-Assistant-004 commented Aug 2, 2023

zhouzaida commented Aug 2, 2023

Desjajja commented Aug 3, 2023

zhouzaida commented Aug 3, 2023

Desjajja commented Aug 4, 2023

zhouzaida commented Aug 4, 2023

codecov bot commented Aug 7, 2023

Codecov Report

CLAassistant commented Aug 1, 2023 •

edited

Loading