New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix marian tokenizer save pretrained #5043
Conversation
Codecov Report
@@ Coverage Diff @@
## master #5043 +/- ##
=======================================
Coverage 77.36% 77.37%
=======================================
Files 130 130
Lines 21989 21990 +1
=======================================
+ Hits 17012 17014 +2
+ Misses 4977 4976 -1
Continue to review full report at Codecov.
|
def test_tokenizer_equivalence_en_de(self): | ||
en_de_tokenizer = MarianTokenizer.from_pretrained(f"{ORG_NAME}opus-mt-en-de") | ||
batch = en_de_tokenizer.prepare_translation_batch(["I am a small frog"], return_tensors=None) | ||
self.assertIsInstance(batch, BatchEncoding) | ||
expected = [38, 121, 14, 697, 38848, 0] | ||
self.assertListEqual(expected, batch.input_ids[0]) | ||
|
||
save_dir = tempfile.mkdtemp() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit I guess, but I like the context manager approach better:
with tempfile.TemporaryDirectory() as tmp_dir:
....
@@ -60,10 +60,15 @@ def get_input_output_texts(self, tokenizer): | |||
"This is a test", | |||
) | |||
|
|||
@slow | |||
def test_tokenizer_equivalence_en_de(self): | |||
en_de_tokenizer = MarianTokenizer.from_pretrained(f"{ORG_NAME}opus-mt-en-de") | |||
batch = en_de_tokenizer.prepare_translation_batch(["I am a small frog"], return_tensors=None) | |||
self.assertIsInstance(batch, BatchEncoding) | |||
expected = [38, 121, 14, 697, 38848, 0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to write the expected result as a comment for better readability
No description provided.