Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open Source MLM Implementation in Fairseq #635

Closed

Conversation

kartikayk
Copy link
Contributor

Summary: Adding a task and relevant models, datasets and criteria needed for training Cross-lingual Language Models similar to Masked Language Model used in XLM (Lample and Conneau, 2019 - https://arxiv.org/abs/1901.07291).

Differential Revision: D14943776

Summary:
Pull Request resolved: facebookresearch#635

Adding a task and relevant models, datasets and criteria needed for training Cross-lingual Language Models similar to Masked Language Model used in XLM (Lample and Conneau, 2019 - https://arxiv.org/abs/1901.07291).

Reviewed By: liezl200

Differential Revision: D14943776

fbshipit-source-id: 9835d82e9741c2ff9091f24cdbe4bb4be654c5a5
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 8776928.

@hanyh
Copy link

hanyh commented Apr 17, 2019

@kartikayk After this PR, i get this error, It seems some files are missing from this?

from fairseq.data.masked_lm_dataset import MaskedLMDataset
fairseq/data/masked_lm_dataset.py", line 18, in
from fairseq.data.fb_block_pair_dataset import BlockPairDataset
ModuleNotFoundError: No module named 'fairseq.data.fb_block_pair_dataset'

@kartikayk
Copy link
Contributor Author

@kartikayk After this PR, i get this error, It seems some files are missing from this?

from fairseq.data.masked_lm_dataset import MaskedLMDataset
fairseq/data/masked_lm_dataset.py", line 18, in
from fairseq.data.fb_block_pair_dataset import BlockPairDataset
ModuleNotFoundError: No module named 'fairseq.data.fb_block_pair_dataset'

@hanyh I'm working on fixing this right now. Will send out an update soon. Sorry for the inconvenience!

@stefan-it
Copy link
Contributor

@kartikayk Thanks for that implementation :+1 I have one question: could you also provide a kind of example that shows a) to load a trained model and b) that returns embeddings for each subtoken in a given sentence from that model? That would really help me :)

yzpang pushed a commit to yzpang/gold-off-policy-text-gen-iclr21 that referenced this pull request Feb 19, 2021
Summary:
Pull Request resolved: facebookresearch/fairseq#635

Adding a task and relevant models, datasets and criteria needed for training Cross-lingual Language Models similar to Masked Language Model used in XLM (Lample and Conneau, 2019 - https://arxiv.org/abs/1901.07291).

Reviewed By: liezl200

Differential Revision: D14943776

fbshipit-source-id: 3e416a730303d1dd4f5b92550c78db989be27073
yzpang pushed a commit to yzpang/gold-off-policy-text-gen-iclr21 that referenced this pull request Feb 19, 2021
Summary:
Pull Request resolved: facebookresearch/fairseq#635

Adding a task and relevant models, datasets and criteria needed for training Cross-lingual Language Models similar to Masked Language Model used in XLM (Lample and Conneau, 2019 - https://arxiv.org/abs/1901.07291).

Reviewed By: liezl200

Differential Revision: D14943776

fbshipit-source-id: 3e416a730303d1dd4f5b92550c78db989be27073
yfyeung pushed a commit to yfyeung/fairseq that referenced this pull request Dec 6, 2023
Add the missing step to add the arguments to the parser.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants