Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raw source for fairseq2n code #370

Open
natgillin opened this issue Mar 6, 2024 · 4 comments
Open

Raw source for fairseq2n code #370

natgillin opened this issue Mar 6, 2024 · 4 comments
Labels
question Further information is requested

Comments

@natgillin
Copy link

natgillin commented Mar 6, 2024

There's a few points in the codebase that imports from fairseq2n code, is there a pointer to the raw source to those?

E.g.

  • from fairseq2n.bindings.data.text.sentence import ...
  • from fairseq2n import DOC_MODE
  • from fairseq2n.bindings.data.text.text_reader import ...
@natgillin natgillin added the question Further information is requested label Mar 6, 2024
@cbalioglu
Copy link
Contributor

Hey @natgillin, you can find the fairseq2n (fairseq2 Native) source code under https://github.com/facebookresearch/fairseq2/tree/main/native.

@natgillin
Copy link
Author

Thanks @cbalioglu for the pointer!

Regarding the fairseq2n.bindings, is there a way to avoid them? I see that they are mainly used in https://github.com/facebookresearch/fairseq2/blob/main/src/fairseq2/data/text/sentencepiece.py

Is there a way to load directly from https://github.com/google/sentencepiece?tab=readme-ov-file#overview instead of

load_basic_sentencepiece_tokenizer = StandardTextTokenizerLoader(
    default_asset_store,
    default_download_manager,
    lambda path, _: BasicSentencePieceTokenizer(path),
)

from https://github.com/facebookresearch/fairseq2/blob/main/src/fairseq2/data/text/sentencepiece.py#L225C1-L229C2?

@cbalioglu
Copy link
Contributor

Hey @natgillin, I might be more helpful if you can tell me what you want to achieve. The sentencepiece implementation in fairseq2 is fully compatible with Google's sentencepiece. In fact, in fairseq2n we use the native API of sentencepiece.

@natgillin
Copy link
Author

natgillin commented Mar 15, 2024

Thanks for the explanation!

We're trying to merge as much code as possible from fairseq2 to our own fairseq fork since we're not sure if that'll eventually be the case on the public repository. We are having some of the decoder-only models into fairseq works in by copying some of the code blocks in fairseq2 , but we found some dependencies of the fairseq.data.text relying on fairseq2n.

Removing the fairseq2n dependency would have essentially allow us to backport some fairseq2 features/models support to fairseq.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants