Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

Tokenization issue in to-En bilingual dictionaries #182

Open
kellymarchisio opened this issue Jan 20, 2021 · 0 comments
Open

Tokenization issue in to-En bilingual dictionaries #182

kellymarchisio opened this issue Jan 20, 2021 · 0 comments

Comments

@kellymarchisio
Copy link

kellymarchisio commented Jan 20, 2021

Hi all -- fyi, there appears to be a tokenization issue in the *-to-En bilingual dictionaries. We commonly see word, -- where the comma wasn't tokenized away. I see this in de-en, fi-en, it-en, and ru-en, at least.

@kellymarchisio kellymarchisio changed the title Small tokenization issue in to-En bilingual dictionaries Tokenization issue in to-En bilingual dictionaries Jan 20, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant