Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Punjabi Gurmukhi title marked as German #11

Open
bgo-eiu opened this issue Dec 25, 2022 · 3 comments
Open

Punjabi Gurmukhi title marked as German #11

bgo-eiu opened this issue Dec 25, 2022 · 3 comments
Labels
bug Something isn't working language Issues related to language detection

Comments

@bgo-eiu
Copy link

bgo-eiu commented Dec 25, 2022

Article addrd at: https://www.wikidata.org/wiki/Q115863887

The code automatically selected for the monolingual text title was German. Granted, it is possible this is a problem with the way the source data is marked up

@rdmpage rdmpage added bug Something isn't working language Issues related to language detection labels Dec 26, 2022
@rdmpage
Copy link
Owner

rdmpage commented Dec 26, 2022

@bgo-elu Language detection is not handled well, I need to do some work on this to improve its accuracy.

@bgo-eiu
Copy link
Author

bgo-eiu commented Feb 8, 2023

@rdmpage Would it be possible for the tool to have a few hard-coded language-to-journal matches for where accurate detection is unlikely? For example, I am interested to create items for quite a few articles published by this Brahui language journal like https://doi.org/10.54781/abz.v7i1.155

At the moment, the title gets detected as Arabic, and the label gets placed in "en" rather than "brh" since brh is not in the list - at least for cases like this, where the number of Brahui research journals is likely quite small to begin with, it might be simpler just to have this journal's DOI prefix associated with the language code. I suppose if you wanted to implement this systematically, it could be done with a query for existing items with both DOIs and a value for P407 "language of work or name" that is not among the most frequently published languages.

@rdmpage
Copy link
Owner

rdmpage commented Feb 10, 2023

@bgo-eiu Interesting idea, I'll need to check what languages my code can detect. The idea of being able to set the default language makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working language Issues related to language detection
Projects
None yet
Development

No branches or pull requests

2 participants