Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

People to talk to for all questions #6

Closed
eklem opened this issue Dec 18, 2021 · 2 comments
Closed

People to talk to for all questions #6

eklem opened this issue Dec 18, 2021 · 2 comments
Assignees
Labels
help wanted Extra attention is needed

Comments

@eklem
Copy link
Owner

eklem commented Dec 18, 2021

  • How much overlap is there between the different Sami languages? What do they consst of.
  • Big enough difference for them to have their own stopword list?
  • What are the three Sami languages at https://www.nrk.no/sapmi
  • Sources of text to generate stopwords list from?
  • Other people I should talk to?
@eklem eklem added the help wanted Extra attention is needed label Dec 18, 2021
@eklem eklem self-assigned this Dec 18, 2021
@eklem
Copy link
Owner Author

eklem commented Dec 18, 2021

Closest Sami language center.

@eklem
Copy link
Owner Author

eklem commented Dec 30, 2021

Difference:
Big enough differences: Yes. North- and Lule Sami has some overlap

Three Sami languages at https://www.nrk.no/sapmi

  • South Samin
  • Lule Sami
  • North Sami
    Trying these news bulletins as source for stopword lists.

Traits
At least North Sami has a lot of prepositions and cases built into the words. This means there are a lot of different versions of words. Since the stopword list won't have any logic like stemming or lemmatization to it, we need to add all versions of words. But mostly this affects non-stopwords which means it may be easier to identify stopwords if they don't have a lot of different variants/versions.

@eklem eklem closed this as completed Jun 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant