-
Notifications
You must be signed in to change notification settings - Fork 216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add NollySenti Bitext Mining (MINERS) #915
Add NollySenti Bitext Mining (MINERS) #915
Conversation
@KennethEnevoldsen I created a new PR for NollySenti. |
I have completed the checklist @KennethEnevoldsen. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will you add points and push models results as well?
} | ||
""", | ||
n_samples={"train": 1640}, | ||
avg_character_length={"train": 4.46}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a bit short?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the avg character length per sample (previously I put the avg char per word)
Thank you for the review. I addressed them. I updated the avg number of character per sample, added the results, and points. The new languages are |
NollySenti is Nollywood movie reviews for five languages widely spoken in Nigeria (English, Hausa, Igbo, Nigerian-Pidgin, and Yoruba.
Checklist
make test
.make lint
.Adding datasets checklist
Reason for dataset addition: ...
mteb -m {model_name} -t {task_name}
command.sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
intfloat/multilingual-e5-small
self.stratified_subsampling() under dataset_transform()
make test
.make lint
.