Skip to content

Commit

Permalink
Update mOSCAR.md
Browse files Browse the repository at this point in the history
  • Loading branch information
MatthieuFP committed Jun 17, 2024
1 parent 31b8517 commit c89f28d
Showing 1 changed file with 13 additions and 1 deletion.
14 changes: 13 additions & 1 deletion docs/versions/mOSCAR.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,9 @@ mOSCAR is a large-scale multilingual and multimodal document corpus crawled from

Access to the mOSCAR is granted via the [Hugging Face Hub](https://huggingface.co/datasets/oscar-corpus/mOSCAR).

All data is avaialble at [https://huggingface.co/datasets/oscar-corpus/mOSCAR](https://huggingface.co/datasets/oscar-corpus/mOSCAR).
All data is available at [https://huggingface.co/datasets/oscar-corpus/mOSCAR](https://huggingface.co/datasets/oscar-corpus/mOSCAR).

Paper link: [https://arxiv.org/abs/2406.08707](https://arxiv.org/abs/2406.08707)

## Layout

Expand Down Expand Up @@ -199,3 +201,13 @@ These data are released under this licensing scheme:
- We license the actual packaging of these data under the Creative Commons CC BY 4.0 license.
- To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR.
- This work is published from: France.

## Citation
```
@article{futeral2024moscar,
title={mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus},
author={Futeral, Matthieu and Zebaze, Armel and Suarez, Pedro Ortiz and Abadji, Julien and Lacroix, R{\'e}mi and Schmid, Cordelia and Bawden, Rachel and Sagot, Beno{\^\i}t},
journal={arXiv preprint arXiv:2406.08707},
year={2024}
}
```

0 comments on commit c89f28d

Please sign in to comment.