Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Will the corpus be openly published? #2

Closed
avacaondata opened this issue Feb 21, 2022 · 4 comments
Closed

Will the corpus be openly published? #2

avacaondata opened this issue Feb 21, 2022 · 4 comments

Comments

@avacaondata
Copy link

Given how good this model works, I'd be interested in having access to the original corpus with which it was trained. Will that be possible? I mean, is there any plan for uploading it to huggingface/datasets or publish it in any other form?

Thank you very much in advance :)

@finiteautomata
Copy link
Collaborator

@alexvaca0 Thanks for your interest! We will be publishing the original tweets soon, hopefully in datasets. Leave this issue open so we let you know when they are available.

@finiteautomata
Copy link
Collaborator

finiteautomata commented Mar 6, 2022

Hi @alexvaca0. I'm having some problems regarding the original tweets -- that is, the raw tweets prior to any preprocessing and filtering. The machine which contained this data is not turning on, let's hope the disk is ok.

In the meanwhile, I have access to the preprocessed and filtered tweets (as described on the paper). If that's useful for you, send me an email and I'll give access to them.

I leave this issue open until we are able to publish the original data.

@avacaondata
Copy link
Author

Oh that would be so great, if it is still possible to have access to the tweets... thank you very much :)

@finiteautomata
Copy link
Collaborator

Well, this is quite late, but finally, the tweets were released. I could only upload half of them, but I suppose this might be enough (~300M tweets).

Check https://huggingface.co/datasets/pysentimiento/spanish-tweets

In the following days, I will be uploading the rest of them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants