New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Will the corpus be openly published? #2
Comments
@alexvaca0 Thanks for your interest! We will be publishing the original tweets soon, hopefully in |
Hi @alexvaca0. I'm having some problems regarding the original tweets -- that is, the raw tweets prior to any preprocessing and filtering. The machine which contained this data is not turning on, let's hope the disk is ok. In the meanwhile, I have access to the preprocessed and filtered tweets (as described on the paper). If that's useful for you, send me an email and I'll give access to them. I leave this issue open until we are able to publish the original data. |
Oh that would be so great, if it is still possible to have access to the tweets... thank you very much :) |
Well, this is quite late, but finally, the tweets were released. I could only upload half of them, but I suppose this might be enough (~300M tweets). Check https://huggingface.co/datasets/pysentimiento/spanish-tweets In the following days, I will be uploading the rest of them. |
Given how good this model works, I'd be interested in having access to the original corpus with which it was trained. Will that be possible? I mean, is there any plan for uploading it to huggingface/datasets or publish it in any other form?
Thank you very much in advance :)
The text was updated successfully, but these errors were encountered: