Will the corpus be openly published? #2

avacaondata · 2022-02-21T11:08:12Z

Given how good this model works, I'd be interested in having access to the original corpus with which it was trained. Will that be possible? I mean, is there any plan for uploading it to huggingface/datasets or publish it in any other form?

Thank you very much in advance :)

finiteautomata · 2022-02-21T11:45:41Z

@alexvaca0 Thanks for your interest! We will be publishing the original tweets soon, hopefully in datasets. Leave this issue open so we let you know when they are available.

finiteautomata · 2022-03-06T21:34:21Z

Hi @alexvaca0. I'm having some problems regarding the original tweets -- that is, the raw tweets prior to any preprocessing and filtering. The machine which contained this data is not turning on, let's hope the disk is ok.

In the meanwhile, I have access to the preprocessed and filtered tweets (as described on the paper). If that's useful for you, send me an email and I'll give access to them.

I leave this issue open until we are able to publish the original data.

avacaondata · 2022-07-01T10:59:15Z

Oh that would be so great, if it is still possible to have access to the tweets... thank you very much :)

finiteautomata · 2022-11-30T23:23:28Z

Well, this is quite late, but finally, the tweets were released. I could only upload half of them, but I suppose this might be enough (~300M tweets).

Check https://huggingface.co/datasets/pysentimiento/spanish-tweets

In the following days, I will be uploading the rest of them.

finiteautomata closed this as completed Nov 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Will the corpus be openly published? #2

Will the corpus be openly published? #2

avacaondata commented Feb 21, 2022

finiteautomata commented Feb 21, 2022

finiteautomata commented Mar 6, 2022 •

edited

avacaondata commented Jul 1, 2022

finiteautomata commented Nov 30, 2022

Will the corpus be openly published? #2

Will the corpus be openly published? #2

Comments

avacaondata commented Feb 21, 2022

finiteautomata commented Feb 21, 2022

finiteautomata commented Mar 6, 2022 • edited

avacaondata commented Jul 1, 2022

finiteautomata commented Nov 30, 2022

finiteautomata commented Mar 6, 2022 •

edited