New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Globo Dataset missing Assets (Labels) #9
Comments
Hi, However, we can not reimplement the 'acr_label_encoders.pickle' without the full text in the Glob dataset. Follow the instruction in Pre-processing data for the ACR module and download the files from Kaggle. full_text column doesn't exist in any files described in the Kaggle, such as clicks.zip, articles_metadata.csv, and articles_embeddings.pickle. So, how can we get the documents_g1.csv? Thanks |
Any answer to this? |
Hi. Sorry for the delayed response. The |
Hi,
With the provided Globo dataset, we cannot train the ACR because of the article contents could not be provided. Progressing to the NAR training, it seems that some assets are missing.
In
nar_trainer_gcom.py
the following method tries to derserialise the labels, metadata and article embeddings from a pickle file. However, this pickle file is not provided, or, more precisely, the "acr_label_encoders" are missing.Similarly, in
nar_utils.py
, this method cannot be executed because the folder ''/pickles/" does not contain 'nar_label_encoders'How can we get these labels? Or am I overlooking something?
Thanks
The text was updated successfully, but these errors were encountered: