New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to retrieve an unpublished data file. #115
Comments
Ultimately, the problem in library(dataverse)
server <- Sys.getenv("DATAVERSE_SERVER")
key <- Sys.getenv("DATAVERSE_KEY")
dataverse_search(id = "datafile_12930", type = "file", server = server, key = key)
#> 0 of 0 results retrieved
#> list() Created on 2022-02-04 by the reprex package (v2.0.1) It is worth noting that I can find the file using some keywords on the web interface. Whereas |
Hmm, because the file is in draft, I bet
@famuvie do you want to see if you can find your draft file that way with curl? You'll have to pass your API token. Docs on this at https://guides.dataverse.org/en/5.9/api/search.html @kuriwaki this might also work:
An example: https://dataverse.harvard.edu/api/search?q=entityId:3371438 (I'm not sure why I suggested |
Not sure how to pass the API token with curl, but it works with library(dataverse)
server <- Sys.getenv("DATAVERSE_SERVER")
key <- Sys.getenv("DATAVERSE_KEY")
dataverse_search(id = "datafile_12930_draft", type = "file", server = server, key = key)
#> 1 of 1 result retrieved
#> name type
#> 1 Bovine_2020_2021.tab file
#> url file_id
#> 1 https://dataverse.cirad.fr/api/access/datafile/12930 12930
#> description
#> 1 On this document, there is only the tick data collection between 2020 and 2021.\n\nSome information about variable :\n\n- Vache : Identifier of cow (10 digits)\n- Date : date of slaughterhouse visit\n- Commune : origin cow municipality\n- Eleveur : origin cow breeder\n- Troupeau : municipality and breeder\n- H_marginatus : number of *H_marginatus* collected\n- R_bursa : number of *R_bursa* collected\n- I_ricinus : number of *I_ricinus* collected\n- H_scupense : number of *H_scupense* collected\n- B_annulatus : number of *B_annulatus* collected\n- R_sanguineus : number of *R_sanguineus* collected\n- H_punctata : number of *H_punctata* collected.\n- D_marginatus : number of *D_marginatus* collected\n- Tiques ? : sum of ticks collected
#> file_type file_content_type size_in_bytes
#> 1 Tab-Delimited text/tab-separated-values 69648
#> md5 checksum.type
#> 1 688c6fc5f92e6526a3cd158854027e8b MD5
#> checksum.value unf dataset_name
#> 1 688c6fc5f92e6526a3cd158854027e8b UNF:6:lt7ZJ1diuShhMCd8UWq5zQ== Bovine
#> dataset_id dataset_persistent_id
#> 1 12928 doi:10.18167/DVN1/8Z1ZI9
#> dataset_citation
#> 1 Bartholomee, Colombine, 2022, "Bovine", https://doi.org/10.18167/DVN1/8Z1ZI9, CIRAD Dataverse, DRAFT VERSION, UNF:6:ov2odYXNktsIbiuwc2MDJQ== [fileUNF] Created on 2022-02-04 by the reprex package (v2.0.1) |
You can pass it as a header or a query parameter. Please see https://guides.dataverse.org/en/5.9/api/auth.html |
Sorry, I made a mistake in the previous example and have just corrected it. It actually works! |
Still, I can't find a hacky way for adding "_draft" to the file id. I guess that needs to be fixed in the package :) |
Thanks @famuvie for creating an issue. A partial fix is now on dev. I created a test dataset on demo dataverse that is intentionally unpublished. The get commands seem to go ok except for my unpublished test file does not have a Proper UNF detection becomes necessary since that's how it currently determines if a file is ingested or not.
|
Huh. This is news to me but I see what you mean. No UNF from the Search API when I look at your unpublished file...
... but when I look at a published file (different server but shouldn't matter), I do see a UNF:
Perhaps we don't reindex the file after ingest is complete? I'm not sure. You could test this by making a change to your draft dataset metadata (add a keyword or something). This will reindex the dataaset and its files. |
Yes! It was sufficient to add a data description to the draft dataset, and it somehow updated. Thank you. |
@kuriwaki hmm, I can replicate this on "develop" on my laptop (around 0d853b74e9). When I first upload a file to a draft, the UNF does not appear in search results...
... but if I edit the metadata of the draft dataset (forcing the file to be reindexed, the UNF appears):
Please feel free to open an issue about this at https://github.com/IQSS/dataverse/issues if you'd like. |
I will put a tip about this in the dataverse download vignette. I think it is a limitation that might be common to people who try to download draft datasets, but the current method to edit something seems not too onerous. |
Addressed by 0.3.11. |
Please specify whether your issue is about:
I'm having an issue while trying to access an unpublished data file, which requires an API token. Unfortunately, this makes the following code not reproducible. Let me know if there is a way of making a reproducible example in this case.
The problem is that I cannot use any of the
get_dataframe_by_*
functions, due to an issue withis_ingested()
which seems unable to find the target file.However, if I work around
is_ingested()
I can retrieve the data as is shown in the example below.The text was updated successfully, but these errors were encountered: