Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error of available_datasets() when calling get_datasets() #115

Closed
camillebelmin opened this issue Apr 22, 2021 · 10 comments · Fixed by #116
Closed

Error of available_datasets() when calling get_datasets() #115

camillebelmin opened this issue Apr 22, 2021 · 10 comments · Fixed by #116

Comments

@camillebelmin
Copy link

Hi, First of all thank you so much for this wonderful package, it has been very useful in my research.
I have a question similar to this issue: malaria-atlas-project/malariaAtlas#30, but I could not solve my problem there.

When I call:

get_datasets("EGIR4ASV.rds")

I get the following error:

Logging into DHS website...
Error in names(filedatatypelist_DHS) <- paste0("filedatatypelist_", qdapRegex::rm_between(filedatatypelist_DHS_line,  : 
  'names' attribute [1] must be the same length as the vector [0]

The error apparently comes from the function available_datasets(). The issue I mentioned above has the same error, and @OJWatson provided some guidance that I followed (see the message copied below). In my case, I do have access to the file I am requesting, and I can see well the download manager on the DHS webiste. I have tried to debug and reached the "y". In my case "y" is a very long string looking whose first lines look like:

  [1] "<!DOCTYPE html> <html lang=\"en\"> <!-- Content Copyright Macro International
   [2] "<!-- Page generated 2021-04-21 16:19:52 on server 1 by CommonSpot Build 10.6.0.30 (2019-10-04 12:35:29) -->"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                
   [3] "<!-- JavaScript & DHTML Code Copyright &copy; 1998-2019, PaperThin, Inc. All Rights Reserved. --> <head
   [4] "<meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\" />"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
   [5] "<meta name=\"Description\" id=\"Description\" content=\"Download Datasets
   [6] "<meta name=\"Generator\" id=\"Generator\" content=\"CommonSpot Build
   [7] "<title>The DHS Program - Download Datasets</title> <style id=\"cs_antiClickjack\">body{display:none !important;position:absolute !important;top:-5000px !important;}</style><script type=\"text/javascript\">(function(){var chk=0;try{if(self!==top){var ts=top.document.location.href.split('/');var ws=window.document.location.href.split('/');if(ts.length<3||ws.length<3)chk=1;else if(ts[2]!==ws[2])chk=2;else if(ts[0]!==ws[0])chk=3;}}catch(e){chk=4;}if(chk===0){var stb=document.getElementById(\"cs_antiClickjack\");stb.parentNode.removeChild(stb);}else{top.location = self.location}})();</script> <script>"
   [8] "var jsDlgLoader = '/data/dataset_admin/loader.cfm';"                                                           

But I am a bit clueless on what to do now. @OJWatson Does that help you in understanding what is going on? do you need the whole string?

Many thanks

Answer from @OJWatson on on Feb 15, 2019 on this issue: malaria-atlas-project/malariaAtlas#30

Hmm okay, so it seems to be erroring at the stage where rdhs goes to the Download Manager tab. A couple of things to try:

1. With the login account that you have could you try logging in to the DHS website and then click on the Download Manager tab. This should take you to a page that looks something like this. Do you get this page?:
   ![image](https://user-images.githubusercontent.com/15249565/52851968-4990af00-310f-11e9-9edc-768780e92a25.png).

2. If yes then you may need to give me a bit more information. Before running `get_datasets(dats)` could you debug the following `debug(rdhs:::available_datasets)`. Then as you step through you'll reach the following lines:
  # Grab the content from that and start creation for last post request
  writeBin(z$content, tf)
  # load the text
  y <- readLines(tf, warn = FALSE)

Could you dump and upload what y looks like here. This should be the Download Manager web page, from which I grab all the selectable download options before making another POST request to create the url with all the download links available for your account. In grabbing the selectable options the error is thrown due to not finding any selectable options. So if you can see them in step 1, then this should let me know what's going on.

Thanks again for trying it out and trying to get this to work,

All the best,

OJ

@csq-dr
Copy link

csq-dr commented Apr 26, 2021

I got the same error message when trying to access DHS datasets via rdhs... when I debug the function and run the following codes in available_datasets function as in this issue: https://github.com/malaria-atlas-project/malariaAtlas/issues/30

z <- httr::POST("https://dhsprogram.com/data/dataset_admin/index.cfm", 
    body = values)
writeBin(z$content, tf)
y <- brio::read_lines(tf)
ctrycodelist_lines <- grep("name=\"ctrycodelist\" value=", 
    y, value = TRUE)

I get an empty ctrycodelist_lines object as follow

ctrycodelist_lines 
#> character(0)

@rlglaubius
Copy link

rlglaubius commented Apr 27, 2021

I have also encountered this error with a previously working script, though I've updated R and package installations since the last run. As csq-dr mentioned above, when tracing via debug(rdhs:::available_datasets) ctrycodelist_lines is empty after stepping through

  values <- list(Proj_ID = project_number, action = "downloadmanager")
  z <- httr::POST("https://dhsprogram.com/data/dataset_admin/index.cfm", 
    body = values)
  writeBin(z$content, tf)
  y <- brio::read_lines(tf)
  ctrycodelist_lines <- grep("name=\"ctrycodelist\" value=", 
    y, value = TRUE)

At this point, y has 681 lines; two seem notable:

[521] "<div align=\"left\" style=\"font-size:7pt;font-weight:bold\">Logged in: rglaubius@avenirhealth.org</div>"
[522] "<font face=\"Verdana,Arial\" size=\"2\" color=\"#ff0000\"><strong>Error in custom script module<br /></strong></font></div></div></div></div></div></div></div></div><div><div id=\"cs_control_19167\" class=\"cs_control CS_Element_Schedule\"><div  title=\"\" id=\"CS_Element_widgetsfooter\"></div></div>"

Line 521 indicates that my credentials were valid, but something went wrong with the form from there. I tried to replicate the error in the web browser. I logged in to dhsprogram.com, then entered "https://dhsprogram.com/data/dataset_admin/index.cfm?action=downloadmanager&Proj_ID=[redacted]" in the location bar. That took me to the download manager rather than producing an error.

I'm stymied at this point, but hopefully this will help pinpoint the issue. Please let me know if there is any other information I can provide.

@rlglaubius
Copy link

I think I was able to work around the problem. In
values <- list(Proj_ID = project_number, action = "downloadmanager")
Proj_ID is a string (e.g., "123456"). It seems that index.cfm expects a number instead of a string. If I interrupt execution after values gets assigned, then set values$Proj_ID=123456 manually, the rest of the code proceeds correctly.

@bpatenaude
Copy link

bpatenaude commented Apr 27, 2021

I think I was able to work around the problem. In
values <- list(Proj_ID = project_number, action = "downloadmanager")
Proj_ID is a string (e.g., "123456"). It seems that index.cfm expects a number instead of a string. If I interrupt execution after values gets assigned, then set values$Proj_ID=123456 manually, the rest of the code proceeds correctly.

@rlglaubius I am having the exact same issue as you described above. Same errors, tried debugging in the exact same way and get the same character(0) message when I run ctrycodelist_lines. Can you show the code for exactly how you fixed this? I am having issues following where and how to assign my $Proj_ID manually to work around the issue. Thanks!

@rlglaubius
Copy link

rlglaubius commented Apr 27, 2021

@bpatenaude I cloned the repository then changed line 66 of authentication.R to pass project_number to as.numeric:

  # Create post request for the download manager
  values <- list(
    Proj_ID = as.numeric(project_number),
    action = "downloadmanager"
  )

Caveats: this worked well enough for me, but I am not affiliated with the rdhs project and have not tested the fix extensively. This change might not address the root cause of the problem. This will not be appropriate if Proj_ID can start with "0".

@bpatenaude
Copy link

@bpatenaude I cloned the repository then changed line 66 of authentication.R to pass project_number to as.numeric:

  # Create post request for the download manager
  values <- list(
    Proj_ID = as.numeric(project_number),
    action = "downloadmanager"
  )

Caveats: this worked well enough for me, but I am not affiliated with the rdhs project and have not tested the fix extensively. This change might not address the root cause of the problem. This will not be appropriate if Proj_ID can start with "0".

Thanks @rlglaubius. @OJWatson is the rdhs project team looking into a fix for this? I know that the DHS has been updating their website over the last week and assume that the root of the issue has do to a change resulting from that website update.

@OJWatson
Copy link
Collaborator

Hi all,

Firstly thanks for the really helpful debugs and apologies for the delays (Github notifications get lost in a stream of emails from Github - just email me at o.watson15@imperial.ac.uk if I am taking long to reply).

I have a fix for this which I am just getting tested in #116. This should fix this issue that came about with the new DHS website. This will get merged shortly and will be version 0.7.2. If these changes are needed more urgently then you can install the package from the patching branch:

devtools::install_github("ropensci/rdhs", ref = "issue33_path")

🤞 this fixes the issue.

Best, OJ

@camillebelmin
Copy link
Author

Hi thanks for fixing this @OJWatson and others for helping to debug. By installing the package from this new branch I do not get this error anymore but I am said I do not have access to this dataset, while I do have access and can download it from the DHS website.

These requested datasets are not available from your DHS login credentials:

---
EGIR4ASV.rds
---
Please request permission for these datasets from the DHS website to be able to download them

Can this still be related to the new DHS website?

@OJWatson
Copy link
Collaborator

Hey @camillebelmin

This error is because that file requested (EGIR4ASV.rds) is not the name of that survey. The filenames will usually be the zip files themselves (e.g. EGIR4ASV.ZIP). Best way to get the names of these files is either:

  1. To go through either the API to find the datasets wanted using dhs_datasets
  2. Or to see all files you can download, use get_available_datasets()

Hope it helps,

OJ

@camillebelmin
Copy link
Author

Problem solved!
Thanks for the help :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants