Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve doc on how to read objects without object assignment #107

Open
kuriwaki opened this issue Sep 16, 2021 · 1 comment
Open

Improve doc on how to read objects without object assignment #107

kuriwaki opened this issue Sep 16, 2021 · 1 comment

Comments

@kuriwaki
Copy link
Member

kuriwaki commented Sep 16, 2021

RData files cannot be read in as an object, but instead are simply released on to the user environment. I think we should all be switching to Rds (see IQSS/dataverse#7249) but nonetheless, some files on Dataverse are uploaded as .RData.

It turns out there are two ways to load this. One is the old way to write the binary file and re-read it with a different function. Another is to create a mini environment within a function, as I found on Stack Overflow. See both in the reprex below. I get identical objects.

We should update the doc with an example.

h/t @jonrobinson2

library(dataverse)
library(fs)


# Algara dataset
# https://dataverse.harvard.edu/file.xhtml?fileId=5028532&version=1.0

# 1. writing and saving as binary works
as_binary <- get_file_by_id(file = 5028532, server = "dataverse.harvard.edu")

temp <- tempdir()
writeBin(as_binary, path(temp, "county.RData"))
load(path(temp, "county.RData"))

str(pres_elections_release)
#> 'data.frame':    113756 obs. of  20 variables:
#>  $ election_year                        : num  1868 1872 1876 1880 1884 ...
#>  $ fips                                 : chr  "01001" "01001" "01001" "01001" ...
#>  $ county_name                          : chr  "AUTAUGA" "AUTAUGA" "AUTAUGA" "AUTAUGA" ...
#>  $ state                                : chr  "AL" "AL" "AL" "AL" ...
#>  $ sfips                                : chr  "01" "01" "01" "01" ...
#>  $ office                               : chr  "PRES" "PRES" "PRES" "PRES" ...
#>  $ election_type                        : chr  "G" "G" "G" "G" ...
#>  $ seat_status                          : chr  "Open Seat" "Republican President Re-election" "Open Seat" "Open Seat" ...
#>  $ democratic_raw_votes                 : num  851 669 804 978 911 ...
#>  $ dem_nominee                          : chr  "Horatio Seymour" "Horace Greeley" "Samuel J. Tilden" "Winfield Scott Hancock" ...
#>  $ republican_raw_votes                 : num  1505 1593 1576 974 877 ...
#>  $ rep_nominee                          : chr  "Ulysses S. Grant" "Ulysses S. Grant" "Rutherford B. Hayes" "James A. Garfield" ...
#>  $ pres_raw_county_vote_totals_two_party: num  2356 2262 2380 1952 1788 ...
#>  $ raw_county_vote_totals               : num  2356 2262 2380 1967 1789 ...
#>  $ county_first_date                    : Date, format: "1818-11-21" "1818-11-21" ...
#>  $ county_end_date                      : Date, format: NA NA ...
#>  $ state_admission_date                 : chr  "1819-12-14" "1819-12-14" "1819-12-14" "1819-12-14" ...
#>  $ complete_county_cases                : num  1 1 1 1 1 1 1 1 1 1 ...
#>  $ original_county_name                 : chr  NA NA NA NA ...
#>  $ original_name_end_date               : Date, format: NA NA ...


# 2. how about directly into R? This is a Rdata file, which we often read by load().

# via: https://stackoverflow.com/questions/34925668/r-assign-content-from-rda-object-with-load
load_object <- function(file) {
  tmp <- new.env()
  load(file = file, envir = tmp)
  tmp[[ls(tmp)[1]]]
}


as_rda <- get_dataframe_by_id(file = 5028532, 
                              server = "dataverse.harvard.edu", 
                              .f = load_object, 
                              original = TRUE)

identical(as_rda, pres_elections_release)
#> [1] TRUE

Created on 2021-09-16 by the reprex package (v2.0.1)

@kuriwaki kuriwaki added this to the CRAN 0.3.10 milestone Dec 21, 2021
kuriwaki added a commit that referenced this issue Dec 24, 2021
@kuriwaki kuriwaki reopened this Sep 8, 2023
kuriwaki added a commit that referenced this issue Sep 8, 2023
@kuriwaki
Copy link
Member Author

kuriwaki commented Sep 8, 2023

@Danny-dK's proposal is more concise:

get_dataframe_by_doi(
  filedoi = "10.70122/FK2/PPIAXE/X2FC5V",
  server = "demo.dataverse.org",
  original = TRUE,
  .f = function(x) load(x, envir = .GlobalEnv))

I have made this change in dev: f33e578

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant