Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using dataspice for multiple datasets #112

Open
robitalec opened this issue May 27, 2021 · 0 comments
Open

Using dataspice for multiple datasets #112

robitalec opened this issue May 27, 2021 · 0 comments

Comments

@robitalec
Copy link
Member

robitalec commented May 27, 2021

Continuing our discussion from #110, I found two obvious hurdles when using dataspice for multiple datasets. In this example, I am splitting up the mtcars example data into an uneven and overlapping set of columns, and distinct set of rows. Then using create_spice, prep_attributes and prep_access, followed by edit_* to setup our metadata files.

Setup

library(dataspice)

dir.create('data')
write.csv(mtcars[1:10, 1:4], 'data/mtcars1.csv')
write.csv(mtcars[11:20, 2:6], 'data/mtcars2.csv')

prep_access()

# The following fileNames have been added to the access file: mtcars1.csv, mtcars2.csv

prep_attributes()

# The following variableNames have been added to the attributes file for mtcars1.csv: X1, mpg, cyl, disp, hp
# The following variableNames have been added to the attributes file for mtcars2.csv: X1, cyl, disp, hp, drat, wt
# Warning messages:
# 1: Missing column names filled in: 'X1' [1] 
# 2: Missing column names filled in: 'X1' [1] 

Then I added some filler information to the metadata. Here are those files zipped: metadata.zip

edit_access()
edit_attributes()
edit_biblio()
edit_creators()

In this example biblio, I added another row for "mtcars2" as suggested in the Shiny app with a right click. It looks like this:

read.csv('data/metadata/biblio.csv')

#     title description datePublished
# 1 mtcars 1          NA          1974
# 2 mtcars 2          NA          1974

#                                              citation
# 1 Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391–411.
# 2 Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391–411.

#   keywords license funder geographicDescription northBoundCoord
# 1       NA      NA     NA                    NA              47
# 2       NA      NA     NA                    NA              57

#   eastBoundCoord southBoundCoord westBoundCoord wktString  startDate
# 1            -98              32           -120        NA 1974-01-01
# 2            -88              42           -110        NA 1974-01-01
 
#     endDate
# 1 1975-01-01
# 2 1975-01-01

Challenges

In write_spice(), we get a warning from the is.na(biblio$keyworks) check, which is only expecting keywords from one row of data.

https://github.com/ropensci/dataspice/blob/main/R/write_spice.R#L67

write_spice()
Warning message:
In if (is.na(biblio$keywords)) { :
  the condition has length > 1 and only the first element will be used

In build_site(), we get an error trying to parse the boxes described in data/metadata/biblio.csv. I was expecting this to simply generate two boxes, instead of one when we are using a single dataset.

build_site()

# Error: Failed to parse box in spatialCoverage$geo$box of '47 -98 32 -12057 -88 42 -110'. 

If you try and remove the second set of east/west/north/south coordinates, the same error occurs:

build_site()

# Error: Failed to parse box in spatialCoverage$geo$box of '47 -98 32 -120NA NA NA NA'. 

This error occurs in build_site() but originates in write_spice() (L88) as the output spatialCoverage is an unexpected list of length 2.

write_spice()

# In dataspice.json
# ...
#  "spatialCoverage": {
#     "type": "Place",
#     "name": [null, null],
#     "geo": {
#       "type": "GeoShape",
#       "box": ["47 -98 32 -120", "37 -88 42 -130"]
#    }
#  }

Within build_site(), the error occurs in the length check == 1 in function parse_GeoShape_box().

biblio <- read.csv('data/metadata/biblio.csv')

box <- paste(biblio$northBoundCoord, biblio$eastBoundCoord,
            biblio$southBoundCoord, biblio$westBoundCoord)
box

# [1] "47 -98 32 -120" "37 -88 42 -130"

tokens <- stringr::str_split(box, " ")

tokens

# [[1]]
# [1] "47"   "-98"  "32"   "-120"

# [[2]]
# [1] "37"   "-88"  "42"   "-130"

if (!length(tokens) == 1) {
  stop("Failed to parse box in spatialCoverage$geo$box of '", 
       box, "'.", call. = FALSE)
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant