Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consistent event handling #75

Open
chrisdane opened this issue Oct 22, 2020 · 3 comments
Open

consistent event handling #75

chrisdane opened this issue Oct 22, 2020 · 3 comments

Comments

@chrisdane
Copy link

chrisdane commented Oct 22, 2020

Hi

I would like to compare different datasets from pangaea in an automatic way using such an input list:

pangdois <- list()
if (T) {
    pangdois <- c(pangdois,
                  list("lorius_etal_1985"=
                       list(pdoi="10.1594/PANGAEA.860950",
                            vars=list("d18op"=list(inputname="δ18O H2O [‰ SMOW]", 
                                                    dims=list("kyr_before_1950"="Age [ka BP]"))))))

if (T) {
    pangdois <- c(pangdois, 
                  list("masson-delmotte_etal_2011"=
                       list(pdoi="10.1594/PANGAEA.785228",
                            vars=list("d18op"=list(inputname="δ18O H2O [‰ SMOW]", 
                                                   dims=list("kyr_before_1950"="Age [ka BP]"))))))
}

if (length(pangdois) > 0) {
    for (pangi in seq_along(pangdois)) {
        if (pangi == 1) library(pangaear)
        message("run `pangaear::pg_data(", pangdois[[pangi]]$pdoi, ")` ...")
        tmp <- pangaear::pg_data(pangdois[[pangi]]$pdoi)
        for (eventi in seq_along(tmp)) { # search wanted variables in every event of current doi
            event <- NA # default
            # <non-consistent event-handling; see below>
            for (vi in seq_along(pangdois[[pangi]]$vars)) { # check if any wanted variable exists in current event of current doi
                if (any(names(tmp[[eventi]]$data) == pangdois[[pangi]]$vars[[vi]]$inputname)) {
                    # do further stuff
                } # if current variable exists in current event of current doi
            } # for vi in wanted vars
        } # for eventi in seq_along(tmp)
    } # for pangi in pangdois
} # if length(pangdois) > 0

However, I realized that the usage of the event handler is not consistent. So far I figured out 3 different cases:

# case 1:
$ parent_doi: chr "10.1594/PANGAEA.785228"
$ metadata  :List of 7
..$ events    :List of 7
 .. ..$ Dome_Fuji (DF): chr NA
# --> if `metadata$events` is a list, use first entry that is NA to identify the data?

 # case 2:
$ parent_doi: chr "10.1594/PANGAEA.860950"
$ metadata  :List of 7
 ..$ events    : chr "Vostok * LATITUDE: -78.464420 * LONGITUDE: 106.837320 * DATE/TIME: 1980-01-01T00:00:00 * ELEVATION: 3488.0 m * Recovery: 2755 m * LOCATION: Antarctica * CAMPAIGN: Ice_core_diverse * BASIS: Sampling/drilling ice * METHOD/DEVICE: Drilling/drill rig (DRILL) * COMMENT: annual pressure 624 mbar; mean annual temperature -55.5°C; snow accumulation between 2.2 and; 22.5 g/cm**2/yr, about 250 ka"
# --> if `metadata$events` is not a list and `data$Event` is null, find a way to reduce the long event string to identify the data?

# case 3:
..$ parent_doi: chr "10.1594/PANGAEA.863978"
..$ metadata  :List of 9
 ..$ events    : chr "177-1089A * LATITUDE: -40.936400 * LONGITUDE: 9.894100 * DATE/TIME START: 1997-12-19T16:15:00 * DATE/TIME END: 1997-12-21T13:15:00 * ELEVATION: -4619.3 m * Penetration: 216.3 m * Recovery: 149.64 m * LOCATION: South Atlantic Ocean * CAMPAIGN: Leg177 (URI: https://doi.org/10.2973/odp.proc.ir.177.1999) * BASIS: Joides Resolution (URI: http://www-odp.tamu.edu/resolutn.html) * METHOD/DEVICE: Drilling/drill rig (DRILL) * COMMENT: 23 cores; 216.3 m cored; 0 m drilled; 69.2 % recovery; 177-1089B * LATITUDE: -40.936400 * LONGITUDE: 9.894100 * DATE/TIME START: 1997-12-22T13:16:00 * DATE/TIME END: 1997-12-22T22:45:00 * ELEVATION: -4623.8 m * Penetration: 264.9 m * Recovery: 246.62 m * LOCATION: South Atlantic Ocean * CAMPAIGN: Leg177 (URI: https://doi.org/10.2973/odp.proc.ir.177.1999) * BASIS: Joides Resolution (URI: http://www-odp.tamu.edu/resolutn.html) * METHOD/DEVICE: Drilling/drill rig (DRILL) * COMMENT: 29 cores; 264.9 m cored; 0 m drilled; 93.1 % recovery; 306-U1313B * LATITUDE: 41.000023 * LONGITUDE: -32.957300 * ELEVATION: -3413.5 m * Recovery: 306.54 m * CAMPAIGN: Exp306 (North Atlantic Climate 2) (URI: https://doi.org/10.2204/iodp.proc.303306.2006) * BASIS: Joides Resolution (URI: http://www-odp.tamu.edu/resolutn.html) * METHOD/DEVICE: Drilling/drill rig (DRILL) * COMMENT: 32 cores; 300.4 m cored; 102 % recovered; 2 m drilled; 302.4 m penetrated; GeoB1515-1 * LATITUDE: 4.238333 * LONGITUDE: -43.666667 * DATE/TIME: 1991-05-15T00:00:00 * ELEVATION: -3129.0 m * Recovery: 6.58 m * LOCATION: Amazon Fan * CAMPAIGN: M16/2 (URI: https://doi.org/10.2312/cr_m16) * BASIS: Meteor (1986) (URI: https://de.wikipedia.org/wiki/Meteor_(Schiff,_1986)) * METHOD/DEVICE: Gravity corer (Kiel type) (SL); GeoB1523-1 * LATITUDE: 3.831667 * LONGITUDE: -41.621667 * DATE/TIME: 1991-05-17T00:00:00 * ELEVATION: -3292.0 m * Recovery: 6.65 m * LOCATION: Amazon Fan * CAMPAIGN: M16/2 (URI: https://doi.org/10.2312/cr_m16) * BASIS: Meteor (1986) (URI: https://de.wikipedia.org/wiki/Meteor_(Schiff,_1986)) * METHOD/DEVICE: Gravity corer (Kiel type) (SL); KNR140-12JPC (KNR140-2-12JPC) * LATITUDE: 29.080000 * LONGITUDE: -72.900000 * ELEVATION: -4250.0 m * LOCATION: North Atlantic Ocean * CAMPAIGN: KNR140 * BASIS: Knorr * METHOD/DEVICE: Piston corer (PC); M35003-4 * LATITUDE: 12.090000 * LONGITUDE: -61.243333 * DATE/TIME: 1996-04-19T00:00:00 * ELEVATION: -1299.0 m * Recovery: 9.63 m * CAMPAIGN: M35/1 (URI: https://doi.org/10.2312/cr_m35) * BASIS: Meteor (1986) (URI: https://de.wikipedia.org/wiki/Meteor_(Schiff,_1986)) * METHOD/DEVICE: Gravity corer (Kiel type) (SL)"          
 $ data      : tibble [138 × 27] (S3: tbl_df/tbl/data.frame)
  ..$ Event                               : chr [1:138] "177-1089A" "177-1089A" "177-1089A" "177-1089A" ...
# --> if `metadata$events` is not a list and `data$Event` is not null, use maybe `unique(data$Event)` to identify the data?

Probably I do not understand the correct usage of the event handler. Is there a better way to identify each individual data set per DOI in an automatic way?

Thanks a lot for any help,
Chris

Session Info
devtools::session_info()
─ Session info ───────────────────────────────────────────────────────────────
 setting  value                       
 version  R version 4.0.3 (2020-10-10)
 os       Arch Linux                  
 system   x86_64, linux-gnu           
 ui       X11                         
 language en_US #de_DE                
 collate  C                           
 ctype    en_US.UTF-8                 
 tz       Europe/Berlin               
 date     2020-10-22Packages ───────────────────────────────────────────────────────────────────
 package     * version date       lib source                            
 assertthat    0.2.1   2019-03-21 [1] CRAN (R 4.0.0)                    
 backports     1.1.10  2020-09-15 [1] CRAN (R 4.0.2)                    
 bookdown    * 0.21    2020-10-13 [1] CRAN (R 4.0.3)                    
 callr         3.5.1   2020-10-13 [1] CRAN (R 4.0.3)                    
 cli           2.1.0   2020-10-12 [1] CRAN (R 4.0.3)                    
 colorout    * 1.2-2   2020-04-27 [1] Github (jalvesaq/colorout@726d681)
 crayon        1.3.4   2017-09-16 [1] CRAN (R 4.0.0)                    
 crul          1.0.0   2020-07-30 [1] CRAN (R 4.0.2)                    
 curl          4.3     2019-12-02 [1] CRAN (R 4.0.0)                    
 desc          1.2.0   2018-05-01 [1] CRAN (R 4.0.0)                    
 devtools    * 2.3.2   2020-09-18 [1] CRAN (R 4.0.2)                    
 digest        0.6.26  2020-10-17 [1] CRAN (R 4.0.3)                    
 dotCall64   * 1.0-0   2018-07-30 [1] CRAN (R 4.0.0)                    
 dplyr         1.0.2   2020-08-18 [1] CRAN (R 4.0.2)                    
 dtupdate    * 1.5     2020-04-27 [1] Github (hrbrmstr/dtupdate@58056ea)
 ellipsis      0.3.1   2020-05-15 [1] CRAN (R 4.0.2)                    
 extrafont   * 0.17    2014-12-08 [1] CRAN (R 4.0.0)                    
 extrafontdb   1.0     2012-06-11 [1] CRAN (R 4.0.0)                    
 fansi         0.4.1   2020-01-08 [1] CRAN (R 4.0.0)                    
 fields      * 11.6    2020-10-09 [1] CRAN (R 4.0.3)                    
 fs            1.5.0   2020-07-31 [1] CRAN (R 4.0.2)                    
 generics      0.0.2   2018-11-29 [1] CRAN (R 4.0.0)                    
 glue          1.4.2   2020-08-27 [1] CRAN (R 4.0.2)                    
 gsw         * 1.0-5   2017-08-09 [1] CRAN (R 4.0.0)                    
 hoardr        0.5.2   2018-12-02 [1] CRAN (R 4.0.0)                    
 httpcode      0.3.0   2020-04-10 [1] CRAN (R 4.0.0)                    
 httr          1.4.2   2020-07-20 [1] CRAN (R 4.0.2)                    
 knitr         1.30    2020-09-22 [1] CRAN (R 4.0.2)                    
 lifecycle     0.2.0   2020-03-06 [1] CRAN (R 4.0.0)                    
 magrittr      1.5     2014-11-22 [1] CRAN (R 4.0.0)                    
 maps          3.3.0   2018-04-03 [1] CRAN (R 4.0.0)                    
 memoise       1.1.0   2017-04-21 [1] CRAN (R 4.0.0)                    
 ncdf4       * 1.17    2019-10-23 [1] CRAN (R 4.0.0)                    
 oai           0.3.0   2019-09-07 [1] CRAN (R 4.0.0)                    
 oce         * 1.2-0   2020-02-21 [1] CRAN (R 4.0.0)                    
 pangaear    * 1.0.0   2020-01-22 [1] CRAN (R 4.0.0)                    
 pbapply       1.4-3   2020-08-18 [1] CRAN (R 4.0.2)                    
 pillar        1.4.6   2020-07-10 [1] CRAN (R 4.0.2)                    
 pkgbuild      1.1.0   2020-07-13 [1] CRAN (R 4.0.2)                    
 pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.0.0)                    
 pkgload       1.1.0   2020-05-29 [1] CRAN (R 4.0.2)                    
 plyr          1.8.6   2020-03-03 [1] CRAN (R 4.0.0)                    
 prettyunits   1.1.1   2020-01-24 [1] CRAN (R 4.0.0)                    
 processx      3.4.4   2020-09-03 [1] CRAN (R 4.0.2)                    
 ps            1.4.0   2020-10-07 [1] CRAN (R 4.0.3)                    
 purrr         0.3.4   2020-04-17 [1] CRAN (R 4.0.0)                    
 R6            2.4.1   2019-11-12 [1] CRAN (R 4.0.0)                    
 rappdirs      0.3.1   2016-03-28 [1] CRAN (R 4.0.0)                    
 Rcpp          1.0.5   2020-07-06 [1] CRAN (R 4.0.2)                    
 remotes       2.2.0   2020-07-21 [1] CRAN (R 4.0.2)                    
 rlang         0.4.8   2020-10-08 [1] CRAN (R 4.0.3)                    
 rprojroot     1.3-2   2018-01-03 [1] CRAN (R 4.0.0)                    
 Rttf2pt1      1.3.8   2020-01-10 [1] CRAN (R 4.0.0)                    
 sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 4.0.0)                    
 spam        * 2.5-1   2019-12-12 [1] CRAN (R 4.0.0)                    
 stringi       1.5.3   2020-09-09 [1] CRAN (R 4.0.2)                    
 stringr       1.4.0   2019-02-10 [1] CRAN (R 4.0.0)                    
 testthat    * 2.3.2   2020-03-02 [1] CRAN (R 4.0.0)                    
 tibble        3.0.4   2020-10-12 [1] CRAN (R 4.0.3)                    
 tidyselect    1.1.0   2020-05-11 [1] CRAN (R 4.0.2)                    
 usethis     * 1.6.3   2020-09-17 [1] CRAN (R 4.0.2)                    
 vctrs         0.3.4   2020-08-29 [1] CRAN (R 4.0.2)                    
 withr         2.3.0   2020-09-22 [1] CRAN (R 4.0.2)                    
 xfun          0.18    2020-09-29 [1] CRAN (R 4.0.2)                    
 xml2          1.3.2   2020-04-23 [1] CRAN (R 4.0.0)  
@sckott
Copy link
Contributor

sckott commented Oct 26, 2020

thanks for the issue @chrisdane ! having a look

@sckott
Copy link
Contributor

sckott commented Oct 26, 2020

unfortunately, the files from Pangaea are semi formatted text files that are quite hard to parse, and super variable. I can try to make them more consistent.

notes to self:

  • DOIs that have varied events text to parse:

10.1594/PANGAEA.785228
10.1594/PANGAEA.860950
10.1594/PANGAEA.863978
10.1594/PANGAEA.881731
10.1594/PANGAEA.896852
10.1594/PANGAEA.896852

  • working locally on parsing events data, come back to later.

@sckott
Copy link
Contributor

sckott commented Jan 7, 2021

@chrisdane sorry for delay on this - if you are willing to contribute this will move along faster - i'll get to it at some point

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants