archivist package Tags

Witold Chodor edited this page Dec 16, 2015 · 30 revisions

Archivist package is a set of tools for datasets and plots archivisation in R.

Each artifact can be archived with his unique Tags which are attributes of an artifact. They can be the artifact's name, class or archiving date. Furthermore, for various artifact's classes more different Tags are available.

A Tag is represented as a string and usually has the following structure "TagKey:TagValue", e.g., "name:iris".

Tags are stored in the Repository. If data is extracted from an artifact then a special Tag named relationWith is created. It specifies with which artifact this data is related to.

The list of supported artifacts which are divided thematically is presented below:

Regression Models

lm

- name - class - coefname - rank - df.residual - date ````{Ruby} artifact tag 1 cd6557c6163a6f9800f308f343e75e72 name:model 2 cd6557c6163a6f9800f308f343e75e72 class:lm 3 cd6557c6163a6f9800f308f343e75e72 coefname:(Intercept) 4 cd6557c6163a6f9800f308f343e75e72 coefname:Sepal.Width 5 cd6557c6163a6f9800f308f343e75e72 coefname:Petal.Length 6 cd6557c6163a6f9800f308f343e75e72 coefname:Petal.Width 7 cd6557c6163a6f9800f308f343e75e72 rank:4 8 cd6557c6163a6f9800f308f343e75e72 df.residual:146 9 cd6557c6163a6f9800f308f343e75e72 date:2015-12-16 09:25:51 10 7a761a2ae54f3d90060a9f6ca04b3506 relationWith:cd6557c6163a6f9800f308f343e75e72 ````

summary.lm

- name - class - sigma - df - R^2 - adjusted R^2 - fstatistic - fstatistic.df - date ````{Ruby} artifact tag 1 159d608f0a3e80d623c8e6f90c5d0c0c name:summary.model 2 159d608f0a3e80d623c8e6f90c5d0c0c class:summary.lm 3 159d608f0a3e80d623c8e6f90c5d0c0c sigma:0.3145 4 159d608f0a3e80d623c8e6f90c5d0c0c df:4 5 159d608f0a3e80d623c8e6f90c5d0c0c df:146 6 159d608f0a3e80d623c8e6f90c5d0c0c df:4 7 159d608f0a3e80d623c8e6f90c5d0c0c R^2:0.8586 8 159d608f0a3e80d623c8e6f90c5d0c0c adjusted R^2:0.8557 9 159d608f0a3e80d623c8e6f90c5d0c0c fstatistic:295.5 10 159d608f0a3e80d623c8e6f90c5d0c0c fstatistic.df:3 11 159d608f0a3e80d623c8e6f90c5d0c0c fstatistic.df:146 12 159d608f0a3e80d623c8e6f90c5d0c0c date:2015-12-16 09:49:30 ````

glmnet

- name - class - dim - nulldev - npasses - offset - nobs - date ````{Ruby} artifact tag 1 515b361852c5f72eaa1eaf9da068ffa1 name:glmnet1 2 515b361852c5f72eaa1eaf9da068ffa1 class:elnet 3 515b361852c5f72eaa1eaf9da068ffa1 class:glmnet 4 515b361852c5f72eaa1eaf9da068ffa1 dim:20 5 515b361852c5f72eaa1eaf9da068ffa1 dim:64 6 515b361852c5f72eaa1eaf9da068ffa1 nulldev:121.075158892729 7 515b361852c5f72eaa1eaf9da068ffa1 npasses:303 8 515b361852c5f72eaa1eaf9da068ffa1 offset:FALSE 9 515b361852c5f72eaa1eaf9da068ffa1 nobs:100 10 515b361852c5f72eaa1eaf9da068ffa1 date:2015-12-16 09:52:30 11 bce5c8548cbc0b633606f1802873e768 relationWith:515b361852c5f72eaa1eaf9da068ffa1 ````

survfit

- name - class - n - type - conf.type - conf.int - strata - date ````{Ruby} artifact tag 1 7756e19f11f2e8b8ff22ecaf2e65d58a name:survfit1 2 7756e19f11f2e8b8ff22ecaf2e65d58a class:survfit 3 7756e19f11f2e8b8ff22ecaf2e65d58a n:26 4 7756e19f11f2e8b8ff22ecaf2e65d58a type:right 5 7756e19f11f2e8b8ff22ecaf2e65d58a conf.type:log 6 7756e19f11f2e8b8ff22ecaf2e65d58a conf.int:0.95 7 7756e19f11f2e8b8ff22ecaf2e65d58a strata:NULL 8 7756e19f11f2e8b8ff22ecaf2e65d58a date:2015-12-16 09:58:38 9 787d1e43d1a52cb104f0833681777829 relationWith:7756e19f11f2e8b8ff22ecaf2e65d58a ````

Plots

ggplot

- name - class - date - labelx - labely ````{Ruby} artifact tag 1 4f84d3942c6cf910fc70d0031cb4e07d labelx:gp 2 4f84d3942c6cf910fc70d0031cb4e07d labely:y 3 4f84d3942c6cf910fc70d0031cb4e07d class:gg 4 4f84d3942c6cf910fc70d0031cb4e07d class:ggplot 5 4f84d3942c6cf910fc70d0031cb4e07d name:myplot123 6 4f84d3942c6cf910fc70d0031cb4e07d date:2014-09-08 20:34:33 7 05eaf6413ffa4d961c01a49ee5852822 relationWith:4f84d3942c6cf910fc70d0031cb4e07d ````

trellis

- date - name - class ````{Ruby} artifact tag 1 506effb5781140c71a388022d3c8b004 name:trellis.plot 2 506effb5781140c71a388022d3c8b004 class:trellis 3 506effb5781140c71a388022d3c8b004 date:2014-08-29 11:01:25 4 f49104cd336e6417f89141855c0fe4a7 relationWith:506effb5781140c71a388022d3c8b004 ````

Results of Agglomeration Methods

twins

which is a result of agnes, diana or mona functions
- date - name - class - ac ````{Ruby} artifact tag 1 db864b3dcc09888f5f7f2750b4ab553d name:agn1 2 db864b3dcc09888f5f7f2750b4ab553d class:agnes 3 db864b3dcc09888f5f7f2750b4ab553d class:twins 4 db864b3dcc09888f5f7f2750b4ab553d date:2014-09-08 20:35:24 5 db864b3dcc09888f5f7f2750b4ab553d ac:0.797755535467609 6 c2380a7b8c090d6564ff12cdd60aaf7f relationWith:db864b3dcc09888f5f7f2750b4ab553d ````

partition

which is a result of pam, clara or fanny functions
- name - class - memb.exp - dunn_coeff - normalized dunn_coeff - k.crisp - objective - tolerance - iterations - converged - maxit - clus.avg.widths - avg.width - date
                           artifact                                           tag         createdDate
1  1b4b9202e3d6cad6cdbc2ebbeda94631                                   name:fannyx 2015-12-16 10:02:26
2  1b4b9202e3d6cad6cdbc2ebbeda94631                                   class:fanny 2015-12-16 10:02:26
3  1b4b9202e3d6cad6cdbc2ebbeda94631                               class:partition 2015-12-16 10:02:26
4  1b4b9202e3d6cad6cdbc2ebbeda94631                                    memb.exp:2 2015-12-16 10:02:26
5  1b4b9202e3d6cad6cdbc2ebbeda94631                  dunn_coeff:0.857319149113722 2015-12-16 10:02:26
6  1b4b9202e3d6cad6cdbc2ebbeda94631       normalized dunn_coeff:0.714638298227444 2015-12-16 10:02:26
7  1b4b9202e3d6cad6cdbc2ebbeda94631                                     k.crisp:2 2015-12-16 10:02:26
8  1b4b9202e3d6cad6cdbc2ebbeda94631                    objective:13.3387071200155 2015-12-16 10:02:26
9  1b4b9202e3d6cad6cdbc2ebbeda94631                               tolerance:1e-15 2015-12-16 10:02:26
10 1b4b9202e3d6cad6cdbc2ebbeda94631                                 iterations:10 2015-12-16 10:02:26
11 1b4b9202e3d6cad6cdbc2ebbeda94631                                   converged:1 2015-12-16 10:02:26
12 1b4b9202e3d6cad6cdbc2ebbeda94631                                     maxit:500 2015-12-16 10:02:26
13 1b4b9202e3d6cad6cdbc2ebbeda94631             clus.avg.widths:0.851114703584252 2015-12-16 10:02:26
14 1b4b9202e3d6cad6cdbc2ebbeda94631             clus.avg.widths:0.780710623619941 2015-12-16 10:02:26
15 1b4b9202e3d6cad6cdbc2ebbeda94631                   avg.width:0.805854937892909 2015-12-16 10:02:27
16 1b4b9202e3d6cad6cdbc2ebbeda94631                      date:2015-12-16 10:02:26 2015-12-16 10:02:27
17 093f20ec76c271c449f7eadc32bfefaf relationWith:1b4b9202e3d6cad6cdbc2ebbeda94631 2015-12-16 10:02:27

lda

- name - class - N - lev - counts - prior - svd - date ````{Ruby} artifact tag 1 2ae5f73c0b979532555fb3800458a4d4 name:lda1 2 2ae5f73c0b979532555fb3800458a4d4 class:lda 3 2ae5f73c0b979532555fb3800458a4d4 N:75 4 2ae5f73c0b979532555fb3800458a4d4 lev:c 5 2ae5f73c0b979532555fb3800458a4d4 lev:s 6 2ae5f73c0b979532555fb3800458a4d4 lev:v 7 2ae5f73c0b979532555fb3800458a4d4 counts_c:24 8 2ae5f73c0b979532555fb3800458a4d4 counts_s:26 9 2ae5f73c0b979532555fb3800458a4d4 counts_v:25 10 2ae5f73c0b979532555fb3800458a4d4 prior_c:0.333 11 2ae5f73c0b979532555fb3800458a4d4 prior_s:0.333 12 2ae5f73c0b979532555fb3800458a4d4 prior_v:0.333 13 2ae5f73c0b979532555fb3800458a4d4 svd:35.154 14 2ae5f73c0b979532555fb3800458a4d4 svd:3.24 15 2ae5f73c0b979532555fb3800458a4d4 date:2015-12-16 10:24:22 16 53a9f3aa4235e111524dda17aad2ee3a relationWith:2ae5f73c0b979532555fb3800458a4d4 ````

qda

- name - class - N - lev - counts - prior - ldet - terms - date
                           artifact                                           tag
1  fdf1922d88bb7369896886dfa7da16f6                                     name:qda1
2  fdf1922d88bb7369896886dfa7da16f6                                     class:qda
3  fdf1922d88bb7369896886dfa7da16f6                                          N:75
4  fdf1922d88bb7369896886dfa7da16f6                                         lev:c
5  fdf1922d88bb7369896886dfa7da16f6                                         lev:s
6  fdf1922d88bb7369896886dfa7da16f6                                         lev:v
7  fdf1922d88bb7369896886dfa7da16f6                                   counts_c:25
8  fdf1922d88bb7369896886dfa7da16f6                                   counts_s:25
9  fdf1922d88bb7369896886dfa7da16f6                                   counts_v:25
10 fdf1922d88bb7369896886dfa7da16f6                                 prior_c:0.333
11 fdf1922d88bb7369896886dfa7da16f6                                 prior_s:0.333
12 fdf1922d88bb7369896886dfa7da16f6                                 prior_v:0.333
13 fdf1922d88bb7369896886dfa7da16f6                        ldet:-10.7456046297995
14 fdf1922d88bb7369896886dfa7da16f6                        ldet:-14.1339270400961
15 fdf1922d88bb7369896886dfa7da16f6                        ldet:-8.73959597700551
16 fdf1922d88bb7369896886dfa7da16f6                                    terms:NULL
17 fdf1922d88bb7369896886dfa7da16f6                      date:2015-12-16 10:29:58
18 4d504b6c315c9be389f2e84186fa64c5 relationWith:fdf1922d88bb7369896886dfa7da16f6

Statistical tests

htest

- name - class - method - data.name - alternative - statistic - parameter - p.value - conf.int. - estimate - date
                           artifact                                           tag
1  614506b5d47b6f643f9fd818aa13c7a3                                 name:htestcor
2  614506b5d47b6f643f9fd818aa13c7a3                                   class:htest
3  614506b5d47b6f643f9fd818aa13c7a3    method:Pearsons product-moment correlation
4  614506b5d47b6f643f9fd818aa13c7a3           data.name:wine_year and wine_s.temp
5  614506b5d47b6f643f9fd818aa13c7a3                      null.value:correlation=0
6  614506b5d47b6f643f9fd818aa13c7a3                         alternative:two.sided
7  614506b5d47b6f643f9fd818aa13c7a3                    statistic:5.03111686859242
8  614506b5d47b6f643f9fd818aa13c7a3                               parameter:df=45
9  614506b5d47b6f643f9fd818aa13c7a3                  p.value:8.29309463679095e-06
10 614506b5d47b6f643f9fd818aa13c7a3     95 percent conf.int.:[0.377951, 0.756773]
11 614506b5d47b6f643f9fd818aa13c7a3                             estimate:0.599997
12 614506b5d47b6f643f9fd818aa13c7a3                      date:2015-12-16 10:38:19
13 956e053182d9bb7e391aeaac3f3020df relationWith:614506b5d47b6f643f9fd818aa13c7a3

When non of above is specified, tags are corresponded by default

default

- name - class - date ````{Ruby} artifact tag 1 e74a7838ca62b5998d6753a8458ef7b7 name:survModel 2 e74a7838ca62b5998d6753a8458ef7b7 class:coxph 3 e74a7838ca62b5998d6753a8458ef7b7 date:2014-08-29 11:13:11 ````

data.frame

- name - class - date - varname ````{Ruby} artifact tag 1 ff575c261c949d073b2895b05d1097c3 name:iris 2 ff575c261c949d073b2895b05d1097c3 varname:Sepal.Length 3 ff575c261c949d073b2895b05d1097c3 varname:Sepal.Width 4 ff575c261c949d073b2895b05d1097c3 varname:Petal.Length 5 ff575c261c949d073b2895b05d1097c3 varname:Petal.Width 6 ff575c261c949d073b2895b05d1097c3 varname:Species 7 ff575c261c949d073b2895b05d1097c3 class:data.frame 8 ff575c261c949d073b2895b05d1097c3 date:2014-08-28 14:22:42 ````

Storing origin code as a `name Tag`

This is possible when using chaining code like this below: ````{Ruby} > # origin of the artifacts stored as a name - chaining code > library(dplyr) > exampleRepoDir createEmptyRepo( repoDir = exampleRepoDir ) > data("hflights", package = "hflights") > hflights %>% + group_by(Year, Month, DayofMonth) %>% + select(Year:DayofMonth, ArrDelay, DepDelay) %>% + saveToRepo( exampleRepoDir, chain = TRUE ) %>% + # here the artifact is stored but chaining is not finished + summarise( + arr = mean(ArrDelay, na.rm = TRUE), + dep = mean(DepDelay, na.rm = TRUE) + ) %>% + filter(arr > 30 | dep > 30) %>% + saveToRepo( exampleRepoDir ) [1] "9013563d1069359f9b7d7a49c49b0a1f" > # chaining code is finished and after last operation the > # artifact is stored > showLocalRepo( exampleRepoDir, "tags" )[,-c(1,3)] [1] "name:hflights %>% group_by(Year, Month, DayofMonth) %>% select(Year:DayofMonth, ArrDelay, DepDelay)" [2] "varname:Year" [3] "varname:Month" [4] "varname:DayofMonth" [5] "varname:ArrDelay" [6] "varname:DepDelay" [7] "class:grouped_df" [8] "class:tbl_df" [9] "class:tbl" [10] "class:data.frame" [11] "date:2014-09-08 20:42:20" [12] "name:hflights %>% group_by(Year, Month, DayofMonth) %>% select(Year:DayofMonth, ArrDelay, DepDelay) %>% saveToRepo(exampleRepoDir, chain = TRUE) %>% summarise(arr = mean(ArrDelay, na.rm = TRUE), dep = mean(DepDelay, na.rm = TRUE)) %>% filter(arr > 30 | dep > 30)" [13] "varname:Year" [14] "varname:Month" [15] "varname:DayofMonth" [16] "varname:arr" [17] "varname:dep" [18] "class:grouped_df" [19] "class:tbl_df" [20] "class:tbl" [21] "class:data.frame" [22] "date:2014-09-08 20:42:20" ```` Functions using `Tags` are: - `addTagsRepo`, - `getTagsLocal`, - `getTagsGithub`, - `searchInLocalRepo`, - `searchInGithubRepo`.
Note
In the following way one can specify his own `Tags` for artifacts by setting artifact's attribute before call of the `saveToRepo` function: `attr(x, "tags" ) = c( "name1", "name2" )`, where `x` is artifact and `name1, name2` are Tags specified by a user. It can be also done in a new, simpler way by using \code{userTags} parameter like this: `saveToRepo(model, repoDir, userTags = c("my_model", "do not delete"))`.
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.