Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spreadRepo/splitTags function #91

Closed
MarcinKosinski opened this issue Sep 30, 2015 · 7 comments
Closed

spreadRepo/splitTags function #91

MarcinKosinski opened this issue Sep 30, 2015 · 7 comments

Comments

@MarcinKosinski
Copy link
Collaborator

Do we need a function that extracts tags from the version they are now tagName:tagValue to 2 separate columns: tagName and tagValue.

I've prepared such function.

create repository

> # create repository
> library(archivist)
> deleteRepo("spread_test")
> createEmptyRepo("spread_test", default = TRUE)
> 
> saveToRepo(iris)
[1] "ff575c261c949d073b2895b05d1097c3"
> library(datasets)
> data(iris3)
> saveToRepo(iris3, format = "rdb")
[1] "40beeb27e7bb0b84d415d6bdc06f4a62"
> data(longley)
> aoptions("format", "rdb")
[1] "rdb"
> saveToRepo(longley)
[1] "a9c4ea5c0ad4a8493e726e1e4223aa18"
> 
> 

assign spreadRepo function

> # assign spreadRepo function
> library(dplyr)
> spreadRepo <- function(repoDir){
+   stopifnot( is.character(repoDir) )
+   repoDir <- archivist:::checkDirectory(repoDir)
+   showLocalRepo(repoDir, method = "tags") -> tags_df
+   
+   strsplit(tags_df$tag, ":") %>%
+   lapply( function(element){
+     if (length(element) > 2) {
+       element[2] <- paste0(element[-1], collapse = ":")
+       element <- element[1:2]
+     }
+     element
+   }) %>% 
+     simplify2array %>%
+     t %>%
+     cbind(tags_df) -> tags_df
+   tags_df <- tags_df[, c(3,1,2,5)]
+   names(tags_df)[2:3] <- c("tagName", "tagValue")
+   tags_df
+ }
> 
> 

examples

normal way

> # examples
> 
> 
> # normal way
> showLocalRepo(method = "tags")
                           artifact                      tag         createdDate
1  ff575c261c949d073b2895b05d1097c3                name:iris 2015-09-30 19:54:49
2  ff575c261c949d073b2895b05d1097c3     varname:Sepal.Length 2015-09-30 19:54:49
3  ff575c261c949d073b2895b05d1097c3      varname:Sepal.Width 2015-09-30 19:54:49
4  ff575c261c949d073b2895b05d1097c3     varname:Petal.Length 2015-09-30 19:54:49
5  ff575c261c949d073b2895b05d1097c3      varname:Petal.Width 2015-09-30 19:54:49
6  ff575c261c949d073b2895b05d1097c3          varname:Species 2015-09-30 19:54:49
7  ff575c261c949d073b2895b05d1097c3         class:data.frame 2015-09-30 19:54:49
8  ff575c261c949d073b2895b05d1097c3 date:2015-09-30 19:54:49 2015-09-30 19:54:49
9  40beeb27e7bb0b84d415d6bdc06f4a62               name:iris3 2015-09-30 19:54:49
10 40beeb27e7bb0b84d415d6bdc06f4a62              class:array 2015-09-30 19:54:49
11 40beeb27e7bb0b84d415d6bdc06f4a62 date:2015-09-30 19:54:49 2015-09-30 19:54:49
12 a9c4ea5c0ad4a8493e726e1e4223aa18             name:longley 2015-09-30 19:54:49
13 a9c4ea5c0ad4a8493e726e1e4223aa18     varname:GNP.deflator 2015-09-30 19:54:49
14 a9c4ea5c0ad4a8493e726e1e4223aa18              varname:GNP 2015-09-30 19:54:49
15 a9c4ea5c0ad4a8493e726e1e4223aa18       varname:Unemployed 2015-09-30 19:54:49
16 a9c4ea5c0ad4a8493e726e1e4223aa18     varname:Armed.Forces 2015-09-30 19:54:49
17 a9c4ea5c0ad4a8493e726e1e4223aa18       varname:Population 2015-09-30 19:54:49
18 a9c4ea5c0ad4a8493e726e1e4223aa18             varname:Year 2015-09-30 19:54:49
19 a9c4ea5c0ad4a8493e726e1e4223aa18         varname:Employed 2015-09-30 19:54:49
20 a9c4ea5c0ad4a8493e726e1e4223aa18         class:data.frame 2015-09-30 19:54:49
21 a9c4ea5c0ad4a8493e726e1e4223aa18 date:2015-09-30 19:54:49 2015-09-30 19:54:49
> 

new way

> # new way
> spreadRepo("format_test") 
                           artifact tagName            tagValue         createdDate
1  ff575c261c949d073b2895b05d1097c3    name                iris 2015-09-30 19:38:31
2  ff575c261c949d073b2895b05d1097c3 varname        Sepal.Length 2015-09-30 19:38:31
3  ff575c261c949d073b2895b05d1097c3 varname         Sepal.Width 2015-09-30 19:38:31
4  ff575c261c949d073b2895b05d1097c3 varname        Petal.Length 2015-09-30 19:38:31
5  ff575c261c949d073b2895b05d1097c3 varname         Petal.Width 2015-09-30 19:38:31
6  ff575c261c949d073b2895b05d1097c3 varname             Species 2015-09-30 19:38:31
7  ff575c261c949d073b2895b05d1097c3   class          data.frame 2015-09-30 19:38:31
8  ff575c261c949d073b2895b05d1097c3    date 2015-09-30 19:38:31 2015-09-30 19:38:31
9  40beeb27e7bb0b84d415d6bdc06f4a62    name               iris3 2015-09-30 19:38:31
10 40beeb27e7bb0b84d415d6bdc06f4a62   class               array 2015-09-30 19:38:31
11 40beeb27e7bb0b84d415d6bdc06f4a62    date 2015-09-30 19:38:31 2015-09-30 19:38:31
12 a9c4ea5c0ad4a8493e726e1e4223aa18    name             longley 2015-09-30 19:38:31
13 a9c4ea5c0ad4a8493e726e1e4223aa18 varname        GNP.deflator 2015-09-30 19:38:31
14 a9c4ea5c0ad4a8493e726e1e4223aa18 varname                 GNP 2015-09-30 19:38:31
15 a9c4ea5c0ad4a8493e726e1e4223aa18 varname          Unemployed 2015-09-30 19:38:31
16 a9c4ea5c0ad4a8493e726e1e4223aa18 varname        Armed.Forces 2015-09-30 19:38:31
17 a9c4ea5c0ad4a8493e726e1e4223aa18 varname          Population 2015-09-30 19:38:31
18 a9c4ea5c0ad4a8493e726e1e4223aa18 varname                Year 2015-09-30 19:38:31
19 a9c4ea5c0ad4a8493e726e1e4223aa18 varname            Employed 2015-09-30 19:38:31
20 a9c4ea5c0ad4a8493e726e1e4223aa18   class          data.frame 2015-09-30 19:38:31
21 a9c4ea5c0ad4a8493e726e1e4223aa18    date 2015-09-30 19:38:31 2015-09-30 19:38:31
> 
> 
> spreadRepo("format_test") %>%
+   group_by(tagName) %>%
+   summarise(count = n())
Source: local data frame [4 x 2]

  tagName count
   (fctr) (int)
1   class     3
2    date     3
3    name     3
4 varname    12
@eliotmcintire
Copy link
Contributor

I would add:

else if(length(element)==1) {
    element <- c("tag", element)
}

resulting in:

if (length(element) > 2) {
    element[2] <- paste0(element[-1], collapse = ":")
    element <- element[1:2]
} else if(length(element)==1) {
    element <- c("tag", element)
}
element

for the case where there is no tag label (which I believe is possible to create).

@MarcinKosinski
Copy link
Collaborator Author

I think since we'll have this, the summary*Repo functions will not be necessary or could became a wrapper around this functionality. Good that @wchodor is willing to finally add this function to archivist with local and github version

@wchodor
Copy link
Collaborator

wchodor commented Nov 16, 2015

splitTagsLocal and splitTagsGithub have been created. I decided to name them that way to keep cohesion with getTagsLocal and getTagsGithub. Here is the link to the commit:
44a2452

@MarcinKosinski MarcinKosinski changed the title spreadRepo function spreadRepo/splitTags function Nov 16, 2015
@MarcinKosinski
Copy link
Collaborator Author

Well done @wchodor ! I'll try to test this function in few days.
By the way, we were talking with @pbiecek last time to maybe change the functions' name convention as they are to long right now.

As we can see from @wchodor's code here: https://github.com/pbiecek/archivist/blob/master/R/splitTags.R#L120-L169
we could practically have only 1 function - splitTags - instead of Local and Github version.
What do you think about that?

If that is fine we could also add new functions for other sister functions, starting from copyRepo as this is the simplest. We can keep long function's names as many users can still be using them, but with a note, that this will depracte in the next archivist release (for version 1.9)

@wchodor
Copy link
Collaborator

wchodor commented Nov 17, 2015

As far as I'm concerned long names don't bother me. The extension local or github allows you to use special parameters. If we were to use only one function then a user should puzzle which parameters use with local and which parameters with github.
But this is only my point of view.

@MarcinKosinski
Copy link
Collaborator Author

Or we can have splitTagsLocal, splitTagsGithub and splitTags ?
Most of the parameters can be set via aoptions and in the future probably
you'll only use
function without any parameter besides artifact, so the shorter name, then
probably the better.

2015-11-17 9:40 GMT+01:00 Witold Chodor notifications@github.com:

As far as I'm concerned long names don't bother me. The extension local
or github allows you to use special parameters. If we were to use only
one function then a user should puzzle which parameters use with local
and which parameters with github.
But this is only my point of view.


Reply to this email directly or view it on GitHub
#91 (comment).

@MarcinKosinski
Copy link
Collaborator Author

So far it looks good with only 2 sister names/functions :)

MarcinKosinski added a commit that referenced this issue Jan 31, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants