Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Reporting the intersection results #85

Open
skarunan opened this issue Jul 12, 2017 · 30 comments
Open

Feature Request: Reporting the intersection results #85

skarunan opened this issue Jul 12, 2017 · 30 comments

Comments

@skarunan
Copy link

I couldn't find a way to extract the intersecting values. For instance, reporting the names of movies that fall under more than one genre (action, thriller, drama). At least for me, extracting those names which fall in just one category or more than one specific categories will be a nice feature for the package.

@rotifergirl
Copy link

I would also appreciate this feature!

@janstrauss1
Copy link

Yes, I'd love this feature, too and would be very keen to get the lists of elements for intersections.

@wkarno
Copy link

wkarno commented Jul 28, 2017

I would also love to have this feature! Or if anyone @rotifergirl @sivarajankarunanithi @janstrauss1 has found a work around please let me know!

@rotifergirl
Copy link

The only work-around I've found is calculate.overlap() from the VennDiagram package, but it's limited in the number of sets it can deal with, so I had to break my sets into groups of three, and then compare the largest overlaps from each initial comparison to find the total overlap.

subset1<-list(set1,set2,set3,set4,set5)
overlap1<-calculate.overlap(subset1)
subset2<-list(set6,set7,set8,set9,set10)
overlap2<-calculate.overlap(subset2)
subset3<-list(set11,set12,set13,set14)
overlap3<-calculate.overlap(subset3)
bigoverlap1<-as.factor(overlap1[[1]])
bigoverlap2<-as.factor(overlap2[[1]])
bigoverlap3<-as.factor(overlap3[[1]])
alloverlaps<-list(bigoverlap1,bigoverlap2,bigoverlap3)
finaloverlap<-calculate.overlap(alloverlaps)

This way, finaloverlap[1] is a list of the elements in your largest overlap for all sets, and other groups would have to be worked out another way. It's not elegant, but it did the job for what I needed so it may be helpful!

@janstrauss1
Copy link

janstrauss1 commented Jul 31, 2017

I have found a work-around using a web tool developed by the group of Yves van de Peer at the University of Ghent, Belgium allowing to print lists of elements which are in each intersection or are unique to a certain list. The tool is accessible at http://bioinformatics.psb.ugent.be/webtools/Venn/.
I've noted, however, that sometimes there are slight discrepancies between the number of elements predicted by UpSetR and the web tool. Overall, it works well though!

@skarunan
Copy link
Author

skarunan commented Aug 2, 2017

@janstrauss1 Thanks for that solution!

@jananiravi
Copy link

@ngehlenborg @JakeConway @alexsb
I find UpSetR to be very helpful with many intuitive parameters.
As suggested on Twitter, I'd like to make a feature request for "Exporting intersection/set data generated behind the scenes by UpSet plots". This data could be used alongside ggplot/other packages to further manipulate and re-plot. Right now assigning the plot to a variable doesn't save anything (NULL). Thanks!

@docmanny
Copy link

docmanny commented Sep 7, 2017

While it would be best if upset() would return a container with the row numbers of the members of each set, in the meantime, I wrote some code using dplyr and tibble to get the members of the intersections from the binary table:

get_intersect_members <- function (x, ...){
  require(dplyr)
  require(tibble)
  x <- x[,sapply(x, is.numeric)][,0<=colMeans(x[,sapply(x, is.numeric)],na.rm=T) & colMeans(x[,sapply(x, is.numeric)],na.rm=T)<=1]
  n <- names(x)
  x %>% rownames_to_column() -> x
  l <- c(...)
  a <- intersect(names(x), l)
  ar <- vector('list',length(n)+1)
  ar[[1]] <- x
  i=2
  for (item in n) {
    if (item %in% a){
      if (class(x[[item]])=='integer'){
        ar[[i]] <- paste(item, '>= 1')
        i <- i + 1
      }
    } else {
      if (class(x[[item]])=='integer'){
        ar[[i]] <- paste(item, '== 0')
        i <- i + 1
      }
    }
  }
  do.call(filter_, ar) %>% column_to_rownames() -> x
  return(x)
}
# get_intersect_members() takes as arguments a dataframe that has been formatted as a binary 
#     table, such as movies from the UpSetR vignette; as well as a series of strings with the names of 
#     columns you wish to test membership for.

# Find all movies that are exclusively dramas:
get_intersect_members <- function (movies, 'Drama')

# Find all movies that are both dramas and comedies:
get_intersect_members <- function (movies, 'Drama', 'Comedy')

# You can also provide the arguments as a vector of column names:
get_intersect_members <- function (movies, c('Drama', 'Comedy'))

For convenience, if you only have your data in the form of a named list (like me) here is a modified version of the fromList function in the package that conserves the item names as row names:

fromList <- function (input) {
  # Same as original fromList()...
  elements <- unique(unlist(input))
  data <- unlist(lapply(input, function(x) {
      x <- as.vector(match(elements, x))
      }))
  data[is.na(data)] <- as.integer(0)
  data[data != 0] <- as.integer(1)
  data <- data.frame(matrix(data, ncol = length(input), byrow = F))
  data <- data[which(rowSums(data) != 0), ]
  names(data) <- names(input)
  # ... Except now it conserves your original value names!
  row.names(data) <- elements
  return(data)
  }

#Example list:
lst <- list(a=c("CARD11_0","EZH2_0","HOXD11_0","FGFR1_0","FGFR1_1"), b=c("EZH2_0","EZH2_0","HOXD11_0","FGFR1_0","FGFR1_0"))

# Binary table with colnames:
fromList(lst)

@bdecato
Copy link

bdecato commented Nov 2, 2017

Also interested in this feature. @docmanny's solution worked very well for me in the meantime though, thanks a lot!!

@cgoo4
Copy link

cgoo4 commented Nov 17, 2017

This feature would be very helpful.

@Tato14
Copy link

Tato14 commented Dec 13, 2017

Hi! First post here, hope I make myself clear :)

In order to find the list of intersected values I used "Reduce" and "combn". I reuse a previous post for this (https://stackoverflow.com/questions/24748170/finding-all-possible-combinations-of-vector-intersections). The code:

combos <- Reduce(c,lapply(2:length(v), 
                          function(x) combn(1:length(v),x,simplify=FALSE) ))
intersect <- lapply(combos, function(x) Reduce(intersect,v[x]) )

Where "v" is the list generated file without fromList transformation.

It's important to say that UpSetR only plots the unique values. This means that the values in the intersection between list 2, 3 and 4 would be only the values that appear in this condition and not the intersected values appeared also in list 1, 2, 3 and 4.
Cheers,

@sfd99
Copy link

sfd99 commented Apr 5, 2018

I tried using the great
get_intersect_members() function
by docmanny (his comment above),
to display rows with intersections in a data frame.

Tried first with the movies DF

get_intersect_members(movies, "Drama") # Fx by docmanny = THIS WORKS!

and later
with the Data Frame in this great UpsetR Tutorial by James Lloyd:
https://www.badgrammargoodsyntax.com/compbio/2018/3/25/compbio-021-upsets-to-replace-complex-venn-diagrams

I did this:

FantasticBeasts_df <- read.table(file = "Fantastic_beasts.txt", header = T, sep = "\t");
upset(fromList(FantasticBeasts_df), order.by = "freq", nsets = 6);

But... the docmanny FX
does not work with the FantasticBeasts_df...

get_intersect_members(FantasticBeasts_df, "'Goblin")
data frame with 0 columns and 109 rows

get_intersect_members(FantasticBeasts_df, "Goblin","Unicorn")
data frame with 0 columns and 109 rows

So,
the docmanny Fx works ok for the movies DF,
but NOT for the FantasticBeasts_df . :-(
Why not? They are both in dataframe format...
And both plot OK in the UpsetR pkg...

Am I missing something
in the call to the get_intersect_members() Fx?

Any suggestions welcome...
SFd99
San Francisco

  • latest Rstudio & R
  • Ubuntu Linux 14.04 LTS

@achamess
Copy link

achamess commented May 5, 2018

I also think this would be very useful.

@brgenzim
Copy link

+1 I'd love it too

@docmanny
Copy link

docmanny commented Aug 1, 2018

@sfd99 Hi! Sorry I just saw the @ mention, but I do have an answer for you if it still helps. Note that the movies dataset already has a binary membership table inside of it, which is what get_intersect_members() uses to pull out members.
As you noted, this doesn't work:

get_intersect_members(FantasticBeasts_df, "'Goblin")
data frame with 0 columns and 109 rows

However, this does:

# You can use either my variant of fromList or the original UpSetR fromList
get_intersect_members(fromList(FantasticBeasts_df), "Unicorn")
# Rownames are from my fromList variant
       Dragon Unicorn House_Elf Goblin Blast.Ended_Skrewt Manticore
g00106      0       1         0      0                  0         0
g00107      0       1         0      0                  0         0
g00108      0       1         0      0                  0         0
g00109      0       1         0      0                  0         0
g00110      0       1         0      0                  0         0
g00111      0       1         0      0                  0         0
g00112      0       1         0      0                  0         0
g00113      0       1         0      0                  0         0
g00114      0       1         0      0                  0         0
g00115      0       1         0      0                  0         0
g00116      0       1         0      0                  0         0
g00117      0       1         0      0                  0         0
g00118      0       1         0      0                  0         0
g00119      0       1         0      0                  0         0
g00120      0       1         0      0                  0         0
g00121      0       1         0      0                  0         0
g00122      0       1         0      0                  0         0
g00123      0       1         0      0                  0         0
g00124      0       1         0      0                  0         0
g00125      0       1         0      0                  0         0
g00126      0       1         0      0                  0         0
g00208      0       1         0      0                  0         0
g00209      0       1         0      0                  0         0
g00210      0       1         0      0                  0         0
g00211      0       1         0      0                  0         0
g00212      0       1         0      0                  0         0
g00213      0       1         0      0                  0         0
g00214      0       1         0      0                  0         0
g00215      0       1         0      0                  0         0
g00216      0       1         0      0                  0         0

You can also see in the UpSetR graph that Goblin and Unicorn only appear together in larger sets, so there are no members that are exclusive to that intersection. As a result, get_intersection_members would not return anything for get_intersection_members((fromList(FantasticBeasts_df), "Goblin", "Unicorn") because it only returns exclusive group members.

Hope that helps!

@seb-mueller
Copy link

Maybe I overlooked it, but having played around with the above solutions, I was still missing a all-in-one function producing a list with all occurring group combination.
Therefore I've put something together which might be useful to others (it certainly is for me).
I've put the code also in a Gist for further improvments etc.

Note that is best works with a named matrix created by the modified fromList function above which needs to be loaded first.

The below function takes is a bit bulky due to documentation which attempts to show intermediate results for understand whats going on:

overlapGroups <- function (listInput, sort = TRUE) {
  # listInput could look like this:
  # $one
  # [1] "a" "b" "c" "e" "g" "h" "k" "l" "m"
  # $two
  # [1] "a" "b" "d" "e" "j"
  # $three
  # [1] "a" "e" "f" "g" "h" "i" "j" "l" "m"
  listInputmat    <- fromList(listInput) == 1
  #     one   two three
  # a  TRUE  TRUE  TRUE
  # b  TRUE  TRUE FALSE
  #...
  # condensing matrix to unique combinations elements
  listInputunique <- unique(listInputmat)
  grouplist <- list()
  # going through all unique combinations and collect elements for each in a list
  for (i in 1:nrow(listInputunique)) {
    currentRow <- listInputunique[i,]
    myelements <- which(apply(listInputmat,1,function(x) all(x == currentRow)))
    attr(myelements, "groups") <- currentRow
    grouplist[[paste(colnames(listInputunique)[currentRow], collapse = ":")]] <- myelements
    myelements
    # attr(,"groups")
    #   one   two three 
    # FALSE FALSE  TRUE 
    #  f  i 
    # 12 13 
  }
  if (sort) {
    grouplist <- grouplist[order(sapply(grouplist, function(x) length(x)), decreasing = TRUE)]
  }
  attr(grouplist, "elements") <- unique(unlist(listInput))
  return(grouplist)
  # save element list to facilitate access using an index in case rownames are not named
}

How to use it (use case):

library(UpSetR)
# example of list input (list of named vectors)
listInput <- list(one = letters[ c(1, 2, 3, 5, 7, 8, 11, 12, 13) ], 
                  two = letters[ c(1, 2, 4, 5, 10) ], 
                  three = letters[ c(1, 5, 6, 7, 8, 9, 10, 12, 13) ])

### that's pretty much all that's needed..
li <- overlapGroups(listInput)
###

# list of all elements:
 attr(li, "elements")
#  [1] "a" "b" "c" "e" "g" "h" "k" "l" "m" "d" "j" "f" "i"

# which elements are in the biggest group?
 li[1]
# $`one:three`
# g h l m 
# 5 6 8 9 
# attr(,"groups")
#   one   two three 
#  TRUE FALSE  TRUE 

 names(li[[1]])
# [1] "g" "h" "l" "m"
 attr(li, "elements")[li[[1]]]
# [1] "g" "h" "l" "m"

# full list
li
# $`one:three`
# g h l m 
# 5 6 8 9 
# attr(,"groups")
#   one   two three 
#  TRUE FALSE  TRUE 
# 
# $`one:two:three`
# a e 
# 1 4 
# attr(,"groups")
#   one   two three 
#  TRUE  TRUE  TRUE 
# 
##### cut out a bit #####
# $`two:three`
#  j 
# 11 
# attr(,"groups")
#   one   two three 
# FALSE  TRUE  TRUE 
# 
# attr(,"elements")
#  [1] "a" "b" "c" "e" "g" "h" "k" "l" "m" "d" "j" "f" "i"

@kasadevall
Copy link

kasadevall commented Sep 17, 2018

Hey everyone, I was also struggling with this, so I also came up with a solution. I'm sure it could be more elegant, but I think it does the trick (improvements welcome, of course!). In my case, docmanny's solution did not work.

I work with a long list of genes (>800) across several disease types (mutated vs no mutated). The function I provide works with a dataframe as an input, each row a gene and each column a disease (i.e. set). It is not thought for working with more columns than these, so please exclude those that contain extra annotations (for example in the movies.csv, columns like AvgRating should be excluded).

The user can provide sets of interest to be included or excluded (or both) from the intersection of interest.

library(tidyverse)

upset.intersection = function(dataframe, sets_in = NULL, sets_out = NULL){
  
  #This step is used in order to operate with column names. The returning dataframe will have the same colnames as the input.
  data <- dataframe
  names(data)[1] <- "ID"
  
  #Provide names of sets of interest to exclude
  if(!is.null(sets_in) & is.null(sets_out)){
    data <- dplyr::select(data, ID, sets_in)
    data$colsum <- rowSums(data[,2:length(data)])
    intersect_members <- data %>% filter(colsum == length(data)-2)
    names(intersect_members)[1] <- names(dataframe)[1]
    return(intersect_members)
  
  #Provide names of sets to exclude (its equivalent as prividing sets_in without a given set or number of sets)
  }else if(!is.null(sets_out) & is.null(sets_in)){
    data <- dplyr::select(data, -sets_out)
    data$colsum <- rowSums(data[,2:length(data)])
    intersect_members <- data %>% filter(colsum == length(data)-2)
    names(intersect_members)[1] <- names(dataframe)[1]
    return(intersect_members)
  
  #This is to explore those intersections that involve items present in some intersections but are explicitly absent in others
  }else if(!is.null(sets_in) & !is.null(sets_out)){
    data_in <- dplyr::select(data, ID, sets_in)
    data_in$colsum <- rowSums(data_in[,2:length(data_in)])
    intersect_in <- data_in %>% filter(colsum == length(data_in)-2)
    
    data_out <- dplyr::select(data, ID, sets_out)
    data_out$colsum <- rowSums(data_out[,2:length(data_out)])
    intersect_out <- data_out %>% filter(colsum == 0)
    
    intersection <- intersect(intersect_in$ID, intersect_out$ID)
    
    intersect_members <- data %>% filter(ID %in% intersection)
    
    names(intersect_members)[1] <- names(dataframe)[1]
    
    return(intersect_members)
    
  }else{
    data$colsum <- rowSums(data[,2:length(data)])
    intersect_members <- data %>% filter(colsum == length(data)-2)
    names(intersect_members)[1] <- names(dataframe)[1]
    return(intersect_members)
  }
}

Hope it helps,

David

@sudhirthakurela
Copy link

sudhirthakurela commented Oct 24, 2018

This would be a valuable addition.

I have been using "Vennerable" package to get interactions for any number of groups.

library(Vennerable)
temp=Venn(list("a"=a,"b"=b,"c"=c,"d"=d,"e"=e,"f"=f,"g"=g))  #provide all your groups as list
temp@IntersectionSets$`1111111`  #all overlapping values

"temp" will contain all interaction values.
You can retrieve any interaction sets by specifying corresponding overlap.

Hope this is helpful.

ST

@ghost
Copy link

ghost commented Feb 21, 2019

@docmanny , your's was a great suggestion!
I tried to use your code and on the dataset movies, and it workes wonderfully.

@docmanny On my dataset doesn't work instead, even though I have similar data formatting.
I get the following error, which I cannot completely understand.

#These are the data
d <- data.frame(A = sample(c(0,1), size = 20, replace = T), C = sample(c(1,0), size = 20, replace = T), B = sample(c(1,0), size = 20, replace = T))

'data.frame':	20 obs. of  3 variables:
 $ A: num  1 1 0 0 1 1 0 0 1 0 ...
 $ C: num  1 0 0 0 1 1 1 1 0 1 ...
 $ B: num  1 0 1 1 1 1 1 1 1 1 ...

#Then I call get_intersect_members and I get the following error

get_intersect_members(d, "A")
Error: Can't convert a NULL to a quosure
Call `rlang::last_error()` to see a backtrace
Called from: abort(sprintf("Can't convert a %s to a quosure", typeof(lazy)))
Browse[1]> 

@docmanny do you have any idea, where the problem might be?
Thanks a lot!

@docmanny
Copy link

docmanny commented Feb 21, 2019

@efr3m It was actually a really subtle error born from an assumption on my part. I assumed people would give dataframes with integer values, but 1 and 0 can also be numeric. Because in get_intersect_members I made a check for integer values - and not numeric values - your search query failed with a really unhelpful error.

Use this variant of the function instead:

get_intersect_members <- function (x, ...){
    require(dplyr)
    require(tibble)
    # the following makes sure that we don't have any weird values in the dataframe
    x <- x[,sapply(x, is.numeric)][,0<=colMeans(x[,sapply(x, is.numeric)],na.rm=T) & colMeans(x[,sapply(x, is.numeric)],na.rm=T)<=1]
    n <- names(x)
    #convert rownames to a column to prevent mulching by tidyr
    x %>% rownames_to_column() -> x
    l <- c(...)
    a <- intersect(names(x), l)
    ar <- vector('list',length(n)+1)
    ar[[1]] <- x
    i=2
    for (item in n) {
        if (item %in% a){
            if (class(x[[item]])=='numeric'){   #Now uses numeric instead of integer
                ar[[i]] <- paste(item, '>= 1')
                i <- i + 1
            }
        } else {
            if (class(x[[item]])=='numeric'){
                ar[[i]] <- paste(item, '== 0')
                i <- i + 1
            }
        }
    }
    do.call(filter_, ar) %>% column_to_rownames() -> x
    return(x)
}

You should get:

> get_intersect_members(d,"A")
   A C B
5  1 0 0
10 1 0 0
12 1 0 0
18 1 0 0

@smoenga55
Copy link

This would be a valuable addition.

I have been using "Vennerable" package to get interactions for any number of groups.

library(Vennerable)
temp=Venn(list("a"=a,"b"=b,"c"=c,"d"=d,"e"=e,"f"=f,"g"=g))  #provide all your groups as list
temp@IntersectionSets$`1111111`  #all overlapping values

"temp" will contain all interaction values.
You can retrieve any interaction sets by specifying corresponding overlap.

Hope this is helpful.

ST

Awesome! Very helpful

@Jakob37
Copy link

Jakob37 commented May 14, 2020

Hi! Also struggling with this one. I found some useful hints here! I ended up doing a Tidyverse approach using the filter_at function.

After calling the UpSetR::fromList() I obtain a data frame with 0 and 1 values, similarly to the d data frame generated above.

> set.seed(37)
> d <- data.frame(A = sample(c(0,1), size = 20, replace = T), C = sample(c(1,0), size = 20, replace = T), B = sample(c(1,0), size = 20, replace = T))
> head(d)
  A C B
1 1 1 0
2 0 0 1
3 1 0 0
4 1 1 1
5 0 1 0
6 1 0 0

To get entries overlapping in all:

> d %>% filter_at(vars(c("A", "B", "C")), ~.==1)
  A C B
1 1 1 1
2 1 1 1
3 1 1 1

To get entries only present in A:

> d %>% filter_at(vars(c("A")), ~.==1) %>% filter_at(vars(c("B", "C")), ~.==0)
  A C B
1 1 0 0
2 1 0 0
3 1 0 0

If we want to preserve the row id it could be included in a separate column.

> d %>% rownames_to_column("rowid") %>% filter_at(vars(c("A", "B", "C")), ~.==1)
  rowid A C B
1     4 1 1 1
2    11 1 1 1
3    16 1 1 1

@sfd99
Copy link

sfd99 commented May 14, 2020

@Jakob37 Great.

But recently,
tidy syntax must have changed in dplyr 0.8.5 ?.
(they now require "where()" ?...).

In your 1st example, I get:

d %>% filter_at(vars(c("A", "B", "C"), ~.==1) + )

Error: Formula shorthand must be wrapped in where().
Bad data %>% select(. == 1)
Good data %>% select(where(
. == 1))
Run rlang::last_error() to see where the error occurred.

@Jakob37
Copy link

Jakob37 commented May 14, 2020

@sfd99 Thank you for your comment! I think I made a typo. It should be an additional end parenthesis i.e. '"C"))' instead of '"C")' .

This one runs for me (also using dplyr 0.8.5):

d %>% filter_at(vars(c("A", "B", "C")), ~.==1)

I corrected it in the example above. Thanks!

Edit: Regarding the where() warning, that is new to me. I do not get that warning after loading dplyr 0.8.5, but it might be that I have missed something there.

@sfd99
Copy link

sfd99 commented May 14, 2020

Works 100% now.
Thanks / Tak!

@sfd99
Copy link

sfd99 commented May 14, 2020

@Jakob37

last quick question,
(not an Issue, really)

d %>% rownames_to_column("rowid") %>% filter_at(vars(c("B": "C")), ~.==1)
rowid A C B
1 2 1 1 1
2 3 1 1 1
3 16 0 1 1
4 18 1 1 1

good!
Now, using your syntax,
how to exclude rows where cols: "B" and "C"
are exclusively value=1,
and any other cols in d (ie: "A") are =zero?.

A result like this:
rowid A C B
1 16 0 1 1 # only d row 16 is shown,
because it's the row where only "B" and "C" are ==1...
all OTHER cols are zero.

@Jakob37
Copy link

Jakob37 commented May 14, 2020

Hi! If I understand you correctly you want rows where B and C exclusively are 1. For this I did a separate filtering step.

d %>% rownames_to_column("rowid") %>% filter_at(vars(c("A", "B")), ~.==1) %>% filter_at(vars(c("C")), ~.==0)

Or more general (for any non-A/B column, even if there are more).

target_cols <- c("A", "B")
other_cols <- colnames(d)[!(colnames(d) %in% target_cols)]
d %>% rownames_to_column("rowid") %>% filter_at(vars(all_of(target_cols)), ~.==1) %>% filter_at(vars(all_of(other_cols)), ~.==0)

Hope this helps!

@sfd99
Copy link

sfd99 commented May 14, 2020

Very clear, works perfectly.
Thanks Jakob & stay safe!.

@cparsania
Copy link

Maybe I overlooked it, but having played around with the above solutions, I was still missing a all-in-one function producing a list with all occurring group combination. Therefore I've put something together which might be useful to others (it certainly is for me). I've put the code also in a Gist for further improvments etc.

Note that is best works with a named matrix created by the modified fromList function above which needs to be loaded first.

The below function takes is a bit bulky due to documentation which attempts to show intermediate results for understand whats going on:

overlapGroups <- function (listInput, sort = TRUE) {
  # listInput could look like this:
  # $one
  # [1] "a" "b" "c" "e" "g" "h" "k" "l" "m"
  # $two
  # [1] "a" "b" "d" "e" "j"
  # $three
  # [1] "a" "e" "f" "g" "h" "i" "j" "l" "m"
  listInputmat    <- fromList(listInput) == 1
  #     one   two three
  # a  TRUE  TRUE  TRUE
  # b  TRUE  TRUE FALSE
  #...
  # condensing matrix to unique combinations elements
  listInputunique <- unique(listInputmat)
  grouplist <- list()
  # going through all unique combinations and collect elements for each in a list
  for (i in 1:nrow(listInputunique)) {
    currentRow <- listInputunique[i,]
    myelements <- which(apply(listInputmat,1,function(x) all(x == currentRow)))
    attr(myelements, "groups") <- currentRow
    grouplist[[paste(colnames(listInputunique)[currentRow], collapse = ":")]] <- myelements
    myelements
    # attr(,"groups")
    #   one   two three 
    # FALSE FALSE  TRUE 
    #  f  i 
    # 12 13 
  }
  if (sort) {
    grouplist <- grouplist[order(sapply(grouplist, function(x) length(x)), decreasing = TRUE)]
  }
  attr(grouplist, "elements") <- unique(unlist(listInput))
  return(grouplist)
  # save element list to facilitate access using an index in case rownames are not named
}

How to use it (use case):

library(UpSetR)
# example of list input (list of named vectors)
listInput <- list(one = letters[ c(1, 2, 3, 5, 7, 8, 11, 12, 13) ], 
                  two = letters[ c(1, 2, 4, 5, 10) ], 
                  three = letters[ c(1, 5, 6, 7, 8, 9, 10, 12, 13) ])

### that's pretty much all that's needed..
li <- overlapGroups(listInput)
###

# list of all elements:
 attr(li, "elements")
#  [1] "a" "b" "c" "e" "g" "h" "k" "l" "m" "d" "j" "f" "i"

# which elements are in the biggest group?
 li[1]
# $`one:three`
# g h l m 
# 5 6 8 9 
# attr(,"groups")
#   one   two three 
#  TRUE FALSE  TRUE 

 names(li[[1]])
# [1] "g" "h" "l" "m"
 attr(li, "elements")[li[[1]]]
# [1] "g" "h" "l" "m"

# full list
li
# $`one:three`
# g h l m 
# 5 6 8 9 
# attr(,"groups")
#   one   two three 
#  TRUE FALSE  TRUE 
# 
# $`one:two:three`
# a e 
# 1 4 
# attr(,"groups")
#   one   two three 
#  TRUE  TRUE  TRUE 
# 
##### cut out a bit #####
# $`two:three`
#  j 
# 11 
# attr(,"groups")
#   one   two three 
# FALSE  TRUE  TRUE 
# 
# attr(,"elements")
#  [1] "a" "b" "c" "e" "g" "h" "k" "l" "m" "d" "j" "f" "i"

Further simplify output of overlapGroups

li2 <- purrr::map(li, ~ attr(li, "elements")[.x] )

@nilaycan
Copy link

nilaycan commented Aug 2, 2022

Javier Herrero nailed what was asked for brilliantly, check here: https://stackoverflow.com/questions/65027133/extract-intersection-list-from-upset-object

Cheers,
Nilay

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests