Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory error when trying to manually sort intersections but not when intersection order is specified with sort_intersections_by #149

Open
lrichter53 opened this issue Apr 25, 2022 · 11 comments
Labels
bug Something isn't working

Comments

@lrichter53
Copy link

I need the intersections sorted in a particular way. This has worked with less intersections, but when I added more I receive a memory error. However when the intersection order isn't specified or is specified with sort_intersections_by the code works.

intersections = list('RaCa2', 'P14K','t1_18','C24S','I4K','S15K', 'A19K', c('C18S','t1_18'),c('C18S','C24S'),c('I4K','A19K'), c('S15K','A19K','T21K'),c('I4K','S15K','t1_18'),c('I4K','A19K','T21K'), c('P3R','I4K','A19K'),c('P3R','S15K','t1_18'), c('P3R','I4K','S15K','t1_18'), c('P3R','I4K','S15K','C18S','t1_18'), c('A19K','t15_23'), c('A19Dab','K22Dab','K23Dab','t15_23','NH2','_CONH2'), c('A19Dap','t15_23','NH2','_CONH2'), c('A19Dap','K22Dap','K23Dap','t15_23','NH2','_CONH2'), c('A19K','d_23','t15_23','NH2','_CONH2'), c('A19K','d_22','t15_23','NH2','_CONH2'), c('A19K','d_19','t15_23','NH2','_CONH2'), c('A19K','d_19','d_22','d_23','t15_23','NH2','_CONH2'), c('A19K','t15_23','NH2','_CONH2'), c('A19K','K22Dap','t15_23','NH3','_CONH2'), c('A19K','K23Dap','t15_23','NH3','_CONH2'), c('A19Dab','t15_23','NH3','_CONH2'), c('A19K','K22Dab','t15_23','NH3','_CONH2'), c('A19K','K23Dab','t15_23','NH3','_CONH2'), c('C18S','A19K','d_19','d_22','d_23','t15_23','NH3','_CONH2'), c('C18S','A19Dap','K22Dap','K23Dap','t15_23','NH3','_CONH2'), c('C18S','A19Dab','K22Dab','K23Dab','t15_23','NH3','_CONH2') )

This produces the error:

Error: cannot allocate vector of size 524287.9 Gb

@krassowski krassowski added the bug Something isn't working label Apr 25, 2022
@krassowski
Copy link
Owner

Looking at the codebase, yes this is unfortunately the case - when using user-provided intersections we are computing all intersections and subsetting rather than compute the intersections of interest directly:

complex-upset/R/data.R

Lines 514 to 515 in e3b51dc

# TODO: this is slow and memory hungry; ideally we would only get the relevant intersection straight away!
possible_intersections = all_intersections_matrix(intersect, NULL, 0, Inf)

Ideally we would instead create an equivalent to observed_intersections_matrix and just multiply it as in:

complex-upset/R/data.R

Lines 534 to 537 in e3b51dc

} else if (intersections == 'observed') {
intersections_matrix = observed_intersections_matrix
colnames(intersections_matrix) = intersect
product_matrix = intersections_matrix %*% unique_members_matrix

But if I recall correctly this was not trivial for some reason...

Without changing the logic too much we could try to change the third and fourth argument in all_intersections_matrix(intersect, NULL, 0, Inf) call to min(saply(intersections, length)) and max(saply(intersections, length)), but I am not sure if it would work for all cases right now.

@a14578
Copy link

a14578 commented Nov 6, 2022

Hi Michal,

I'm having the same issue as lrichter53 (Error: vector memory exhausted (limit reached?)) and I've tried applying the patch mentioned here: https://stackoverflow.com/questions/72820148/complexupset-how-can-i-plot-selected-intersections. But it seems to have made the issue worse as I receive the memory error a lot quicker. Is there any way to fix the error please? I'm using ComplexUpset 1.3.3 have a 92 long list of intersections I need to apply, so it would be great if there was a fix.

Just adding my commands below (its pretty long):

gene_list = c("unknown", "AmpC1", "BBB_1D_v1", "AAA_1_gene", "AAA_2_genes", "AAA_3_genes", "AAA_4_genes", "BBB_KKK_1_gene", "BBB_KKK_2_genes", "BBB_KKK_3_genes", "BIL_CMY_LAP_SCO_KKK_1_gene", "BIL_CMY_LAP_SCO_KKK_2_genes", "UUU_GGG_1_gene", "UUU_GGG_2_genes", "BBB_GGG_1_gene", "KKK_GGG_1_gene", "KKK_GGG_2_genes", "KPC_KKK_MMM_1_gene", "NDM_KKK_MMM_1_gene", "NDM_KKK_MMM_2_genes", "OXA_KKK_MMM_1_gene", "VIM_KKK_MMM_1_gene", "VIM_KKK_MMM_2_genes", "UUU_49_KKK_InhR")


upset(df, gene_list, annotations = list(
  'Infection_status'=(
    ggplot(mapping=aes(fill=Infection_status))
    + ggtitle("alpha") + theme(plot.title = element_text(size = 60, face = "bold"))
    + geom_bar(stat='count', position='fill')
    + scale_y_continuous(labels=scales::percent_format())
    + scale_fill_manual(values=c('Dead'='#ebb860', 'Alive'='#57109e', 'Unknown'='#468a37')) + ylab('Infection_status')),
  'Phenotype'=(
    ggplot(mapping=aes(fill=Phenotype))
    + geom_bar(stat='count', position='fill')
    + scale_y_continuous(labels=scales::percent_format())
    + scale_fill_manual(values=c('grey'='#36e345','intermediate'='#eb5278','sleep'='#1a47db')) + ylab('Phenotype')),
  'Phenotype (mm)'=ggplot(mapping=aes(x=intersection, y=Disk_measurement)) + ggtitle("Phenotype vs cell (DDD) mutations") + geom_hline(yintercept=18, color="pink", size=1, linetype = 'dashed') + annotate("text",x=50, y =17, label = "ECO_O = 18mm",color = "pink",size = 12) + geom_violin(width=1.1, alpha=1.5) + ggbeeswarm::geom_quasirandom(aes(color=DDD_mutations, size = 1)) + guides(color = guide_legend(override.aes = list(size=7))),
  'Phenotypes (mm)'=ggplot(mapping=aes(x=intersection, y=Disk_measurement)) + ggtitle("Phenotype vs Phenotype") + geom_hline(yintercept=18, color="pink", size=1, linetype = 'dashed') + annotate("text",x=50, y =17, label = "ECO_O = 18mm",color = "pink",size = 12) + geom_violin(width=1.1, alpha=1.5) + ggbeeswarm::geom_quasirandom(aes(color=Phenotype, size = 1)) + guides(color = guide_legend(override.aes = list(size=7)))),
  sort_intersections=FALSE, intersections=list(c("AAA_1_gene"), c("AAA_2_genes"), c("AAA_3_genes"), c("AAA_4_genes"), c("UUU_49_KKK_InhR"), c("UUU_GGG_1_gene"), c("UUU_GGG_2_genes"), c("BBB_1D_v1", "AAA_1_gene"),
                                               c("AAA_1_gene", "BIL_CMY_LAP_SCO_KKK_1_gene"), c("AAA_1_gene", "BBB_KKK_1_gene"), c("BBB_1D_v1", "AAA_1_gene", "BBB_KKK_1_gene"), c("BBB_1D_v1", "AAA_1_gene", "BBB_KKK_1_gene", "BIL_CMY_LAP_SCO_KKK_1_gene"), c("AAA_1_gene", "KKK_GGG_1_gene"), c("AAA_1_gene", "UUU_GGG_1_gene"),
                                               c("AAA_1_gene", "UUU_GGG_2_genes"), c("BBB_1D_v1", "AAA_1_gene", "KKK_GGG_1_gene"), c("BBB_1D_v1", "AAA_1_gene", "BBB_GGG_1_gene"), c("BBB_1D_v1", "AAA_1_gene", "BBB_KKK_1_gene", "KKK_GGG_1_gene"), c("BBB_1D_v1", "AAA_1_gene", "BBB_GGG_1_gene", "KKK_GGG_1_gene"), c("AAA_1_gene", "BBB_KKK_1_gene", "KKK_GGG_1_gene"),
                                               c("AAA_1_gene", "BIL_CMY_LAP_SCO_KKK_1_gene", "KKK_GGG_1_gene"), c("BBB_1D_v1", "AAA_1_gene", "BBB_KKK_1_gene", "UUU_GGG_1_gene"), c("BBB_1D_v1", "AAA_1_gene", "BBB_KKK_1_gene", "BIL_CMY_LAP_SCO_KKK_1_gene", "KKK_GGG_1_gene"), c("AAA_1_gene", "NDM_KKK_MMM_1_gene"), c("AAA_1_gene", "BIL_CMY_LAP_SCO_KKK_1_gene", "NDM_KKK_MMM_1_gene"), c("AAA_1_gene", "UUU_GGG_1_gene", "OXA_KKK_MMM_1_gene"),
                                               c("AAA_1_gene", "KKK_GGG_1_gene", "OXA_KKK_MMM_1_gene"), c("AAA_1_gene", "UUU_GGG_1_gene", "KPC_KKK_MMM_1_gene"), c("AAA_1_gene", "BBB_KKK_1_gene", "KKK_GGG_1_gene", "NDM_KKK_MMM_1_gene"), c("AAA_1_gene", "BBB_KKK_1_gene", "UUU_GGG_1_gene", "KPC_KKK_MMM_1_gene"), c("AAA_1_gene", "BBB_KKK_2_genes", "UUU_GGG_1_gene", "KKK_GGG_1_gene", "NDM_KKK_MMM_1_gene"), c("AAA_1_gene", "BBB_KKK_2_genes", "UUU_GGG_1_gene", "BBB_GGG_1_gene", "KKK_GGG_1_gene", "OXA_KKK_MMM_1_gene"),
                                               c("BBB_1D_v1", "AAA_1_gene", "BBB_KKK_1_gene", "BIL_CMY_LAP_SCO_KKK_2_genes", "KKK_GGG_1_gene", "OXA_KKK_MMM_1_gene"), c("BBB_1D_v1", "AAA_1_gene", "BBB_KKK_1_gene", "KKK_GGG_1_gene", "NDM_KKK_MMM_1_gene"), c("BBB_1D_v1", "AAA_1_gene", "BBB_KKK_1_gene", "KKK_GGG_1_gene", "OXA_KKK_MMM_1_gene"), c("BBB_1D_v1", "AAA_1_gene", "BBB_KKK_1_gene", "UUU_GGG_1_gene", "KKK_GGG_1_gene", "OXA_KKK_MMM_1_gene"), c("BBB_1D_v1", "AAA_1_gene", "BBB_KKK_1_gene", "UUU_GGG_2_genes", "NDM_KKK_MMM_1_gene"), c("AAA_1_gene", "BBB_KKK_1_gene", "KKK_GGG_1_gene", "NDM_KKK_MMM_1_gene", "OXA_KKK_MMM_1_gene"),
                                               c("AmpC1", "BBB_1D_v1", "AAA_1_gene", "BBB_KKK_1_gene", "KKK_GGG_2_genes", "NDM_KKK_MMM_1_gene", "OXA_KKK_MMM_1_gene"), c("AAA_1_gene", "UUU_49_KKK_InhR"), c("AAA_2_genes", "BIL_CMY_LAP_SCO_KKK_1_gene"), c("AAA_2_genes", "BBB_KKK_1_gene"), c("BBB_1D_v1", "AAA_2_genes", "BBB_KKK_1_gene"), c("AAA_2_genes", "UUU_GGG_1_gene"), c("AAA_2_genes", "KKK_GGG_1_gene"), c("AAA_2_genes", "NDM_KKK_MMM_1_gene"), c("AAA_2_genes", "BBB_KKK_1_gene", "KPC_KKK_MMM_1_gene"), c("AAA_2_genes", "BIL_CMY_LAP_SCO_KKK_1_gene", "OXA_KKK_MMM_1_gene"), c("BBB_1D_v1", "AAA_2_genes", "KKK_GGG_1_gene"), c("BBB_1D_v1", "AAA_2_genes", "BBB_GGG_1_gene", "KKK_GGG_1_gene"),
                                               c("AAA_2_genes", "BBB_KKK_1_gene", "BBB_GGG_1_gene", "KKK_GGG_1_gene"), c("AAA_2_genes", "BBB_KKK_1_gene", "KKK_GGG_1_gene"), c("AAA_2_genes", "BBB_KKK_2_genes", "KKK_GGG_1_gene"), c("BBB_1D_v1", "AAA_2_genes", "BBB_KKK_1_gene", "BIL_CMY_LAP_SCO_KKK_1_gene", "KKK_GGG_1_gene"), c("BBB_1D_v1", "AAA_2_genes", "BBB_KKK_1_gene", "KKK_GGG_1_gene"), c("BBB_1D_v1", "AAA_2_genes", "BBB_KKK_1_gene", "KKK_GGG_2_genes"), c("AAA_2_genes", "BBB_KKK_1_gene", "BBB_GGG_1_gene", "KKK_GGG_2_genes", "OXA_KKK_MMM_1_gene"), c("AAA_2_genes", "BBB_KKK_2_genes", "KKK_GGG_1_gene", "NDM_KKK_MMM_1_gene"), c("AAA_2_genes", "BBB_KKK_2_genes", "KKK_GGG_1_gene", "OXA_KKK_MMM_1_gene"),
                                               c("AAA_2_genes", "BBB_KKK_2_genes", "BBB_GGG_1_gene", "KKK_GGG_1_gene", "NDM_KKK_MMM_2_genes"), c("AAA_2_genes", "BBB_KKK_1_gene", "KKK_GGG_1_gene", "KPC_KKK_MMM_1_gene"), c("AAA_2_genes", "BBB_KKK_1_gene", "KKK_GGG_1_gene", "NDM_KKK_MMM_1_gene"), c("BBB_1D_v1", "AAA_2_genes", "BBB_GGG_1_gene", "KPC_KKK_MMM_1_gene"), c("BBB_1D_v1", "AAA_2_genes", "KKK_GGG_1_gene", "OXA_KKK_MMM_1_gene"), c("BBB_1D_v1", "AAA_2_genes", "BBB_KKK_1_gene", "KKK_GGG_1_gene", "NDM_KKK_MMM_1_gene", "OXA_KKK_MMM_1_gene"), c("BBB_1D_v1", "AAA_2_genes", "BBB_KKK_1_gene", "KKK_GGG_2_genes", "OXA_KKK_MMM_1_gene"), c("BBB_1D_v1", "AAA_2_genes", "BBB_GGG_1_gene", "KKK_GGG_1_gene", "OXA_KKK_MMM_1_gene"),
                                               c("BBB_1D_v1", "AAA_2_genes", "BBB_GGG_1_gene", "KKK_GGG_2_genes", "NDM_KKK_MMM_1_gene", "OXA_KKK_MMM_1_gene"), c("BBB_1D_v1", "AAA_2_genes", "BBB_KKK_1_gene", "KKK_GGG_1_gene", "OXA_KKK_MMM_1_gene"), c("AAA_2_genes", "BBB_KKK_2_genes", "BIL_CMY_LAP_SCO_KKK_1_gene", "BBB_GGG_1_gene", "KKK_GGG_1_gene", "UUU_49_KKK_InhR"), c("AAA_3_genes", "UUU_GGG_1_gene"), c("AAA_3_genes", "UUU_49_KKK_InhR"), c("BIL_CMY_LAP_SCO_KKK_1_gene", "UUU_GGG_1_gene"), c("BBB_1D_v1", "BBB_KKK_1_gene", "UUU_GGG_1_gene"), c("BBB_1D_v1", "BBB_KKK_1_gene", "BIL_CMY_LAP_SCO_KKK_1_gene", "UUU_GGG_1_gene"), c("BBB_KKK_3_genes", "BIL_CMY_LAP_SCO_KKK_1_gene", "UUU_GGG_1_gene"), c("BBB_1D_v1", "BBB_KKK_1_gene", "UUU_GGG_1_gene", "KKK_GGG_1_gene"),
                                               c("BIL_CMY_LAP_SCO_KKK_1_gene", "UUU_GGG_1_gene", "OXA_KKK_MMM_1_gene"), c("BBB_KKK_1_gene", "UUU_GGG_1_gene", "KPC_KKK_MMM_1_gene"), c("UUU_GGG_1_gene", "VIM_KKK_MMM_1_gene"), c("UUU_GGG_1_gene", "KPC_KKK_MMM_1_gene"), c("BBB_KKK_1_gene", "UUU_GGG_2_genes", "KPC_KKK_MMM_1_gene"), c("BBB_1D_v1", "UUU_GGG_1_gene", "KKK_GGG_1_gene", "NDM_KKK_MMM_1_gene", "OXA_KKK_MMM_1_gene"), c("BBB_1D_v1", "UUU_GGG_2_genes", "BBB_GGG_1_gene", "KPC_KKK_MMM_1_gene"), c("BBB_KKK_2_genes", "UUU_GGG_1_gene", "BBB_GGG_1_gene", "KKK_GGG_1_gene", "NDM_KKK_MMM_1_gene"), c("BBB_KKK_2_genes", "UUU_GGG_1_gene", "BBB_GGG_1_gene", "KPC_KKK_MMM_1_gene"), c("BBB_1D_v1", "BBB_KKK_1_gene", "UUU_GGG_1_gene", "VIM_KKK_MMM_1_gene"), c("BBB_1D_v1", "BIL_CMY_LAP_SCO_KKK_1_gene", "UUU_GGG_1_gene", "BBB_GGG_1_gene", "KKK_GGG_1_gene", "NDM_KKK_MMM_1_gene"),
                                               c("BBB_1D_v1", "BBB_KKK_1_gene", "BIL_CMY_LAP_SCO_KKK_1_gene", "UUU_GGG_2_genes", "NDM_KKK_MMM_1_gene"), c("BBB_1D_v1", "BBB_KKK_1_gene", "BIL_CMY_LAP_SCO_KKK_1_gene", "UUU_GGG_1_gene", "KKK_GGG_1_gene", "NDM_KKK_MMM_2_genes"), c("UUU_GGG_1_gene", "UUU_49_KKK_InhR")),
  queries=list(
    upset_query(intersect=c("unknown"), color='KKKck', fill='KKKck', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_1_gene"), color='grey', fill='grey', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_2_genes"), color='grey', fill='grey', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_3_genes"), color='grey', fill='grey', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_4_genes"), color='grey', fill='grey', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("UUU_49_KKK_InhR"), color='red', fill='red', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("UUU_GGG_1_gene"), color='blue', fill='blue', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("UUU_GGG_2_genes"), color='blue', fill='blue', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "AAA_1_gene"), color='orange', fill='orange', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_1_gene", "BIL_CMY_LAP_SCO_KKK_1_gene"), color='purple', fill='purple', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_1_gene", "BBB_KKK_1_gene"), color='purple', fill='purple', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "AAA_1_gene", "BBB_KKK_1_gene"), color='purple', fill='purple', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "AAA_1_gene", "BBB_KKK_1_gene", "BIL_CMY_LAP_SCO_KKK_1_gene"), color='purple', fill='purple', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_1_gene", "KKK_GGG_1_gene"), color='yellow', fill='yellow', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_1_gene", "UUU_GGG_1_gene"), color='yellow', fill='yellow', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_1_gene", "UUU_GGG_2_genes"), color='yellow', fill='yellow', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "AAA_1_gene", "KKK_GGG_1_gene"), color='KKKck', fill='KKKck', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "AAA_1_gene", "BBB_GGG_1_gene"), color='KKKck', fill='KKKck', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "AAA_1_gene", "BBB_KKK_1_gene", "KKK_GGG_1_gene"), color='KKKck', fill='KKKck', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "AAA_1_gene", "BBB_GGG_1_gene", "KKK_GGG_1_gene"), color='KKKck', fill='KKKck', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_1_gene", "BBB_KKK_1_gene", "KKK_GGG_1_gene"), color='pink', fill='pink', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_1_gene", "BIL_CMY_LAP_SCO_KKK_1_gene", "KKK_GGG_1_gene"), color='pink', fill='pink', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "AAA_1_gene", "BBB_KKK_1_gene", "UUU_GGG_1_gene"), color='red', fill='red', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "AAA_1_gene", "BBB_KKK_1_gene", "BIL_CMY_LAP_SCO_KKK_1_gene", "KKK_GGG_1_gene"), color='red', fill='red', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_1_gene", "NDM_KKK_MMM_1_gene"), color='orange', fill='orange', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_1_gene", "BIL_CMY_LAP_SCO_KKK_1_gene", "NDM_KKK_MMM_1_gene"), color='blue', fill='blue', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_1_gene", "UUU_GGG_1_gene", "OXA_KKK_MMM_1_gene"), color='grey', fill='grey', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_1_gene", "KKK_GGG_1_gene", "OXA_KKK_MMM_1_gene"), color='grey', fill='grey', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_1_gene", "UUU_GGG_1_gene", "KPC_KKK_MMM_1_gene"), color='grey', fill='grey', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_1_gene", "BBB_KKK_1_gene", "KKK_GGG_1_gene", "NDM_KKK_MMM_1_gene"), color='green', fill='green', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_1_gene", "BBB_KKK_1_gene", "UUU_GGG_1_gene", "KPC_KKK_MMM_1_gene"), color='green', fill='green', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_1_gene", "BBB_KKK_2_genes", "UUU_GGG_1_gene", "KKK_GGG_1_gene", "NDM_KKK_MMM_1_gene"), color='green', fill='green', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_1_gene", "BBB_KKK_2_genes", "UUU_GGG_1_gene", "BBB_GGG_1_gene", "KKK_GGG_1_gene", "OXA_KKK_MMM_1_gene"), color='green', fill='green', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "AAA_1_gene", "BBB_KKK_1_gene", "BIL_CMY_LAP_SCO_KKK_2_genes", "KKK_GGG_1_gene", "OXA_KKK_MMM_1_gene"), color='purple', fill='purple', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "AAA_1_gene", "BBB_KKK_1_gene", "KKK_GGG_1_gene", "NDM_KKK_MMM_1_gene"), color='purple', fill='purple', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "AAA_1_gene", "BBB_KKK_1_gene", "KKK_GGG_1_gene", "OXA_KKK_MMM_1_gene"), color='purple', fill='purple', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "AAA_1_gene", "BBB_KKK_1_gene", "UUU_GGG_1_gene", "KKK_GGG_1_gene", "OXA_KKK_MMM_1_gene"), color='purple', fill='purple', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "AAA_1_gene", "BBB_KKK_1_gene", "UUU_GGG_2_genes", "NDM_KKK_MMM_1_gene"), color='purple', fill='purple', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_1_gene", "BBB_KKK_1_gene", "KKK_GGG_1_gene", "NDM_KKK_MMM_1_gene", "OXA_KKK_MMM_1_gene"), color='purple', fill='purple', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AmpC1", "BBB_1D_v1", "AAA_1_gene", "BBB_KKK_1_gene", "KKK_GGG_2_genes", "NDM_KKK_MMM_1_gene", "OXA_KKK_MMM_1_gene"), color='purple', fill='purple', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_1_gene", "UUU_49_KKK_InhR"), color='grey', fill='grey', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_2_genes", "BIL_CMY_LAP_SCO_KKK_1_gene"), color='yellow', fill='yellow', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_2_genes", "BBB_KKK_1_gene"), color='yellow', fill='yellow', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "AAA_2_genes", "BBB_KKK_1_gene"), color='yellow', fill='yellow', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_2_genes", "UUU_GGG_1_gene"), color='red', fill='red', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_2_genes", "KKK_GGG_1_gene"), color='red', fill='red', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_2_genes", "NDM_KKK_MMM_1_gene"), color='purple', fill='purple', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_2_genes", "BBB_KKK_1_gene", "KPC_KKK_MMM_1_gene"), color='purple', fill='purple', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_2_genes", "BIL_CMY_LAP_SCO_KKK_1_gene", "OXA_KKK_MMM_1_gene"), color='purple', fill='purple', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "AAA_2_genes", "KKK_GGG_1_gene"), color='orange', fill='orange', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "AAA_2_genes", "BBB_GGG_1_gene", "KKK_GGG_1_gene"), color='orange', fill='orange', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_2_genes", "BBB_KKK_1_gene", "BBB_GGG_1_gene", "KKK_GGG_1_gene"), color='darkgreen', fill='darkgreen', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_2_genes", "BBB_KKK_1_gene", "KKK_GGG_1_gene"), color='darkgreen', fill='darkgreen', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_2_genes", "BBB_KKK_2_genes", "KKK_GGG_1_gene"), color='darkgreen', fill='darkgreen', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "AAA_2_genes", "BBB_KKK_1_gene", "BIL_CMY_LAP_SCO_KKK_1_gene", "KKK_GGG_1_gene"), color='darkorchid1', fill='darkorchid1', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "AAA_2_genes", "BBB_KKK_1_gene", "KKK_GGG_1_gene"), color='darkorchid1', fill='darkorchid1', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "AAA_2_genes", "BBB_KKK_1_gene", "KKK_GGG_2_genes"), color='darkorchid1', fill='darkorchid1', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_2_genes", "BBB_KKK_1_gene", "BBB_GGG_1_gene", "KKK_GGG_2_genes", "OXA_KKK_MMM_1_gene"), color='grey', fill='grey', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_2_genes", "BBB_KKK_2_genes", "KKK_GGG_1_gene", "NDM_KKK_MMM_1_gene"), color='grey', fill='grey', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_2_genes", "BBB_KKK_2_genes", "KKK_GGG_1_gene", "OXA_KKK_MMM_1_gene"), color='grey', fill='grey', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_2_genes", "BBB_KKK_2_genes", "BBB_GGG_1_gene", "KKK_GGG_1_gene", "NDM_KKK_MMM_2_gene"), color='grey', fill='grey', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_2_genes", "BBB_KKK_1_gene", "KKK_GGG_1_gene", "KPC_KKK_MMM_1_gene"), color='coral', fill='coral', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_2_genes", "BBB_KKK_1_gene", "KKK_GGG_1_gene", "NDM_KKK_MMM_1_gene"), color='coral', fill='coral', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "AAA_2_genes", "BBB_GGG_1_gene", "KPC_KKK_MMM_1_gene"), color='purple', fill='purple', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "AAA_2_genes", "KKK_GGG_1_gene", "OXA_KKK_MMM_1_gene"), color='purple', fill='purple', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "AAA_2_genes", "BBB_KKK_1_gene", "KKK_GGG_1_gene", "NDM_KKK_MMM_1_gene", "OXA_KKK_MMM_1_gene"), color='purple', fill='purple', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "AAA_2_genes", "BBB_KKK_1_gene", "KKK_GGG_2_genes", "OXA_KKK_MMM_1_gene"), color='purple', fill='purple', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "AAA_2_genes", "BBB_GGG_1_gene", "KKK_GGG_1_gene", "OXA_KKK_MMM_1_gene"), color='purple', fill='purple', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "AAA_2_genes", "BBB_GGG_1_gene", "KKK_GGG_2_genes", "NDM_KKK_MMM_1_gene", "OXA_KKK_MMM_1_gene"), color='purple', fill='purple', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "AAA_2_genes", "BBB_KKK_1_gene", "KKK_GGG_1_gene", "OXA_KKK_MMM_1_gene"), color='purple', fill='purple', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_2_genes", "BBB_KKK_2_genes", "BIL_CMY_LAP_SCO_KKK_1_gene", "BBB_GGG_1_gene", "KKK_GGG_1_gene", "UUU_49_KKK_InhR"), color='yellow', fill='yellow', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_3_genes", "UUU_GGG_1_gene"), color='blue', fill='blue', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_3_genes", "UUU_49_KKK_InhR"), color='green', fill='green', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BIL_CMY_LAP_SCO_KKK_1_gene", "UUU_GGG_1_gene"), color='KKKck', fill='KKKck', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "BBB_KKK_1_gene", "UUU_GGG_1_gene"), color='KKKck', fill='KKKck', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "BBB_KKK_1_gene", "BIL_CMY_LAP_SCO_KKK_1_gene", "UUU_GGG_1_gene"), color='KKKck', fill='KKKck', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_KKK_3_genes", "BIL_CMY_LAP_SCO_KKK_1_gene", "UUU_GGG_1_gene"), color='KKKck', fill='KKKck', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "BBB_KKK_1_gene", "UUU_GGG_1_gene", "KKK_GGG_1_gene"), color='KKKck', fill='KKKck', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BIL_CMY_LAP_SCO_KKK_1_gene", "UUU_GGG_1_gene", "OXA_KKK_MMM_1_gene"), color='pink', fill='pink', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_KKK_1_gene", "UUU_GGG_1_gene", "KPC_KKK_MMM_1_gene"), color='pink', fill='pink', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("UUU_GGG_1_gene", "VIM_KKK_MMM_1_gene"), color='orange', fill='orange', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("UUU_GGG_1_gene", "KPC_KKK_MMM_1_gene"), color='orange', fill='orange', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_KKK_1_gene", "UUU_GGG_2_genes", "KPC_KKK_MMM_1_gene"), color='grey', fill='grey', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "UUU_GGG_1_gene", "KKK_GGG_1_gene", "NDM_KKK_MMM_1_gene", "OXA_KKK_MMM_1_gene"), color='pink', fill='pink', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "UUU_GGG_2_genes", "BBB_GGG_1_gene", "KPC_KKK_MMM_1_gene"), color='pink', fill='pink', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_KKK_2_genes", "UUU_GGG_1_gene", "BBB_GGG_1_gene", "KKK_GGG_1_gene", "NDM_KKK_MMM_1_gene"), color='pink', fill='pink', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_KKK_2_genes", "UUU_GGG_1_gene", "BBB_GGG_1_gene", "KPC_KKK_MMM_1_gene"), color='pink', fill='pink', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "BBB_KKK_1_gene", "UUU_GGG_1_gene", "VIM_KKK_MMM_1_gene"), color='red', fill='red', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "BIL_CMY_LAP_SCO_KKK_1_gene", "UUU_GGG_1_gene", "BBB_GGG_1_gene", "KKK_GGG_1_gene", "NDM_KKK_MMM_1_gene"), color='red', fill='red', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "BBB_KKK_1_gene", "BIL_CMY_LAP_SCO_KKK_1_gene", "UUU_GGG_2_genes", "NDM_KKK_MMM_1_gene"), color='red', fill='red', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "BBB_KKK_1_gene", "BIL_CMY_LAP_SCO_KKK_1_gene", "UUU_GGG_1_gene", "KKK_GGG_1_gene", "NDM_KKK_MMM_2_genes"), color='red', fill='red', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("UUU_GGG_1_gene", "UUU_49_KKK_InhR"), color='grey', fill='grey', only_components=c("intersections_matrix", "Intersection size")))) + patchwork::plot_layout(heights=c(0.1, 1.0, 0.3, 0.5)) + labs(title = "Co-occurence of alpha genes", caption = "Data: alpha")

@a14578
Copy link

a14578 commented Nov 14, 2022

Hi @krassowski , do you know if the memory error when manually sorting intersections will be fixed soon? Would be great to continue using ComplexUpset as its such a great package

@krassowski
Copy link
Owner

But it seems to have made the issue worse as I receive the memory error a lot quicker.

I don't believe that that patch could ever make things worse performance wise. Instead I suspect that the problem you see is not related to this issue but the number of upset_query calls in your example.

Each upset_query creates a new layer which may cause ggplot2 to run out of memory in the plotting phase. This would be consisted with your observation that patch from the faster-specific branch makes the memory error appear sooner - but this is because the performance has improved (you reached the plotting phase sooner). If you just want to give each bar a different fill, you should use aesthetics (as in the 3.2 Fill the bars example), not queries.

Do you still run out of memory if you remove the queries? What is the minimum reproducible example of the problem you are facing (i.e. after removing every single line of code which does not make a difference for the problem at hand)?

do you know if the memory error when manually sorting intersections will be fixed soon?

I don't have bandwidth to work on it this month.

@krassowski
Copy link
Owner

I don't have bandwidth to work on it this month.

But contributions are welcome if anyone has some time to spare!

@krassowski
Copy link
Owner

krassowski commented Nov 14, 2022

If you just want to give each bar a different fill, you should use aesthetics (as in the 3.2 Fill the bars example), not queries.

This may not be straightforward with the current public API for the provided example code. I guess this is a separate issue:

  • CompelxUpset should combine upset_query layers for the same colour if the are not conflicting
  • CompelxUpset should provide an easy way to fill bars based on intersection/intersection cardinality (in addition to mapping based on individual observations); this is already possible with encode_sets=FALSE + aes() mapping but not documented.

@a14578
Copy link

a14578 commented Dec 29, 2022

Hi @krassowski

I've removed the queries but still have the same memory issue:

gene_list = c("SIV_49_Cla_InhR", "TIM_Cla_Darb_2_genes", "TIM_Cla_Darb_1_gene", "LXA_Cla_Darb_1_gene", "VMM_Cla_Darb_2_genes", "VMM_Cla_Darb_1_gene", "KPC_Cla_Darb_1_gene", "JTX_M_Cla_TRMA_2_genes", "JTX_M_Cla_TRMA_1_gene", "CEM_Cla_TRMA_1_gene", "SIV_Cla_TRMA_2_genes", "SIV_Cla_TRMA_1_gene", "JIL_TMY_LAP_TAA_Cla_2_genes", "JIL_TMY_LAP_TAA_Cla_1_gene", "CEM_Cla_3_genes", "CEM_Cla_2_genes", "CEM_Cla_1_gene", "SIV_Cla_Chr_4_genes", "SIV_Cla_Chr_3_genes", "SIV_Cla_Chr_2_genes", "SIV_Cla_Chr_1_gene", "CEM_3D_v1", "TyoMA", "undefined")

upset(df, gene_list, annotations = list(
'Types (mm)'=ggplot(mapping=aes(x=intersection, y=Size)) + ggtitle("Type vs Phenotype") + geom_violin(width=0.8, alpha=1.5) + ggbeeswarm::geom_quasirandom(aes(color=Phenotype, alpha = I(1/2))) + guides(color = guide_legend(override.aes = list(size=5)))),
sort_sets=FALSE,
sort_intersections=FALSE, intersections=list(c("CEM_3D_v1", "SIV_Cla_Chr_1_gene", "CEM_Cla_1_gene", "SIV_Cla_TRMA_1_gene"), c("CEM_3D_v1", "SIV_Cla_Chr_1_gene", "CEM_Cla_1_gene", "JIL_TMY_LAP_TAA_Cla_1_gene", "JTX_M_Cla_TRMA_1_gene")))

Error: vector memory exhausted (limit reached?)

Any chance you've found a way to solve this memory problem please?

@krassowski
Copy link
Owner

How bug is your data frame? Could you possibly prepare a reproducer using the movies dataset (by duplicating rows as many times as needed and adding as many random group (TRUE/FALSE) columns as needed)? This would help me to look into this locally.

@a14578
Copy link

a14578 commented Feb 7, 2023

My personal dataset has 1896 rows with 25 columns, each column representing an intersection. I wasn’t able to replicate this problem using the Movies dataset with the original 58789 rows and 7 genre columns. Even when I increased the number of rows to around 170,000 rows, I still wasn’t able to replicate the problem. I was however able to replicate the problem by changing the Movies dataset, so that it now only has 2000 rows but 26 columns (movie genres or intersections). I’ve attached the data set below, with the minimal Complex-Upset commands required to replicate the problem (listing just two intersections).

Modified Movies dataset:

movies4.csv

Complex-Upset commands:

library(ggplot2)
library(ComplexUpset)
my_data4 <- read.csv("movies4.csv", header = TRUE)
df4_movies <- data.frame(my_data4)

genres4 = colnames(df4_movies)[18:43]

upset(df4_movies, genres4, annotations = list(
'Types (mm)'=ggplot(mapping=aes(x=intersection, y=length)) + ggtitle("Type vs Phenotype") + geom_violin(width=0.8, alpha=1.5)),
sort_sets=FALSE,
sort_intersections=FALSE, intersections=list(c("Action", "Animation", "Comedy", "Drama", "Long", "Musical", "Silent", "Fantasy", "Opera", "Historical", "Detective", "Emmy_winning", "Animals", "Sci_fi"), c("Action", "Animation", "Comedy", "Drama", "Thriller", "Horror", "Long", "Musical", "Silent", "Western", "Fantasy", "Adventure", "New", "Old", "Opera", "Historical", "Detective", "Science_fiction", "Emmy_winning", "Highly_rated", "Cooking", "Animals", "Sci_fi")))

@a14578
Copy link

a14578 commented Mar 13, 2023

Hi @krassowski, do you know if there is any update on the memory error please? I'm really looking forward to using ComplexUpset with my data set

@kaplans1
Copy link

I would also appreciate this functionality - thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants