Skip to content

tidyr::unnest memory leak causes java.lang.OutOfMemory error #85

@NRBPerdijk

Description

@NRBPerdijk

When running R code from a JVM project with GraalVM, I ran into a java.lang.OutOfMemory error: GC overhead limit exceeded.

Using the VisualVM program that comes with Graal, I noticed the Heap would quite quickly grow to its maximum size, after which the garbage collector would desperately try to cut down on heapsize while the running processes slowed down more and more. Eventually the program would simply crash with the OutOfMemory error message.

I've spent some time isolating the problem code and I found that the unnest function in tidyr appears to be the source. I have created a small sample Java+R project: https://github.com/NRBPerdijk/fastRBug/tree/memoryLeakUnnest

The problematic code can be found on the memoryLeakUnnest branch and can be run by executing this command from the project folder:
mvn clean install && cd target && {PATH_TO_GRAALVM}/bin/java -Xmx1G -cp fastRBug-1.0-SNAPSHOT.jar:../lib/graal-sdk-19.1.0.jar Main

Alternatively, similar behaviour can be triggered using just FastR, by running the following R snippet:

    runOutOfMemory <- function() {
        while("a" == "a") {
            df <- tibble(
            x = 1:3,
            y = c("a", "d,e,f", "g,h")
        )

        df %>% unnest(y = strsplit(y, ","))

        print("Whoah!")
        }
    }
    runOutOfMemory()

You'll need more patience with this method, the Java project explodes a bit quicker.

Edit:
Tried versions

  • GraalVM RC 16 and GraalVM 19.1.0
  • tidyr package 0.8.2 and 0.8.3
  • dplyr package 0.7.8

This memory leak may also contribute to the poor performance indicated in issue #71

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions