Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak / infinite recursion #181

Open
code-tree opened this issue Sep 14, 2015 · 22 comments
Open

Memory leak / infinite recursion #181

code-tree opened this issue Sep 14, 2015 · 22 comments
Labels

Comments

@code-tree
Copy link
Contributor

Since upgrading to 2.11 I'm getting a memory leak when trying to push files. After disabling the below exception with -XX:-UseGCOverheadLimit the process went over 1GB before I killed it. The files I'm sending are tiny though.

Would appreciate any help, thanks

Picked up JAVA_TOOL_OPTIONS: -javaagent:/usr/share/java/jayatanaag.jar 
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
    at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:252)
    at scala.collection.mutable.ArrayOps$ofRef.flatMap(ArrayOps.scala:186)
    at s3.website.model.Files$.recursiveListFiles(push.scala:125)
    at s3.website.model.Files$$anonfun$recursiveListFiles$2.apply(push.scala:125)
    at s3.website.model.Files$$anonfun$recursiveListFiles$2.apply(push.scala:125)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:252)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:252)
    at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
    at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
    at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:252)
    at scala.collection.mutable.ArrayOps$ofRef.flatMap(ArrayOps.scala:186)
    at s3.website.model.Files$.recursiveListFiles(push.scala:125)
    at s3.website.model.Files$$anonfun$recursiveListFiles$2.apply(push.scala:125)
    at s3.website.model.Files$$anonfun$recursiveListFiles$2.apply(push.scala:125)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:252)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:252)
    at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
    at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
    at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:252)
    at scala.collection.mutable.ArrayOps$ofRef.flatMap(ArrayOps.scala:186)
    at s3.website.model.Files$.recursiveListFiles(push.scala:125)
    at s3.website.model.Files$$anonfun$recursiveListFiles$2.apply(push.scala:125)
    at s3.website.model.Files$$anonfun$recursiveListFiles$2.apply(push.scala:125)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:252)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:252)
    at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
    at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
    at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:252)
    at scala.collection.mutable.ArrayOps$ofRef.flatMap(ArrayOps.scala:186)
    at s3.website.model.Files$.recursiveListFiles(push.scala:125)
    at s3.website.model.Files$$anonfun$recursiveListFiles$2.apply(push.scala:125)
    at s3.website.model.Files$$anonfun$recursiveListFiles$2.apply(push.scala:125)
@code-tree
Copy link
Contributor Author

It seems like it isn't due to 2.11 actually, as just downgraded to 2.10 (which previously worked for me) and still getting a memory leak. Any ideas? thanks

@code-tree code-tree changed the title Memory leak Memory leak / infinite recursion Sep 14, 2015
@laurilehmijoki
Copy link
Owner

The recursive file listing routine seems to run away sometimes. Maybe we should replace it with [Apache Commons FileUtils](https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/FileUtils.html#listFiles%28java.io.File, java.lang.String[], boolean%29) and see if the problem disappears. Would you like to try?

@code-tree
Copy link
Contributor Author

Sorry, I'm not very familiar with Scala nor Java.

Any ideas what might be triggering it? / Something I can do to work around it? As I'm not able to push any projects at the moment.

Thanks for your work

@laurilehmijoki
Copy link
Owner

@code-tree try another version of Java? See the official download page at http://www.oracle.com/technetwork/java/javase/downloads/index.html.

@laurilehmijoki
Copy link
Owner

@code-tree will this comment help you?

@code-tree
Copy link
Contributor Author

I was using openjdk7 but downgrading to openjdk6 solved this. So I assume it is a bug with openjdk then? Hope it'll be fixed sometime as v6 probably won't be available next time I upgrade OS.

Thanks for your help

@code-tree
Copy link
Contributor Author

Sorry, that was actually a misdiagnosis. It doesn't matter which version, it works in 6, 7 and 8 openjdk. Instead, the problem occurs when there are too many files to list, and depends on what the working directory is when executing s3_website.

When executed from dir:

  • / -- process over 1GB and fails
  • /home -- process over 1GB and fails
  • /home/user -- process over 1GB and fails
  • /home/user/folder -- process over 300MB but succeeds
  • etc.

So I believe s3_website is actually listing every file from the dir it is executed from (possibly in here), even though I have the --site option specified.

@laurilehmijoki
Copy link
Owner

Thanks for reporting your valuable discovery!

@laurilehmijoki
Copy link
Owner

@code-tree please try out the new version 2.11.1. It contains a fix for this problem.

@code-tree
Copy link
Contributor Author

Thanks for the fix, though unfortunately the problem still remains. Process went to 500MB before I stopped it (using 2.11.1), running from root (/) with --site set to my project.

In addition to limiting the recursion, it would be nice to stop it trying to list files from the working dir when --site is specified, as I think that might be the root cause of the issue? (I'm assuming this is the case, given it works when run from a file tree with little depth.

laurilehmijoki added a commit that referenced this issue Sep 18, 2015
@laurilehmijoki
Copy link
Owner

@code-tree try 2.11.2, it contains a new fix.

@code-tree
Copy link
Contributor Author

Thanks Lauri, almost there. It appears to work for site in the yaml config, but not --site from the command line?

That said, even when testing with site in the yaml config, the process still went up to 300MB, but succeeded, unlike when using --site. I wonder if there is another recursion happening somewhere else as well.

@laurilehmijoki
Copy link
Owner

@code-tree thanks for the feedback. According to the implementation the --site setting does not recursively search for files. I wonder what could explain the behaviour you are experiencing. There are only two places from which the recursiveListFiles function is called:

  1. recursiveListFiles(workingDirectory).find { file =>
  2. recursiveListFiles(site.rootDirectory)

@code-tree
Copy link
Contributor Author

Sorry, what I meant was: the fix works when site is specified in the yaml config (s3_website does not recurse through working dir) but does not seem to work when site is given as a CLI arg (s3_website still recurses through working dir).

It seems like config is only filled with yaml options and not CLI options? Then the second line containing the fix would evaluate to true (site is empty) even when it is specified on the CLI.

  def resolveSiteDir(implicit yamlConfig: S3_website_yml, config: Config, cliArgs: CliArgs, workingDirectory: File): Either[ErrorReport, File] = {
    val siteFromAutoDetect = if (config.site.isEmpty) { autodetectSiteDir(workingDirectory) } else { None }
    val errOrSiteFromCliArgs: Either[ErrorReport, Option[File]] = Option(cliArgs.site) match {

@laurilehmijoki
Copy link
Owner

You are right in your reasoning.

However, the code seems not to recurse in the case where one defines the site via the CLI arg. Hence I wonder what could possibly cause the out-of-memory error in that situation.

@laurilehmijoki
Copy link
Owner

@code-tree you can also try to work around the problem like this:

First, add into s3_website.yml the following line:

site: <%= ENV['SITE_DIR'] %>

Then invoke SITE_DIR=/path/to/your/site s3_website push.

If there is a bug in the way s3_website handles the --site CLI argument, then the above trick should circumvent that problem.

@code-tree
Copy link
Contributor Author

Yes, using ENV in the yaml config has fixed it for the time being, thanks

@stroupaloop
Copy link

Hey there, experiencing the same issue, but setting the site: <%= ENV['SITE_DIR'] %> parameter within the s3_website.yml and invoking the SITE_DIR=_site s3_website push still returns the Java out of memory error as noted above.

Any additional insight to the problem? I'm currently using v2.12.2

@laurilehmijoki
Copy link
Owner

As a workaround, you can try the pure Ruby implementation at https://github.com/laurilehmijoki/s3_website/tree/1.x

@fagiani
Copy link

fagiani commented Oct 26, 2017

I'm experiencing the very same behavior even with the ENV parameter set. This all happened on a brand new setup as it used to work fine on the previous computer. I guess that some dependency gem may be actually causing that. Any others experiencing this currently? Any clues on how to fix this as it has been quite a while?

I am using 3.4.0 version.

Thanks!

@Nihahs
Copy link

Nihahs commented Feb 28, 2018

Hi,
I had been facing the same issue. Not sure why but clearing the tmp folder fixed it. Hope this helps.

@fagiani
Copy link

fagiani commented Mar 1, 2018

@Nihahs do you mean the root /tmp/ of the filesystem or other tmp folder? I was unable to reproduce that and still get the same error :(

@laurilehmijoki are there any new clues on what may be causing that? In my case, the only reason I'd see is that one of my buckets has a big chunk of logs and although I have ignore_on_server: logs it will take a big while, raise CPU levels then throw an OutOfMemoryError exception.

Thanks all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants