Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory usage #100

Closed
donaldlee2008 opened this issue May 7, 2016 · 33 comments
Closed

Memory usage #100

donaldlee2008 opened this issue May 7, 2016 · 33 comments
Labels

Comments

@donaldlee2008
Copy link

when I set quickRender(false); I just get about 15 pages .it get Out Of Memory.
I set cache and cache dir. but the memory still grow very quickly and then crash.
is there a way cache in disk and clean the history and other thing from memory?

for(int x=0;x<100;x++){
driver.get("http://news.qq.com/");
// System.gc(); // a littile better but not work
System.out.println(x);
}

@hollingsworthd
Copy link
Owner

This is a JRE bug, a memory leak. quickRender(true) is the only good workaround. Otherwise it should be fixed in Java 9. Unfortunately Java 9 is still in beta.

Also, there is an open enhancement, #83, to pass JRE args to the child processes. This would let you use a larger heap space.

@PatrickHuetter
Copy link

@hollingsworthd Are you sure that quickRender(true) is a workaround? I also noticed growing memory consumption of the jBrowserDriver child processes over time until the machine has to swap and i'm using quickRender(true) and cache(true). I have to restart my crawler every hour because of this.

This is the code i'm using to create a driver:

webdriver = new JBrowserDriver(Settings.builder().connectTimeout(10000).socketTimeout(30000).connectionReqTimeout(30000)
                            .timezone(Timezone.EUROPE_BERLIN).cache(true).maxRouteConnections(100)
                            .userAgent(UserAgent.CHROME).headless(true).ssl('trustanything').hostnameVerification(false).quickRender(true)
                            .javascript(System.getenv("jsActivated") == "true")
                            .requestHeaders(getRequestHeaders()).proxy(proxyConfig).build())

Everything i do are many sequential webdriver.get(url) calls for each jBrowser instance and i have 5 jBrowser instances running which consume 31GB of ram and more after some time. I'm using java8-openjdk-jre (from https://hub.docker.com/_/java/ ).

@hollingsworthd
Copy link
Owner

There might be some other memory leaks. It appears they've been fixed in Java 9 and won't be backported to Java 8. This driver seems to work ok on Java 9 and memory consumption is much more reasonable.

@PatrickHuetter
Copy link

PatrickHuetter commented Jul 8, 2016

What about openjfx / javafx? Using java 9 i couldn't find the openjfx package. (only for java 8 )
Do this memory leaks also occur with java 7 ? Which kind of memory leaks are this? Are there any known issues in the java issue tracker?

/UPDATE/ I found this: http://stackoverflow.com/questions/32666528/issues-with-garbage-collection-in-java-8 Now testing with java 7 and maybe with another garbage collector with java 8.

/UPDATE2/ It seems that it's not possible to run jBrowserDriver with java 7 because of missing javafx. Now i'm running tests with the G1 garbage collector on java 8.

/UPDATE3/ Possible related to https://bugs.openjdk.java.net/browse/JDK-8046339

/UPDATE4/ Using cache(false) makes the growing of memory consumption slower. @hollingsworthd How do you run jBrowserDriver with java 9? I couldn't get it running.

@hollingsworthd
Copy link
Owner

I used Java 9 from here which has JavaFX: https://jdk9.java.net/download/

Adding .javaOptions("-XX:+UseG1GC", "-server", "-XX:+AggressiveOpts") to the Settings Builder does improve memory consumption in Java 8.

@hollingsworthd
Copy link
Owner

Use v0.14.11 if you're testing. Some things in Java 9 changed recently and that includes fixes for it.

@hollingsworthd hollingsworthd changed the title java.lang.OutOfMemoryError: Java heap space General discussion of Java 8 memory usage Jul 9, 2016
@hollingsworthd hollingsworthd reopened this Jul 9, 2016
@PatrickHuetter
Copy link

@hollingsworthd Do you think this is the bug which causes the memory growing? https://bugs.openjdk.java.net/browse/JDK-8046339

If yes, it may be possible to use jdk 8u102 or 8u111 since the bugfix is backported in this versions. Using java 9 didn't work for me because of incompatibilities with other libraries in my project.

@PatrickHuetter
Copy link

I've extreme problems using jBrowserDriver to visit many pages sequentially (with the same jBrowserDriver instance). As you can see here there is definitely a memory problem. The heap is growing from minute to minute and never gets garbage collected and if i manually start the garbage collection it only garbage collects a small amount of the heap, so there is some memory which can't get garbage collected. I also attached a heap dump here: http://encircle360share.s3-eu-central-1.amazonaws.com/6Y60MmXVte.hprof

And a thread dump: http://encircle360share.s3-eu-central-1.amazonaws.com/zWhWCZDUp7.tdump

bildschirmfoto 2016-07-17 um bildschirmfoto 2016-07-17 um 20 54 03 20 40 03
bildschirmfoto 2016-07-17 um 20 35 35
bildschirmfoto 2016-07-17 um 20 54 03

@hollingsworthd
Copy link
Owner

hollingsworthd commented Jul 17, 2016

Yes, I was wrong that the fixes aren't being backported. These may help:

https://bugs.openjdk.java.net/browse/JDK-8153148
https://bugs.openjdk.java.net/browse/JDK-8153151
https://bugs.openjdk.java.net/browse/JDK-8159860
https://bugs.openjdk.java.net/browse/JDK-8089681
https://bugs.openjdk.java.net/browse/JDK-8046339

JRE 8u112 is here: https://jdk8.java.net/download.html

Again I recommend using .javaOptions("-XX:+UseG1GC", "-server", "-XX:+AggressiveOpts")

Also for analyzing heap dumps try http://www.eclipse.org/mat/

If you have a reproducible memory leak, it would help to isolate it as a self-contained test case. These leaks can't be reproduced easily on just any webpage as they very much depend on the specific subresources within the page and JS called.

@PatrickHuetter
Copy link

PatrickHuetter commented Jul 18, 2016

Using java 8u112 and the javaOptions you recommend didn't really help. I also didn't enable javascript. The heap space consumption is growing slower but grows also endless. I created a test program to reproduce this. You can find it here: https://github.com/PatrickHuetter/jbd-memory-test

Eclipse map finds 1 possible problem:
238 instances of "com.sun.javafx.webkit.prism.WCImageDecoderImpl", loaded by "sun.misc.Launcher$ExtClassLoader @ 0x780012ba0" occupy 57.906.768 (78,28%) bytes.

Maybe this is related to #27 . I'll test the mentioned java program parameter although i already enabled quickRender while building jBrowserDriver. I'll also test the reset() method. I hope that the cookies and localStorage don't get cleaned using that method.

/UPDATE/ Neither of the mentioned workarounds helped. Using reset() after each request also leads to faster growing of heap usage (as you can see 750MB of heap in ~3 minutes).
bildschirmfoto 2016-07-18 um 22 30 41

@hollingsworthd
Copy link
Owner

Thanks for that. There may be something here that could be fixed or a workaround. Using a plain Java WebView doesn't exhibit the same leak. It could be something related to RMI, listeners that JBD adds, or the headless Monocle code. Also it could be possible to specify a Settings.javaBinary to run the external process in Java 9 while the main process remains in Java 8.

@hollingsworthd hollingsworthd changed the title General discussion of Java 8 memory usage Memory usage Jul 20, 2016
@hollingsworthd
Copy link
Owner

Also for some reason my memory graph looks very different from yours. The max memory usage keeps growing but GC is able to reduce it back to the original heap size. Groovy may be making things worse? Of all the JRE language they seem to have taken the most liberties with modifying core Java functionality.

@hollingsworthd
Copy link
Owner

hollingsworthd commented Jul 20, 2016

@PatrickHuetter In the main trunk (not released) is ability to specify arbitrary Java binary and turn on Java 9 compatibility. In the Settings Builder, use .javaBinary("/path/to/jre_or_jdk_9/bin/java").javaExportModules(true)... ... this should work regardless of what Java version the main process is running on. Also, I've never built on Java 9, just use the regular Java 8 project artifacts.

Java 9 has no apparent memory leaks: http://i.imgur.com/yQMQwcS.png

@hollingsworthd
Copy link
Owner

For comparison this is the test case on Java 8u112: http://i.imgur.com/XKazMvu.png ... which is why I think Groovy may have something to do with it. There's maybe a slight memory leak--but GC seems to clean up effectively. This screenshot and the one in the previous comment is your test case ported to Java. If you'd like to submit a patch to address this in Groovy, happy to accept it.

@eremyjay
Copy link

I have been experiencing the same problems using Java 8u112 on multiple sites. See screenshot here of an example with www.mcgrath.com.au: http://imgur.com/a/vlfLd.

@eremyjay
Copy link

Interestingly, this only happens when I reset the browser after each page visited. If I do not do reset, then the memory usage seems to stay reasonable with garbage collection: http://imgur.com/a/ytlA5

@PatrickHuetter
Copy link

If you do not reset, the memory usage seems to stay reasonable but it's also growling slowly. I couldn't really find a good and stable workaround, so i migrated to phantomjs with ghostdriver temporary.

@hollingsworthd
Copy link
Owner

I located one memory leak, not sure if this affects all these use cases though. I reported it to Oracle. https://gist.github.com/hollingsworthd/07348e7297059e47adb8d31c2902d86a

@hollingsworthd
Copy link
Owner

@adcoelum / @PatrickHuetter: The upper bound memory usage in the second screenshot from @adcoelum (one with no resets) should not grow on Java 9.

@eremyjay
Copy link

I'll try the same tests with Java 9 and observe performance

@PatrickHuetter
Copy link

@adcoelum please let me know about your results.

hollingsworthd added a commit that referenced this issue Jul 25, 2016
… element exceptions, and iframe handling. See #100
@hollingsworthd
Copy link
Owner

v0.16.0 might fix the memory issues (aside from JBrowserDriver.reset). Also quickRender defaults to false now, because with the latest changes and a recent JRE, it provides little advantage.

@eremyjay
Copy link

I will give 0.16.0 a go first and then see if I can compare against java 9. Thanks!

@hollingsworthd
Copy link
Owner

Some more testing, I think quickRender improve things on Java 8. Java 9 memory usage is the best all around. Regardless, I think v0.16.0 improves things in all cases.

@eremyjay
Copy link

Test results for same site as before with 0.16.0 using Java 8u112 with no browser reset - looks like an improvement - no java heap errors - looks good!:

test3

With reset, not so great (resulted in java heap errors) - but not critical as the performance without reset seems to be good:

test4

Java 9 to test next...

@eremyjay
Copy link

Tried Java 9 testing but so far no luck with early access - looks like will need to merge Home folders as not all Sources are available (unless I am missing something.)

@hollingsworthd
Copy link
Owner

hollingsworthd commented Jul 26, 2016

Thanks for posting the results. Not sure exactly on Java 9 but the way I do it is run on Java 8 but modify the code so that in your Settings Builder you give it a Java 9 binary .javaBinary("/path/to/jre9/bin/java") and set .javaExportModules(true) ... that way the main application runs on Java 8 and just the child process runs on 9. Maybe depending on your classpath there could still be compatibility problems.

@eremyjay
Copy link

i'll try that way and give a go

@hollingsworthd
Copy link
Owner

hollingsworthd commented Jul 26, 2016

You could also modify it at runtime--almost every setting can be overridden with System properties... java -Djbd.javabinary=/path/to/jre9/bin/java -Djbd.javaexportmodules=true -jar myapp.jar

@PatrickHuetter
Copy link

PatrickHuetter commented Jul 27, 2016

I tested version 0.16.0 with java 8u112 and it seems to work now (without reset). No heap errors anymore. Using some java binary before 8u112 leads to heap errors.
bildschirmfoto 2016-07-27 um 17 40 57

@TonyRice
Copy link

I tested using 8u112 and it actually made it worse than using 8u101

@Scisaga
Copy link

Scisaga commented Sep 30, 2016

using 8u102, the performance is not as excepted.
submit 20*1000 real crawl task, i kill JbrowserDriverServer after it comsume 4GB memory.

it seems create 3 RMI TCP connections and restless create new thread, and each thread is blocked!!!

"Thread-192" #239 daemon prio=5 os_prio=0 tid=0x0000000020a7f000 nid=0x5498 in Object.wait() [0x000000003931f000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.lang.Object.wait(Object.java:502)
    at com.machinepublishers.jbrowserdriver.AjaxListener.run(AjaxListener.java:56)
    - locked <0x00000006c7cf5778> (a com.machinepublishers.jbrowserdriver.StatusCode)
    at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
    - None

may be some core ajax function is banned by certain website policy?

Test Url:

http://weixin.sogou.com/weixin?type=2&ie=utf8&query=大数据

i tried several setting configuration combinations:

.connectionReqTimeout(CrawlerManager.ReadTimeout)
.connectTimeout(CrawlerManager.ConnectTimeout)
.maxConnections(400)
.ajaxResourceTimeout(5000)
.ssl("compatible")
.requestHeaders(RequestHeaders.CHROME)
.processes(1)
.quickRender(true)
.userAgent(UserAgent.CHROME);

make no differents, really upsetting

@hollingsworthd
Copy link
Owner

v0.17.0 probably fixes concurrency issue mentioned above. Separate bug with suggest fix filed for reset, #204. Primary cause of memory leak long since fixed. Thus closing this issue. Thanks again for the feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants