Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GC overhead limit exceeded #351

Closed
desb42 opened this issue Feb 1, 2019 · 19 comments
Closed

GC overhead limit exceeded #351

desb42 opened this issue Feb 1, 2019 · 19 comments

Comments

@desb42
Copy link
Collaborator

desb42 commented Feb 1, 2019

I am trying to build dewiki html pages and around 380,000 pages with 2,286,446 to go I start getting the above error.
According to Oracle, this means it is spending an inordinate amount of time in the garbage collector.
Indicating (at least to me) that 'stuff' is not being released
Tracking this down is going to be a right pain
Attached are the err.txt and session.txt
20190201_064759.201.zip

Now I am going to build enwiki html to see how far I get

@gnosygnu
Copy link
Owner

gnosygnu commented Feb 1, 2019

Yeah, that's going to be tough. enwiki will probably be even worse.

A few questions:

  • How much total memory do you have on your system. I'm at 32 GB now.
  • What is the page rate status / ETA? For dewiki, I remember mine is usually something like 10 per second and an ETA of 24 hours
  • Which wikis did you manage to build successfully (enwikisource?) How long did those take (roughly, in hours)

I'm just trying to get a sense of your machine's performance profile.

In terms of pointers, using this for reference: http://xowa.org/home/wiki/Dev/Command-line/Dumps

      num_wkrs = 8; load_all_templates = 'y'; cleanup_interval = 50; hzip_enabled = 'y'; hdiff_enabled ='y'; manual_now = '2016-08-01 01:02:03';
      load_all_imglinks = 'y'; 

Each of the following will reduce memory load, but also slow things down

  • load_all_templates to 'n'. Biggest impact.
    • When 'y'. All templates and modules will be loaded into memory
    • When 'n', Templates and Modules will be loaded from database on every request
  • load_all_imglinks to 'n'. Big impact
    • When 'y', will load imglinks table into memory
    • When 'n', will query imglinks on each request
  • cleanup_interval to 5 or lower. Big impact
    • Number determines after how many articles XOWA will clean cache. Stuff getting thrown in cache is:
      • Other loaded pages besides Templates and Modules
      • Intermediate objects like the in-memory-dom-tree representation of a template

@desb42
Copy link
Collaborator Author

desb42 commented Feb 2, 2019

I have an i7-4790k 4Gh with 32GB memory
I have been successful in building a number of wikis

for 20190101 archive

commonswiki
download 5 files 42.4GB
expanded 269GB in 60mins
build 27h
final ~105GB (but extra .sqlite files)
      36 xowa files 96.4GB

wikidata
download 5 files 58.2GB
expanded 1.22TB 19hrs
build 8h 12m
final 1.37TB (61)
      53 xowa files 157GB

enwiki (no html dump)
download 6 files 23.5GB
expanded 130GB 40mins
build 24h 12m
final 216GB (127) - with images files from 201807
      121 xowa files 153GB

dewiki (no html dump)
download 6 files 6GB
expanded 26.3GB 15mins
build 3h 3m
final 18.8GB (23) - no image files
      18 xowa files 9.94GB

enwikibooks
download 6 files 161MB
expanded 745MB 1min
build 1h 1m
final 3.24GB (37)
      31 xowa files 2.73GB (with image and html)

enwikinews
download 6 files 226MB
expanded 2.2GB 1min
build 2h 52m
final 4.12GB (33)
      27 xowa files 763MB (with image and html)

enwikiquote
download 6 files 123MB
expanded 486MB 1min
build 2m 29s ???
final 4.12GB (33)
      27 xowa files 763MB (with image and html)

enwikisource
download 6 files 2.04GB
expanded 9.78GB 6min
build 3h 16m
final 13.6GB (38)
      32 xowa files 9.56GB (with image and html)

enwikiversity
download 6 files 123MB
expanded 722MB 1min
build 15m 17s
final 2.16GB (33)
      24 xowa files 1.30GB (with image and html)

enwikivoyage
download 6 files 117MB
expanded 722MB 1min
build 1h 57m
final 14.2GB (33)
      27 xowa files 13.8GB (with image and html)

enwiktionary
download 6 files 1.2GB
expanded 9.87GB 5min
build 4h 15m
final 14.1GB (42)
      35 xowa files 10.0GB (with image and html)

the last 7 were built with html so that I could download the imagery

What I noticed with all (of the last 7) builds was that the rate started low with an outrageous ETA (usually measured in days), the rate would then pickup with the ETA down to a small number of hours. The rate would reach a peak and then slowly decline with the ETA slowly getting longer again

for dewiki I am using

      num_wkrs = 8; load_all_templates = 'y'; cleanup_interval = 50; hzip_enabled = 'n'; hdiff_enabled ='n'; 

for enwiki I am using

      num_wkrs = 8; load_all_templates = 'y'; cleanup_interval = 50; hzip_enabled = 'y'; hdiff_enabled ='n'; 

As of right now the stats on the build for enwiki are

done=1112000 left=7464351 rate=14 time_past=21:27:47 h time_left=6:00:04:24 d

I am hoping this will improve (or at least finish)

@desb42
Copy link
Collaborator Author

desb42 commented Feb 3, 2019

As of now (roughly 24 hours later) the stats for the enwiki html build are

done=1526000 left=7050351 rate=9 time_past=1:20:18:43 d time_left=8:12:43:43 d

As you can see, the rate is slowing

I suspect if I were to start at, say 1,000,000 I would get the behaviour I mentioned above (starts slow, speeds up, then slows and slows)
I think there is either a caching issue or a memory leak or both
(finding it (them) will be tricky)

@desb42
Copy link
Collaborator Author

desb42 commented Feb 3, 2019

I have stopped my enwiki build and instead started a netbeans profiling session on the dewiki html build (because its smaller than enwiki)
Looking at the Objects profile,the first thing I noticed is that there are a lot of Xof_orig_itm objects. I think these are the image keys - all held in memory (so it would be considerably bigger for enwiki)

As it is running now there are two object classes that just seem to increase in number (whereas there are many that are transient)
The two object classes are Json_itm_str and Json_kv, they have something to do with the json parsing.

It looks like the json parsed results are not being discarded? (if they should)

@desb42
Copy link
Collaborator Author

desb42 commented Feb 3, 2019

I cant quite put my finger on it, but...
I thinks this is related to wikidata - with the way more and more data is coming from there

The mru cache seems to be doing what it is supposed to; however I think other entities (unknown) are maintaining a reference, so they are not garbage collected

This is my hypothesis (at the moment)
Still trying to think of ways to prove it

@gnosygnu
Copy link
Owner

gnosygnu commented Feb 4, 2019

The mru cache seems to be doing what it is supposed to; however I think other entities (unknown) are maintaining a reference, so they are not garbage collected

Yeah, these memory leaks are a pain to track down. Am starting a dewiki build on my side and will try to troubleshoot from there.

Also, just as comparison, I built simplewiki and it comes in at 35 min.

@desb42
Copy link
Collaborator Author

desb42 commented Feb 5, 2019

Have just tried to rebuild dewiki html (again) and its failed with this error message
Attached are the err.txt and session.txt for the run.
I have also included the console log and the gfs file I use
20190205_081524.764.zip

I cannot run the xowa version built by xowa_get_and_make.sh - it produces a lot of errors (which I believe is due to synchronisation issues)

I am at a loss

@gnosygnu
Copy link
Owner

gnosygnu commented Feb 6, 2019

Have just tried to rebuild dewiki html (again) and its failed with this error message

Sorry for all the trouble. I started a dewiki run last night, and hit the same error. After analyzing the memory dump, I think I found the cause of the memory leak. I made the commit above, and am running another dewiki run tonight. Hopefully it works (it usually completes in less than 8 hours)

I cannot run the xowa version built by xowa_get_and_make.sh - it produces a lot of errors (which I believe is due to synchronisation issues)

Sorry, wasn't expecting that. Can you post the err.txt for that run? (or wherever you see a lot of errors). I'm running with the latest codebase here (which should be the same as the xowa_get_and_make.sh) and so far there haven't been issues.

Also, for what it's worth, I think I'm done with #320. Last night's and tonight's run was done with the non-hzip mode:

num_wkrs = 8; load_all_templates = 'y'; load_all_imglinks = 'y'; indexer_enabled = 'y';
cleanup_interval = 50; hzip_enabled = 'n'; hdiff_enabled ='n'; 

Finally, just as a point of reference, I run the builds with this command:

java -Xmx15000m -XX:+HeapDumpOnOutOfMemoryError -jar /xowa/xowa_linux_64.jar --app_mode cmd --cmd_file /xowa/make_wiki.gfs --show_license n --show_args n

The -XX:+HeapDumpOnOutOfMemoryError arg generated a 15 GB dump file which I analyzed with Eclipse MemoryAnalyzer. Looking at the dominator tree showed me that 80% of the 15 GB file was locked up in Bry_bfrs, mostly related to templatestyles.

Anyway, hope this helps.

@desb42
Copy link
Collaborator Author

desb42 commented Feb 6, 2019

This is the log from the straight xowa run
20190205_073854.781.zip

I think this is related to the synchronisation issues I have mentioned

@desb42
Copy link
Collaborator Author

desb42 commented Feb 6, 2019

I have just started a build with the source changes
I suspected you were using a non default memory size, thanks for the confirmation
the command indexer_enabled is a new one on me

On some of my memory analysis, I think that the image info is held in memory. So as the total number of images grow this will need more and more memory - or have I misunderstood

@gnosygnu
Copy link
Owner

gnosygnu commented Feb 6, 2019

This is the log from the straight xowa run
Cool. Thanks. It looks like mostly Wikidata / Scribunto errors (=Module:Wikidata:647 attempt to call nil)

Which version of Wikidata / Dewiki are you building? I'm building 2019-01 (I haven't tried to download 2019-02 yet)

Incidentally, mine finished this morning:

cmd end: wiki.mass_parse.exec 6h 55m 24s 625f
bldr done: 6h 55m 30s 656f

My errors were much fewer, though I am getting some of the errors you reported earlier (namely the counts don't match)

20190205_225543.050.zip

I suspected you were using a non default memory size, thanks for the confirmation

Yeah, that's my way of forcing the JVM to crash if it reaches a certain size. If I remember correctly, it will continue to grow and start paging to disk.

the command indexer_enabled is a new one on me

Yeah, let me update the docs for that. It generates full-text search indexes. See http://xowa.org/home/wiki/App/Full-text_search

On some of my memory analysis, I think that the image info is held in memor

Yeah, I noticed it was 1 GB in my dump. I think this is a static size (won't grow), but is just huge b/c it's loading the entire commons image table in to memory.

I'll look at it more over the next few days. Hope your run completes!

@desb42
Copy link
Collaborator Author

desb42 commented Feb 6, 2019

When I run java on my machine with no memory switches it seems to act as though its -Xmx8000m
That is, my process maxes out at ~8GB
Are you saying that on your machine, you do not use any memory switches and this was a test to limit it?
if so, do you have some JAVA* environment variables set? or something else?
Another thought, do you run from the cygwin environment or a command shell (or something else)?

Sorry if this sounds like an interrogation

@desb42
Copy link
Collaborator Author

desb42 commented Feb 7, 2019

Success!
my timings are somewhat different

cmd end: wiki.mass_parse.exec 17h 56m 24s 675f
bldr done: 17h 56m 54s 775f

logs
20190206_083124.290.zip

@gnosygnu
Copy link
Owner

gnosygnu commented Feb 7, 2019

Sorry, for late reply, but been traveling

Are you saying that on your machine, you do not use any memory switches and this was a test to limit it?

Nah, this was a mistake on my side. I forgot the default was 8GB default. I thought it was infinite.

I always run the builds with the command-line above (for at least the past 5 or more years)

if so, do you have some JAVA* environment variables set? or something else?

Nothing else set.

Another thought, do you run from the cygwin environment or a command shell (or something else)?

I have a weird hybrid setup.

  • I do my development work on a Windows box.
  • I write the code in C#
  • I transpile to Java with some primitive tool I wrote almost a decade ago.
  • I debug and test with Eclipse (Neon).
  • I have Windows shell scripts to build the jars / application zips
  • When I build the image / html databases, I build them on an openSUSE Linux box

Ultimately, I want to unify on a pure Java / Linux setup, but for the time being, I still have a heavy dependency in the C# / Windows world. From an XOWA perspective though, I've made sure that users can use whatever OS they want (Windows, Linux, Mac OS X). The same also goes for devs.

Sorry if this sounds like an interrogation

No worries!

Success!

Cool! Your err.txt is much smaller too, and in line with my own.

my timings are somewhat different

Yeah, I would think that's machine related? I also have a SSD (though it's a few years old). My hardware specs are on http://xowa.org/home/wiki/Dev/Command-line/Dumps#Hardware and I haven't upgraded it in the last two years (aside from memory - which isn't a factor here b/c of the 16 GB limit)

@desb42
Copy link
Collaborator Author

desb42 commented Feb 7, 2019

Another thing, what is actually taking up all the space?
The wikidata should be constrained via the mru cache (to roughly 100Mb by default)
May need to keep track of image links
I thought most of the working data is written to temporary sqlite databases
So why would it need to grow to 15GB of memory?

@desb42
Copy link
Collaborator Author

desb42 commented Feb 7, 2019

Looking at session.txt from 20190205_225543.050.zip
I have come across entries I cannot find in my version of dewiki (or for that matter in https://de.wikipedia.org)
For example the line:

20190206_104632.687 qid not found in wikidata; qid=Q30728129 page=Frankfurter_Buchmesse_2017/Frankophone_Autoren/1984

the page Frankfurter_Buchmesse_2017/Frankophone_Autoren/1984 does not exist
The dewiki dump I am working from is as of 2019-01-01

@gnosygnu
Copy link
Owner

Another thing, what is actually taking up all the space?
The wikidata should be constrained via the mru cache (to roughly 100Mb by default)
May need to keep track of image links

Image links is pretty big. It's about 1 GB in my memory dump. Unfortunately, it needs to be loaded all in memory.

I thought most of the working data is written to temporary sqlite databases
So why would it need to grow to 15GB of memory?

XOWA writes a lot of data to sqlite dbs, but it also caches a lot in RAM. From my memory, the worst offenders are (fuzzy numbers based on enwiki):

  • 5 to 6 GB: Template / Module
    • controlled by load_all_templates='y'
    • Caches template / Module wikitext and XOWA DOM trees (XOWA tries to create a DOM tree for all wikitext docs)
  • 1 GB : imglinks
    • controlled by load_all_imglinks='y'
    • Caches all image name, width, height and other meta for commonswiki and enwiki
  • < 1 GB: 8 wikis
    • controlled by num_wkrs = 8
    • There's basically some overhead of creating 8 full-fledged enwiki instances in memory (as well as some other cache-related stuff)

This means that a baseline, the XOWA enwiki build needs 7 to 8 GB.

While building, it can reach up to 15 GB due to various caching XOWA does

  • Template evaluations: XOWA caches stuff like {{some_template|arg1}} with a result of some_result. I remember this has a non-minor improvement on performance, though can't say it's major
  • Page loads: Pages loaded from the wikitext database are cached in memory
  • Various other caches: Too many to list here

All these caches get cleaned up every time cleanup_interval = 50 is reached.

Finally, I think the real offender in usage is Java GC. When profiling the app through VisualVM, I've regularly seen it drift upwards to 15 GB, and then tick back down to 8 GB. This "drifting to 15 GB and reset to 8 GB" occurs over a 30 or 40 second period which is easily the time it takes for XOWA to process 300 or so articles

Anyway, hope this helps

@gnosygnu
Copy link
Owner

the page Frankfurter_Buchmesse_2017/Frankophone_Autoren/1984 does not exist

Oops. I was logging page title instead of page url. The page does exist here: https://de.wikipedia.org/wiki/Wikipedia:Frankfurter_Buchmesse_2017/Frankophone_Autoren/1984

I made a commit above to fix it. Thanks for the report!

@gnosygnu
Copy link
Owner

Marking this issue closed. Let me know if I missed anything. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants