Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dnsjava NIO selector thread stuck at 100% after terminating job #425

Closed
jmvezic opened this issue Aug 4, 2021 · 3 comments · Fixed by #444
Closed

dnsjava NIO selector thread stuck at 100% after terminating job #425

jmvezic opened this issue Aug 4, 2021 · 3 comments · Fixed by #444
Labels

Comments

@jmvezic
Copy link

jmvezic commented Aug 4, 2021

Using the latest version (20210803) and a lot of versions before that, when the job is terminated, one CPU thread seems to be stuck at 100% doing nothing. This never goes away until I restart Heritrix.

For reference, this doesn't happen with version 20200304, for example. I haven't tried all versions, so I don't know when this problem started. There's also nothing in the logs that would indicate something is wrong.

Using default crawler-beans with a set operator URL and any seed you like.

@jmvezic
Copy link
Author

jmvezic commented Oct 30, 2021

Update: this bug was introduced with version 3.4.0-20210617, and is present in all versions after that

@ato ato added the bug label Oct 30, 2021
@ato
Copy link
Collaborator

ato commented Oct 30, 2021

Confirming I can reproduce this in 20210803. Hitting shift-H in top shows it's the dnsjava NIO selector thread. Here's the stack trace (from jstack <pid>):

"dnsjava NIO selector" #67 daemon prio=4 os_prio=0 cpu=1014221.96ms elapsed=1060.26s tid=0x00007fd000011800 nid=0x90e27 runnable  [0x00007fd0e07bf000]
   java.lang.Thread.State: RUNNABLE
	at sun.nio.ch.EPoll.wait(java.base@11.0.12/Native Method)
	at sun.nio.ch.EPollSelectorImpl.doSelect(java.base@11.0.12/EPollSelectorImpl.java:120)
	at sun.nio.ch.SelectorImpl.lockAndDoSelect(java.base@11.0.12/SelectorImpl.java:124)
	- locked <0x00000000f41af7b8> (a sun.nio.ch.Util$2)
	- locked <0x00000000f41af558> (a sun.nio.ch.EPollSelectorImpl)
	at sun.nio.ch.SelectorImpl.select(java.base@11.0.12/SelectorImpl.java:136)
	at org.xbill.DNS.Client.runSelector(Client.java:67)
	at org.xbill.DNS.Client$$Lambda$308/0x00000001004ec840.run(Unknown Source)
	at java.lang.Thread.run(java.base@11.0.12/Thread.java:829)

@ato ato changed the title One CPU thread stuck at 100% after terminating job dnsjava NIO sel thread stuck at 100% after terminating job Oct 30, 2021
@ato ato changed the title dnsjava NIO sel thread stuck at 100% after terminating job dnsjava NIO selector thread stuck at 100% after terminating job Oct 30, 2021
@ato
Copy link
Collaborator

ato commented Oct 30, 2021

Poking this with a debugger a bit it appears select returns immediately because the thread was interrupted. dnsjava's runSelector() code never clears the interrupted flag so it just busy loops calling select. Looks like dnsjava NIO selector ends up in in ToePool.getToes() which presumably means ToePool.shutdown() is interrupting it.

One workaround might be to have ToePool check the thread name and exclude it from interrupting.

As the dnsjava selector thread is global per process it seems wrong that it ends up in the ToePool thread group at all. So perhaps it'd be better to prevent it from being assigned to the group in the first place. I guess one way to do this would be to do a dummy lookup on startup from a thread that's not in a group.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants