Not an issue as such, a few questions.
Why in ArticleTextExtractor.getNodes() do you:
Regarding 1: yes, you are right. But it wouldn't be a difference in terms of CPU or memory. As HashSet uses even more memory than HashMap and calculating the hashCode would still be done under the hood from hashset ... but when I think about it then this could be improved using an IdentityHashMap. I'll see if I can get all tests passing
Regarding 2: Thanks! Really not necessary.
The linked hashmap cannot be replaced by an identity hashmap as the order of insertion is important.
Is this now better understandable?
Yes, definitely clearer but I'm still not convinced a Map is suitable here; you are filling the values with null and returning a Set... This may be a matter of preference though (especially if HashMap has better performance than HashSet).
Yeah, ok. I'll see if it would have significant perf or memory differences. BTW: hashset is implemented via hashmap ...