ArticleTextExtractor.getNodes() questions #3

Closed
tlvince opened this Issue Mar 11, 2012 · 5 comments

2 participants

@tlvince

Not an issue as such, a few questions.

Why in ArticleTextExtractor.getNodes() do you:

  1. Use a Map, generate a hashCode and then only return the map values? Wouldn't a Set do the same job?
  2. Add the parent of each element?
@karussell
Owner

Regarding 1: yes, you are right. But it wouldn't be a difference in terms of CPU or memory. As HashSet uses even more memory than HashMap and calculating the hashCode would still be done under the hood from hashset ... but when I think about it then this could be improved using an IdentityHashMap. I'll see if I can get all tests passing

Regarding 2: Thanks! Really not necessary.

@karussell
Owner

The linked hashmap cannot be replaced by an identity hashmap as the order of insertion is important.

@karussell
Owner

Is this now better understandable?

38203f2#diff-0

@tlvince

Yes, definitely clearer but I'm still not convinced a Map is suitable here; you are filling the values with null and returning a Set... This may be a matter of preference though (especially if HashMap has better performance than HashSet).

@tlvince tlvince closed this Mar 12, 2012
@karussell
Owner

Yeah, ok. I'll see if it would have significant perf or memory differences. BTW: hashset is implemented via hashmap ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment