Skip to content
This repository has been archived by the owner on Mar 9, 2021. It is now read-only.

ArticleTextExtractor.getNodes() questions #3

Closed
tlvince opened this issue Mar 11, 2012 · 5 comments
Closed

ArticleTextExtractor.getNodes() questions #3

tlvince opened this issue Mar 11, 2012 · 5 comments

Comments

@tlvince
Copy link

tlvince commented Mar 11, 2012

Not an issue as such, a few questions.

Why in ArticleTextExtractor.getNodes() do you:

  1. Use a Map, generate a hashCode and then only return the map values? Wouldn't a Set do the same job?
  2. Add the parent of each element?
@karussell
Copy link
Owner

Regarding 1: yes, you are right. But it wouldn't be a difference in terms of CPU or memory. As HashSet uses even more memory than HashMap and calculating the hashCode would still be done under the hood from hashset ... but when I think about it then this could be improved using an IdentityHashMap. I'll see if I can get all tests passing

Regarding 2: Thanks! Really not necessary.

@karussell
Copy link
Owner

The linked hashmap cannot be replaced by an identity hashmap as the order of insertion is important.

@karussell
Copy link
Owner

Is this now better understandable?

38203f2#diff-0

@tlvince
Copy link
Author

tlvince commented Mar 12, 2012

Yes, definitely clearer but I'm still not convinced a Map is suitable here; you are filling the values with null and returning a Set... This may be a matter of preference though (especially if HashMap has better performance than HashSet).

@tlvince tlvince closed this as completed Mar 12, 2012
@karussell
Copy link
Owner

Yeah, ok. I'll see if it would have significant perf or memory differences. BTW: hashset is implemented via hashmap ...

arunkumar9t2 pushed a commit to arunkumar9t2/crux that referenced this issue Sep 16, 2018
karussell#3)

Fixed ConcurrentModificationException in removeDisallowedAttributes.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants