Generalize interface types where possible and prudent #84

frizbog · 2015-09-27T20:25:26Z

(See Issue #79)

There are a number of places where we could use less-specific interface types than we use currently. Scenarios to evaluate on parameters and return types:

Collections instead of Sets and Lists
- Set and List should only be used when the type of collection matters
- Particular attention should be made to using the right underlying implementation:
  - HashSets when duplicates are forbidden but order is not significant
  - LinkedHashSets when duplicates are forbidden but order is significant
  - ArrayLists whenever duplicates are allowed
CharSequences instead of Strings

Of course, this should only be done when it makes sense.

frizbog · 2016-07-06T01:03:27Z

I've decided to continue return Strings as return types on the API instead of CharSequences. People generally expect to receive a String back, and expect it to be immutable and have defined behavior for equals() and hashCode(), none of which are guaranteed for CharSequences.

I may still accept CharSequences as parameters into the API from calling code to allow more flexibility for callers.

frizbog · 2016-07-06T02:21:06Z

Upon further experimentation and consideration, I will not be switching to HashSets in the object model to avoid duplicates.

The lightest Set implementation that ships with Java is a HashSet, and it has an overhead of 32 bytes per entry compared to ArrayList's 4 bytes. When I'm working this hard to reduce the memory footprint (especially for large files), I'm not inclined to add that much memory overhead just to keep users from adding an item to a collection more than once.

An argument could also be made that if someone is reading a GEDCOM file that has duplicated entries in it, that duplication should be preserved in the object model. There's nothing in the GEDCOM spec that specifically forbids it or considers it an error, although I don't know anyone who is arguing that support for duplication is actually desirable.

I am opening another issue, though, for a compromise: The validators could look for duplicates, and if autorepair is enabled in the validators, the duplicates could be automatically removed.

For release 3.0.0, though, Collections are staying Lists backed by ArrayLists, and Maps are remaining HashMaps.

frizbog · 2016-07-08T12:25:54Z

Decided against using CharSequences anywhere. In order to safely work with the data from a CharSequence, I'd need to make defensive String copies anyway, which further bloats things.

frizbog added the enhancement label Sep 27, 2015

frizbog added this to the Release 3 milestone Sep 27, 2015

frizbog mentioned this issue Jul 6, 2016

Validators should detect/autorepair duplicate items in Lists #111

Closed

frizbog closed this as completed Jul 8, 2016

frizbog added the wontfix label Jul 8, 2016

frizbog removed this from the Release 3 milestone Jul 9, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generalize interface types where possible and prudent #84

Generalize interface types where possible and prudent #84

frizbog commented Sep 27, 2015 •

edited

frizbog commented Jul 6, 2016

frizbog commented Jul 6, 2016

frizbog commented Jul 8, 2016

Generalize interface types where possible and prudent #84

Generalize interface types where possible and prudent #84

Comments

frizbog commented Sep 27, 2015 • edited

frizbog commented Jul 6, 2016

frizbog commented Jul 6, 2016

frizbog commented Jul 8, 2016

frizbog commented Sep 27, 2015 •

edited