Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize interface types where possible and prudent #84

Closed
frizbog opened this issue Sep 27, 2015 · 3 comments
Closed

Generalize interface types where possible and prudent #84

frizbog opened this issue Sep 27, 2015 · 3 comments

Comments

@frizbog
Copy link
Owner

frizbog commented Sep 27, 2015

(See Issue #79)

There are a number of places where we could use less-specific interface types than we use currently. Scenarios to evaluate on parameters and return types:

  • Collections instead of Sets and Lists
    • Set and List should only be used when the type of collection matters
    • Particular attention should be made to using the right underlying implementation:
      • HashSets when duplicates are forbidden but order is not significant
      • LinkedHashSets when duplicates are forbidden but order is significant
      • ArrayLists whenever duplicates are allowed
  • CharSequences instead of Strings

Of course, this should only be done when it makes sense.

@frizbog frizbog added this to the Release 3 milestone Sep 27, 2015
@frizbog
Copy link
Owner Author

frizbog commented Jul 6, 2016

I've decided to continue return Strings as return types on the API instead of CharSequences. People generally expect to receive a String back, and expect it to be immutable and have defined behavior for equals() and hashCode(), none of which are guaranteed for CharSequences.

I may still accept CharSequences as parameters into the API from calling code to allow more flexibility for callers.

@frizbog
Copy link
Owner Author

frizbog commented Jul 6, 2016

Upon further experimentation and consideration, I will not be switching to HashSets in the object model to avoid duplicates.

The lightest Set implementation that ships with Java is a HashSet, and it has an overhead of 32 bytes per entry compared to ArrayList's 4 bytes. When I'm working this hard to reduce the memory footprint (especially for large files), I'm not inclined to add that much memory overhead just to keep users from adding an item to a collection more than once.

An argument could also be made that if someone is reading a GEDCOM file that has duplicated entries in it, that duplication should be preserved in the object model. There's nothing in the GEDCOM spec that specifically forbids it or considers it an error, although I don't know anyone who is arguing that support for duplication is actually desirable.

I am opening another issue, though, for a compromise: The validators could look for duplicates, and if autorepair is enabled in the validators, the duplicates could be automatically removed.

For release 3.0.0, though, Collections are staying Lists backed by ArrayLists, and Maps are remaining HashMaps.

@frizbog
Copy link
Owner Author

frizbog commented Jul 8, 2016

Decided against using CharSequences anywhere. In order to safely work with the data from a CharSequence, I'd need to make defensive String copies anyway, which further bloats things.

@frizbog frizbog closed this as completed Jul 8, 2016
@frizbog frizbog added the wontfix label Jul 8, 2016
@frizbog frizbog removed this from the Release 3 milestone Jul 9, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant