Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return mutable collection from sample() #38

Conversation

dssysolyatin
Copy link

@dssysolyatin dssysolyatin commented Jun 17, 2021

Hi!
I would like to use your library, but it has a very strong restriction that sample method returns an unmodifiable collection.

In my case, I want to sort sample results in order to build equi-depth histogram. But in order to do it, I should copy the unmodifiable collection and sort it. It requires extra space.

In this PR I removed unmodifiableSample. But maybe it would be better to allow a user to extend your library: remove final and replace private with protected in AbstractRandomSampling class. What do you think?

@gstamatelat
Copy link
Owner

Hello!

I'm not sure I like the first suggestion (making the sample() method return a mutable collection), mostly for two reasons:

  1. Allowing the caller to manipulate the internal reservoir would be dangerous as it could make the instance useless. The algorithm does not expect the reservoir to change between consecutive feeds.
  2. There are certain implementations that do not have a single concrete reservoir. See for example ChaoSampling where the reservoir is actually a concatenation of two data structures. In these cases it would be difficult to return a mutable collection without copying the underlying data structures. Making some implementations return read-only collections and others return mutable collections would be inconsistent.

I think one option would be to implement another method, for example RandomSampling.sampleMutable() or even RandomSampling.close(), that would return a mutable collection and nullify the instance, for example making the feed methods throw exception thereafter. In the case where this is not possible, a copy would be returned.

Regarding your second suggestion, can you explain more what you are proposing? Changing the AbstractRandomSampling.sample field to protected? Would that allow you to extend the implementation you are interested in like the following example?

package com.example;

import gr.james.sampling.LiLSampling;

import java.util.List;
import java.util.Random;

public class LiLSamplingUnsafe extends LiLSampling<Integer> {
    public LiLSamplingUnsafe(int sampleSize, Random random) {
        super(sampleSize, random);
    }

    public LiLSamplingUnsafe(int sampleSize) {
        super(sampleSize);
    }

    public List<Integer> sampleUnsafe() {
        return this.sample;
    }
}

@dssysolyatin
Copy link
Author

dssysolyatin commented Jun 18, 2021

My second proposal was about changing field modifiers to protected and remove final from methods.

For example:

public abstract class AbstractRandomSampling<T> implements RandomSampling<T> {
    protected final int sampleSize;
    protected final Random random;
    protected final List<T> sample;
    protected long streamSize;
    protected long skip;
    
    public Collection<T> sample() {
        return this.unmodifiableSample
    }
    

It allows to override sample method in subclass:

    public class VitterZSamplingUnsafe extends VitterZSampling<Integer> {
        public Collection<T> sample() {
           return this.sample
        }
    }

@gstamatelat
Copy link
Owner

Yes, I think your second proposal might be possible.

@gstamatelat
Copy link
Owner

@dssysolyatin Do you think that bea5422 addresses your issue?

@dssysolyatin
Copy link
Author

@gstamatelat Yes, Thanks !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants