Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: .GetAllProviders func #493

Closed
wants to merge 1 commit into from
Closed

Conversation

daviddias
Copy link
Member

@daviddias daviddias commented Mar 13, 2020

NOTE: Do not merge this PR. The Implementation is not ideal. See below:

This PR adds a .GetAllProviders func and respective test. The way the feature is implemented is in the spectrum between "high school science project and academic code", I'm not proud of it, but I am unsure what is the most elegant way to do it without requiring a huge refactor. Right now the constrain is that:

  • There is not list of all providers available, other than the keys in the datastore
  • Each datastore key is the concatenation of ProvidersPrefix + ProviderKey + PeerID providing it.

Why not store the PeerIDs as the values for the ProvidersPrefix + ProviderKey? I found it to be a odd use of a datastore.

Is there a better way? Please advise!

@Stebalien
Copy link
Member

Stebalien commented Mar 13, 2020

Recording an out of band discussion:

This should skip over the provider reactor entirely, and just walk through all providers records in the datastore, skipping (but not deleting) expired records.

@daviddias daviddias removed the request for review from alanshaw Mar 13, 2020
@daviddias
Copy link
Member Author

daviddias commented Mar 16, 2020

@Stebalien got the thing we discussed implemented. It passes the tests in isolation but when running with a race flag, it does detect one and that is because the query for all provider records can happen while writes are being made.

I believe I can do two things:

  • Upgrade the reactor to receive requests for all providers, move the logic there and have the func issue that request.
  • Add a mutex to the datastore that locks on each cycle of the for loop, releasing the lock at each epoch time.

I know you asked to not touch the reactor (understandably), but adding a mutex looks like a ugly hack. Any advise on how to do it better?

@@ -250,6 +250,42 @@ func (pm *ProviderManager) getProvidersForKey(k []byte) ([]peer.ID, error) {
return pset.providers, nil
}

// GetAllProviders returns map of all providers where key is the provide key base32
// encoded and the value is a slice of peerIDs that we know to be providing the key
func (pm *ProviderManager) GetAllProviders() (map[string][]peer.ID, error) {
Copy link
Member

@alanshaw alanshaw Mar 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if this was GetAllProviderKeys that returned a chan string? You could keep track of seen keys and the caller code could receive a provider key and call GetProviders for each?

Copy link
Member Author

@daviddias daviddias Mar 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would yield a database read for each provider for a GetAllProviders vs. one single read for all of them (unless some levelDB implementation detail that I'm missing)

Copy link
Member

@Stebalien Stebalien Mar 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should stream results back. Either key/provider tuples (struct {key string, provider peer.ID}) or a struct with all providers for a given key struct { key string, providers []peer.ID }.

Copy link
Member

@alanshaw alanshaw Mar 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would yield a database read for each provider for a GetAllProviders vs. one single read for all of them (unless some levelDB implementation detail that I'm missing)

True, that was short signted. I was trying to think of a solution to buffering the entire datastore in memory. Unless pre-sorted, struct {key string, provider peer.ID} would just mean the caller will end up buffering the whole channel in order to group all the providers per key.

struct { key string, providers []peer.ID } is really what we want. Is possible without buffering everything in memory for the sort?

Copy link
Member

@Stebalien Stebalien Mar 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can ask for the datastore query to be sorted. On most datastores, that will be free.

@@ -250,6 +250,42 @@ func (pm *ProviderManager) getProvidersForKey(k []byte) ([]peer.ID, error) {
return pset.providers, nil
}

// GetAllProviders returns map of all providers where key is the provide key base32
// encoded and the value is a slice of peerIDs that we know to be providing the key
func (pm *ProviderManager) GetAllProviders() (map[string][]peer.ID, error) {
Copy link
Member

@Stebalien Stebalien Mar 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should stream results back. Either key/provider tuples (struct {key string, provider peer.ID}) or a struct with all providers for a given key struct { key string, providers []peer.ID }.

@@ -250,6 +250,42 @@ func (pm *ProviderManager) getProvidersForKey(k []byte) ([]peer.ID, error) {
return pset.providers, nil
}

// GetAllProviders returns map of all providers where key is the provide key base32
// encoded and the value is a slice of peerIDs that we know to be providing the key
func (pm *ProviderManager) GetAllProviders() (map[string][]peer.ID, error) {
Copy link
Member

@Stebalien Stebalien Mar 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also take a context.Context for cancelation.

@Stebalien
Copy link
Member

Stebalien commented Mar 16, 2020

it does detect one and that is because the query for all provider records can happen while writes are being made.

That shouldn't lead to a race condition. Can you post the output of the race detector?

@daviddias
Copy link
Member Author

daviddias commented Mar 17, 2020

@Stebalien
Copy link
Member

Stebalien commented Mar 17, 2020

Ah, it's because the provider manager auto-batches writes (and the autobatching datastore is not thread safe). I'd add a baseds field to the ProviderManager and store the non batching datastore there (see NewProviderManager).

@jacobheun
Copy link

jacobheun commented Nov 20, 2020

Closing due to inactivity.

@jacobheun jacobheun closed this Nov 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants