New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: .GetAllProviders func #493
Conversation
|
Recording an out of band discussion: This should skip over the provider reactor entirely, and just walk through all providers records in the datastore, skipping (but not deleting) expired records. |
|
@Stebalien got the thing we discussed implemented. It passes the tests in isolation but when running with a race flag, it does detect one and that is because the query for all provider records can happen while writes are being made. I believe I can do two things:
I know you asked to not touch the reactor (understandably), but adding a mutex looks like a ugly hack. Any advise on how to do it better? |
| @@ -250,6 +250,42 @@ func (pm *ProviderManager) getProvidersForKey(k []byte) ([]peer.ID, error) { | |||
| return pset.providers, nil | |||
| } | |||
|
|
|||
| // GetAllProviders returns map of all providers where key is the provide key base32 | |||
| // encoded and the value is a slice of peerIDs that we know to be providing the key | |||
| func (pm *ProviderManager) GetAllProviders() (map[string][]peer.ID, error) { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if this was GetAllProviderKeys that returned a chan string? You could keep track of seen keys and the caller code could receive a provider key and call GetProviders for each?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would yield a database read for each provider for a GetAllProviders vs. one single read for all of them (unless some levelDB implementation detail that I'm missing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should stream results back. Either key/provider tuples (struct {key string, provider peer.ID}) or a struct with all providers for a given key struct { key string, providers []peer.ID }.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would yield a database read for each provider for a GetAllProviders vs. one single read for all of them (unless some levelDB implementation detail that I'm missing)
True, that was short signted. I was trying to think of a solution to buffering the entire datastore in memory. Unless pre-sorted, struct {key string, provider peer.ID} would just mean the caller will end up buffering the whole channel in order to group all the providers per key.
struct { key string, providers []peer.ID } is really what we want. Is possible without buffering everything in memory for the sort?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can ask for the datastore query to be sorted. On most datastores, that will be free.
| @@ -250,6 +250,42 @@ func (pm *ProviderManager) getProvidersForKey(k []byte) ([]peer.ID, error) { | |||
| return pset.providers, nil | |||
| } | |||
|
|
|||
| // GetAllProviders returns map of all providers where key is the provide key base32 | |||
| // encoded and the value is a slice of peerIDs that we know to be providing the key | |||
| func (pm *ProviderManager) GetAllProviders() (map[string][]peer.ID, error) { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should stream results back. Either key/provider tuples (struct {key string, provider peer.ID}) or a struct with all providers for a given key struct { key string, providers []peer.ID }.
| @@ -250,6 +250,42 @@ func (pm *ProviderManager) getProvidersForKey(k []byte) ([]peer.ID, error) { | |||
| return pset.providers, nil | |||
| } | |||
|
|
|||
| // GetAllProviders returns map of all providers where key is the provide key base32 | |||
| // encoded and the value is a slice of peerIDs that we know to be providing the key | |||
| func (pm *ProviderManager) GetAllProviders() (map[string][]peer.ID, error) { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also take a context.Context for cancelation.
That shouldn't lead to a race condition. Can you post the output of the race detector? |
|
Ah, it's because the provider manager auto-batches writes (and the autobatching datastore is not thread safe). I'd add a |
|
Closing due to inactivity. |
NOTE: Do not merge this PR. The Implementation is not ideal. See below:
This PR adds a .GetAllProviders func and respective test. The way the feature is implemented is in the spectrum between "high school science project and academic code", I'm not proud of it, but I am unsure what is the most elegant way to do it without requiring a huge refactor. Right now the constrain is that:
Why not store the PeerIDs as the values for the ProvidersPrefix + ProviderKey? I found it to be a odd use of a datastore.
Is there a better way? Please advise!