New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add list_distinct command for more effective library lookups. #1022

Merged
merged 4 commits into from Mar 2, 2015

Conversation

4 participants
@adamcik
Member

adamcik commented Mar 1, 2015

Should resolve part of the problem with #913 as this gives backends a way to support this other than doing our silly search for everything workaround.

@adamcik adamcik force-pushed the adamcik:fix/913 branch from 4499d8a to 5fd2afa Mar 1, 2015

@jodal

This comment has been minimized.

Member

jodal commented Mar 1, 2015

If you ignore the fact that this is used by list* commands in MPD, is there any other reason to name it list_distinct() instead of get_distinct()? We probably have several methods named get_*() that returns lists today.

@adamcik

This comment has been minimized.

Member

adamcik commented Mar 1, 2015

Note that I just rebased a minute or two after opening the PR, please be aware of this so you comment the right commits.

@jodal jodal added this to the v0.20 - Audio cleanup 1 milestone Mar 1, 2015

@jodal jodal added the 2 - Working label Mar 1, 2015

@jodal jodal self-assigned this Mar 1, 2015

@adamcik

This comment has been minimized.

Member

adamcik commented Mar 1, 2015

I was almost tempted to just call it distinct as we don't have get_* anywhere else in the module.

@adamcik

This comment has been minimized.

Owner

adamcik commented on mopidy/local/json.py in ba8fc51 Mar 1, 2015

I was also tempted to make the field names album.artist.name and do some getter magic. But having them fixed, and matching the query field names is probably more sane in the long run.

@tkem

This comment has been minimized.

Member

tkem commented Mar 2, 2015

Just an idea: Would it be possible for list_distinct to return an iterable instead of a set, for future improvements? I know, Mopidy core today uses mostly lists/tuples internally, so this would end up being converted to a list or set higher up in the call chain anyway for now. But list_distinct may return lots of results, so delaying set creation might still have some benefits.

@adamcik

This comment has been minimized.

Member

adamcik commented Mar 2, 2015

For the backend.list_distinct part sure that sounds like a reasonable idea I think. We would end up with a future per backend with a generator and then core would need to consume them all as it needs to lump them all together. We could postpone it further by making the core API repeat the exercise creating a new generator that filters out dupes, or for that matter just providing a result per backend an making it the frontends problem. Possibly even to the point of just giving the frontends the backend futures.

And as for frontends the MPD one would need to consume the generator at some point no matter what, and as of the current implementation the same goes for the WS as we sadly haven't added any streaming results support.

With this idea in mind we should probably also look at the proposed v1.0 Core API spec: http://goo.gl/TOzY8L as I suspect there are more places that could benefit from generators.

@jodal

This comment has been minimized.

Member

jodal commented Mar 2, 2015

I was almost tempted to just call it distinct as we don't have get_* anywhere else in the module.

But all the other words are verbs. "distinct" is a noun, thus a lousy method name.

@tkem

This comment has been minimized.

Member

tkem commented Mar 2, 2015

@adamcik: Sounds good - that's exactly what I had in mind ;-)

From a backend perspective, this would probably let Mopidy-Local-SQLite return the result cursor from a SELECT DISTINCT..., which already is a generator AFAIK, possibly wrapped using itertools.imap to return strings instead of Row objects. Might prove more efficient than slurping all results into memory up-front.

@tkem

This comment has been minimized.

Member

tkem commented Mar 2, 2015

Didn't think about combining distinct results from multiple backends, though; to filter out duplicates using a generator, I think you'd have to impose an ordering (i.e. results must be sorted alphabetically). which may or may not be a problem for backends. Providing one result per backend, on the other hand, would somewhat resemble search results, where clients already usually do some combining/sorting, so this might be okay. Note that search results in theory could also contain duplicates (i.e. multiple backends returning the same http stream), but I don't think clients handle that, yet.

@ZenithDK

This comment has been minimized.

Contributor

ZenithDK commented Mar 2, 2015

I think ordering the results would be a problem as I seem to recall having read somewhere that the results from Spotify are ordered by popularity?

@jodal

This comment has been minimized.

Member

jodal commented Mar 2, 2015

I've thought a bit more about Pykka vs generators:

  • Pykka doesn't care at all about generators, so it will work to pass them from one actor to another.
  • A generator produces results when needed.

Consider this scenario:

  1. Actor A returns a generator to actor B, closing over/using variables in actor A.
  2. Actor B consumes the generator, causing the generator code to run in actor B, using state from actor A.

This may be recipe for disaster.

@tkem

This comment has been minimized.

Member

tkem commented Mar 2, 2015

Well, as I said, this was just a spontaneous idea I had early this morning, and I didn't think it through. So I didn't mean to block this PR, since the return type can always be "relaxed" from a set to an interable later without backend changes, if this is indeed feasible.

@adamcik

This comment has been minimized.

Member

adamcik commented Mar 2, 2015

So I guess we just keep returning sets for now to postpone potential gotchas with pykka + generators. That means the two remaining questions are:

  1. What do we call the method?
  2. Should be return a single set, or one per backend like search does?
@jodal

This comment has been minimized.

Member

jodal commented Mar 2, 2015

  1. get_distinct()! library.get_distinct('artist') reads well.
  2. A single set makes sense, I think. Otherwise the frontends will need to merge the sets to get rid of duplicates. There's no URIs or anything mapping e.g. the artist to the backend anyway.
@adamcik

This comment has been minimized.

Member

adamcik commented Mar 2, 2015

Ok, renamed now.

jodal added a commit that referenced this pull request Mar 2, 2015

Merge pull request #1022 from adamcik/fix/913
Add list_distinct command for more effective library lookups.

@jodal jodal merged commit 2179bf0 into mopidy:develop Mar 2, 2015

2 checks passed

Scrutinizer 3 updated code elements
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@jodal jodal added 3 - Done and removed 2 - Working labels Mar 2, 2015

@adamcik adamcik deleted the adamcik:fix/913 branch Mar 2, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment