Refactor SpotifyId#459
Conversation
- perf: * base62 encoding is an order of magnitude faster (~20x); * base16/62 enc/dec and from_uri are several times faster (~2-20x); * Let FileId::to_base16() reuse the hex encoder (~20x); - changes: * Add to_uri() method; * Make from_uri() error handling consistent; * Move audio type from string matching to a SpotifyAudioType factory (private); * Implent From/Into<&str> for SpotifyAudioType; * Add representation sizes as associated constants (private); - cs/docs: * Add rudimentary docs for most public funcs; * Add trivial test cases for the codecs;
willstott101
left a comment
There was a problem hiding this comment.
The actual changes LGTM. I think it's good to have someone working with librespot for non-audio tasks as it can only help make the whole thing more reliable, with fewer assumptions. Thanks for going into such depth.
There are just a couple of comments I've made inline.
W.R.T. your suggestions on where to go next: I think it all sounds reasonable, and inspired by comments made by @ashthespy which can only be good ;)
If you're happy to get going, go for it.
| } | ||
| } | ||
|
|
||
| #[inline] |
There was a problem hiding this comment.
It's my understanding that #[inline] doesn't do anything unless the function is used across crates without LTO (which it can't be since it's not pub). Is there any point of it here?
| } | ||
| } | ||
|
|
||
| #[cfg(test)] |
There was a problem hiding this comment.
I think the added complexity is fine since you've taken the time to add tests.
But could you add running the tests to the travis config?
And if it's not a huge PITA, it would be good to see a commit with the tests passing for the old implementation to prove correctness - at least for the methods that already existed.
|
@alcore Sorry I seem to have missed this! Thanks for the effort, and appreciate the benchmarking! |
|
Just triggering a rebuild. The API changes look fine to me as well. @alcore are you planning on refactoring |
|
Any reason why this doesn't get merged? |
| let mut id = match SpotifyId::from_base62(&src[id_i..]) { | ||
| Ok(v) => v, | ||
| Err(e) => return Err(e), | ||
| }; |
There was a problem hiding this comment.
This is exactly what the question mark operator does.
| if src.len() != SpotifyId::SIZE { | ||
| return Err(SpotifyIdError); | ||
| }; | ||
|
|
||
| let mut arr: [u8; 16] = Default::default(); | ||
| arr.copy_from_slice(&data[0..16]); | ||
| let mut dst = [0u8; SpotifyId::SIZE]; | ||
| dst.copy_from_slice(src); |
There was a problem hiding this comment.
You could use try_into to convert a slice into a fixed-size array.
|
Wasn't merged because no response received from OP, and likely requires a review to account for potential conflicts/side effects as described in the original comment. Also API change is breaking, though this is less of an issue. |
|
I don't think the change is breaking, the function signatures are the same. It's a pity that this isn't merged, the code seems well-written and well-documented... If @alcore isn't interested anymore, I will perhaps create a new pr based on this one (if it's ok). |
|
@Johannesd3 feel free. I'm happy to merge a new PR based off this one, but it needs to be updated to work with changes to dev since it was originally submitted I believe. |
|
Closing in favour of #587 |
This PR includes a refactoring of
SpotifyId.In my particular use case I am dealing with a relatively large amount of track IDs (GBs) as playback history - those need to be base62 encoded onto the wire in batches of up to 100 and the current encoding algo was at my QPS prohibitively expensive as it performs 128-bit division + modulo, which compiles down to an unoptimized runtime call.
It now instead uses 64-bit arithmetic (which could relatively trivially be optimized for 32-bit arches behind a build condition) with an algo in use in the cryptocurrency ecosystem for their base58 encoding, further optimized for our case.
On
rustc 1.44.0-nightly (b543afca9 2020-04-05), Windows 10, i7 4770k @ 4.4GHz (windows/x86-64)the result is as follows:I then went ahead to pick some further low hanging fruit in the other methods, without resorting to LUTs nor SIMD intrinsics. I.e. the change to the base62 encoding algo is the only non-trivial piece of code contained.
from_uri()had an overhead of ~400ns/op on top of itsfrom_base62()call, which is now reduced to 7ns, with some trivial inline parsing based on our knowledge of the inputs. FileId gain was primarily due to it needlessly allocating 20 Strings - one for each byte.I included simple tests and rudimentary docs, which felt less welcome than the code - given there are none of either in the entire codebase ;-)
Rationale
I realize librespot itself in a player scenario barely uses those codepaths and that the PR introduces a bit of additional complexity. I would still like to be able to use this as a library and reuse the types in my project, given they do what I want -- just not efficiently enough.
Additional
to_uri()method. However, it moves part of the logic to theSpotifyAudioTypeenum (and is 100ns faster, albeit less idiomatic) where I feel it should belong;SpotifyObjectstruct to encapsulate the ID and object type instead.This would involve:
SpotifyAudioTypefor anSpotifyObjectTypeenum and filling it with other known types (including non-playable ones), keeping it unitary (with a placeholderUnknowntag for unrecognized types);SpotifyIdtoSpotifyObject. For consistency I'd love to rename ID toSpotifyObjectIdin that case as well;SpotifyContextto encapsulate aSpotifyObjectTypein a given context (including the URI);SpotifyIdshould convert to/from an URI or the base62 encoding. That is, if bumping MSRV is a go, as those need to be TryInto/TryFrom;A 'go' for this would be welcome as I'm eager to get coding.