Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for streaming #336

Merged
merged 15 commits into from
Dec 30, 2020
Merged

Add support for streaming #336

merged 15 commits into from
Dec 30, 2020

Conversation

kvanzuijlen
Copy link
Contributor

Changes proposed in this pull request:

  • add support for streaming from batches to reduce memory
  • some code improvements

I tested both implementations by retrieving my recent tracks since January 1st of this year. With stream=False the memory usage was 150mb and rising while with stream=True (which I made the default) the memory usage stayed between 28 and 35mb.

@kvanzuijlen
Copy link
Contributor Author

I see that the travis-ci tests are failing with the following error:
pylast.WSError: Invalid API key - You must be granted a valid key by last.fm

I don't think this is something I can fix myself, right?
If so, can one of the maintainers please fix this?
If not, how can I fix this?

Thanks in advance.

@hugovk
Copy link
Member

hugovk commented Jul 12, 2020

Yeah, the CI fails for PRs, see #171.

Do they pass for you locally?

@kvanzuijlen
Copy link
Contributor Author

kvanzuijlen commented Jul 12, 2020

Yeah, the CI fails for PRs, see #171.

Do they pass for you locally?

@hugovk Thanks for your quick reply, I should've looked further. Yes, they succeed locally (as you can see in the screenshot).
image

The force-push was because I used the wrong git profile by accident.

Copy link
Member

@hugovk hugovk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much for the PR! I'll review it in more detail later.

I tested both implementations by retrieving my recent tracks since January 1st of this year.

Please could you share some example code for this?

stream=True (which I made the default)

I think I'd like to be on the cautious side and start off with stream=False as the default, so as not to have any surprising API changes (for example, user.get_loved_tracks(limit=20) returns a 20-item list now, and a generator with this PR).

Then we can release it with a minor bump, and announce it will become the default in the next major release, where we can flip the default to stream=True. That way nothing will break for people pinning by major version.

We have a major, breaking release coming up soon anyway, as Python 3.5 will be EOL in September and dropped, so it won't be too long to wait.

What do you think?

README.md Outdated Show resolved Hide resolved
@hugovk hugovk added changelog: Added For new features enhancement New feature or request labels Jul 13, 2020
Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com>
@kvanzuijlen
Copy link
Contributor Author

kvanzuijlen commented Jul 14, 2020

Please could you share some example code for this?
Sure. I basically used the example code and added the call to get_user().get_recent_tracks(). The following is what I came up with.

import time
from datetime import datetime

import pylast

start = time.time()
print(start)

# You have to have your own unique two values for API_KEY and API_SECRET
# Obtain yours from https://www.last.fm/api/account/create for Last.fm
API_KEY = "******************************"
API_SECRET = "***************************"

# In order to perform a write operation you need to authenticate yourself
username = "*************"
password_hash = pylast.md5("**********")

network = pylast.LastFMNetwork(api_key=API_KEY, api_secret=API_SECRET,
                               username=username, password_hash=password_hash)

tracks = network.get_user(username=username).get_recent_tracks(
    time_from=int(datetime(year=2020, month=1, day=1).timestamp()),
    cacheable=False, stream=False  # or stream=True
)
for track in tracks:
    pass

end = time.time()
print(end)
print(end-start)

I took some shortcuts with this example since I only made some scratches in PyCharm. I ran both examples locally watched the memory usage of the python process via system monitor/taskmanager on Windows.

I think I'd like to be on the cautious side and start off with stream=False as the default, so as not to have any surprising API changes (for example, user.get_loved_tracks(limit=20) returns a 20-item list now, and a generator with this PR).

Then we can release it with a minor bump, and announce it will become the default in the next major release, where we can flip the default to stream=True. That way nothing will break for people pinning by major version.

We have a major, breaking release coming up soon anyway, as Python 3.5 will be EOL in September and dropped, so it won't be too long to wait.

What do you think?

Couldn't agree more, I didn't really think about it other than my own use case. Compatibility-wise it's a much better choice to make the change more gradually. I'll make the change and update the pull request.

To finish this comment, I would like to say thank you to you and the other contributers, since this wrapper made it really easy for me to implement my personal use case.

@kvanzuijlen
Copy link
Contributor Author

I made the changes as requested. Is there anything else I should do?

@hugovk
Copy link
Member

hugovk commented Dec 28, 2020

Please could you resolve the conflicts and I'll take another look? Thanks!

return _get_artist_tracks() if stream else list(_get_artist_tracks())

def get_friends(self, limit=50, cacheable=False, stream=False):
def get_friends(self, limit=50, cacheable=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lost its stream=False in the merge

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx, fixing it now.

"""
Returns this user's loved track as a sequence of LovedTrack objects in
reverse order of their timestamp, all the way back to the first track.

If limit==None, it will try to pull all the available data.
If stream=True, it will yield tracks as soon as a page has been retrieved.
If stream=False, it will yield tracks as soon as a page has been retrieved.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this still say If stream=True, ...?

We can also remove the stream=False from test calls, it has no effect.

@hugovk
Copy link
Member

hugovk commented Dec 29, 2020

I made a copy of your branch and resolved conflicts and changed the things I just commented above:

https://github.com/pylast/pylast/tree/streaming

(sorry for accidentally pushing to your branch, I undid the commit)

And all the tests passed except for one:

________________________ TestPyLastNetwork.test_caching ________________________

self = <tests.test_network.TestPyLastNetwork object at 0x7f3d83272310>

    def test_caching(self):
        # Arrange
        user = self.network.get_user("RJ")
    
        # Act
>       self.network.enable_caching()

tests/test_network.py:236: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
.tox/py/lib/python3.7/site-packages/pylast/__init__.py:428: in enable_caching
    self.cache_backend = _ShelfCacheBackend(file_path)
.tox/py/lib/python3.7/site-packages/pylast/__init__.py:788: in __init__
    self.shelf = shelve.open(file_path)
/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/shelve.py:243: in open
    return DbfilenameShelf(filename, flag, protocol, writeback)
/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/shelve.py:227: in __init__
    Shelf.__init__(self, dbm.open(filename, flag), protocol, writeback)
/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/dbm/__init__.py:78: in open
    result = whichdb(file) if 'n' not in flag else None
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

filename = 5

    def whichdb(filename):
        """Guess which db package to use to open a db file.
    
        Return values:
    
        - None if the database file can't be read;
        - empty string if the file can be read but can't be recognized
        - the name of the dbm submodule (e.g. "ndbm" or "gnu") if recognized.
    
        Importing the given module may still fail, and opening the
        database using that module may still fail.
        """
    
        # Check for ndbm first -- this has a .pag and a .dir file
        try:
>           f = io.open(filename + ".pag", "rb")
E           TypeError: unsupported operand type(s) for +: 'int' and 'str'

/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/dbm/__init__.py:112: TypeError

https://github.com/pylast/pylast/runs/1623179772?check_suite_focus=true

@kvanzuijlen
Copy link
Contributor Author

And all the tests passed except for one:
...

I'll look into it right away

@kvanzuijlen
Copy link
Contributor Author

kvanzuijlen commented Dec 29, 2020

I originally replaced .mktemp() with .TemporaryFile(), because .mktemp() is marked as unsafe. However, I should've replaced it with '.mkstemp()` (https://docs.python.org/3/library/tempfile.html#tempfile.mktemp). Can you please recheck?

@hugovk
Copy link
Member

hugovk commented Dec 29, 2020

That fails with another TypeError: can only concatenate tuple (not "str") to tuple, because mkstemp returns a tuple like (14, '/var/folders/kt/j77sf4_n6fnbx6pg199rbx700000gn/T/pylast_tmp_3koykybr'):

mkstemp() returns a tuple containing an OS-level handle to an open file (as would be returned by os.open()) and the absolute pathname of that file, in that order.

Let's leave it with the original in this PR as there's enough going on here, and please could you open a separate PR to change it?

Thanks!

@kvanzuijlen
Copy link
Contributor Author

Sorry, should've read that, don't know why I didn't. Reverted it to the original .mktemp().

@hugovk
Copy link
Member

hugovk commented Dec 30, 2020

I tested locally and all tests pass. Let's merge and I'll get a release out in the next week or so.

Thank you!

@hugovk hugovk merged commit 34690f6 into pylast:master Dec 30, 2020
@hugovk hugovk changed the title Implemented streaming Add support for streaming Dec 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog: Added For new features enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants