The goal of pylistfm is to offer a means for the creation of playlists where the probability of any given song or artist being added to a playlist can be weighted with the data available from last.fm on the song or artist. These playlists are meant to offer an alternative to hitting the 'randomize' button on your music player of choice. In this era, music libraries rarely consist solely of best-of songs. More commonly music libraries are built around albums, or even entire discographies. The randomize button loses its value as a useful playlist generation tool when, given an arbitrary but particular artist, a user would be equally likely to listen to a song by that artist which is, subjectively, worse than average, as the user would be to listen to a song which is subjectively better than average. The issue worsens when for the majority of artists in a user's library, the user would not even want to listen to an average song by that artist. Pylistfm offers an alternative under which the chance of a song being chosen by a particular artist is weighted by its relative popularity according to global listening data pulled from the last.fm web services API.
In addition, by being open-source, pylistfm offers even those with minimal programming experience the ability to modify the song-picking algorithm in any way they see fit. One can also export data about their music library or any subset of their library to visualize or apply the data however the user chooses. Two playlist-selection algorithms come bundled into the pylistfm package, one of which based purely on track listening counts, and one based on the ratio of listens to listeners.
The first algorithm, based on track listening counts, is best explained through example. John, a pylistfm user, has 10,000 tracks in his library, with 100 of them being by, for this example, The Beatles. The probability of the first song being added to the playlist being by The Beatles is 100 / 10000, or 1%. Let's say the random number generator hits that 1% chance, deciding that a Beatles song will be the first on the playlist. In a traditional pure randomization playlist, each song would have a 1/100 or 1% chance of being chosen. However, pylistfm weights the probability of each song being chosen by the total number of listens to that song from all last.fm users across the globe. If the song Come Together has 1,000,000 listens, that song has a 1,000,000 / n chance of being chosen, where n is equal to the sum of all listen counts of all of the Beatles songs he has in his library. If that sum is 40,000,000 listens, he now has a 1/40 chance to hear Come Together. Once the first song is chosen, that song is removed from the potential tracks and the process is repeated, this time with the Beatles having a 99 / 9,999 chance of being picked for the second song, or a .99% probability.
The second algorithm, based on the ratio of listens to listeners, effectively weights track choice by the average number of times a song is listened to by lastfm users. The idea behind this selection-process is that the better songs in your library will have been listened to, by each last.fm listener of the song, more times on average than songs which are not as enjoyable. Artist choice is weighted by the average of the ratio values of the track objects in the artist object's .tracks field.
Primary definitions file. Run this in an interactive console to run pylistfm.
Class Hybrid_Track definition. Hybrid_Track objects are what represent processed songs, storing data in the following fields: track (track name), artist_name, album_name, location, itunes_id, track_number, track_count, file_duration, bit_rate, sample_rate, playcount, artist (stores a pylast.Artist object reference), album (stores a pylast.Album object reference), listener_count (int), lfm_playcount (int). Excluding the exceptions noted above, each of these fields stores a unicode string assuming that data exists for that field. If data does not exist for a field, it contains a reference to a NoneType object.
get_lfm_info(itunes_library) accepts a parameter containing a pylistxml.Itunes_Library object. Converts each track and artist object in the Itunes_Library object into Hybrid_Track and pylast.Artist objects, respectively. Saves a file containing a list of pylast.Artist objects, with each object containing all Hybrid_Track objects belonging to that Artist object in the object's .tracks field. This file is saved as 'incompleteartists.db' Tracks which failed the conversion process are saved as 'failedtracks.db'
process_info(fname='incompleteartists.db',v=False) accepts the filename of the saved list of Artist objects generated by get_lfm_info(). v can be set to True if you want to see a line of text for each artist and track processed. It fills in the Artist.sum_playcount field with the sum value of playcounts of all Hybrid_Track objects in its tracks field, and it fills in the Artist.playcount field with the total number of times a particular artist has been played on last_fm by all users. In addition, each Hybrid_Track object has its get_data() method called, filling in the object's listener_count and lfm_playcount field with integer values representing the total number of listeners and the total number of times listened, respectively. Once all of these fields have been filled in for all track and artist objects, calculate_ratios() is called, filling in a ratio field for every track and artist object. The ratio for a track is a float equal to lfm_playcount / listener_count. The ratio for an artist object is the average ratio value for the track objects in its tracks field. Artist objects also have their .trackcount field filled in with the integer value of the length of their tracks field. Tracks and artists without valid ratio data are then removed. To avoid a track having invalid data, make sure that the track's ID3 tags are correct.
make_progress(): The function which advances the stage of data parsing and processing. This function will direct you from having just downloaded the application to having a ready dataset of artist and track objects to apply playlist generation algorithms to. It will check what files exist in the current directory to gauge your progress towards a ready-to-use dataset. The first stage requires you to perform an iTunes library export. The function will prompt you to do so and provide instructions to do so if you have not done this. The next stage is to call pylistxml.parse_itunes('Library.xml') which saves an Itunes_Library object as 'itunes.db'. The third stage calls get_lfm_info(datamgmt.load('itunes.db')) which performs the processes described above, then saves a file 'incompleteartists.db'. The final stage is to call process_info() which is also described above, resulting in a 'artists.db' file being generated.
save(data,filename) performs a pickle dump of the data into a file of name filename using protocol 2.
load(filename) returns the unpickled contents of a file.
make_m3u(songs) accepts a list of Hybrid_Track objects and creates an m3u8 (m3u8 is a m3u file encoded in utf-8) playlist in the current directory titled 'playlistnew.m3u'. Works for Windows.
make_m3u_osx(songs) does the same thing as make_m3u but formats the file locations in a way that works with OSX.
- Contains functions used to parse an itunes xml library export and store that data in an Itunes_Library object.
make_playlist(artists,songcount) accepts a list of artists and the number of songs you want in your playlist. Makes choices in the way described above as Algorithm 1. Returns a list of songs of songcount length.
make_playlist2(artists,songcount) Input and output identical to make_playlist. Uses the selection algorithm described above as Algorithm 2.
I would like to re-build the user interface for the application, allowing all actions currently possible from the command-line to be possible through an intuitive GUI. I would also like to look into using a SQL database to store all data used by the application. Currently it only uses a SQL database to store cached results of API calls to last.fm. I've found cPickle to be much faster at saving and loading large numbers of entries than the sqlite3 module. I would like to create a pandora /genius-style playlist generation algorithm which will factor similarity of tracks to the previous track into the song choice decision-making. I would also like to find a more efficient way to determine when new tracks and artists are added to iTunes and then process those objects, adding them to the 'artists.db' database. It would be nice to find a way to integrate directly to iTunes, without the need for the intermediate step of a library xml export. I would also like to implement song bit rates into account when parsing the iTunes xml export. Currently, it accepts the first item encountered of a particular song name and particular track name, ignoring any duplicate tracks, potentially ones with higher audio quality.
The difference between unicode strings and ascii strings was a huge problem area throughout development. However, I'm fairly certain that at this point all bugs relating to unicode strings with characters that do not exist in ascii have been resolved. I ran into issues with pickle where under after performing a dump using protocol 0 or 1, files became unloadable. This has something to do with unicode, but I'm really unsure of exactly what the issue was. I ran into issues with pickling objects which stored references to the cached results of their function_calls using a Connection object referencing the sqlite database. By disabling caching immediately before saving, the Network field containing the Connection object is assigned None as a value, allowing for pickle to save the objects.
Directions for use:
Perform an iTunes library export (File -> Library -> Export Library from within iTunes) and put the Library.xml file in the same directory as lfmgather.py.
Run lfmgather.py interactively in your python console of choice.
Once you've loaded lfmgather.py, execute the following command: makeprogress().
You can then watch as pylistfm performs the data gathering necessary and advances from Library.xml to itunes.db to incompleteartists.db, which is finally converted into artists.db.
This whole process may take several hours, depending on the size of your library. Once lfmgather has created an artists.db file, you have all the necessary data stored clientside to perform playlist generation. At that point, any time you run lfmgather.py, you can execute the commands:
- artists = datamgmt.load('artists.db')
- songs = filters.make_playlist(artists, n) or songs = filters.make_playlist2(artists, n) where n is equal to the number of songs you want on the generated playlist.
- datamgmt.make_m3u_osx(songs) or datamgmt.make_m3u(songs) (osx or windows, respectively)
which will result in a m3u8 playlist of those songs being saved as playlistnew.m3u in the current working directory. Once you've called a make_playlist function, the tracks which are chosen are removed from artists, so you'll have to reload artists.db afterwards if you want songs from the first playlist to potentially end up in any future playlists.
Please let me know if you run into any issues or bugs or suggestions when using this program, it's still very much a work in progress and there are quite a few things I'd like to add or fix at this point. However, the primary documented features should be fully functional at this point, and most errors have try/except statements in the code that attempt to work around situations where I have encountered errors so far, as I work toward figuring out more specifically what causes some of the errors and attempt to correct them before they happen.