Better genre detection and track recommendation #ML #data #83

adrienjoly · 2017-08-11T18:20:29Z

For music lovers, discovering new music is essential.

Spotify is well known for the quality of their "Discover Weekly" playlist, containing a personalised selection of tracks based on your listening history.

On Openwhyd, current ways to discover music are:

listening to your stream, after having followed users with similar musical taste;
or listening to hot tracks, which are classified by genres.

The first way is purely relying on humans and luck.

The second way relies on a list of 16 genres (a quite limited and vague selection of genres), in which popular tracks are classified, based on the names of the playlists that hold them. This kinda works but it's far from perfect. For example, we had to create a hard-coded rule to prevent Daft Punk songs from being recognised as Punk Rock music!

In order to discover new music by discovering relevant people to follow, we had also experienced showing a measure of profile similarity, but it was only based on the number of artists that were added by both users.

=> Anyone interested in exploring new ways to discover music on Openwhyd?

adrienjoly · 2017-08-11T18:25:11Z

Examples that could be applied to Openwhyd:

#deeplearning

Marinlemaignan · 2017-08-18T18:16:36Z

This is a rather exciting feature to add!
I was thinking that, maybe, we could replace plTags.js by another db backedup service that would first ask discogs' api infos about a track/album/artist's metadata, and then store them back for later use, instead of being hardcoded as it is now. So we can then build something that would be able to evolve bits by bits. Also mongodb's perfect for this kind of job.

adrienjoly · 2017-08-18T19:03:02Z

Hi @Marinlemaignan !

I'd be happy to replace plTags.js when we have a fully-functional solution that is better than the current one, while maintaining:

user satisfaction on the hot tracks page (which is also openwhyd's landing page)
and relevance of the users that are recommended based on selected genres, during user onboarding (also fed by plTags)

One way we could transition gently to a new system:

the new system (e.g. based on discogs) is developed outside of openwhyd's repo,
inspired by the use cases provided above, automated tests are written to ensure that the new system works as expected (or better),
when hosted online, the new system's database could be populated at the same time as plTags's system does, by pluging a web-hook (or something like that).

What do you think?
Are you interested in working on this?

adrienjoly · 2017-11-02T11:21:09Z

/cc @florentpietot

SkinyMonkey · 2017-11-05T16:04:25Z

I have experimented extensively with discog's API.

It's very complete, extremely promising but ... the number of request is of 60 requests .. per minute.

https://www.discogs.com/developers/#page:home,header:home-rate-limiting

There is no way to go around this. A partnership would be the only solution and I doubt that they would be attracted by a partnership that does not bring them anything.

What we could do is identify albums and point to their products/sellings. They would not split in such a big showcase as openwhyd.

A solution is to host their database. There is docker images to download their monthly dump and index it in mongodb.
A mongo-connector to an elasticsearch database then allow us to get the extra performance to be efficient on the lookup.
I tried, it was working well.

But even then a few other problems arise :

youtube/soundcloud names are not always the right ones, mispelled etc
identify the right album from discog is sometime difficult, i spent a lot of time on this.
the images of the albums are not available in the dump, would be cool to display it instead of the youtube artwork for example
the links to youtube videos linked to the album
yes .. yes they have that and it would be an amazing feature!
imagine, you post a track and bam! You get other tracks from the same album
but they are not available in the dump

A solution that I studied would be scraping ... but they wouldn't like it and what a dirty solution.

I'm not saying it's impossible, just that it's not a bulletproof approach.

adrienjoly · 2017-11-05T17:51:37Z

Thanks for sharing our ideas and notes with us, @SkinyMonkey !

adrienjoly · 2017-11-10T13:21:23Z

WIP:

Analytics / BI: know our users (and openwhyd usage) better · Issue #107 · openwhyd/openwhyd
Analytics / BI: count track playback errors, study the causes, increase success rate · Issue #88 · openwhyd/openwhyd

Florent Piétot is currently analysing Openwhyd's data set, and thinking of ways to leverage it (e.g. use clustering and/or machine learning techniques for better genre detection and music recommendation).

adrienjoly · 2018-11-03T10:46:59Z

During a "Hackergarten" meetup in Paris, Mihangy, Damien and I wrote a python script that turns playlog.json.log into a anonymised csv file in which each line associates a user (identified by a number) to a youtube track id that the user listened to.

👉 2c095d2

The goal was to provide a starting point for the development of a music recommendation algorithm based on Openwhyd's playback logs, while preserving the privacy of its users. (i.e. data anonymisation)

Next steps:

cluster similar songs by similarity, e.g. groups of tracks that we're listened by more than one user
build a mini website that would recommend tracks, based on a user-given youtube video
build a mini website that would recommend tracks and openwhyd users to subscribe to, based on the user's openwhyd profile
integrate it as a discovery feature on openwhyd.org

adrienjoly · 2018-12-27T05:27:16Z

The data science cheatsheets provided on this repo may help :-) https://github.com/FavioVazquez/ds-cheatsheets

adrienjoly · 2018-12-27T10:21:30Z

This also may help: https://github.com/trekhleb/homemade-machine-learning (examples of machine learning techniques in Python, based on Andrew Ng's MOOC)

adrienjoly · 2019-02-26T19:13:36Z

During Hackergarten meetup, Sébastien Treguer suggested the following next step:

First easy step, implement a collaborative filtering for recommendation, with a matrix of users (in rows) and content like videos (in columns)

http://surpriselib.com/

adrienjoly · 2019-03-02T19:22:20Z

For reference: Mihangy is experimenting with Jupyter Notebooks and SurpriseLib. He opened a google group to discuss data analysis tasks on openwhyd's data using those tools.

Aidan O'Donnell and Patrick Allain also showed interest in these initiatives, during this week's Hackergarten.

(for #83)

# [1.5.0](v1.4.9...v1.5.0) (2019-04-03) ### Features * add timestamp to anonymised playlog entries ([45caf45](45caf45)), closes [#83](#83) * can anonymise playlog with timestamps or ObjectIDs ([461623b](461623b)), closes [#83](#83)

adrienjoly · 2019-04-03T21:41:19Z

For reference, I published a 700MB history/playlog file in https://github.com/openwhyd/openwhyd-data

At some point, it may be worth picking a license and publishing the data on open data listings like awesomedata/awesome-public-datasets. Suggestions are welcome!

adrienjoly · 2019-07-19T06:28:50Z

This list of best practices could help: https://github.com/microsoft/recommenders

adrienjoly · 2020-11-21T17:25:59Z

Music genre detection and genre-based streams were removed in #399. => Closing.

adrienjoly added enhancement help wanted labels Aug 11, 2017

adrienjoly removed the Hacktoberfest label Nov 3, 2017

adrienjoly added the experiment / study label Nov 3, 2018

adrienjoly mentioned this issue Mar 3, 2019

Create a /data page to download an anonymised playback history #191

Closed

adrienjoly added a commit that referenced this issue Apr 3, 2019

feat: add timestamp to anonymised playlog entries

45caf45

(for #83)

adrienjoly added a commit that referenced this issue Apr 3, 2019

feat: can anonymise playlog with timestamps or ObjectIDs

461623b

(for #83)

adrienjoly closed this as completed Nov 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better genre detection and track recommendation #ML #data #83

Better genre detection and track recommendation #ML #data #83

adrienjoly commented Aug 11, 2017 •

edited

Loading

adrienjoly commented Aug 11, 2017

Marinlemaignan commented Aug 18, 2017

adrienjoly commented Aug 18, 2017

adrienjoly commented Nov 2, 2017

SkinyMonkey commented Nov 5, 2017

adrienjoly commented Nov 5, 2017

adrienjoly commented Nov 10, 2017

adrienjoly commented Nov 3, 2018 •

edited

Loading

adrienjoly commented Dec 27, 2018

adrienjoly commented Dec 27, 2018

adrienjoly commented Feb 26, 2019

adrienjoly commented Mar 2, 2019 •

edited

Loading

adrienjoly commented Apr 3, 2019 •

edited

Loading

adrienjoly commented Jul 19, 2019

adrienjoly commented Nov 21, 2020

Better genre detection and track recommendation #ML #data #83

Better genre detection and track recommendation #ML #data #83

Comments

adrienjoly commented Aug 11, 2017 • edited Loading

adrienjoly commented Aug 11, 2017

Marinlemaignan commented Aug 18, 2017

adrienjoly commented Aug 18, 2017

adrienjoly commented Nov 2, 2017

SkinyMonkey commented Nov 5, 2017

adrienjoly commented Nov 5, 2017

adrienjoly commented Nov 10, 2017

adrienjoly commented Nov 3, 2018 • edited Loading

adrienjoly commented Dec 27, 2018

adrienjoly commented Dec 27, 2018

adrienjoly commented Feb 26, 2019

adrienjoly commented Mar 2, 2019 • edited Loading

adrienjoly commented Apr 3, 2019 • edited Loading

adrienjoly commented Jul 19, 2019

adrienjoly commented Nov 21, 2020

adrienjoly commented Aug 11, 2017 •

edited

Loading

adrienjoly commented Nov 3, 2018 •

edited

Loading

adrienjoly commented Mar 2, 2019 •

edited

Loading

adrienjoly commented Apr 3, 2019 •

edited

Loading