New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better genre detection and track recommendation #ML #data #83

Open
adrienjoly opened this Issue Aug 11, 2017 · 7 comments

Comments

3 participants
@adrienjoly
Member

adrienjoly commented Aug 11, 2017

For music lovers, discovering new music is essential.

Spotify is well known for the quality of their "Discover Weekly" playlist, containing a personalised selection of tracks based on your listening history.

On Openwhyd, current ways to discover music are:

  1. listening to your stream, after having followed users with similar musical taste;
  2. or listening to hot tracks, which are classified by genres.

The first way is purely relying on humans and luck.

The second way relies on a list of 16 genres (a quite limited and vague selection of genres), in which popular tracks are classified, based on the names of the playlists that hold them. This kinda works but it's far from perfect. For example, we had to create a hard-coded rule to prevent Daft Punk songs from being recognised as Punk Rock music!

In order to discover new music by discovering relevant people to follow, we had also experienced showing a measure of profile similarity, but it was only based on the number of artists that were added by both users.

=> Anyone interested in exploring new ways to discover music on Openwhyd?


Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

@adrienjoly adrienjoly created this issue from a note in Openwhyd's Roadmap (📈 Roadmap) Aug 11, 2017

@adrienjoly

This comment has been minimized.

Show comment
Hide comment
@Marinlemaignan

This comment has been minimized.

Show comment
Hide comment
@Marinlemaignan

Marinlemaignan Aug 18, 2017

Contributor

This is a rather exciting feature to add!
I was thinking that, maybe, we could replace plTags.js by another db backedup service that would first ask discogs' api infos about a track/album/artist's metadata, and then store them back for later use, instead of being hardcoded as it is now. So we can then build something that would be able to evolve bits by bits. Also mongodb's perfect for this kind of job.

Contributor

Marinlemaignan commented Aug 18, 2017

This is a rather exciting feature to add!
I was thinking that, maybe, we could replace plTags.js by another db backedup service that would first ask discogs' api infos about a track/album/artist's metadata, and then store them back for later use, instead of being hardcoded as it is now. So we can then build something that would be able to evolve bits by bits. Also mongodb's perfect for this kind of job.

@adrienjoly

This comment has been minimized.

Show comment
Hide comment
@adrienjoly

adrienjoly Aug 18, 2017

Member

Hi @Marinlemaignan !

I'd be happy to replace plTags.js when we have a fully-functional solution that is better than the current one, while maintaining:

  • user satisfaction on the hot tracks page (which is also openwhyd's landing page)
  • and relevance of the users that are recommended based on selected genres, during user onboarding (also fed by plTags)

One way we could transition gently to a new system:

  • the new system (e.g. based on discogs) is developed outside of openwhyd's repo,
  • inspired by the use cases provided above, automated tests are written to ensure that the new system works as expected (or better),
  • when hosted online, the new system's database could be populated at the same time as plTags's system does, by pluging a web-hook (or something like that).

What do you think?
Are you interested in working on this?

Member

adrienjoly commented Aug 18, 2017

Hi @Marinlemaignan !

I'd be happy to replace plTags.js when we have a fully-functional solution that is better than the current one, while maintaining:

  • user satisfaction on the hot tracks page (which is also openwhyd's landing page)
  • and relevance of the users that are recommended based on selected genres, during user onboarding (also fed by plTags)

One way we could transition gently to a new system:

  • the new system (e.g. based on discogs) is developed outside of openwhyd's repo,
  • inspired by the use cases provided above, automated tests are written to ensure that the new system works as expected (or better),
  • when hosted online, the new system's database could be populated at the same time as plTags's system does, by pluging a web-hook (or something like that).

What do you think?
Are you interested in working on this?

@adrienjoly

This comment has been minimized.

Show comment
Hide comment
@adrienjoly
Member

adrienjoly commented Nov 2, 2017

@adrienjoly adrienjoly moved this from 📈 Roadmap to 🎈 ToDo / up next in Openwhyd's Roadmap Nov 2, 2017

@SkinyMonkey

This comment has been minimized.

Show comment
Hide comment
@SkinyMonkey

SkinyMonkey Nov 5, 2017

Contributor

I have experimented extensively with discog's API.

It's very complete, extremely promising but ... the number of request is of 60 requests .. per minute.

https://www.discogs.com/developers/#page:home,header:home-rate-limiting

There is no way to go around this. A partnership would be the only solution and I doubt that they would be attracted by a partnership that does not bring them anything.

What we could do is identify albums and point to their products/sellings. They would not split in such a big showcase as openwhyd.

A solution is to host their database. There is docker images to download their monthly dump and index it in mongodb.
A mongo-connector to an elasticsearch database then allow us to get the extra performance to be efficient on the lookup.
I tried, it was working well.

But even then a few other problems arise :

  • youtube/soundcloud names are not always the right ones, mispelled etc
  • identify the right album from discog is sometime difficult, i spent a lot of time on this.
  • the images of the albums are not available in the dump, would be cool to display it instead of the youtube artwork for example
  • the links to youtube videos linked to the album
    yes .. yes they have that and it would be an amazing feature!
    imagine, you post a track and bam! You get other tracks from the same album
    but they are not available in the dump

A solution that I studied would be scraping ... but they wouldn't like it and what a dirty solution.

I'm not saying it's impossible, just that it's not a bulletproof approach.

Contributor

SkinyMonkey commented Nov 5, 2017

I have experimented extensively with discog's API.

It's very complete, extremely promising but ... the number of request is of 60 requests .. per minute.

https://www.discogs.com/developers/#page:home,header:home-rate-limiting

There is no way to go around this. A partnership would be the only solution and I doubt that they would be attracted by a partnership that does not bring them anything.

What we could do is identify albums and point to their products/sellings. They would not split in such a big showcase as openwhyd.

A solution is to host their database. There is docker images to download their monthly dump and index it in mongodb.
A mongo-connector to an elasticsearch database then allow us to get the extra performance to be efficient on the lookup.
I tried, it was working well.

But even then a few other problems arise :

  • youtube/soundcloud names are not always the right ones, mispelled etc
  • identify the right album from discog is sometime difficult, i spent a lot of time on this.
  • the images of the albums are not available in the dump, would be cool to display it instead of the youtube artwork for example
  • the links to youtube videos linked to the album
    yes .. yes they have that and it would be an amazing feature!
    imagine, you post a track and bam! You get other tracks from the same album
    but they are not available in the dump

A solution that I studied would be scraping ... but they wouldn't like it and what a dirty solution.

I'm not saying it's impossible, just that it's not a bulletproof approach.

@adrienjoly

This comment has been minimized.

Show comment
Hide comment
@adrienjoly

adrienjoly Nov 5, 2017

Member

Thanks for sharing our ideas and notes with us, @SkinyMonkey !

Member

adrienjoly commented Nov 5, 2017

Thanks for sharing our ideas and notes with us, @SkinyMonkey !

@adrienjoly

This comment has been minimized.

Show comment
Hide comment
@adrienjoly

adrienjoly Nov 10, 2017

Member

WIP:

Florent Piétot is currently analysing Openwhyd's data set, and thinking of ways to leverage it (e.g. use clustering and/or machine learning techniques for better genre detection and music recommendation).

Member

adrienjoly commented Nov 10, 2017

WIP:

Florent Piétot is currently analysing Openwhyd's data set, and thinking of ways to leverage it (e.g. use clustering and/or machine learning techniques for better genre detection and music recommendation).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment