Structure in Artist-Listener Data

This repository contains code to train a model that returns listener features for artists. It can be used as a base model for creating more complex models to generate artist insights (e.g., geographic or demographic recommendations).

Install

Ensure you're using Python 3.6. Then simply pull down the code and run

python -m venv env
source env/bin/activate
pip install .

Run

This software assumes you are using Snowflake as your data warehouse. It can always be reworked to use another data source, as long as you have the tables described below.

Copy .env.shadow to .env and fill in with your Snowflake and neo4j connection parameters. The three source tables referenced in config.py need to exist in the Snowflake schema you specified. These are:

USER_ARTIST_STREAMS_FILTERED_WITH_IDS: A table with columns artistname, artistnum, userid, usernum, and total_streams. Each row represents the total times a given listener streamed a given artist.
ARTIST_IDS_SINGLE_GENRE: A table with columns artistname, countrylist, and genreid. This represents the countries an artist has been played in, as well as their genre. countrylist is a comma-separated list of country names.
COUNTRY_ARTIST_STREAMS_BY_STORE: A table with columns artistnum, artistname, countryname, storename, and total_streams. Each row is the total number of streams an artist received in a given country and "store" (e.g., Spotify, iTunes).

Train

To train the model, simply run

artist_listener_train

This will save the features and artist clusters in features_clusters.p by default.

Inference

You can use the model to infer similar artists or generate geographic recommendations.

artist_listener_insights similarity "<artist_name_1,artist_name_2,...>"

Example output looks like

{
  "artist_name_1": [
    {
      "name": "similar_artist_name_1",
      "similarity": 0.8724746619981572
    },
    {
      "name": "similar_artist_name_2",
      "similarity": 0.8723650858404651
    },
    ...
  ],
  "artist_name_2": [
    {
      "name": "similar_artist_name_1",
      "similarity": 0.9528950042946587
    },
    {
      "name": "similar_artist_name_2",
      "similarity": 0.9339451879861346
    },
    ...
  ]
}

To get geographic recommendations, run

artist_listener_insights geographic "<artist_name_1,artist_name_2,...>"

which returns

{
  "aritst_name_1": {
    "top_markets": [
      {
        "territory": "USA",
        "actual_streams": {
          "itunes/apple": 65034529,
          "pandora": 11254721,
          "spotify": 108156780,
          "google play": 5427456,
          "amazon unlimited": 24978311,
          "napster": 928452,
          "deezer": 99110,
          "total": 215879359
        },
        "modeled_streams": {
          "amazon unlimited": 1058597,
          "deezer": 6532,
          "google play": 364509,
          "itunes/apple": 3434108,
          "napster": 39947,
          "pandora": 710965,
          "spotify": 8630254,
          "total": 14244912
        }
      },
      {
        "territory": "Canada",
        "actual_streams": {
          "google play": 1093796,
          "deezer": 39841,
          "spotify": 9839118,
          "itunes/apple": 7106304,
          "napster": 2568,
          "total": 18081627
        },
        "modeled_streams": {
          "deezer": 1843,
          "google play": 98414,
          "itunes/apple": 604035,
          "napster": 122,
          "spotify": 1132072,
          "total": 1836486
        }
      },
      ...
    ],
    "recommendations": [
      {
        "territory": "Australia",
        "actual_streams": {
          "itunes/apple": 27266,
          "deezer": 476,
          "spotify": 65721,
          "amazon unlimited": 10,
          "google play": 2794,
          "total": 96267
        },
        "modeled_streams": {
          "amazon unlimited": 227,
          "deezer": 118,
          "google play": 3219,
          "itunes/apple": 34395,
          "spotify": 241024,
          "total": 278983
        }
      },
      {
        "territory": "United Kingdom",
        "actual_streams": {
          "napster": 68,
          "google play": 344,
          "deezer": 342,
          "amazon unlimited": 108,
          "itunes/apple": 917,
          "spotify": 14662,
          "total": 16441
        },
        "modeled_streams": {
          "amazon unlimited": 1849,
          "deezer": 845,
          "google play": 2272,
          "itunes/apple": 28737,
          "napster": 116,
          "spotify": 169733,
          "total": 203552
        }
      },
      ...
    ]
  },
  ...
}

Upload to Databases

To load similarity and geographic insights data for all artists in the training set into Snowflake, run

artist_listener_upload snowflake

To upload a graph of artist nodes, clusters, and similarities to neo4j, run

artist_listener_upload neo4j

This requires being connected to a graph through the neo4j browser, which can be used to visualize the model.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
artist_listener_models		artist_listener_models
.env.shadow		.env.shadow
.flake8		.flake8
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

artist_listener_models

artist_listener_models

.env.shadow

.env.shadow

.flake8

.flake8

.gitignore

.gitignore

README.md

README.md

requirements.txt

requirements.txt

setup.py

setup.py

Repository files navigation

Structure in Artist-Listener Data

Install

Run

Train

Inference

Upload to Databases

About

Releases

Packages

Languages

jchryssanthacopoulos/artist_clustering

Folders and files

Latest commit

History

Repository files navigation

Structure in Artist-Listener Data

Install

Run

Train

Inference

Upload to Databases

About

Resources

Stars

Watchers

Forks

Languages