# Displaying MusicBrainz data in a timeline

One thing I thought about adding to the MusicBrainz web interface is a timeline on band pages, in order to show when new members arrived or left the band.

Before suggesting this change to the MusicBrainz project, we can try to display this kind of timeline in a Jupyter notebook and see how it would look like. I decided to test the timesheet-advanced.js library for this purpose.

Again I need first to initialize a connection to my local copy of the MusicBrainz database:

In [1]:
from pprint import pprint

import pandas
import sqlalchemy
# define DB
PGHOST = "192.168.11.3"
PGDATABASE = "musicbrainz_db"
PGUSER = "musicbrainz"
PGPASSWORD = "musicbrainz"

engine = sqlalchemy.create_engine(
   'postgresql+psycopg2://{PGUSER}:{PGPASSWORD}@{PGHOST}/{PGDATABASE}'.format(**locals()),
    isolation_level='READ UNCOMMITTED')

# helper function
def sql(query, **kwargs):
    return pandas.read_sql(query, engine, params=globals(), **kwargs)

Now I can extract the information for the band I want. The SQL query will look for:

* band name
* artists linked to this band through the "member of" relationship
* instrument/vocal role of this relationship

Let's start with some band you probably already know:

In [2]:
band_name = 'The Beatles'

The SQL query is a bit complicated because it uses a lot of different tables

In [3]:
df = sql("""
SELECT b.name AS band,
       m.name AS member,
       lat.name AS role,
       l.begin_date_year AS start,
       l.end_date_year AS end
FROM artist              AS b
JOIN l_artist_artist     AS laa ON laa.entity1 = b.id
JOIN artist              AS m   ON laa.entity0 = m.id
JOIN link                AS l   ON l.id = laa.link
JOIN link_attribute      AS la  ON la.link = l.id
JOIN link_attribute_type AS lat ON la.attribute_type = lat.id
JOIN link_type           AS lt  ON l.link_type = lt.id
WHERE lt.name = 'member of band'
  AND b.name = %(band_name)s
  AND lat.name != 'original';
""")
df

Unnamed: 0,band,member,role,start,end
0,The Beatles,George Harrison,lead vocals,1958.0,1970
1,The Beatles,George Harrison,guitar,1958.0,1970
2,The Beatles,Pete Best,drums,1960.0,1962
3,The Beatles,Paul McCartney,lead vocals,1957.0,1970
4,The Beatles,Paul McCartney,bass guitar,1957.0,1970
5,The Beatles,Ringo Starr,drums,1962.0,1970
6,The Beatles,Stuart Sutcliffe,bass guitar,1960.0,1962
7,The Beatles,John Lennon,lead vocals,,1970
8,The Beatles,John Lennon,guitar,,1970


Looks good. Apart from some details:

* John Lennon's role has no starting date, we can set it to 1957
* these missing values made PanDas believe year dates are floating point number, we should fix this

In [4]:
df['start'] = df['start'].fillna(1957).astype(int)
df['start'] = df['start'].astype(str)
df['end'] = df['end'].astype(str)
df

Unnamed: 0,band,member,role,start,end
0,The Beatles,George Harrison,lead vocals,1958,1970
1,The Beatles,George Harrison,guitar,1958,1970
2,The Beatles,Pete Best,drums,1960,1962
3,The Beatles,Paul McCartney,lead vocals,1957,1970
4,The Beatles,Paul McCartney,bass guitar,1957,1970
5,The Beatles,Ringo Starr,drums,1962,1970
6,The Beatles,Stuart Sutcliffe,bass guitar,1960,1962
7,The Beatles,John Lennon,lead vocals,1957,1970
8,The Beatles,John Lennon,guitar,1957,1970


To display the timeline inside this notebook we need to lead the JS/CSS source of the timesheet-advanced package:

In [5]:
from IPython.display import HTML
HTML("""
<link rel="stylesheet" type="text/css" href="./timesheet/timesheet.min.css" />
<script type="text/javascript" src="./timesheet/timesheet-advanced.min.js"></script>
""")

The timesheet-advanced package requires the input data for the timeline to be inserted slightly differently from what we have in our dataframe df. We need a 'label' field (we'll choose the band member name + instrument) and we need a 'type' which is a color. We choose colors to represent all possible roles (vocals, guitar, drums....)

In [6]:
df['label'] = df['member'] + ' (' + df['role'] + ')'
df

Unnamed: 0,band,member,role,start,end,label
0,The Beatles,George Harrison,lead vocals,1958,1970,George Harrison (lead vocals)
1,The Beatles,George Harrison,guitar,1958,1970,George Harrison (guitar)
2,The Beatles,Pete Best,drums,1960,1962,Pete Best (drums)
3,The Beatles,Paul McCartney,lead vocals,1957,1970,Paul McCartney (lead vocals)
4,The Beatles,Paul McCartney,bass guitar,1957,1970,Paul McCartney (bass guitar)
5,The Beatles,Ringo Starr,drums,1962,1970,Ringo Starr (drums)
6,The Beatles,Stuart Sutcliffe,bass guitar,1960,1962,Stuart Sutcliffe (bass guitar)
7,The Beatles,John Lennon,lead vocals,1957,1970,John Lennon (lead vocals)
8,The Beatles,John Lennon,guitar,1957,1970,John Lennon (guitar)


In [7]:
colors = dict(zip(list(set(df['role'])), ['red', 'blue', 'yellow', 'green']))
print('Correspondance between colors and roles: {}'.format(colors))
df['type'] = df['role'].apply(lambda role: colors[role])
df

Correspondance between colors and roles: {'guitar': 'red', 'bass guitar': 'blue', 'lead vocals': 'yellow', 'drums': 'green'}


Unnamed: 0,band,member,role,start,end,label,type
0,The Beatles,George Harrison,lead vocals,1958,1970,George Harrison (lead vocals),yellow
1,The Beatles,George Harrison,guitar,1958,1970,George Harrison (guitar),red
2,The Beatles,Pete Best,drums,1960,1962,Pete Best (drums),green
3,The Beatles,Paul McCartney,lead vocals,1957,1970,Paul McCartney (lead vocals),yellow
4,The Beatles,Paul McCartney,bass guitar,1957,1970,Paul McCartney (bass guitar),blue
5,The Beatles,Ringo Starr,drums,1962,1970,Ringo Starr (drums),green
6,The Beatles,Stuart Sutcliffe,bass guitar,1960,1962,Stuart Sutcliffe (bass guitar),blue
7,The Beatles,John Lennon,lead vocals,1957,1970,John Lennon (lead vocals),yellow
8,The Beatles,John Lennon,guitar,1957,1970,John Lennon (guitar),red


Good. The last preparation step is to transform this Python data structure into a Javascript one that the timesheet library can read. We're going to use the fact that a Python list and a Javascript array are very close.

In [8]:
bubbles = [df.ix[i].to_dict() for i in range(len(df))]
pprint(bubbles)

[{'band': 'The Beatles',
  'end': '1970',
  'label': 'George Harrison (lead vocals)',
  'member': 'George Harrison',
  'role': 'lead vocals',
  'start': '1958',
  'type': 'yellow'},
 {'band': 'The Beatles',
  'end': '1970',
  'label': 'George Harrison (guitar)',
  'member': 'George Harrison',
  'role': 'guitar',
  'start': '1958',
  'type': 'red'},
 {'band': 'The Beatles',
  'end': '1962',
  'label': 'Pete Best (drums)',
  'member': 'Pete Best',
  'role': 'drums',
  'start': '1960',
  'type': 'green'},
 {'band': 'The Beatles',
  'end': '1970',
  'label': 'Paul McCartney (lead vocals)',
  'member': 'Paul McCartney',
  'role': 'lead vocals',
  'start': '1957',
  'type': 'yellow'},
 {'band': 'The Beatles',
  'end': '1970',
  'label': 'Paul McCartney (bass guitar)',
  'member': 'Paul McCartney',
  'role': 'bass guitar',
  'start': '1957',
  'type': 'blue'},
 {'band': 'The Beatles',
  'end': '1970',
  'label': 'Ringo Starr (drums)',
  'member': 'Ringo Starr',
  'role': 'drums',
  'start': '

Perfect. Time to do some javascript. The Jupyter notebook can display javascript plots in an output cell by using the *element.append* magic. I'll display this cell (no. 9) later so that we keep the javascript code (no. 10) above its output, but the *element.append* code must be executed **before** the "new Timesheet" code (so 9 before 10).

Last step: we call the Timesheet javascript command using the CSS/JS libraries loaded above, our input data (referred below as 'bubbles', the cell where we want our graph, and the timeline limit (min and max date). Executing the next cell will fill the output cell just above this block automatically.

In [10]:
from IPython.display import Javascript
Javascript("""
var bubbles = %s;
new Timesheet(bubbles, {
    container: 'timesheet-container',
    type: 'parallel',
    //type: 'serial',
    timesheetYearMin: %s,
    timesheetYearMax: %s,
    theme: 'light'
});
""" % (bubbles, df['start'].min(), df['end'].max()))

<IPython.core.display.Javascript object>

In [9]:
%%javascript
// this must be executed before the "from IPython.display import Javascript" block
element.append('<div id="timesheet-container" style="width: 100%;height: 100%;"></div>');

<IPython.core.display.Javascript object>

We have our timeline now! As you can see the same color is used for the same role consistently.

## Second example

Let's repeat our code on a second band, this time a classical String Quartet group.

In [11]:
band_name = 'Beethoven String Quartet'

In [12]:
def prepare_data(band_name):
    df = sql("""
SELECT b.name AS band,
       m.name AS member,
       lat.name AS role,
       l.begin_date_year AS start,
       l.end_date_year AS end
FROM artist              AS b
JOIN l_artist_artist     AS laa ON laa.entity1 = b.id
JOIN artist              AS m   ON laa.entity0 = m.id
JOIN link                AS l   ON l.id = laa.link
JOIN link_attribute      AS la  ON la.link = l.id
JOIN link_attribute_type AS lat ON la.attribute_type = lat.id
JOIN link_type           AS lt  ON l.link_type = lt.id
WHERE lt.name = 'member of band'
  AND b.name = %(band_name)s
  AND lat.name != 'original';
""")
    df['start'] = df['start'].fillna(1957).astype(int)
    df['start'] = df['start'].astype(str)
    df['end'] = df['end'].astype(str)
    df['label'] = df['member'] + ' (' + df['role'] + ')'
    colors = dict(zip(list(set(df['role'])), ['red', 'blue', 'yellow', 'green']))
    df['type'] = df['role'].apply(lambda role: colors[role])
    return [df.ix[i].to_dict() for i in range(len(df))], df['start'].min(), df['end'].max()

In [13]:
prepare_data(band_name)

([{'band': 'Beethoven String Quartet',
   'end': '1965',
   'label': 'Василий Ширинский (violin)',
   'member': 'Василий Ширинский',
   'role': 'violin',
   'start': '1923',
   'type': 'blue'},
  {'band': 'Beethoven String Quartet',
   'end': '1977',
   'label': 'Дмитрий Цыганов (violin)',
   'member': 'Дмитрий Цыганов',
   'role': 'violin',
   'start': '1923',
   'type': 'blue'},
  {'band': 'Beethoven String Quartet',
   'end': '1990',
   'label': 'Nikolai Zabavnikov (violin)',
   'member': 'Nikolai Zabavnikov',
   'role': 'violin',
   'start': '1965',
   'type': 'blue'},
  {'band': 'Beethoven String Quartet',
   'end': '1990',
   'label': 'Олег Крыса (violin)',
   'member': 'Олег Крыса',
   'role': 'violin',
   'start': '1977',
   'type': 'blue'},
  {'band': 'Beethoven String Quartet',
   'end': '1974',
   'label': 'Сергей Петрович Ширинский (cello)',
   'member': 'Сергей Петрович Ширинский',
   'role': 'cello',
   'start': '1923',
   'type': 'yellow'},
  {'band': 'Beethoven String Q

In [15]:
from IPython.display import Javascript
Javascript("""
var bubbles = %s;
new Timesheet(bubbles, {
    container: 'timesheet-container2',
    type: 'parallel',
    timesheetYearMin: %s,
    timesheetYearMax: %s,
    theme: 'light'
});
""" % prepare_data(band_name))

<IPython.core.display.Javascript object>

In [14]:
%%javascript
// this must be executed before the "from IPython.display import Javascript" block
element.append('<div id="timesheet-container2" style="width: 100%;height: 100%;"></div>');

<IPython.core.display.Javascript object>

Use the slider to see when members changed in the String Quartet group

## Conclusion

With this notebook I hoped I showed that testing a new plotting library to display data from MusicBrainz is not very complicated