# Visualizing a piano teaching "family tree"

I would like in this notebook to create the data structure necessary for a "teaching family tree", i.e. a graph of relations "X taught Y".

As usual, my boilerplate code:

In [1]:
%matplotlib inline
import pandas
import sqlalchemy
# define DB
PGHOST = "192.168.11.2"
PGDATABASE = "musicbrainz_db"
PGUSER = "musicbrainz"
PGPASSWORD = "musicbrainz"

engine = sqlalchemy.create_engine(
   'postgresql+psycopg2://{PGUSER}:{PGPASSWORD}@{PGHOST}/{PGDATABASE}'.format(**locals()),
    isolation_level='READ UNCOMMITTED')

# helper function
def sql(query, **kwargs):
    return pandas.read_sql(query, engine, params=globals(), **kwargs)

Let's start simply by finding [Heinrich Neuhaus](https://en.wikipedia.org/wiki/Heinrich_Neuhaus)'s teachers and students

In [2]:
artist_name = 'Heinrich Neuhaus'

In [3]:
artist = sql("""
SELECT a.id,
       a.gid AS mbid,
       a.name,
       aa.name,
       to_date(to_char(a.begin_date_year, '9999') || 
               to_char(a.begin_date_month, '99') || 
               to_char(a.begin_date_day, '99'), 'YYYY MM DD') AS start,
       to_date(to_char(a.end_date_year, '9999') || 
               to_char(a.end_date_month, '99') || 
               to_char(a.end_date_day, '99'), 'YYYY MM DD') AS end
  FROM artist       AS a
  JOIN artist_alias AS aa ON aa.artist=a.id
 WHERE aa.name = %(artist_name)s;
""")
artist_id = artist['id'][0]
artist_mbid = artist['mbid'][0]
artist

Unnamed: 0,id,mbid,name,name.1,start,end
0,605778,2b075237-6e90-4e78-a4f8-a66170c682fe,Генрих Густавович Нейгауз,Heinrich Neuhaus,1888-04-12,1964-10-10


We'll keep the MBID for later

Let's explore MusicBrainz relations to find the proper "X taught Y":

In [4]:
relation = sql("""
SELECT id, name, description
  FROM link_type
 WHERE description ILIKE '%%teach%%';
""")
relation_id = relation['id'][0]
relation

Unnamed: 0,id,name,description
0,847,teacher,This relationship indicates that a person was ...
1,893,teacher,This relationship indicates that a person was ...


So the one we need is relation id=847 (the other one is for events)

Let's now find the direct students:

In [5]:
df_students = sql("""
SELECT student.name,
       student.sort_name,
       student.gid AS mbid
  FROM artist          AS student
  JOIN l_artist_artist AS laa     ON laa.entity1 = student.id
  JOIN artist          AS teacher ON laa.entity0 = teacher.id
 WHERE teacher.gid = %(artist_mbid)s;
""")
df_students

Unnamed: 0,name,sort_name,mbid
0,Святослав Теофилович Рихтер,"Richter, Sviatoslav Teofilovich",2014bfbb-c65d-45dc-9973-35bead3833fa
1,Эмиль Григорьевич Гилельс,"Gilels, Emil Grigoryevich",88b4ad33-63ba-4923-947e-26a720631156
2,Radu Lupu,"Lupu, Radu",bec02ec4-34be-43db-9be1-006606174fd2
3,Алексей Любимов,"Lubimov, Alexei",385cb39b-f06c-4635-84fd-ecc8fff56c97
4,Анатолий Иванович Ведерников,"Vedernikov, Anatoly Ivanovich",fcfc3731-3b67-4b1d-8096-6a1537b61daf
5,Valentina Kameníková,"Kameníková, Valentina",ab095a51-18c1-43ab-ba96-2b1c8cdb7019
6,Ryszard Bakst,"Bakst, Ryszard",d1cea2e7-0ef6-4434-8890-61f22f85827c
7,Heljo Sepp,"Sepp, Heljo",a460e14c-a004-4ead-be61-8d3956c3de17
8,Yuliy Meitus,"Meitus, Yuliy",ff5ce524-7794-4e84-ad42-a586c29c4f90


So we have a few of them in MusicBrainz.

And teachers:

In [6]:
df_teachers = sql("""
SELECT teacher.name,
       teacher.sort_name,
       teacher.gid AS mbid
FROM artist AS student
JOIN l_artist_artist AS laa ON laa.entity1 = student.id
JOIN artist AS teacher ON laa.entity0 = teacher.id
WHERE student.gid = %(artist_mbid)s;
""")
df_teachers

Unnamed: 0,name,sort_name,mbid
0,Leopold Godowsky,"Godowsky, Leopold",f5b8d14c-3adb-4cd3-aa97-140a557c7302
1,Karl Heinrich Barth,"Barth, Karl Heinrich",695de63f-d20a-4e8f-bba9-2ae645f109a9


Now I want to use the [visjs](http://visjs.org) library to display a graph. My graph requires two javascript arrays:

* 'nodes', to store the node labels and identifiers (I chose to use MBIDs)
* 'edges', to store the links between two MBIDs corresponding to one 'X taught Y' relation

In [8]:
from pprint import pprint
nodes = []
nodes.append({'id': str(artist_mbid), 'label' : artist_name})
nodes.extend([{'id': str(student.mbid), 'label' : student.sort_name} 
              for student in df_students.itertuples()])
nodes.extend([{'id': str(teacher.mbid), 'label' : teacher.sort_name} 
              for teacher in df_teachers.itertuples()])
pprint(nodes)

edges = []
edges.extend([{'from': str(artist_mbid), 'to' : str(student.mbid)} 
              for student in df_students.itertuples()])
edges.extend([{'from': str(teacher.mbid), 'to' : str(artist_mbid) }
               for teacher in df_teachers.itertuples()])
pprint(edges)

[{'id': '2b075237-6e90-4e78-a4f8-a66170c682fe', 'label': 'Heinrich Neuhaus'},
 {'id': '2014bfbb-c65d-45dc-9973-35bead3833fa',
  'label': 'Richter, Sviatoslav Teofilovich'},
 {'id': '88b4ad33-63ba-4923-947e-26a720631156',
  'label': 'Gilels, Emil Grigoryevich'},
 {'id': 'bec02ec4-34be-43db-9be1-006606174fd2', 'label': 'Lupu, Radu'},
 {'id': '385cb39b-f06c-4635-84fd-ecc8fff56c97', 'label': 'Lubimov, Alexei'},
 {'id': 'fcfc3731-3b67-4b1d-8096-6a1537b61daf',
  'label': 'Vedernikov, Anatoly Ivanovich'},
 {'id': 'ab095a51-18c1-43ab-ba96-2b1c8cdb7019',
  'label': 'Kameníková, Valentina'},
 {'id': 'd1cea2e7-0ef6-4434-8890-61f22f85827c', 'label': 'Bakst, Ryszard'},
 {'id': 'a460e14c-a004-4ead-be61-8d3956c3de17', 'label': 'Sepp, Heljo'},
 {'id': 'ff5ce524-7794-4e84-ad42-a586c29c4f90', 'label': 'Meitus, Yuliy'},
 {'id': 'f5b8d14c-3adb-4cd3-aa97-140a557c7302', 'label': 'Godowsky, Leopold'},
 {'id': '695de63f-d20a-4e8f-bba9-2ae645f109a9',
  'label': 'Barth, Karl Heinrich'}]
[{'from': '2b075237-6e90

The output graph is shown on [github.io](https://loujine.github.io/musicbrainz-dataviz/)