In [1]:
import pandas as pd
from plotly import express
from warnings import filterwarnings
filterwarnings(action='ignore', category=FutureWarning)

GUYS = '/kaggle/input/top-classical-composers/classical_composers.csv'
# we used charset normalizer detect to find the encoding but now that we
# know it we don't need that code any more
df = pd.read_csv(filepath_or_buffer=GUYS, encoding='hp_roman8')
# our data has a mystery bogus row at the top
df = df[df.index != 0]
# and we have some weird space in our column names that we will be happier without
df.columns = [item.strip() for item in df.columns]
df['mid'] = 0.5 * (df['Born'] + df['Died'])
df.head()

Unnamed: 0,Composer,Nationality,Born,Died,Biggest Piece,Duration of Biggest Piece(mins),mid
1,Ludwig van Beethoven,German,1770.0,1791.0,Symphony No. 9,65.0,1780.5
2,Wolfgang Amadeus Mozart,Austrian,1756.0,1791.0,Symphony No.41,33.0,1773.5
3,Johann Sebastian Bach,German,1685.0,1750.0,Mass in B minor,125.0,1717.5
4,Richard Wagner,German,1813.0,1883.0,Der Ring des Nibelungen,,1848.0
5,Joseph Haydn,Austrian,1732.0,1809.0,Symphony No. 45,25.0,1770.5


We would like to see all of our composers on a timeline, but our dates aren't really dates, and if they were they wouldn't fit into the pandas epoch, so we need to be a little creative to see a timeline. Let's use a scatter plot and make it big enough to give every composer a row.

In [2]:
express.scatter(data_frame=df.sort_values(by='Born'), y='Composer', x=['Born', 'Died'], height=1800, color_discrete_map={'Born': 'green', 'Died': 'red'} )

We can also group them by nationality, which is kind of interesting.

In [3]:
express.scatter(data_frame=df.sort_values(by='Born'), y='Composer', x=['Born', 'Died'], height=1800, color='Nationality' )

Another thing we can try to do is capture a sense of whether compositions have gotten longer or shorter over time. We don't have the date of composition or first performance for the biggest piece data, but we can place each composer in time by taking the midpoint of the year data.

In [4]:
express.scatter(data_frame=df, x='mid', y='Duration of Biggest Piece(mins)', color='Nationality', hover_name='Composer', hover_data=['Biggest Piece'])

If we use half-century buckets we clearly see that our composers mostly flourished between 1800 and 1950.

In [5]:
express.histogram(data_frame=df, x='mid', nbins=20, color='Nationality')

In [6]:
express.histogram(data_frame=df, x='Duration of Biggest Piece(mins)',   nbins=28, marginal='box')

It's interesting to see that our durations start to have something like a Gaussian distribution; perhaps it would look more Gaussian if we had thousands of composers.