# Analysis of songs and their lyrics

## Installation and loading of libraries


In [24]:
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


# Data and sources


## Prints for Koppermaandag
From the 18th century onwards, so called "Koppermaandagprenten" are retained in cultural heritage institutes. These prints are a proof of the quality of printing that a printshop was able to deliver. The labourers of a printshop made a print and brought them to the customers, expecting them to buy the print, or give a tip. This extra income was used to buy drinks on "Koppermaandag", a celebration on the first Monday after Epiphany.

In 1991, an important publication is made about this type of printwork, with an overview of the retained Koppermaandagprenten. From this book the counts per year are available.

In [25]:
prentenDF = pd.read_csv("../data/koppermaandagprentenCount.csv", index_col="year")

print("Number of prints: "+ str(prentenDF['count'].sum()))


Number of prints: 554


## Data about songs
Between 1848 and ca. 1914 typographical associations created booklets with lyrics of songs they sang during feasts they organized. The dataset contains a table (in CSV) with an overview of all the songs in the booklets between 1848 and 1870, with among others title, year and writer.

In [26]:
liedjesDF = pd.read_csv("../data/liedjes.csv", dtype={'jaartal': 'Int32'})
liedjesDF = liedjesDF.sort_values(by=['songID'])

print("Number of songs:    " + str(len(liedjesDF)))
print("Number of booklets:  " + str(len(liedjesDF['sourceID'].unique())))

Number of songs:    771
Number of booklets:  64


## Song lyrics from files
Besides the overview of the songs in a CSV-file, for every song there is a machine readable representation of the lyrics. We use the following functions to process them.

In [27]:
def getlistOfFilenames(rootdir):
    # input: rootdir: directory with (subdirectory with) TXT-files to be handled
	# output: list of TXT-files(+path) lexicographically ordered on path-name

    files_all = []
    for subdir, dirs, files in os.walk(rootdir):
        for file in files:
            if not file.endswith('.txt'):
                continue
            fn = os.path.join(subdir, file)
            files_all.append(fn)

    files_all = sorted(files_all)
    return files_all

def getlistOfTexts(listOfFilenames):
	# input: listOfFilenames: list of TXT-files(+path) lexicographically ordered on path-name
    # output: list of texts

	texts = []
	for file in listOfFilenames:
		with open(file) as stream:
			text = stream.read()
		texts.append(text)

	return texts



We use the above functions to read the lyrics.

In [28]:
liedjesFilenames    = getlistOfFilenames('../data/lyrics')
liedjes             = getlistOfTexts(liedjesFilenames)

To investigate whether our code has worked, we look at the data of song with number ```n```.

In [29]:
n = 100

print("-- data: --")
print(liedjesDF.iloc[n])
print("-- path: --")
print(liedjesFilenames[n])
print("-- song: --")
print(liedjes[n])


-- data: --
typoID                                                      amsterdam1849
sourceID                                  amsterdam1849-feestliederen1862
songID                                 amsterdam1849-feestliederen1862-06
titel                   De boekdrukkunst beschouwd als het licht der v...
wijze                                   Makkers brengt met stem en snaren
jaartal                                                              1862
schrijver                                                        H. Stühr
vereniging_schrijver                                        amsterdam1849
Name: 164, dtype: object
-- path: --
data/lyrics/amsterdam1849/amsterdam1849-feestliederen1862/amsterdam1849-feestliederen1862-06.txt
-- song: --
De boekdrukkunst

Is de zaal thans weêr ontsloten,
Voor het jaarlijks Kopperfeest,
Laat ons dan dit feest vergrooten,
Met een regt verheugden geest;
Voegt u hier dan, Typographen !
Door geen bange zorg gedrukt,
Dat gij Costers roem blijft staven,
Nu 

# Stanza size 




In [36]:
n = 0
vondst = False
for liedje in liedjes:
    liedje_regels = liedje.split('\n')
    i = 0
    for line in liedje_regels:
        if line == '':
            if i > 1 and i < 7: vondst = True
            i = 0
        else: i = i + 1
    if vondst:
        print("-- data: --")
        print(liedjesDF.iloc[n])
        print("-- song: --")
        print(liedje)
    n = n + 1
    vondst = False


-- data: --
typoID                                   amsterdam1847
sourceID                   amsterdam1847-reglement1848
songID                  amsterdam1847-reglement1848-02
titel                                              NaN
wijze                                              NaN
jaartal                                           1848
schrijver                                          NaN
vereniging_schrijver                     amsterdam1847
Name: 65, dtype: object
-- song: --
- II
Komt, vriende in het rond,
Zingt met een blijden mond
Ter eer van Laurens Koster.
Wij zijn hier bij elkaar,
Een regte vriendenschaar
Om zijnen roem te melden.

Ja Koster, kunstgenoot
Wij zullen tot den dood
U vond aan elk veronden:
Gij gaf ons het bestaan
Geen Duitscher heeft ‘t gedaan
Wij zijn aan u verbonden.

Daarom, o vriendenkring
Laat ons dan onderling
Met vreugde hem gedenken
En blijven hem gehecht
Het pleit is lang beslecht
Dat Koster vond de Drukkunst


-- data: --
typoID                      