Skip to content

hcorp (R package): Music corpora for harmonic analysis

Notifications You must be signed in to change notification settings

pmcharrison/hcorp

Repository files navigation

hcorp: Music corpora for harmonic analysis

lifecycle Travis build status AppVeyor build status Coverage status DOI

This R package provides several datasets of chord sequences. These datasets are expressly for research purposes only.

  • bach_chorales_1: 370 chorales by J. S. bach from KernScores, represented as salami slices.
  • bach_chorales_1b: same as bach_chorales_1, but converted to chord sequences using the algorithm of Pardo & Birmingham (2002).
  • classical_1: 1,022 classical pieces compiled from KernScores, converted to chord sequences using the algorithm of Pardo & Birmingham (2002).
  • classical_1b: same as classical_1, but represented as salami slices
  • popular_1: 739 pieces from the McGill Billboard corpus (Burgoyne, 2011), converted from chord symbols to pitch-class sets by Harrison & Pearce (2018).
  • jazz_1: 1,186 pieces from the iRB corpus (Broze & Shanahan, 2013), converted from chord symbols to pitch-class sets by Harrison & Pearce (2018).

For more details, see the package’s documentation (e.g. ?classical_1).

Installation

You can install the current version of hcorp from Github by entering the following commands into R:

if (!require(devtools)) install.packages("devtools")
devtools::install_github("hcorp")

Example usage

The hcorp package is best used in tandem with the hrep package. The hrep package provides the underlying representations for the corpora in hcorp, as well as methods for manipulating and visualising them.

You can load these packages into the global namespace as follows:

library(hcorp)
library(hrep)
library(magrittr) # Provides the pipe operator, %>%

The hrep package currently contains three corpora:

classical_1
#> 
#> A corpus of 1022 sequences 
#>   total size = 199254 symbols 
#>   symbol type = 'pc_chord'
#>   coded = true 
#>  (Metadata available)

popular_1
#> 
#> A corpus of 739 sequences 
#>   total size = 74093 symbols 
#>   symbol type = 'pc_chord'
#>   coded = true 
#>  (Metadata available)

jazz_1
#> 
#> A corpus of 1186 sequences 
#>   total size = 42822 symbols 
#>   symbol type = 'pc_chord'
#>   coded = true 
#>  (Metadata available)

Internally, a corpus is a list of encoded vectors.

classical_1[1:3] %>% as.list
#> [[1]]
#> Encoded vector of type 'pc_chord', length = 53 (metadata available)
#> 
#> [[2]]
#> Encoded vector of type 'pc_chord', length = 47 (metadata available)
#> 
#> [[3]]
#> Encoded vector of type 'pc_chord', length = 39 (metadata available)

Encoded vectors are objects of class coded_vec.

classical_1[[1]] %>% class
#> [1] "coded_vec_pc_chord" "coded_vec"          "integer"

Internally, encoded vectors are sequences of integers. This is good for memory efficiency, and useful for certain modelling approaches.

classical_1[[1]] %>% as.integer %>% head
#> [1] 14481  8473 12553 14481  4245  8465

These vectors can be decoded with the function decode.

classical_1[[1]][1:3] %>% decode
#> Vector of type 'pc_chord', length = 3 (metadata available)

classical_1[[1]][1:3] %>% decode %>% as.list
#> [[1]]
#> Pitch-class chord: [7] 2 11
#> 
#> [[2]]
#> Pitch-class chord: [4] 0 7 11
#> 
#> [[3]]
#> Pitch-class chord: [6] 2 9

Corpora and sequences can optionally store metadata.

metadata(classical_1)
#> $description
#> [1] "A selection of common-practice Western tonal music"

metadata(classical_1[[1]])
#> $description
#> [1] "bach-chor001"
#> 
#> $keysig
#> [1] 1
#> 
#> $mode
#> [1] 0

Corpora and sequences can be subsetted and combined like lists.

classical_1[1:3]
#> 
#> A corpus of 3 sequences 
#>   total size = 139 symbols 
#>   symbol type = 'pc_chord'
#>   coded = true 
#>  (Metadata available)

classical_1[[1]]
#> Encoded vector of type 'pc_chord', length = 53 (metadata available)

classical_1[[1]][1:3]
#> Encoded vector of type 'pc_chord', length = 3 (metadata available)

c(classical_1[1:3],
  popular_1[1:3])
#> 
#> A corpus of 6 sequences 
#>   total size = 313 symbols 
#>   symbol type = 'pc_chord'
#>   coded = true

Pardo & Birmingham templates

Several of these corpora were converted into chord sequences using Pardo & Birmingham’s (2002) algorithm with an extended template dictionary. This extended dictionary is provided here:

Pitch classes Name Weight
0 4 7 maj 0.436
0 4 7 10 dom7 0.219
0 3 7 min 0.194
0 3 6 9 dim7 0.044
0 3 6 10 hdim7 0.037
0 3 6 dim 0.018
0 4 7 11 maj7 0.2
0 3 7 10 min7 0.2
0 4 8 aug 0.02
0 7 no3 0.05
0 7 10 min7no3 0.05

Note: only the first 6 (maj to dim) are present in Pardo & Birmingham’s original paper, the rest were added for this work.

References

Broze, Y., & Shanahan, D. (2013). Diachronic changes in jazz harmony: A cognitive perspective. Music Perception, 31(1), 32–45. https://doi.org/10.1525/rep.2008.104.1.92

Harrison, P. M. C., & Pearce, M. T. (2018). An energy-based generative sequence model for testing sensory theories of Western harmony. In Proceedings of the 19th International Society for Music Information Retrieval Conference (pp. 160–167). Paris, France.

Pardo, B., & Birmingham, W. P. (2002). Algorithms for chordal analysis. Computer Music Journal, 26(2), 27–49. https://doi.org/10.1162/014892602760137167

About

hcorp (R package): Music corpora for harmonic analysis

Resources

Stars

Watchers

Forks

Packages

No packages published