## Demo notebook for FoNN pattern extraction (accessed via FoNN.pattern_extraction)

In [1]:
# imports
from FoNN.pattern_extraction import NgramPatternCorpus

Step 1: Setup NgramPatternCorpus class instance to extract and store patterns from music corpus in feature sequence format.

In [2]:
# define paths
inpath = '../mtc_ann_corpus/feature_sequence_data/feat_seq_duration_weighted'
outpath = '../mtc_ann_corpus/pattern_corpus/duration_weighted'
# setup Setup NgramPatternCorpus instance to represent MTC-ANN corpus
# paths are defined above; feature is the target musical feature for which patterns will be extracted.
# 15 features are available; names of all feature are accessible by reading NgramPatternCorpus.FEATURES.
# In this example we are investigating diatonic scale degree patterns extracted from the feature sequence data
# outputted by feature_extraction_demo.ipynb notebook.
mtc_ann_pattern_corpus = NgramPatternCorpus(in_dir=inpath, out_dir=outpath, feature='diatonic_scale_degree')

Reading input data: 100%|██████████| 360/360 [00:00<00:00, 2056.43it/s]
Formatting data: 100%|██████████| 360/360 [00:00<00:00, 483338.49it/s]

Process completed.





Extract all tune titles from corpus, store as NgramPatternCorpus.titles, and write to file

In [3]:
# save tune titles to 'titles' attr and write to file
mtc_ann_pattern_corpus.save_tune_titles_to_file()

One-step call to perform two related tasks:
1. Extract all n-gram patterns between 3-12 elements in length which occur at least once in the corpus. Store as NgramPatternCorpus.patterns.
2. Count occurrences of these patterns in all tunes in corpus and store in a sparse matrix as NgramPatternCorpus.pattern_freq_matrix

In [4]:
# extract all n-gram patterns which; populate pattern freq corpus, save both to file
mtc_ann_pattern_corpus.create_pattern_frequency_matrix(write_output=True)

Print class info via custom __repr__

In [5]:
# print mtc_ann_pattern_corpus info
print(mtc_ann_pattern_corpus)


Corpus name: mtc_ann_corpus
Level: note-level (duration-weighted)
Input directory: ../mtc_ann_corpus/feature_sequence_data/feat_seq_duration_weighted
Corpus contains 360 tunes.
Number of patterns extracted: 5400



Transform pattern occurrence counts in NgramPatternCorpus.pattern_freq_matrix into TFIDF values. Store the output matrix as NgramPatternCorpus.pattern_freq_matrix and write to file.

In [6]:
mtc_ann_pattern_corpus.calculate_tfidf_vals(write_output=True)

Final step: precalculate the "TFIDF" similarity metric: create a pairwise Cosine similarity matrix between the pattern tfidf vectors of all tunes in the corpus, and write output to file.

In [7]:
mtc_ann_pattern_corpus.calculate_tfidf_vector_cos_similarity()

NOTE: All files outputted are tmp / private files and are stored in automatically-generated ./pattern_corpus dir under corpus root dir.