## Working notebook for FoNN similarity search tools
# Curently testing similarity experiment 3, exploring variations on the 'motif' similarity method for n=6, 8 and 10.

## Testing on single query tune

In [1]:
# imports

import os
# os.chdir('../')
# print(os.getcwd())
# print(os.listdir('../'))
from FoNN.similarity_search import PatternSimilarityDev




Initialize PatternSimilarityDev class object to conduct similarity searches for n=6 and Hamming distance

In [2]:
# set input corpus path
test_corpus_path = '/Users/dannydiamond/NUIG/Polifonia/thesession/exp3_input_data_annotated_subset/motif'

# set up PatternSimilarity class instance:

# Args:
# corpus_path -- set to root dir of input corpus
# level -- sets level of input data granularity ('duration_weighted', 'note', or 'accent')
# n -- sets length of representative search term patterns(s) extracted from query tune in 'motif' similarity method. Can be an integer value between 3 and 12.
# query_tune -- Name of query tune for similarity search. Must be selected from the filenames from the original corpus, in this case '''../mtc_ann_corpus/krn''' dir.
# feature -- the musical feature for which pattern data has been extracted. For a list of the 16 features extracted by FoNN's ingest pipeline, see NgramPatternCorpus.FEATURES or ./README.md.

similarity_search = PatternSimilarityDev(
    corpus_path=test_corpus_path,
    level='accent',
    n=6,
    query_tune='LordMcDonalds507',
    feature='diatonic_scale_degree'
)
# global settings
similarity_search.include_query_tune_in_results = False

In [3]:
similarity_search.run_similarity_search(method='tfidf')

Query tune: LordMcDonalds507.
Similarity search method: TFIDF
                    Cosine similarity
LordMcDonalds507             1.000000
LordMcDonalds13430           0.750977
LordMcDonalds13432           0.713867
LordMcDonalds23658           0.629395
LordMcDonalds30618           0.629395


Run accent-level similarity search using 'motif' method in 'edit distance' mode, using custom-weighted Hamming distance as metric.
Distance threshold is 0.5, allowing a single musically-consonant element to be different between the patterns under comparison.
Query is tune excluded from results and no results normalization applied.
Target 6-element representative patterns.

In [4]:
similarity_search.motif_edit_distance_filter_range = (0, 0.5)
similarity_search.run_similarity_search(method='motif', motif_mode='edit_distance', motif_norm=False, edit_dist_metric='custom_weighted_hamming')

Query tune: LordMcDonalds507.
Search method: motif
Edit distance metric: custom_weighted_hamming
                title  count
0  LordMcDonalds13432    4.0
1  LordMcDonalds13430    4.0
2  LordMcDonalds30618    4.0
3  LordMcDonalds23658    4.0
4  LordMcDonalds13431    2.0


Initialize PatternSimilarityDev class object to conduct similarity searches for n=6 and Levenshtein distance

Run accent-level similarity search using 'motif' method in 'edit distance' mode, using Hamming distance as metric.
Distance threshold is 1, allowing a single element to be different between the patterns under comparison.
Query is tune excluded from results and no results normalization applied.
Target 6-element representative patterns.

In [5]:
similarity_search.motif_edit_distance_filter_range = (0, 1)
similarity_search.run_similarity_search(method='motif', motif_mode='edit_distance', motif_norm=False, edit_dist_metric='hamming')

Query tune: LordMcDonalds507.
Search method: motif
Edit distance metric: hamming
                title  count
0  LordMcDonalds13432    9.0
1  LordMcDonalds13430    9.0
2  LordMcDonalds30618    9.0
3  LordMcDonalds23658    9.0
4  LordMcDonalds13431    5.0


In [6]:
# TODO: WH and H composite mode

In [7]:
# set input corpus path
test_corpus_path = '/Users/dannydiamond/NUIG/Polifonia/thesession/exp3_input_data_annotated_subset/motif'

# set up PatternSimilarity class instance:

# Args:
# corpus_path -- set to root dir of input corpus
# level -- sets level of input data granularity ('duration_weighted', 'note', or 'accent')
# n -- sets length of representative search term patterns(s) extracted from query tune in 'motif' similarity method. Can be an integer value between 3 and 12.
# query_tune -- Name of query tune for similarity search. Must be selected from the filenames from the original corpus, in this case '''../mtc_ann_corpus/krn''' dir.
# feature -- the musical feature for which pattern data has been extracted. For a list of the 16 features extracted by FoNN's ingest pipeline, see NgramPatternCorpus.FEATURES or ./README.md.

similarity_search2 = PatternSimilarityDev(
    corpus_path=test_corpus_path,
    level='accent',
    n=6,
    query_tune='LordMcDonalds507',
    feature='diatonic_scale_degree'
)


Global settings for this test run: exclude query tune from results

In [8]:
similarity_search2.include_query_tune_in_results = False

Run accent-level similarity search using 'motif' method in 'edit distance' mode, using weighted Levenshtein distance as metric.
Distance threshold is 1, allowing a single element to be different between the patterns under comparison.
Query is tune excluded from results and no results normalization applied.
Target 6-element representative patterns.

In [9]:
similarity_search2.motif_edit_distance_filter_range = (0, .5)
similarity_search2.run_similarity_search(method='motif', motif_mode='edit_distance', motif_norm=False, edit_dist_metric='custom_weighted_levenshtein')

Query tune: LordMcDonalds507.
Search method: motif
Edit distance metric: custom_weighted_levenshtein
                title  count
0  LordMcDonalds30618   21.0
1  LordMcDonalds23658   21.0
2  LordMcDonalds13432   20.0
3  LordMcDonalds13430   20.0
4  LordMcDonalds13431   14.0


Run accent-level similarity search using 'motif' method in 'edit distance' mode, using Levenshtein distance as metric.
Levenshtein threshold is 1, allowing a single element to be different between the patterns under comparison.
Query is tune excluded from results and no results normalization applied.
Target 6-element representative patterns.

In [10]:
similarity_search2.motif_edit_distance_filter_range = (0, 1)
similarity_search2.run_similarity_search(method='motif', motif_mode='edit_distance', motif_norm=False, edit_dist_metric='levenshtein')

Query tune: LordMcDonalds507.
Search method: motif
Edit distance metric: levenshtein
                title  count
0  LordMcDonalds30618   27.0
1  LordMcDonalds13432   27.0
2  LordMcDonalds13430   27.0
3  LordMcDonalds23658   27.0
4  LordMcDonalds13431   15.0


Run accent-level similarity search using 'motif' method in 'exact' mode with query tune excluded from results and no results normalization.


In [11]:
similarity_search2.run_similarity_search(method='motif', motif_mode='exact', motif_norm=False)
# TODO: Results formatting: 'levenshtein' -> 'exact'

Query tune: LordMcDonalds507.
Search method: motif
Edit distance metric: levenshtein
                title  count
0  LordMcDonalds13432    4.0
1  LordMcDonalds13430    4.0
2  LordMcDonalds30618    4.0
3  LordMcDonalds23658    4.0
4  LordMcDonalds13431    2.0


Run accent-level similarity search using 'motif' method in 'composite' mode with query tune excluded from results and no results normalization.
Edit distance is Levenshtein filter range is adjusted to avoid duplication of exact matches.
Weighting factor of 1.5 applied to count, boosting importance of exact matches

In [12]:
similarity_search2.motif_count_weighting_factor = 1.5
similarity_search2.motif_edit_distance_filter_range = (0.1, 1)
similarity_search2.run_similarity_search(method='motif', motif_mode='composite', motif_norm=False, metric='levenshtein')

Query tune: LordMcDonalds507.
Search method: motif
Edit distance metric: levenshtein
                title  count
0  LordMcDonalds13432   29.0
1  LordMcDonalds23658   29.0
2  LordMcDonalds13430   29.0
3  LordMcDonalds30618   29.0
4  LordMcDonalds13431   16.0




In [13]:
# TODO: WL composite mode

Run similarity search using 'incipit and cadence' method, with default Levenshtein distance metric.

In [14]:
# similarity_search.run_similarity_search(mode='incipit_and_cadence', edit_dist_metric='levenshtein')
#
# # alternate edit distance metrics can be selected as follows:
# # Hamming distance
# similarity_search.run_similarity_search(mode='incipit_and_cadence', edit_dist_metric='hamming')
# # custom-weighted Hamming distance
# similarity_search.run_similarity_search(mode='incipit_and_cadence', edit_dist_metric='custom_weighted_hamming')