# Introduction

* SongSim is method found by **Collin Morris** for visualizing the songs' repetitive parts and intends to find patterns in them. 
* Github link for SongSim: https://github.com/colinmorris/SongSim
* If you do not know JavaScript and CSS you check out my Python version for it:
* Github link for Python version: https://github.com/bayhippo/SongSim-in-Python

## How It Works

* SongSim uses self-similarity matrices to visualize patterns of repetition in text. The cell at position (x, y) is filled in if the xth and yth words of the song are the same.

<img src = "https://colinmorris.github.io/SongSim/img/about/barbie.png" width = "450" height = "450" /> 

* You can find more information about the patterns here: https://colinmorris.github.io/SongSim/#/about/advanced

# Imports

In [None]:
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import sys
from wordcloud import WordCloud

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# Preprocessing

In [None]:
data = pd.read_csv("/kaggle/input/pink-floyd-lyrics/pink_floyd_lyrics.csv")

In [None]:
data.head()

In [None]:
data = data.dropna()

In [None]:
data.info()

In [None]:
data.album.unique()

In [None]:
darkside = data[data["album"] == "The Dark Side of the Moon"]
wish = data[data["album"] == "Wish You Were Here"]
wall = data[data["album"] == "The Wall"]
animals = data[data["album"] == "Animals"]
meddle = data[data["album"] == "Meddle"]

In [None]:
darkside = darkside.reset_index()
wish = wish.reset_index()
wall = wall.reset_index()
animals = animals.reset_index()
meddle = meddle.reset_index()

# Functions

In [None]:
def SongSim(lyr, show_table = False, fig_size = (8, 8), ticks = False, title = "SongSim Table"):
    
    """
    SongSim is a method that creates a similarity matrix for n length text.
    I inspired by Collin Morris and when I looked his github page for the source code
    for this but I did not know that it written in css and javascript 
    so I created a python version for SongSim.
    Github link of actual SongSim: https://github.com/colinmorris/SongSim
    """
    
    if type(lyr) is str:
        
        punctuations = '''!()-[]{};:'"\,<>./?@#$%^&*_~'''
        lyrics = ""
        for char in lyr:
            if char not in punctuations:
                lyrics = lyrics + char
                
        lyrics = lyrics.lower()
        lyrics = lyrics.split()
        
    else:
        
        lyrics = lyr.copy()
        
    raw_corrs = []
    for current_word in lyrics:
        for word in lyrics:
            if current_word == word:
                raw_corrs.append(1)

            else:
                raw_corrs.append(0)

    corrs = []
    for length, _ in enumerate(lyrics, start = 1):
        length *= len(lyrics)
        corrs.append(raw_corrs[(length - len(lyrics)):length])

    corrs = np.array(corrs)

    uniq, count = np.unique(lyrics, return_counts = True)
    freq_names = {}
    for name, freq in zip(uniq, count):
        freq_names[name] = freq

    freq_names2 = freq_names.copy()

    corrs_dict = {}
    for indx, c_name in enumerate(lyrics):
        if c_name in corrs_dict:
            freq_names2[c_name] -= 1 
            label = freq_names[c_name] - freq_names2[c_name]
            corrs_dict[c_name + str(label)] = corrs[indx]

        else:
            corrs_dict[c_name] = corrs[indx]

    corrs_df = pd.DataFrame(data = corrs_dict)

    songsim = corrs_df.corr()
    for colmn in songsim.columns:
        for indx, corr_val in enumerate(songsim[colmn]):
            if corr_val != 1:
                songsim[colmn][indx] = 0

            else:
                continue
                
    if show_table == True:
        f,ax = plt.subplots(figsize = fig_size)
        res = sns.heatmap(songsim,linecolor = "none", xticklabels = ticks, yticklabels = ticks, ax=ax,cmap = "Greys", cbar = False)
        plt.title(title)

        for _, spine in res.spines.items(): 
            spine.set_visible(True) 
            spine.set_linewidth(2) 

        plt.show()
    
    return songsim

In [None]:
def clear_lyrics(lyr):
    
    if type(lyr) is str:
        
        punctuations = '''!()-[]{};:'"\,<>./?@#$%^&*_~'''
        lyrics = ""
        for char in lyr:
            if char not in punctuations:
                lyrics = lyrics + char
                
        lyrics = lyrics.lower()
        clean = lyrics.split()
    
    return clean

In [None]:
def plot_gallery(data):

    fig, ax = plt.subplots(int(len(data["lyrics"])/2),2, figsize = (12,(len(data["lyrics"])-1)*3))

    count = 0
    for r in range(0,int(len(data["lyrics"])/2)):
        for c in range(0,2):
            
            table= SongSim(data["lyrics"][count])
            res = sns.heatmap(table, xticklabels = False, yticklabels = False, ax=ax[r,c],cmap = "Greys", cbar = False)
            ax[r,c].set_title(data["song_title"][count])

            for _, spine in res.spines.items(): 
                spine.set_visible(True) 
                spine.set_linewidth(2) 

            count += 1

    plt.show()
    
    if len(data["lyrics"])%2 == 1:
        fig, ax = plt.subplots(figsize = (5.5,5.5))
        
        table = SongSim(data["lyrics"][count])
        
        res = sns.heatmap(table, xticklabels = False, yticklabels = False, ax=ax,cmap = "Greys", cbar = False)
        ax.set_title(data["song_title"][count])

        for _, spine in res.spines.items(): 
            spine.set_visible(True) 
            spine.set_linewidth(2) 
            
        plt.show()

# Gallery

## The Dark Side of the Moon

<img src = "https://i.imgur.com/qY6hW7n.jpg" width = "400" height = "400" />

In [None]:
plot_gallery(darkside)

### The Interesting one: Eclipse

* Eclipse is an interesting one because it has repeating diagonlas. This is a very common pattern in pop music, normally they are long diagonals but in Eclpise they are very short and they represent the "(and) all that you..." part.

In [None]:
cleared = clear_lyrics(darkside["lyrics"][8])
song_as_txt = ' '.join([text for text in cleared])

fig = plt.figure(figsize = (12,12))
fig.suptitle('The Interesting one: Eclipse',fontsize = 15)

wordcloud = WordCloud(width=800, height=300, random_state=42, max_font_size=100, relative_scaling=0.5, background_color='white').generate(song_as_txt)

ax1 = plt.subplot(212)
ax1.imshow(wordcloud)
ax1.axis('off')
ax1.set_title("WordCloud")


ax2 = plt.subplot(221)
res = sns.heatmap(SongSim(darkside["lyrics"][8]),linecolor = "none", xticklabels = False, yticklabels = False, ax=ax2,cmap = "Greys", cbar = False)
ax2.set_title("Eclipse")

for _, spine in res.spines.items(): 
    spine.set_visible(True) 
    spine.set_linewidth(2) 

    
ax3 = plt.subplot(222)
res = sns.heatmap(SongSim(cleared[0:20]),linecolor = "none", xticklabels = True, yticklabels = True, ax=ax3,cmap = "Greys", cbar = False)
ax3.set_title("Repeating Diagonals")

for _, spine in res.spines.items(): 
    spine.set_visible(True) 
    spine.set_linewidth(2) 

plt.show()

## Wish You Were Here

<img src = "https://i.imgur.com/akjvRCQ.jpg" width = "400" height = "400" />

In [None]:
plot_gallery(wish)

### The Interesting one: Shine on You Crazy Diamond

* Just like the Eclipse this song has diagonals as well but they are standalone verses this time. The verse is: "Shine on you crazy diamond".

In [None]:
cleared = clear_lyrics(wish["lyrics"][0])
song_as_txt = ' '.join([text for text in cleared])

fig = plt.figure(figsize = (12,12))
fig.suptitle('The Interesting one: Shine on You Crazy Diamond',fontsize = 15)

wordcloud = WordCloud(width=800, height=300, random_state=42, max_font_size=100, relative_scaling=0.5, background_color='white').generate(song_as_txt)

ax1 = plt.subplot(212)
ax1.imshow(wordcloud)
ax1.axis('off')
ax1.set_title("WordCloud")


ax2 = plt.subplot(221)
res = sns.heatmap(SongSim(wish["lyrics"][0]),linecolor = "none", xticklabels = False, yticklabels = False, ax=ax2,cmap = "Greys", cbar = False)
ax2.set_title("Shine on You Crazy Diamond")

for _, spine in res.spines.items(): 
    spine.set_visible(True) 
    spine.set_linewidth(2) 

    
ax3 = plt.subplot(222)
res = sns.heatmap(SongSim(cleared[10:15]),linecolor = "none", xticklabels = True, yticklabels = True, ax=ax3,cmap = "Greys", cbar = False)
ax3.set_title("Repeating Diagonal")

for _, spine in res.spines.items(): 
    spine.set_visible(True) 
    spine.set_linewidth(2) 

plt.show()

## The Wall

<img src = "https://i.imgur.com/c6nKWBt.jpg" width = "400" height = "400" />

In [None]:
plot_gallery(wall)

### The Interesting one: Stop

* This song uses another common pattern: Checkerboards. Checkerboards represents repeating verses in a song. The length of spacing betwen blocks shows the length of the verse ([length of the verse]-1). In Stop the verse is: "have to know". 

In [None]:
cleared = clear_lyrics(wall["lyrics"][23])
song_as_txt = ' '.join([text for text in cleared])

fig = plt.figure(figsize = (12,12))
fig.suptitle('The Interesting one: Stop',fontsize = 15)

wordcloud = WordCloud(width=800, height=300, random_state=42, max_font_size=100, relative_scaling=0.5, background_color='white').generate(song_as_txt)

ax1 = plt.subplot(212)
ax1.imshow(wordcloud)
ax1.axis('off')
ax1.set_title("WordCloud")


ax2 = plt.subplot(221)
res = sns.heatmap(SongSim(wall["lyrics"][23]),linecolor = "none", xticklabels = False, yticklabels = False, ax=ax2,cmap = "Greys", cbar = False)
ax2.set_title("Stop")

for _, spine in res.spines.items(): 
    spine.set_visible(True) 
    spine.set_linewidth(2) 

    
ax3 = plt.subplot(222)
res = sns.heatmap(SongSim(cleared[36:48]),linecolor = "none", xticklabels = True, yticklabels = True, ax=ax3,cmap = "Greys", cbar = False)
ax3.set_title("Repeating Checkerboard")

for _, spine in res.spines.items(): 
    spine.set_visible(True) 
    spine.set_linewidth(2) 

plt.show()

## Animals

<img src = "https://i.imgur.com/kxnsv9x.jpg" width = "400" height = "400" />

In [None]:
plot_gallery(animals)

### The Interesting one: Pigs (Three Different Ones)

* This one uses short diagonals as well. Verse: "Haha, charade you are".

In [None]:
cleared = clear_lyrics(animals["lyrics"][2])
song_as_txt = ' '.join([text for text in cleared])

fig = plt.figure(figsize = (12,12))
fig.suptitle('The Interesting one: Pigs (Three Different Ones)',fontsize = 15)

wordcloud = WordCloud(width=800, height=300, random_state=42, max_font_size=100, relative_scaling=0.5, background_color='white').generate(song_as_txt)

ax1 = plt.subplot(212)
ax1.imshow(wordcloud)
ax1.axis('off')
ax1.set_title("WordCloud")


ax2 = plt.subplot(221)
res = sns.heatmap(SongSim(animals["lyrics"][2]),linecolor = "none", xticklabels = False, yticklabels = False, ax=ax2,cmap = "Greys", cbar = False)
ax2.set_title("Pigs (Three Different Ones)")

for _, spine in res.spines.items(): 
    spine.set_visible(True) 
    spine.set_linewidth(2) 

    
ax3 = plt.subplot(222)
res = sns.heatmap(SongSim(cleared[6:10]),linecolor = "none", xticklabels = True, yticklabels = True, ax=ax3,cmap = "Greys", cbar = False)
ax3.set_title("Repeating Diagonal")

for _, spine in res.spines.items(): 
    spine.set_visible(True) 
    spine.set_linewidth(2) 

plt.show()

## Meddle

<img src = "https://i.imgur.com/KpVFnSB.jpg" width = "400" height = "400" />

In [None]:
plot_gallery(meddle)

### The Interesting one: A Pillow of Winds

* This one has long diagonals. Verse: "Sleepy time, and I lie with my love by my side and she's breathing low".

In [None]:
cleared = clear_lyrics(meddle["lyrics"][1])
song_as_txt = ' '.join([text for text in cleared])

fig = plt.figure(figsize = (12,12))
fig.suptitle('The Interesting one: A Pillow of Winds',fontsize = 15)

wordcloud = WordCloud(width=800, height=300, random_state=42, max_font_size=100, relative_scaling=0.5, background_color='white').generate(song_as_txt)

ax1 = plt.subplot(212)
ax1.imshow(wordcloud)
ax1.axis('off')
ax1.set_title("WordCloud")


ax2 = plt.subplot(221)
res = sns.heatmap(SongSim(meddle["lyrics"][1]),linecolor = "none", xticklabels = False, yticklabels = False, ax=ax2,cmap = "Greys", cbar = False)
ax2.set_title("A Pillow of Winds")

for _, spine in res.spines.items(): 
    spine.set_visible(True) 
    spine.set_linewidth(2) 

    
ax3 = plt.subplot(222)
res = sns.heatmap(SongSim(cleared[10:25]),linecolor = "none", xticklabels = True, yticklabels = True, ax=ax3,cmap = "Greys", cbar = False)
ax3.set_title("Repeating Diagonal")

for _, spine in res.spines.items(): 
    spine.set_visible(True) 
    spine.set_linewidth(2) 

plt.show()

# Conclusion
* **We analyzed the songs and created a gallery.**
* **We have seen that Pink Floyd is not that repetitive and does not follow a certain pattern in their songs. That is an expected result because they are a Progressive Rock band.**
* **If there is something wrong with this kernel please let me know in the comments.**

### **My other kernels: https://www.kaggle.com/mrhippo/notebooks**