# Analysing Deltron 3030's *Deltron 3030*

<tr>
    <td> <img src="https://logjampresents.com/wp-content/uploads/2017/06/Social-2-Deltron-3030.170907.jpg" width="600" height ="350"/> </td>

</tr>


 


Del tha Funkee Homosapien often regarded as "King of Oddball Rappers" along with Automator, the Kool Keith and Kid Koala debuted as the supergroup Deltron 3030 with their [self-titled album](https://open.spotify.com/album/04uhhcjGVCHodMgZjXOlye?si=iiTGJm_fRwa8EM03rfVtLw). *Deltron 3030* is a rap opera concept album set in the year 3030 where corporate oligarchs suppress human rights and hip-hop. It stars a super hero Deltron-Z (or Deltron Zero) who traverses the galaxy "supporting his secretive Earthling existence by participating in weird rap battles where one's rhymes summon psychic powers that physically damage the opponent" ([pitchfork](https://https://pitchfork.com/reviews/albums/2272-deltron-3030/)). 

In this assignment, we will explore the lyrics of this space opera.


Alternatively, if you would like to analyse a collection of texts of your own choosing (another album, set of news articles, film reviews, etc) you are most welcome to! But please modify the sub-questions as per your data, if necessary.

##Data

The data can be found under the folder "<span style:="color:darkblue">deltron</span>". 

This folder is composed of files corresponding to the various songs and their lyrics in Deltron 3030's self titled album.

We suggest having a look at this folder before proceeding.
(and giving the album a listen perhaps during analysis :)

FYI: This data was acquired from [Genius](https://genius.com) (Genius Lyrics) using their API.

## Exercises


We would like to look through all the lyrics of the album *Deltron 3030* and be able to understand their characteristics. 

Write the outputs of your findings to the file "results.txt", where specified.

Make your results as pretty as possible, and feel free to use tabs and enumeration.

1. Write a function that reads through the files in a directory. 

    *   Read through the deltron directory and all its contents.




In [None]:
import os
import codecs

"""Reading files from folder"""
def ReadFile():
  allText = []
  for filename in os.listdir('deltron'):
    with codecs.open(os.path.join("deltron", filename),'r',"utf_8_sig") as f:
      for lyrics in f.read().split('\n'):
        allText.append(lyrics)  
  return('\n'.join(allText).split('\n')) 

ReadFile()

2. Perform text normalization on the lyrics. Text normalisation should include removal of

    *   neccessary punctuations
    *   lowercasing the lyrics
    *   try removing [function words](https://https://en.wikipedia.org/wiki/Function_word)

    *   remove meta-information (remember assignment 2)

In [None]:
def remove_punctuations(text):
    """Remove punctuations"""
    remove_punctuations = [line.replace(',', '').replace('"', '').replace('.', '').replace("'", '') for line in text]
    return remove_punctuations

def convert_to_lowercase(text):
    """Convert to lowercase"""
    lower_casing_text = [line.lower() for line in text]
    return lower_casing_text

def remove_function_words(text):
    """Remove function words"""
    function_words = ['the', 'a', 'an', 'and', 'but', 'for', 'so', 'since', 'as', 'in', 'to', 'be', 'or', 'at', 'of', 'is', 'you', 'with', 'your', 'my', 'its', 'it']
    remove_function_words = [line for line in text if all(word not in line.split() for word in function_words)]
    return remove_function_words

def remove_meta_information(text):
    """Remove parentheses"""
    remove_meta_information = [char for line in text for split in str(line).split('[') for char in str(split).split(']')]
    return remove_meta_information

def remove_braces(text):
    """Remove braces"""
    remove_braces = [char for line in text for split in str(line).split('{') for char in str(split).split('}')]
    return remove_braces

def Normalize_lyrics(text):
    text = remove_punctuations(text)
    text = convert_to_lowercase(text)
    text = remove_function_words(text)
    text = remove_meta_information(text)
    text = remove_braces(text)
    return text

text = ReadFile()
normalized_lyrics = Normalize_lyrics(text)
print('\n'.join(normalized_lyrics))

3. Write a function that returns some statistics about the album Deltron 3030's song lyrics:

    * Most frequent words
    * Type to token ratio (unique words/words)
    * Longest and shortests songs (by lyrics)
    * What are the songs with the largest vocabulary and shortest vocabulary?
    
   Write these statistics for **every song** and for the **entire album** to a file "results.txt". 
   
    * Write down your interpretation of these results in this notebook.  
    * This is a particularly verbose album. Do you agree? Back it with some evidence.
    * Which songs are the most verbose and is there a pattern? 

In [None]:
file = open('results.txt', 'w')

def write_to_file(text):
    print(text)
    file.write(text + '\n')

def write_section_title(title):
    write_to_file(title)
    write_to_file('=' * len(title))

def most_frequent_words(text):
    from collections import Counter
    split_it = text.split()
    counter = Counter(split_it)
    most_occur = counter.most_common(200)
    write_section_title("Most frequent words ('Word', Number of repetitions)")
    write_to_file(str(most_occur))

def unique_words(text):
    from collections import Counter
    split_it = text.split()
    counter = Counter(split_it)
    get_all_words = counter.most_common(2000)
    unique_words = list(reversed(get_all_words))[:len(get_all_words)-1800]
    write_section_title("Unique words ('Word', Number of repetitions)")
    write_to_file(str(unique_words))

def shortest_song():
    lenth_of_songs = []
    shortest_song = float('inf')
    shortest_song_name = ""
    for filename in os.listdir('deltron'):
        with codecs.open(os.path.join("deltron", filename), 'r', "utf_8_sig") as f:
            song_length = len(f.read())
            lenth_of_songs.append((filename, song_length))
            if song_length < shortest_song:
                shortest_song = song_length
                shortest_song_name = filename

    write_section_title("Shortest song")
    write_to_file(f"{shortest_song} {shortest_song_name}")

def longest_song():
    lenth_of_songs = []
    longest_song = 0
    longest_song_name = ""
    for filename in os.listdir('deltron'):
        with codecs.open(os.path.join("deltron", filename), 'r', "utf_8_sig") as f:
            song_length = len(f.read())
            lenth_of_songs.append((filename, song_length))
            if song_length > longest_song:
                longest_song = song_length
                longest_song_name = filename

    write_section_title("Longest song")
    write_to_file(f"{longest_song} {longest_song_name}")

def song_with_largest_vocabulary():
    lenth_of_voc = []
    longest_voc = 0
    longest_song_name = ""
    for filename in os.listdir('deltron'):
        song_text = []
        with codecs.open(os.path.join("deltron", filename), 'r', "utf_8_sig") as f:
            for lyrics in f.read().split(' '):
                song_text.append(lyrics)
        voc_length = len(song_text)
        lenth_of_voc.append((filename, voc_length))
        if voc_length > longest_voc:
            longest_voc = voc_length
            longest_song_name = filename

    write_section_title("Song with largest vocabulary")
    write_to_file(f"{longest_voc} {longest_song_name}")

def song_with_shortest_vocabulary():
    lenth_of_voc = []
    shortest_voc = float('inf')
    shortest_song_name = ""
    for filename in os.listdir('deltron'):
        song_text = []
        with codecs.open(os.path.join("deltron", filename), 'r', "utf_8_sig") as f:
            for lyrics in f.read().split(' '):
                song_text.append(lyrics)
        voc_length = len(song_text)
        lenth_of_voc.append((filename, voc_length))
        if voc_length < shortest_voc:
            shortest_voc = voc_length
            shortest_song_name = filename

    write_section_title("Song with shortest vocabulary")
    write_to_file(f"{shortest_voc} {shortest_song_name}")

def most_frequent_words_for_one_song(text):
    from collections import Counter
    split_it = text.split()
    counter = Counter(split_it)
    most_occur = counter.most_common(5)
    write_section_title("Most frequent words for one song ('Word', Number of repetitions)")
    write_to_file(str(most_occur))

def unique_words_for_one_song(text):
    from collections import Counter
    split_it = text.split()
    counter = Counter(split_it)
    get_all_words = counter.most_common(5)
    unique_words = list(reversed(get_all_words))
    write_section_title("Unique words for one song ('Word', Number of repetitions)")
    write_to_file(str(unique_words))

def song_length(filename):
    with codecs.open(os.path.join("deltron", filename), 'r', "utf_8_sig") as f:
        song_length = len(f.read())
        write_section_title("Song length")
        write_to_file(f"{song_length} {filename}")

def song_vocabulary(filename):
    song_text = []
    with codecs.open(os.path.join("deltron", filename), 'r', "utf_8_sig") as f:
        for lyrics in f.read().split(' '):
            song_text.append(lyrics)
    voc_length = len(song_text)
    write_section_title("Song vocabulary")
    write_to_file(f"{voc_length} {filename}")

def read_one_file(filename):
    all_text = []
    with codecs.open(os.path.join("deltron", filename), 'r', "utf_8_sig") as f:
        for lyrics in f.read().split('\n'):
            all_text.append(lyrics)
    return '\n'.join(all_text)

def statistics_for_every_song():
    write_section_title("Statistics for every song")
    for filename in os.listdir('deltron'):
        write_to_file("\n")
        write_to_file("#####" + filename + "#####")
        write_to_file("\n")
        text = read_one_file(filename)
        most_frequent_words_for_one_song(text)
        unique_words_for_one_song(text)
        song_length(filename)
        song_vocabulary(filename)

write_to_file("Statistics for the entire album")
text = ReadFile()
most_frequent_words(text)
unique_words(text)
shortest_song()
longest_song()
song_with_shortest_vocabulary()
song_with_largest_vocabulary()
statistics_for_every_song()
file.close()

4. For all the lyrics in the album, construct a word cloud.

    *   You just need to reassign the variable 'frequencies' in the code below to a dictionary of word frequencies (or Counter) in Deltron 3030/your selected corpus. Try to use an output from one of the functions above.

In [None]:
import matplotlib.pyplot as plt
from wordcloud import WordCloud
from collections import Counter

text = ReadFile()
split_it = str(text).split()
Counter = Counter(split_it)
Most_occur = Counter.most_common(200)

wordcloud = WordCloud()
wordcloud.generate_from_frequencies(Most_occur)
plt.figure()
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()