# The Beatles Lyrics Analysis

The beatles is regarded by many people as the greatest band of all time. Being both publicly and critically acclaimed, they are immortalized with songs that successfully passed the test of time. In this report, we analyse The Beatles' lyrics, aiming to answer the following questions:

* What are the most usual and unusual word used in the titles of the albums?
* What are the most usual and unsual word used in the titles of the songs?
* What are the most usual and unusual word used in the lyrics of the songs?
* Are there any words never used in a Beatles' song?

##  Getting the data

We start our analysis by loading the data with The Beatles lyrics. In order to do so, we are going to be using the azlyrics website.

In [1]:
import requests
from bs4 import BeautifulSoup as bs
import re
import pandas as pd
import string
import os

url="https://www.azlyrics.com/b/beatles.html"
r=requests.get(url)
soup=bs(r.content)
print(soup.prettify())

<!DOCTYPE html>
<html lang="en">
 <head>
  <meta charset="utf-8"/>
  <meta content="IE=edge" http-equiv="X-UA-Compatible"/>
  <meta content="width=device-width, initial-scale=1" name="viewport"/>
  <!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags -->
  <meta content='The Beatles lyrics - 427 song lyrics sorted by album, including "Yesterday", "Ob-La-Di, Ob-La-Da", "Hey Jude".' name="description"/>
  <meta content="The Beatles, The Beatles lyrics, discography, albums, songs" name="keywords"/>
  <meta content="noarchive" name="robots"/>
  <title>
   The Beatles Lyrics
  </title>
  <link href="https://www.azlyrics.com/b/beatles.html" rel="canonical"/>
  <link href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.4/css/bootstrap.min.css" rel="stylesheet"/>
  <link href="/local/az.css" rel="stylesheet"/>
  <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->
  <!--[if lt IE 9]>
<script src="https:/

## The Albums Titles

All the albums titles are inside the b tags that are inside div tags with an "album" class. Let's try to get them.

In [2]:
b_tags = soup.select("div.album b")
b_tags

[<b>"Please Please Me"</b>,
 <b>"With The Beatles"</b>,
 <b>"A Hard Day's Night"</b>,
 <b>"Beatles For Sale"</b>,
 <b>"Help!"</b>,
 <b>"Rubber Soul"</b>,
 <b>"Revolver"</b>,
 <b>"Sgt. Pepper's Lonely Hearts Club Band"</b>,
 <b>"Magical Mystery Tour"</b>,
 <b>"The Beatles (The White Album)"</b>,
 <b>"Yellow Submarine"</b>,
 <b>"Abbey Road"</b>,
 <b>"Let It Be"</b>,
 <b>"Past Masters. Volume One"</b>,
 <b>"Past Masters. Volume Two"</b>,
 <b>"Live At The BBC. Disk 1"</b>,
 <b>"Live At The BBC. Disk 2"</b>,
 <b>"Anthology 1"</b>,
 <b>"Anthology 2"</b>,
 <b>"Anthology 3"</b>,
 <b>other songs:</b>]

We successfully get all the albums. In this analysis, we are going to consider only the studio original albums. Let us get the text from the b tags and remove the undesirable albums.

In [3]:
titles = [i.get_text().replace('"','') for i in b_tags]
titles = titles[:-8]
titles

['Please Please Me',
 'With The Beatles',
 "A Hard Day's Night",
 'Beatles For Sale',
 'Help!',
 'Rubber Soul',
 'Revolver',
 "Sgt. Pepper's Lonely Hearts Club Band",
 'Magical Mystery Tour',
 'The Beatles (The White Album)',
 'Yellow Submarine',
 'Abbey Road',
 'Let It Be']

In order to count the words, we firstly need to remove the special characters.

In [4]:
remove_cases = ["'s", "(", ")", "!"]
treated_titles = []
for title in titles:
    for case in remove_cases:
        title=title.replace(case, "")
    treated_titles.append(title)
treated_titles

['Please Please Me',
 'With The Beatles',
 'A Hard Day Night',
 'Beatles For Sale',
 'Help',
 'Rubber Soul',
 'Revolver',
 'Sgt. Pepper Lonely Hearts Club Band',
 'Magical Mystery Tour',
 'The Beatles The White Album',
 'Yellow Submarine',
 'Abbey Road',
 'Let It Be']

Now let us count the words.

In [5]:
counter = {}
for title in treated_titles:
    for word in title.split():
        if counter.get(word)==None:
            counter[word] = 1
        else:
            counter[word] = counter[word]+1
sorted(counter.items(), key=lambda item: item[1], reverse=True)

[('The', 3),
 ('Beatles', 3),
 ('Please', 2),
 ('Me', 1),
 ('With', 1),
 ('A', 1),
 ('Hard', 1),
 ('Day', 1),
 ('Night', 1),
 ('For', 1),
 ('Sale', 1),
 ('Help', 1),
 ('Rubber', 1),
 ('Soul', 1),
 ('Revolver', 1),
 ('Sgt.', 1),
 ('Pepper', 1),
 ('Lonely', 1),
 ('Hearts', 1),
 ('Club', 1),
 ('Band', 1),
 ('Magical', 1),
 ('Mystery', 1),
 ('Tour', 1),
 ('White', 1),
 ('Album', 1),
 ('Yellow', 1),
 ('Submarine', 1),
 ('Abbey', 1),
 ('Road', 1),
 ('Let', 1),
 ('It', 1),
 ('Be', 1)]

The words "The" and "Beatles" are the most common in the albums titles. The second is the word "Please". All the other words only appear once in the titles. One of the appearances of the word "The" happens because of the white album, which is actually officially called only "The Beatles". Therefore, the word "The" only appears 2 times in titles and it is always followed by the word "Beatles". The word "Please" has its two occurences in the name of the first album. Therefore, we can say that, with the exception of the word "(The) Beatles", all words only appear only once in the titles.

## The Songs Titles

All the songs titles are inside the a tags that are inside div tags with an "listalbum-item" class. Let's try to get them.

In [6]:
a_tags = soup.select('div.listalbum-item a')
a_tags

[<a href="/lyrics/beatles/isawherstandingthere.html" target="_blank">I Saw Her Standing There</a>,
 <a href="/lyrics/beatles/misery.html" target="_blank">Misery</a>,
 <a href="/lyrics/beatles/annagotohim.html" target="_blank">Anna (Go To Him)</a>,
 <a href="/lyrics/beatles/chains.html" target="_blank">Chains</a>,
 <a href="/lyrics/beatles/boys.html" target="_blank">Boys</a>,
 <a href="/lyrics/beatles/askmewhy.html" target="_blank">Ask Me Why</a>,
 <a href="/lyrics/beatles/pleasepleaseme.html" target="_blank">Please Please Me</a>,
 <a href="/lyrics/beatles/lovemedo.html" target="_blank">Love Me Do</a>,
 <a href="/lyrics/beatles/psiloveyou.html" target="_blank">P.S. I Love You</a>,
 <a href="/lyrics/beatles/babyitsyou.html" target="_blank">Baby It's You</a>,
 <a href="/lyrics/beatles/doyouwanttoknowasecret.html" target="_blank">Do You Want To Know A Secret</a>,
 <a href="/lyrics/beatles/atasteofhoney.html" target="_blank">A Taste Of Honey</a>,
 <a href="/lyrics/beatles/theresaplace.html"

Now let us get the songs titles and get rid of the duplicates.

In [7]:
songs_titles = [i.get_text() for i in a_tags]
songs_titles = list(set(songs_titles))
songs_titles

["It's Only Love",
 'The Word',
 'Dear Prudence',
 "Money (That's What I Want)",
 "Don't Bother Me",
 "I Want You (She's So Heavy)",
 "I've Got A Feeling",
 'Little Child',
 "Maxwell's Silver Hammer",
 'Sexy Sadie',
 "I Don't Want To Spoil The Party",
 'Come Together',
 "Ain't She Sweet",
 'Hello, Goodbye',
 "You're Going To Lose That Girl",
 "Baby, You're A Rich Man",
 'Tell Me What You See',
 'All Things Must Pass',
 'Tell Me Why',
 'Clarabella',
 "That's Alright (Mama)",
 "All I've Got To Do",
 "A Hard Day's Night",
 'Something',
 'Love Me Do',
 'Please Please Me',
 'Set Fire To That Lot!',
 'I Saw Her Standing There',
 'Matchbox',
 'Maggie Mae',
 'Ask Me Why',
 "When I'm Sixty Four",
 "I'm Gonna Sit Right Down And Cry (Over You)",
 'My Bonnie',
 'Free As A Bird',
 "Don't Pass Me By",
 'Strawberry Fields Forever',
 'Hallelujah, I Love Her So',
 'I Got To Find My Baby',
 'Sun King',
 'Riding On A Bus',
 "I'm Happy Just To Dance With You",
 'Dig A Pony',
 'Yer Blues',
 'She Came In Th

Since words such as pronouns, prepositions, conjunctions and adverbs are too commom in a sentence and verbs can have many forms of conjugation, we will be only considering nouns and adjetives in the counting. For doing so, we will be using the britannica website list of words. Let us get the list.

In [8]:
list_url = "https://www.britannica.com"
list_sub_url = "/dictionary/eb/3000-words/alpha/"
letters = string.ascii_lowercase
word_list = []
class_list = []
for letter in letters:
    page_no = 1
    while True:
        page = bs(requests.get(list_url+list_sub_url+letter+"/"+str(page_no)).content)
        a_tags = page.select("ul.a_words li a")
        for word in a_tags:
            word_list.append(word.get_text().strip().split()[0])
            word_page = bs(requests.get(list_url+word['href']).content)
            word_classes = ", ".join([word_class.get_text() for word_class in word_page.select("div.fl")])
            class_list.append(word_classes)
        if not(page.find("a", attrs={'class':'button next'})):
            break
        page_no+=1
print("Size of the words list: {}".format(len(word_list)))
print("Size of the classes list: {}".format(len(class_list)))

Size of the words list: 3962
Size of the classes list: 3962


As expected, the words and classes lists have the same size. Let us now transform those lists in a dataframe object.

In [9]:
words_df = pd.DataFrame({'word':word_list, 'classes':class_list})
words_df.head()

Unnamed: 0,word,classes
0,a,"noun, indefinite article"
1,abandon,"verb, noun"
2,ability,noun
3,able,adjective
4,about,"adverb, preposition, adjective"


Now let us create a column for each gramatical class. We are going to be using the classes noun, adjective, verb, adverb, article, preposition, conjunction, interjection and other (for classes which do not fit in any of the other categories).

In [10]:
words_df['pronoun'] = words_df['classes'].str.contains('pronoun')
words_df['noun'] = words_df['classes'].str.contains('noun')&(~words_df['pronoun'])
words_df['adjective'] = words_df['classes'].str.contains('adjective')
words_df['adverb'] = words_df['classes'].str.contains('adverb')
words_df['verb'] = words_df['classes'].str.contains('verb')&(~words_df['adverb'])
words_df['adverb'] = words_df['classes'].str.contains('adverb')
words_df['article'] = words_df['classes'].str.contains('article')
words_df['preposition'] = words_df['classes'].str.contains('preposition')
words_df['conjunction'] = words_df['classes'].str.contains('conjunction')
words_df['interjection'] = words_df['classes'].str.contains('interjection')
words_df['other'] = ~(words_df['noun']|words_df['adjective']|words_df['verb']|
                      words_df['adverb']|words_df['article']|words_df['preposition']|words_df['conjunction'])
words_df.head()

Unnamed: 0,word,classes,pronoun,noun,adjective,adverb,verb,article,preposition,conjunction,interjection,other
0,a,"noun, indefinite article",False,True,False,False,False,True,False,False,False,False
1,abandon,"verb, noun",False,True,False,False,True,False,False,False,False,False
2,ability,noun,False,True,False,False,False,False,False,False,False,False
3,able,adjective,False,False,True,False,False,False,False,False,False,False
4,about,"adverb, preposition, adjective",False,False,True,True,False,False,True,False,False,False


Now let us remove the duplicates and then save the dataframe as a csv file.

In [11]:
words_df = words_df.drop_duplicates().reset_index(drop=True)
words_df.to_csv("list_of_english_words.csv", sep=';', index=False)
words_df.tail()

Unnamed: 0,word,classes,pronoun,noun,adjective,adverb,verb,article,preposition,conjunction,interjection,other
3311,zenith,noun,False,True,False,False,False,False,False,False,False,False
3312,zero,"noun, adjective, verb",False,True,True,False,True,False,False,False,False,False
3313,zipper,noun,False,True,False,False,False,False,False,False,False,False
3314,zone,"noun, verb",False,True,False,False,True,False,False,False,False,False
3315,zoo,noun,False,True,False,False,False,False,False,False,False,False


Now let us remove all the words that are not nouns or adjectives.

In [12]:
words_df = words_df.query("noun | adjective").reset_index(drop=True)
words_df.tail()

Unnamed: 0,word,classes,pronoun,noun,adjective,adverb,verb,article,preposition,conjunction,interjection,other
2838,zenith,noun,False,True,False,False,False,False,False,False,False,False
2839,zero,"noun, adjective, verb",False,True,True,False,True,False,False,False,False,False
2840,zipper,noun,False,True,False,False,False,False,False,False,False,False
2841,zone,"noun, verb",False,True,False,False,True,False,False,False,False,False
2842,zoo,noun,False,True,False,False,False,False,False,False,False,False


We end up with a list ot 2843 words. Let us count them in the songs titles.

In [13]:
words_df['title_counter']=0
for title in songs_titles:
    counter=[]
    for word in words_df['word']:
        counter.append(len(re.findall(r'\b' + word.capitalize() + r'\b', title)))
    words_df['title_counter'] += counter
words_df.sort_values("title_counter", ascending=False).head()

Unnamed: 0,word,classes,pronoun,noun,adjective,adverb,verb,article,preposition,conjunction,interjection,other,title_counter
0,a,"noun, indefinite article",False,True,False,False,False,True,False,False,False,False,26
1642,my,adjective,False,False,True,False,False,False,False,False,False,False,18
1489,love,"noun, verb",False,True,False,False,True,False,False,False,False,False,16
1273,in,"preposition, adverb, adjective, noun",False,True,True,True,False,False,True,False,False,False,11
2835,your,adjective,False,False,True,False,False,False,False,False,False,False,10


There are two main problems with this ranking. The first is that words like "a" and "in" are also nouns, according to the britannica dictionary, but this is obviously not the most usual way that people use these words. The second problem is that words like "my" and "your" are considered adjectives in the english language, whereas in most languages (including mine!), these type of words are considered to be pronouns. Therefore, we will be removing every word that are not exclusively inside the following group: noun, adjective and verb. We will also remove words like "my" and "your".

In [14]:
words_df = words_df.query("(noun|adjective)&(~pronoun)&(~adverb)&(~article)&(~preposition)&(~conjunction)&(~interjection)&(~other)")

words_df = words_df[~words_df['word'].isin(['my', 'your', 'our', 'their'])].reset_index(drop=True)
words_df.tail()

Unnamed: 0,word,classes,pronoun,noun,adjective,adverb,verb,article,preposition,conjunction,interjection,other,title_counter
2632,zenith,noun,False,True,False,False,False,False,False,False,False,False,0
2633,zero,"noun, adjective, verb",False,True,True,False,True,False,False,False,False,False,0
2634,zipper,noun,False,True,False,False,False,False,False,False,False,False,0
2635,zone,"noun, verb",False,True,False,False,True,False,False,False,False,False,0
2636,zoo,noun,False,True,False,False,False,False,False,False,False,False,0


We ended up with 2646 words. Now let us see the final ranking.

In [15]:
words_df.sort_values("title_counter", ascending=False).head()

Unnamed: 0,word,classes,pronoun,noun,adjective,adverb,verb,article,preposition,conjunction,interjection,other,title_counter
1393,love,"noun, verb",False,True,False,False,True,False,False,False,False,False,16
745,do,"verb, verb, noun, noun",False,True,False,False,True,False,False,False,False,False,9
164,baby,"noun, adjective, verb",False,True,True,False,True,False,False,False,False,False,7
2536,want,"verb, noun",False,True,False,False,True,False,False,False,False,False,6
639,day,noun,False,True,False,False,False,False,False,False,False,False,5


The word love is by far the most used in the titles of The Beatles songs with 16 occurences. Here we notice the presence of words like do, want and know. They are here because they can also be used as nouns, but their most commom way of use is as verbs. For sake of simplicity, we are still going to be counting them, since a large amount of nouns can also be used as verbs and in most of the cases, we have no parameter to guess which was the case. Now let us check the least used words. 

In [16]:
words_df.query("title_counter==1")

Unnamed: 0,word,classes,pronoun,noun,adjective,adverb,verb,article,preposition,conjunction,interjection,other,title_counter
20,act,"noun, verb",False,True,False,False,True,False,False,False,False,False,1
198,bathroom,noun,False,True,False,False,False,False,False,False,False,False,1
223,benefit,"noun, verb",False,True,False,False,True,False,False,False,False,False,1
229,bill,"noun, verb, noun, verb",False,True,False,False,True,False,False,False,False,False,1
241,black,"adjective, noun, verb",False,True,True,False,True,False,False,False,False,False,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...
2605,word,"noun, verb",False,True,False,False,True,False,False,False,False,False,1
2606,work,"verb, noun, adjective",False,True,True,False,True,False,False,False,False,False,1
2621,writer,noun,False,True,False,False,False,False,False,False,False,False,1
2628,yellow,"adjective, noun, verb",False,True,True,False,True,False,False,False,False,False,1


115 words are only used once in the titles of the beatles songs and we cannot take much more information of this.

## The Songs Lyrics

All the links to the songs lyrics are inside the a tags that are inside div tags with an "listalbum-item" class. Let's try to get them.

In [17]:
a_tags = soup.select('div.listalbum-item a')
links = [link['href'] for link in a_tags]
links = list(set(links))
links

['/lyrics/beatles/wordsoflove.html',
 '/lyrics/beatles/theend.html',
 '/lyrics/beatles/imaloser.html',
 '/lyrics/beatles/pleasepleaseme.html',
 '/lyrics/beatles/thisboy.html',
 '/lyrics/beatles/yesterday.html',
 '/lyrics/beatles/taxman.html',
 '/lyrics/beatles/actnaturally.html',
 '/lyrics/beatles/memphistennessee.html',
 '/lyrics/beatles/lonesometearsinmyeyes.html',
 '/lyrics/beatles/sohowcomenoonelovesme.html',
 '/lyrics/beatles/dontletmedown.html',
 '/lyrics/beatles/watchingrainbows.html',
 '/lyrics/beatles/inspiteofallthedanger.html',
 '/lyrics/beatles/junk.html',
 '/lyrics/beatles/dearprudence.html',
 '/lyrics/beatles/pleasemisterpostman.html',
 '/lyrics/beatles/come-together.html',
 '/lyrics/beatles/mailmanbringmenomoreblues.html',
 '/lyrics/beatles/youknowwhattodo.html',
 '/lyrics/beatles/likedreamersdo.html',
 '/lyrics/beatles/allthingsmustpass.html',
 '/lyrics/beatles/imemine.html',
 '/lyrics/beatles/thingswesaidtoday.html',
 '/lyrics/beatles/someotherguy.html',
 '/lyrics/beat

Let us take a look in the structure of one of the lyrics site.

In [18]:
url="https://www.azlyrics.com"
lyrics_site = bs(requests.get(url+links[0]).content)
print(lyrics_site.prettify())

<!DOCTYPE html>
<html lang="en">
 <head>
  <meta charset="utf-8"/>
  <meta content="IE=edge" http-equiv="X-UA-Compatible"/>
  <meta content="width=device-width, initial-scale=1" name="viewport"/>
  <meta content='The Beatles "Words Of Love": Hold me close and tell me how you feel Tell me love is real Words of love you whisper soft and true...' name="description"/>
  <meta content="Words Of Love lyrics, The Beatles Words Of Love lyrics, The Beatles lyrics" name="keywords"/>
  <meta content="noarchive" name="robots"/>
  <meta content="//www.azlyrics.com/az_logo_tr.png" property="og:image"/>
  <title>
   The Beatles - Words Of Love Lyrics | AZLyrics.com
  </title>
  <link href="https://www.azlyrics.com/lyrics/beatles/wordsoflove.html" rel="canonical"/>
  <link href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.4/css/bootstrap.min.css" rel="stylesheet"/>
  <link href="/local/az.css" rel="stylesheet"/>
  <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->

The lyrics are inside a div tag with another div tag with a class "ringtone" as its siblings. First we are going to save the lyrics inside txt files. This way, we avoid doing multiple requests to the azlyrics website in the future. Since we are going to make more than 300 requests, we are also setting a delay time of 45 seconds to avoid being blocked from the website. Since even this way we can still get blocked, we are also keep tracking of the lyrics we have already saved.

In [19]:
url = "https://www.azlyrics.com"
for i in range(305, len(links)):
    lyrics_site = bs(requests.get(url+links[i]).content)
    try:
        lyrics = lyrics_site.select("div.ringtone~div")[0].get_text()
        f = open("lyrics/"+links[i][15:-4]+'txt', 'w')
        f.write(lyrics)
        f.close()
        time.sleep(45)
    except:
        print(i)
        break
    

305


Now that we have all the lyrics stored in txt files, we are able to count the words appearances. 

In [20]:
words_df['lyrics_counter'] = 0
lyrics_path = os.getcwd() + '\\lyrics\\'
lyrics_list = os.listdir(lyrics_path)
for lyrics in lyrics_list:
    counter = []
    for word in words_df['word']:
        f = open(lyrics_path+lyrics, 'r')
        lyrics_text = f.read()
        f.close()
        capital_count = len(re.findall(r'\b' + word.capitalize() + r'\b', lyrics_text))
        count = len(re.findall(r'\b' + word + r'\b', lyrics_text))
        counter.append(capital_count+count)
    words_df['lyrics_counter']+=counter
words_df.sort_values("lyrics_counter", ascending=False).head()

Unnamed: 0,word,classes,pronoun,noun,adjective,adverb,verb,article,preposition,conjunction,interjection,other,title_counter,lyrics_counter
1393,love,"noun, verb",False,True,False,False,True,False,False,False,False,False,16,685
1310,know,"verb, noun",False,True,False,False,True,False,False,False,False,False,5,501
745,do,"verb, verb, noun, noun",False,True,False,False,True,False,False,False,False,False,9,467
347,can,"verb, noun, verb",False,True,False,False,True,False,False,False,False,False,5,434
164,baby,"noun, adjective, verb",False,True,True,False,True,False,False,False,False,False,7,294


Love is by far the most used in The Beatles' songs. Since words like know, do and can are mostly used as verbs, we can assume the second most used word is baby. Interesting to point out that the word "love" is used, on average, more than two times in the songs. Now let us see the least used words.

In [21]:
words_df.query("lyrics_counter==1")

Unnamed: 0,word,classes,pronoun,noun,adjective,adverb,verb,article,preposition,conjunction,interjection,other,title_counter,lyrics_counter
2,able,adjective,False,False,True,False,False,False,False,False,False,False,0,1
43,advice,noun,False,True,False,False,False,False,False,False,False,False,0,1
48,afternoon,noun,False,True,False,False,False,False,False,False,False,False,0,1
49,age,"noun, verb",False,True,False,False,True,False,False,False,False,False,0,1
85,angry,adjective,False,False,True,False,False,False,False,False,False,False,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2563,weird,"adjective, verb",False,False,True,False,True,False,False,False,False,False,0,1
2572,whistle,"noun, verb",False,True,False,False,True,False,False,False,False,False,0,1
2589,winter,"noun, verb",False,True,False,False,True,False,False,False,False,False,0,1
2591,wire,"noun, verb",False,True,False,False,True,False,False,False,False,False,0,1


201 words are only used once in their lyrics, include common words like "afternoon", "age", "angry" and "weird". Now let us see the word that were neve used in songs lyrics.

In [22]:
words_df.query("lyrics_counter==0")

Unnamed: 0,word,classes,pronoun,noun,adjective,adverb,verb,article,preposition,conjunction,interjection,other,title_counter,lyrics_counter
0,abandon,"verb, noun",False,True,False,False,True,False,False,False,False,False,0,0
1,ability,noun,False,True,False,False,False,False,False,False,False,False,0,0
3,absence,noun,False,True,False,False,False,False,False,False,False,False,0,0
4,absolute,adjective,False,False,True,False,False,False,False,False,False,False,0,0
5,abstract,"adjective, noun, verb",False,True,True,False,True,False,False,False,False,False,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2631,zealot,noun,False,True,False,False,False,False,False,False,False,False,0,0
2632,zenith,noun,False,True,False,False,False,False,False,False,False,False,0,0
2633,zero,"noun, adjective, verb",False,True,True,False,True,False,False,False,False,False,0,0
2634,zipper,noun,False,True,False,False,False,False,False,False,False,False,0,0


1862 words were never used. Now let us save the dataframe into an csv file and finish our analysis.

In [23]:
words_df.to_csv("word_counter.csv", sep=';', index=False)

## Conclusion

In this project we have analysed all the lyrics to The Beatles' songs and we were able to show that the word "love" is by far the most used by them, both in the lyrics and in the songs titles. The most common word used in the albums titles is "(The) Beatles". 201 words were only used once in their songs and 1862 were never used.

## Off Topic

In this project we used a list of common english words and we were able to notice that the letter "s" is the most usual to start an english word.