# The Beatles Lyrics Analysis

The beatles is regarded by many people as the greatest band of all time. Being both publicly and critically acclaimed, they are immortalized with songs that successfully passed the test of time. In this report, we analyse The Beatles' lyrics, aiming to answer the following questions:

* What are the most usual and unusual word used in the titles of the albums?
* What are the most usual and unsual word used in the titles of the songs?
* What are the most usual and unusual word used in the lyrics of the songs?
* Are there any words never used in a Beatles' song?

##  Getting the data

We start our analysis by loading the data with The Beatles lyrics. In order to do so, we are going to be using the azlyrics website.

In [30]:
import requests
from bs4 import BeautifulSoup as bs
import re
import pandas as pd
import string

url="https://www.azlyrics.com/b/beatles.html"
r=requests.get(url)
soup=bs(r.content)
print(soup.prettify())

<!DOCTYPE html>
<html lang="en">
 <head>
  <meta charset="utf-8"/>
  <meta content="IE=edge" http-equiv="X-UA-Compatible"/>
  <meta content="width=device-width, initial-scale=1" name="viewport"/>
  <!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags -->
  <meta content='The Beatles lyrics - 427 song lyrics sorted by album, including "Yesterday", "Ob-La-Di, Ob-La-Da", "Hey Jude".' name="description"/>
  <meta content="The Beatles, The Beatles lyrics, discography, albums, songs" name="keywords"/>
  <meta content="noarchive" name="robots"/>
  <title>
   The Beatles Lyrics
  </title>
  <link href="https://www.azlyrics.com/b/beatles.html" rel="canonical"/>
  <link href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.4/css/bootstrap.min.css" rel="stylesheet"/>
  <link href="/local/az.css" rel="stylesheet"/>
  <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->
  <!--[if lt IE 9]>
<script src="https:/

## The Albums Titles

All the albums titles are inside the b tags that are inside div tags with an "album" class. Let's try to get them.

In [27]:
b_tags = soup.select("div.album b")
b_tags

[<b>"Please Please Me"</b>,
 <b>"With The Beatles"</b>,
 <b>"A Hard Day's Night"</b>,
 <b>"Beatles For Sale"</b>,
 <b>"Help!"</b>,
 <b>"Rubber Soul"</b>,
 <b>"Revolver"</b>,
 <b>"Sgt. Pepper's Lonely Hearts Club Band"</b>,
 <b>"Magical Mystery Tour"</b>,
 <b>"The Beatles (The White Album)"</b>,
 <b>"Yellow Submarine"</b>,
 <b>"Abbey Road"</b>,
 <b>"Let It Be"</b>,
 <b>"Past Masters. Volume One"</b>,
 <b>"Past Masters. Volume Two"</b>,
 <b>"Live At The BBC. Disk 1"</b>,
 <b>"Live At The BBC. Disk 2"</b>,
 <b>"Anthology 1"</b>,
 <b>"Anthology 2"</b>,
 <b>"Anthology 3"</b>,
 <b>other songs:</b>]

We successfully get all the albums. In this analysis, we are going to consider only the studio original albums. Let us remove get the text from the b tags and remove the undesirable albums

In [13]:
titles = [i.get_text().replace('"','') for i in b_tags]
titles = titles[:-8]
titles

['Please Please Me',
 'With The Beatles',
 "A Hard Day's Night",
 'Beatles For Sale',
 'Help!',
 'Rubber Soul',
 'Revolver',
 "Sgt. Pepper's Lonely Hearts Club Band",
 'Magical Mystery Tour',
 'The Beatles (The White Album)',
 'Yellow Submarine',
 'Abbey Road',
 'Let It Be']

In order to count the words, we firstly need to remove the special characters.

In [18]:
remove_cases = ["'s", "(", ")", "!"]
treated_titles = []
for i in titles:
    aux=i
    for j in remove_cases:
        aux=aux.replace(j, "")
    treated_titles.append(aux)
treated_titles

['Please Please Me',
 'With The Beatles',
 'A Hard Day Night',
 'Beatles For Sale',
 'Help',
 'Rubber Soul',
 'Revolver',
 'Sgt. Pepper Lonely Hearts Club Band',
 'Magical Mystery Tour',
 'The Beatles The White Album',
 'Yellow Submarine',
 'Abbey Road',
 'Let It Be']

Now let us count the words.

In [24]:
counter = {}
for i in treated_titles:
    for j in i.split():
        if counter.get(j)==None:
            counter[j] = 1
        else:
            counter[j] = counter[j]+1
sorted(counter.items(), key=lambda item: item[1], reverse=True)

[('The', 3),
 ('Beatles', 3),
 ('Please', 2),
 ('Me', 1),
 ('With', 1),
 ('A', 1),
 ('Hard', 1),
 ('Day', 1),
 ('Night', 1),
 ('For', 1),
 ('Sale', 1),
 ('Help', 1),
 ('Rubber', 1),
 ('Soul', 1),
 ('Revolver', 1),
 ('Sgt.', 1),
 ('Pepper', 1),
 ('Lonely', 1),
 ('Hearts', 1),
 ('Club', 1),
 ('Band', 1),
 ('Magical', 1),
 ('Mystery', 1),
 ('Tour', 1),
 ('White', 1),
 ('Album', 1),
 ('Yellow', 1),
 ('Submarine', 1),
 ('Abbey', 1),
 ('Road', 1),
 ('Let', 1),
 ('It', 1),
 ('Be', 1)]

The words "The" and "Beatles" are the most common in the song titles. The second is the word "Please". All the other words only appear once in the titles. One of the appearances of the word "The" happens because of the white album, which is actually officially called only "The Beatles". Therefore, the word "The" only appears 2 times in titles and it is always followed by the word "Beatles". The word "Please" has its two occurences in the name of the first album. Therefore, we can say that, with the exception of the word "(The) Beatles", all words only appear once in the titles.

## The Songs Titles

All the songs titles are inside the b tags that are inside div tags with an "listalbum-item" class. Let's try to get them.

In [25]:
a_tags = soup.select('div.listalbum-item a')
a_tags

[<a href="/lyrics/beatles/isawherstandingthere.html" target="_blank">I Saw Her Standing There</a>,
 <a href="/lyrics/beatles/misery.html" target="_blank">Misery</a>,
 <a href="/lyrics/beatles/annagotohim.html" target="_blank">Anna (Go To Him)</a>,
 <a href="/lyrics/beatles/chains.html" target="_blank">Chains</a>,
 <a href="/lyrics/beatles/boys.html" target="_blank">Boys</a>,
 <a href="/lyrics/beatles/askmewhy.html" target="_blank">Ask Me Why</a>,
 <a href="/lyrics/beatles/pleasepleaseme.html" target="_blank">Please Please Me</a>,
 <a href="/lyrics/beatles/lovemedo.html" target="_blank">Love Me Do</a>,
 <a href="/lyrics/beatles/psiloveyou.html" target="_blank">P.S. I Love You</a>,
 <a href="/lyrics/beatles/babyitsyou.html" target="_blank">Baby It's You</a>,
 <a href="/lyrics/beatles/doyouwanttoknowasecret.html" target="_blank">Do You Want To Know A Secret</a>,
 <a href="/lyrics/beatles/atasteofhoney.html" target="_blank">A Taste Of Honey</a>,
 <a href="/lyrics/beatles/theresaplace.html"

Now let us get the songs titles and get rid of the duplicates.

In [29]:
songs_titles = [i.get_text() for i in a_tags]
songs_titles = list(set(songs_titles))
songs_titles

['Love Me Do',
 'Till There Was You',
 'Cry Baby Cry',
 'You Know My Name (Look Up The Number)',
 'Come And Get It',
 'Yellow Submarine',
 'Back In The U.S.S.R.',
 'The Inner Light',
 "I Want You (She's So Heavy)",
 'And I Love Her',
 'Because',
 "A Hard Day's Night",
 'Help!',
 'Polythene Pam',
 'She Came In Through The Bathroom Window',
 "Don't Bother Me",
 'Thank You Girl',
 'Lend Me Your Comb',
 'P.S. I Love You',
 "What's The New Mary Jane",
 'Drive My Car',
 'Piggies',
 'Have A Banana!',
 "Maxwell's Silver Hammer",
 "You've Got To Hide Your Love Away",
 'Good Morning, Good Morning',
 'Blue Jay Way',
 'Hallelujah, I Love Her So',
 "It Won't Be Long",
 'The Sheik Of Araby',
 'Set Fire To That Lot!',
 'You Really Got A Hold On Me',
 "She's A Woman",
 'I Got To Find My Baby',
 'Doctor Robert',
 'A Shot Of Rhythm And Blues',
 'Free As A Bird',
 'Junk',
 'A Day In The Life',
 'Honey Pie',
 "I'll Get You",
 'The Word',
 'Here, There And Everywhere',
 "Sgt. Pepper's Lonely Hearts Club Ba

Since words such as pronouns, prepositions, conjuctions and adverbs are too commom in a sentence and verbs can have many forms of conjugation, we will be only considering nouns and adjetives in the counting. For doing so, we will be using the britannica website list of words. Let us get the list.

In [51]:
list_url = "https://www.britannica.com/dictionary/eb/3000-words/alpha"
letters = string.ascii_lowercase
teste = bs(requests.get(list_url+"/a/1").content)
aux=teste.find("a",attrs={"class":"button"})
print(aux)

<a class="button next" href="/dictionary/eb/3000-words/alpha/a/2">Next »</a>
