# Introduction
In this innovative project, our case study revolves around harnessing the potential of a comprehensive Beyoncé dataset sourced from Kaggle. Our primary objective is to reshape and reimagine the lyrics using AI-driven creativity. By utilizing predictive generation techniques, we're aiming to craft fresh and captivating verses that resonate with the essence of the dataset. To deepen our exploration, we're integrating advanced methodologies such as Part-of-Speech tagging and Named Entity Recognition. Through this endeavor, we aspire to unlock a new dimension of lyrical interpretation and creative expression, while showcasing the seamless fusion of technology and artistry.

In [1]:
cd /Users/macbook/Downloads

/Users/macbook/Downloads


# Section 1: Import Libraries
I began by importing essential libraries, which lay the foundation for data analysis, natural language processing, and machine learning.

In [2]:
import pandas as pd
import spacy
import nltk
import random
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
nlp = spacy.load('en_core_web_sm')

# Section 2: Data Loading and Preprocessing
In this part, the project loads the dataset containing Beyonce's lyrics using the pandas. It then preprocesses the lyrics by converting them to lowercase to ensure uniformity for analysis.

In [3]:
df=pd.read_csv('Beyonce.csv')
df['Lyric'] = df['Lyric'].fillna('')
df['Lyric'] = df['Lyric'].str.lower()

In [4]:
df.head()

Unnamed: 0.1,Unnamed: 0,Artist,Title,Album,Year,Date,Lyric
0,0,Beyoncé,Drunk in Love,BEYONCÉ,2013.0,2013-12-17,beyoncé i've been drinkin' i've been drinkin' ...
1,1,Beyoncé,Formation,Lemonade,2016.0,2016-02-06,messy mya what happened at the new wil'ins bit...
2,2,Beyoncé,Partition,BEYONCÉ,2013.0,2013-12-13,part yoncé let me hear you say hey ms carte...
3,3,Beyoncé,Mine,BEYONCÉ,2013.0,2013-12-13,beyoncé i've been watching for the signs took ...
4,4,Beyoncé,Hold Up,Lemonade,2016.0,2016-04-23,hold up they don't love you like i love you sl...


# Section 3: TF-IDF Vectorization and Model Training
This section involves creating a TF-IDF (Term Frequency-Inverse Document Frequency) vectorizer using TfidfVectorizer from sklearn. This vectorizer converts the lyrics into numerical features that can be used for modeling. The LogisticRegression model is then trained to predict song titles based on the vectorized lyrics.

In [5]:
vectorizer = TfidfVectorizer(stop_words='english')
X = vectorizer.fit_transform(df['Lyric'])
y = df['Title']
model = LogisticRegression()
model.fit(X, y)

# Section 4: Lyrics Tokenization and Dictionary Creation
In this section, the lyrics are tokenized into words using nltk.word_tokenize(). Additionally, a dictionary is created where each word is mapped to the following word. This dictionary will be utilized later for generating lyrics.

In [6]:
word_tokens = nltk.word_tokenize(' '.join(df['Lyric'].dropna()).lower())
lyrics_dict = {}
for i in range(len(word_tokens)-1):
    word = word_tokens[i]
    next_word = word_tokens[i+1]
    if word in lyrics_dict:
        lyrics_dict[word].append(next_word)
    else:
        lyrics_dict[word] = [next_word]

# Section 5: Functions for Predictions and Analysis
This part defines several functions that enable various types of analysis and predictions:

predict_song_title(lyrics): Predicts the song title based on input lyrics using the trained Logistic Regression model.
perform_named_entity_recognition(lyrics): Performs Named Entity Recognition (NER) on lyrics using spaCy's NER capabilities.
perform_pos_tagging(lyrics): Performs Part-of-Speech (POS) tagging on lyrics using spaCy's linguistic analysis.
generate_lyrics(start_word, num_lines=20): Generates new lyrics based on a starting word using the previously created dictionary.


In [13]:
# Function to predict the song title based on lyrics
def predict_song_title(lyrics):
    lyrics = lyrics.lower()
    X_test = vectorizer.transform([lyrics])
    prediction = model.predict(X_test)
    return prediction[0]

# Function to perform named entity recognition on lyrics
def perform_named_entity_recognition(lyrics):
    doc = nlp(lyrics)
    entities = [(entity.text, entity.label_) for entity in doc.ents]
    return entities

# Function to perform POS tagging on lyrics
def perform_pos_tagging(lyrics):
    doc = nlp(lyrics)
    pos_tags = [(token.text, token.pos_) for token in doc]
    return pos_tags

# Function to generate lyrics given a starting word
def generate_lyrics(start_word, num_lines=20):
    generated_lyrics = [start_word]
    for _ in range(num_lines):
        current_word = generated_lyrics[-1]
        if current_word in lyrics_dict:
            next_word_options = lyrics_dict[current_word]
            next_word = random.choice(next_word_options)
            generated_lyrics.append(next_word)
        else:
            break
    return ' '.join(generated_lyrics)
# Start the conversation
print("Welcome! I'm a multipurpose Bot.")
print("You can choose to perform Named Entity Recognition, POS tagging, predict a song title, or generate lyrics.")
print("Enter 'bye' to exit.")

# Conversation loop
while True:
    # Ask for the user's choice
    choice = input("Enter 'ner' for Named Entity Recognition, 'pos' for POS tagging, 'lyrics' to predict a song title, or 'generate' to User input: ")

    # Check if the user wants to exit
    if choice.lower() == 'bye':
        print("Goodbye! Have a great day!")
        break

    # User wants to perform Named Entity Recognition
    if choice.lower() == 'ner':
        # Ask for lyrics input
        user_lyrics = input("Please Enter the Sentence: ")

        # Perform Named Entity Recognition on the lyrics
        entities = perform_named_entity_recognition(user_lyrics)

        print("Named Entities:")
        for entity, label in entities:
            print(f"{entity}: {label}")
        print()

    # User wants to perform POS tagging
    elif choice.lower() == 'pos':
        # Ask for lyrics input
        user_lyrics = input("Please Enter the Sentence: ")

        # Perform POS tagging on the lyrics
        pos_tags = perform_pos_tagging(user_lyrics)

        print("POS Tags:")
        for token, pos in pos_tags:
            print(f"{token}: {pos}")
        print()
    # User wants to predict a song title
    elif choice.lower() == 'lyrics':
        # Ask for lyrics input
        user_lyrics = input("Enter the lyrics: ")

        # Predict the song title based on the lyrics
        predicted_song_title = predict_song_title(user_lyrics)

        print("Predicted Song Title:", predicted_song_title)
        print()

    # User wants to generate lyrics
    elif choice.lower() == 'generate':
        # Ask for word input
        user_word = input("Enter a word: ")

        # Generate lyrics based on the user's input
        generated_lyrics = generate_lyrics(user_word)
        print(generated_lyrics)
        print()

    else:
        print("Invalid choice. Please try again.")

Welcome! I'm a multipurpose Bot.
You can choose to perform Named Entity Recognition, POS tagging, predict a song title, or generate lyrics.
Enter 'bye' to exit.
Enter 'ner' for Named Entity Recognition, 'pos' for POS tagging, 'lyrics' to predict a song title, or 'generate' to User input: ner
Please Enter the Sentence: James and John eats at KFC every working day
Named Entities:
James: PERSON
John: PERSON
KFC: ORG
every working day: DATE

Enter 'ner' for Named Entity Recognition, 'pos' for POS tagging, 'lyrics' to predict a song title, or 'generate' to User input: ner
Please Enter the Sentence: Julieth and Romeo could leave for Las Vegas tomorrow
Named Entities:
Julieth: PERSON
Romeo: ORG
Las Vegas: GPE

Enter 'ner' for Named Entity Recognition, 'pos' for POS tagging, 'lyrics' to predict a song title, or 'generate' to User input: pos
Please Enter the Sentence: Julieth and Romeo could leave for Las Vegas tomorrow
POS Tags:
Julieth: PROPN
and: CCONJ
Romeo: PROPN
could: AUX
leave: VERB
for

# Limitations
Dependency on Available Lyrics Dataset: The project relies on the availability and quality of the Beyonce lyrics dataset. If the dataset is incomplete, outdated, or contains errors, it might affect the accuracy of the generated results and analyses.

Limited Scope of Predictive Model: The song title prediction model is trained specifically on Beyonce's lyrics. It may not generalize well to lyrics from other artists or genres, potentially leading to inaccurate predictions.
Lack of Contextual Understanding: The project uses statistical and linguistic analysis without understanding the broader context or emotions behind the lyrics. It may misinterpret certain nuances or poetic expressions, leading to inaccurate results in tasks such as Named Entity Recognition and sentiment analysis.

Limited Vocabulary for Lyrics Generation: The lyrics generation function relies on a dictionary of word transitions from the input lyrics dataset. If the dataset's vocabulary is limited, generated lyrics might also have a constrained vocabulary and repetitiveness.

No Emotion or Creativity: The generated lyrics lack human creativity, emotions, and deeper meaning that human songwriters bring to their work. The generated lyrics may seem mechanical and lack the emotional depth found in human-composed lyrics.

Handling of Rare or Uncommon Words: The project may struggle with handling rare or uncommon words that are not well-represented in the lyrics dataset. This could affect the quality of generated lyrics.

Inability to Comprehend Complex Context: The project's analysis and generation methods do not comprehend complex context or hidden meanings that might be present in song lyrics. It focuses on statistical patterns rather than deep semantic understanding.

Overfitting of Models: The project involves training a model for song title prediction. If the model overfits the training data, it might struggle to accurately predict song titles for new, unseen lyrics.
Ethical Considerations: Using the generated lyrics in a public or commercial context might raise ethical concerns, as they could resemble copyrighted content or infringe on artistic rights.

User Dependency for Interpretation: The results of NER, POS tagging, and song title predictions might need user interpretation. Users might need to analyze the outputs within the broader context of the song and artist.