## Creating Dataframe from dictionary

In [4]:
import pandas as pd

In [5]:
data = {
    "Customer Name": [
        "Pralov", "Abhishek", "Swoyesh", "Aakash", "Swoyesh", "Anika", "Steve", "Fatima", "Nikhil", "Jasmine",
        "Omar", "Laxmi", "Ben", "Mina", "Raj", "Sonia", "Alex", "Kabir", "Priya", "Zara"
    ],
    "Feedback": [
        "The course on AI was 🔥🔥. I learned a ton!!",
        "why is the quiz not loading??? pls fix it asap!!",
        "Love the UI, but some buttons <i>don't respond</i> on mobile.",
        "Very detailed explanations! Props to the instructors 🙏",
        "video #4 keeps buffering... even tho my net is fine :(",
        "Where’s the certificate download option?? couldn’t find it!",
        "Great platform overall, but the search feature sux...",
        "I’ve emailed support twice about my payment issue. No reply.",
        "some typos in the Python basics module. pls correct 🙃",
        "I like the badges system. Makes it feel like a game 🎮",
        "Please add subtitles for non-English speakers.",
        "Lesson 7 has a broken video link: <a href='http://vidhost.com/404'>Watch</a>",
        "Too many ads before each video 😤",
        "App crashed 3 times while taking the test!!!",
        "The daily streak system is motivating 💪",
        "Would love if you could add dark mode 🌙",
        "Forum discussions are so helpful. Thanks, team!",
        "Quiz timer is too short! 😓 barely finished in time",
        "Voice of instructor in module 2 is hard to understand.",
        "I finally finished my first course. Yay!! 🎉"
    ]
}


In [6]:
df = pd.DataFrame(data)

In [7]:
df.head(5)

Unnamed: 0,Customer Name,Feedback
0,Pralov,The course on AI was 🔥🔥. I learned a ton!!
1,Abhishek,why is the quiz not loading??? pls fix it asap!!
2,Swoyesh,"Love the UI, but some buttons <i>don't respond..."
3,Aakash,Very detailed explanations! Props to the instr...
4,Swoyesh,video #4 keeps buffering... even tho my net is...


In [8]:
feedback_df = df[['Feedback']]

In [9]:
feedback_df.head(5)

Unnamed: 0,Feedback
0,The course on AI was 🔥🔥. I learned a ton!!
1,why is the quiz not loading??? pls fix it asap!!
2,"Love the UI, but some buttons <i>don't respond..."
3,Very detailed explanations! Props to the instr...
4,video #4 keeps buffering... even tho my net is...


In [10]:
full_forms = {
    "ASAP": "As Soon As Possible",
    "UI": "User Interface",
    "UX": "User Experience",
    "AI": "Artificial Intelligence",
    "IT": "Information Technology",
    "HR": "Human Resources",
    "CEO": "Chief Executive Officer",
    "CFO": "Chief Financial Officer",
    "CTO": "Chief Technology Officer",
    "FAQ": "Frequently Asked Questions",
    "ETA": "Estimated Time of Arrival",
    "API": "Application Programming Interface",
    "HTTP": "HyperText Transfer Protocol",
    "HTML": "HyperText Markup Language",
    "SQL": "Structured Query Language",
    "DB": "Database",
    "VPN": "Virtual Private Network",
    "IP": "Internet Protocol",
    "R&D": "Research and Development",
    "KPI": "Key Performance Indicator",
    "TBD": "To Be Decided",
    "FYI": "For Your Information",
    "BTW": "By The Way",
    "IDK": "I Don't Know",
    "BRB": "Be Right Back",
    "IMO": "In My Opinion",
    "LOL": "Laughing Out Loud",
    "OMG": "Oh My God",
    "LMAO": "Laughing My Ass Off",
    "TBH": "To Be Honest",
    "TY": "Thank You",
    "THX": "Thanks",
    "NP": "No Problem",
    "PM": "Project Manager"
}

### 1. Expanding Acronyms

In [11]:
import re

In [12]:
# First of all expanding acronyms to their respective fullforms

In [13]:
def expand_acronyms(text):
    def replace(match):
        word = match.group(0)
        return full_forms.get(word.upper(), word)
    return re.sub(r'\b[A-Z]{2,}\b', replace, text)

In [14]:
feedback_df['Feedback'] = feedback_df['Feedback'].apply(expand_acronyms)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  feedback_df['Feedback'] = feedback_df['Feedback'].apply(expand_acronyms)


In [15]:
feedback_df.sample(5)

Unnamed: 0,Feedback
1,why is the quiz not loading??? pls fix it asap!!
3,Very detailed explanations! Props to the instr...
15,Would love if you could add dark mode 🌙
11,Lesson 7 has a broken video link: <a href='htt...
9,I like the badges system. Makes it feel like a...


### 2. Lowercasing all the Feedbacks

In [16]:
feedback_df['Feedback'] = feedback_df['Feedback'].str.lower()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  feedback_df['Feedback'] = feedback_df['Feedback'].str.lower()


### 3. Removing emojis 

In [17]:
def remove_emojis(text):
    emoji_pattern = re.compile(
        "["
        u"\U0001F600-\U0001F64F"  # emoticons
        u"\U0001F300-\U0001F5FF"  # symbols & pictographs
        u"\U0001F680-\U0001F6FF"  # transport & map symbols
        u"\U0001F1E0-\U0001F1FF"  # flags (iOS)
        u"\U00002702-\U000027B0"  # other symbols
        u"\U000024C2-\U0001F251"
        "]+",
        flags=re.UNICODE
    )
    return emoji_pattern.sub(r'', text)

In [18]:
feedback_df["Feedback"] = feedback_df["Feedback"].apply(remove_emojis)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  feedback_df["Feedback"] = feedback_df["Feedback"].apply(remove_emojis)


### 4. Removing HTML tags

In [19]:
def remove_html_tags(text):
    html_pattern = re.compile(r'<.*?>')
    return html_pattern.sub('', text)

In [21]:
feedback_df["Feedback"] = feedback_df["Feedback"].apply(remove_html_tags)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  feedback_df["Feedback"] = feedback_df["Feedback"].apply(remove_html_tags)


In [22]:
feedback_df

Unnamed: 0,Feedback
0,the course on artificial intelligence was . i ...
1,why is the quiz not loading??? pls fix it asap!!
2,"love the user interface, but some buttons don'..."
3,very detailed explanations! props to the instr...
4,video #4 keeps buffering... even tho my net is...
5,where’s the certificate download option?? coul...
6,"great platform overall, but the search feature..."
7,i’ve emailed support twice about my payment is...
8,some typos in the python basics module. pls co...
9,i like the badges system. makes it feel like a...


### 5. Removing any URLs

In [23]:
def remove_urls(text):
    url_pattern = re.compile(r'http\S+|www\.\S+')
    return url_pattern.sub('', text)

In [24]:
feedback_df["Feedback"] = feedback_df["Feedback"].apply(remove_urls)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  feedback_df["Feedback"] = feedback_df["Feedback"].apply(remove_urls)


In [25]:
feedback_df

Unnamed: 0,Feedback
0,the course on artificial intelligence was . i ...
1,why is the quiz not loading??? pls fix it asap!!
2,"love the user interface, but some buttons don'..."
3,very detailed explanations! props to the instr...
4,video #4 keeps buffering... even tho my net is...
5,where’s the certificate download option?? coul...
6,"great platform overall, but the search feature..."
7,i’ve emailed support twice about my payment is...
8,some typos in the python basics module. pls co...
9,i like the badges system. makes it feel like a...


### 6. Removing any punctuation

In [27]:
import string

In [28]:
def remove_punctuation(text):
    return re.sub(f"[{re.escape(string.punctuation)}]", "", text)

In [29]:
feedback_df["Feedback"] = feedback_df["Feedback"].apply(remove_punctuation)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  feedback_df["Feedback"] = feedback_df["Feedback"].apply(remove_punctuation)


In [37]:
feedback_df.sample(5)

Unnamed: 0,Feedback
19,i finally finished my first course yay
13,app crashed 3 times while taking the test
3,very detailed explanations props to the instru...
2,love the user interface but some buttons dont ...
16,forum discussions are so helpful thanks team


### 7. Removing stop words

In [43]:
import nltk
nltk.download('stopwords')
nltk.download('punkt')

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\Pralo\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping corpora\stopwords.zip.
[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\Pralo\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [45]:
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize


In [None]:
stop_words = set(stopwords.words('english'))
feedback_df['Feedback'] = feedback_df['Feedback'].apply(
    lambda x: ' '.join([word for word in x.split() if word not in stop_words])
)


'course artificial intelligence learned ton'