## Practice Lab

# **Scenario: Text Analysis**

## Objectives

Upon finishing this laboratory exercise, you will gain proficiency in employing various techniques in combination.

## Setup

For this lab, we will be using the following data types:

- List
- Strings
- Classes and objects


# Let's consider a real-life scenario where you are analyzing customer feedback for a product. You have a large dataset of customer reviews in the form of strings, and you want to extract useful information from them using the three identified tasks:

**1. String in lower case:** You want to Pre-process the customer feedback by converting all the text to lowercase. This step helps standardize the text. Lower casing the text allows you to focus on the content rather than the specific letter casing.

**2. Frequency of all words in a given string:** After converting the text to lowercase, you want to determine the frequency of each word in the customer feedback. This information will help you identify which words are used more frequently, indicating the key aspects or topics that customers are mentioning in their reviews. By analyzing the word frequencies, you can gain insights into the most common issues raised by customers.

**3. Frequency of a specific word:** In addition to analyzing the overall word frequencies, you want to specifically track the frequency of a particular word that is relevant to your analysis. For example, you might be interested in monitoring how often the word "reliable" appears in the customer reviews to gauge customer sentiment about the product's reliability. By focusing on the frequency of a specific word, you can gain a deeper understanding of customer opinions or preferences related to that particular aspect.

By performing these tasks on the customer feedback dataset, you can gain valuable insights into customer sentiment

## Below is the given string

In [13]:
given_string = "Lorem ipsum dolor! diam amet, consetetur Lorem magna. sed diam nonumy eirmod tempor. diam et labore? et diam magna. et diam amet."

# Task-1 Define the class and its attributes:

1. Create a class named TextAnalyzer.
2. Define the constructor __init__ method that takes a text argument.

In [14]:
class TextAnalyzer(object):
    
    def __init__(self, text):
        self.text = text

# Task-2 Formatting the text:

1. Inside the constructor, convert the text argument to lowercase using the lower() method.
2. Remove punctuation marks (periods, exclamation marks, commas, and question marks) from the text using the replace() method.
3. Assign the formatted text to a new attribute called fmtText.

**Update the above `TextAnalyzer` class with points mentioned above.**

In [15]:
class TextAnalyzer(object):
    
    def __init__(self, text):
        # text is formatted to be lowercase
        f_text = text.lower()
        #text is formatted to have no punctuation
        f_text = text.replace(",", "").replace(".", "").replace("!", "").replace("?", "")
        #assign formatted text to self object
        self.fmtText = f_text

## Task-3 Frequency of all unique words:

- Implement the freqAll() method:
    1. Split the fmtText attribute into individual words using the split() method.
    2. Create an empty dictionary to store the word frequency.
    3. Iterate over the list of words and update the frequency dictionary accordingly.
    4. use count method for counting the occurence
    5. Return the frequency dictionary.

**Update the above `TextAnalyzer` class with points mentioned above.**

In [16]:
class TextAnalyzer(object):
    
    def __init__(self, text):
        # text is formatted to be lowercase
        f_text = text.lower()
        #text is formatted to have no punctuation
        f_text = text.replace(",", "").replace(".", "").replace("!", "").replace("?", "")
        #assign formatted text to self object
        self.fmtText = f_text
        
    def freqAll(self):
        # split text into word list by spaces
        word_list = self.fmtText.split()
        # return a dictionary of words by their frequency in the word list
        return {word: word_list.count(word) for word in word_list}

Testing freqAll() function thus far:

In [17]:
txt = TextAnalyzer(given_string)
txt.freqAll()

{'Lorem': 2,
 'ipsum': 1,
 'dolor': 1,
 'diam': 5,
 'amet': 2,
 'consetetur': 1,
 'magna': 2,
 'sed': 1,
 'nonumy': 1,
 'eirmod': 1,
 'tempor': 1,
 'et': 3,
 'labore': 1}

## Task-4 Frequency of a specific word:

- Implement the freqOf(word) method that takes a word argument:
    1. Create method and pass the word that need to be found
    2. Get the freqAll method for look for count and check if that word is in the list.
    3. Return the count.

**Update the above `TextAnalyzer` class with points mentioned above.**

In [22]:
class TextAnalyzer(object):
    
    def __init__(self, text):
        # text is formatted to be lowercase
        f_text = text.lower()
        #text is formatted to have no punctuation
        f_text = text.replace(",", "").replace(".", "").replace("!", "").replace("?", "")
        #assign formatted text to self object
        self.fmtText = f_text
        
    def freqAll(self):
        # split text into word list by spaces
        word_list = self.fmtText.split()
        # return a dictionary of words by their frequency in the word list
        return {word: word_list.count(word) for word in word_list}
    
    def freqOf(self, word):
        # acquire frequency map
        freq_dict = self.freqAll()
        # return the frequency of the word in the word list if it exists, otherwise return 0
        return freq_dict[word] if word in freq_dict else 0

Testing freqOf() function thus far:

In [23]:
txt = TextAnalyzer(given_string)
txt.freqOf("et")

3

## Now, We have successfully created a class with 3 methods

## Task-5 Create an instance of TextAnalyzer Class.

In [26]:
txt = TextAnalyzer(given_string)

## Task-6 Print the formatted text (lower case string)

In [28]:
print(txt.fmtText)

Lorem ipsum dolor diam amet consetetur Lorem magna sed diam nonumy eirmod tempor diam et labore et diam magna et diam amet


## Task-7 Test freqAll() method

In [29]:
txt.freqAll()

{'Lorem': 2,
 'ipsum': 1,
 'dolor': 1,
 'diam': 5,
 'amet': 2,
 'consetetur': 1,
 'magna': 2,
 'sed': 1,
 'nonumy': 1,
 'eirmod': 1,
 'tempor': 1,
 'et': 3,
 'labore': 1}

## Task-8 Test freqOf() method

you have to find the frequency of the following words:-

1. "lorem"
2. "diam"
3. "et"

print the output using **formatting**

In [41]:
word = "lorem"
print(f"'{word}' appears {txt.freqOf(word)} times")

'lorem' appears 0 times


In [42]:
word = "diam"
print(f"'{word}' appears {txt.freqOf(word)} times")

'diam' appears 5 times


In [43]:
word = "et"
print(f"'{word}' appears {txt.freqOf(word)} times")

'et' appears 3 times
