# LLMs for Text Transformation

In this notebook, we will explore how to use Large Language Models for text transformation tasks such as language translation, spelling and grammar checking, tone adjustment, and format conversion.

## Setup

In [1]:
! pip install python-dotenv
! pip install openai

[0m

In [2]:
from openai import OpenAI
import os

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file



In [3]:
client = OpenAI(
    # This is the default and can be omitted
    api_key=OPENAI_API_KEY,
)

def get_completion(prompt, model="gpt-3.5-turbo", temperature=0): 
    messages = [{"role": "user", "content": prompt}]
    response = client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=temperature, 
    )
    return response.choices[0].message.content

## Translation

ChatGPT is trained with sources in many languages. This gives the model the ability to do translation. Here are some examples of how to use this capability.

In [4]:
prompt = f"""
Translate the following English text to Spanish: \ 
```Hi, I would like to order a blender```
"""
response = get_completion(prompt)
print(response)

Hola, me gustaría ordenar una licuadora.


In [5]:
prompt = f"""
Tell me which language this is: 
```Combien coûte le lampadaire?```
"""
response = get_completion(prompt)
print(response)

This is French.


In [6]:
prompt = f"""
Translate the following  text to French and Spanish
and English pirate: \
```I want to order a basketball```
"""
response = get_completion(prompt)
print(response)

French: Je veux commander un ballon de basket
Spanish: Quiero ordenar un balón de baloncesto
English: I want to order a basketball


In [7]:
prompt = f"""
Translate the following text to Spanish in both the \
formal and informal forms: 
'Would you like to order a pillow?'
"""
response = get_completion(prompt)
print(response)

Formal: ¿Le gustaría ordenar una almohada?
Informal: ¿Te gustaría ordenar una almohada?


### Universal Translator
Imagine you are in charge of IT at a large multinational e-commerce company. Users are messaging you with IT issues in all their native languages. Your staff is from all over the world and speaks only their native languages. You need a universal translator!

In [8]:
user_messages = [
  "La performance du système est plus lente que d'habitude.",  # System performance is slower than normal         
  "Mi monitor tiene píxeles que no se iluminan.",              # My monitor has pixels that are not lighting
  "Il mio mouse non funziona",                                 # My mouse is not working
  "Mój klawisz Ctrl jest zepsuty",                             # My keyboard has a broken control key
  "我的屏幕在闪烁"                                               # My screen is flashing
] 

In [10]:
for issue in user_messages:
    prompt = f"Tell me what language this is: ```{issue}```"
    lang = get_completion(prompt)
    print(f"Original message ({lang}): {issue}")

    prompt = f"""
    Translate the following  text to English \
    and Korean: ```{issue}```
    """
    response = get_completion(prompt)
    print(response, "\n")

Original message (This is French.): La performance du système est plus lente que d'habitude.
English: "The system performance is slower than usual."

Korean: "시스템 성능이 평소보다 느립니다." 

Original message (This is Spanish.): Mi monitor tiene píxeles que no se iluminan.
English: "My monitor has pixels that do not light up."
Korean: "내 모니터에는 빛나지 않는 픽셀이 있습니다." 

Original message (Italian): Il mio mouse non funziona
English: My mouse is not working
Korean: 내 마우스가 작동하지 않습니다 

Original message (This is Polish.): Mój klawisz Ctrl jest zepsuty
English: My Ctrl key is broken
Korean: 제 Ctrl 키가 고장 났어요 

Original message (This is Chinese.): 我的屏幕在闪烁
English: My screen is flickering
Korean: 내 화면이 깜박거립니다 



## Tone Transformation
Writing can vary based on the intended audience. ChatGPT can produce different tones.


In [11]:
prompt = f"""
Translate the following from slang to a business letter: 
'Dude, This is Joe, check out this spec on this standing lamp.'
"""
response = get_completion(prompt)
print(response)

Dear Sir/Madam,

I am writing to bring to your attention the specifications of the standing lamp. 

Sincerely,
Joe


## Format Conversion
ChatGPT can translate between formats. The prompt should describe the input and output formats.

In [12]:
data_json = { "resturant employees" :[ 
    {"name":"Shyam", "email":"shyamjaiswal@gmail.com"},
    {"name":"Bob", "email":"bob32@gmail.com"},
    {"name":"Jai", "email":"jai87@gmail.com"}
]}

prompt = f"""
Translate the following python dictionary from JSON to an HTML \
table with column headers and title: {data_json}
"""
response = get_completion(prompt)
print(response)

<html>
<head>
  <title>Restaurant Employees</title>
</head>
<body>
  <table>
    <tr>
      <th>Name</th>
      <th>Email</th>
    </tr>
    <tr>
      <td>Shyam</td>
      <td>shyamjaiswal@gmail.com</td>
    </tr>
    <tr>
      <td>Bob</td>
      <td>bob32@gmail.com</td>
    </tr>
    <tr>
      <td>Jai</td>
      <td>jai87@gmail.com</td>
    </tr>
  </table>
</body>
</html>


In [13]:
from IPython.display import display, Markdown, Latex, HTML, JSON
display(HTML(response))

Name,Email
Shyam,shyamjaiswal@gmail.com
Bob,bob32@gmail.com
Jai,jai87@gmail.com


## Spellcheck/Grammar check.

Here are some examples of common grammar and spelling problems and the LLM's response. 

To signal to the LLM that you want it to proofread your text, you instruct the model to 'proofread' or 'proofread and correct'.

In [15]:
text = [ 
  "The girl with the black and white puppies have a ball.",  # The girl has a ball.
  "Yolanda has her notebook.", # ok
  "Its going to be a long day. Does the car need it’s oil changed?",  # Homonyms
  "Their goes my freedom. There going to bring they’re suitcases.",  # Homonyms
  "Your going to need you’re notebook.",  # Homonyms
  "That medicine effects my ability to sleep. Have you heard of the butterfly affect?", # Homonyms
  "This phrase is to cherck chatGPT for speling abilitty"  # spelling
]
for t in text:
    prompt = f"""Proofread and correct the following text
    and rewrite the corrected version. If you don't find
    and errors, just say "No errors found". Don't use 
    any punctuation around the text:
    ```{t}```"""
    response = get_completion(prompt)
    print(response)

The girl with the black and white puppies has a ball.
No errors found
No errors found.
Their goes my freedom. There going to bring they’re suitcases.

No errors found.

Rewritten:
Their goes my freedom. There going to bring their suitcases.
You're going to need your notebook.
No errors found.
No errors found


In [16]:
text = f"""
Got this for my daughter for her birthday cuz she keeps taking \
mine from my room.  Yes, adults also like pandas too.  She takes \
it everywhere with her, and it's super soft and cute.  One of the \
ears is a bit lower than the other, and I don't think that was \
designed to be asymmetrical. It's a bit small for what I paid for it \
though. I think there might be other options that are bigger for \
the same price.  It arrived a day earlier than expected, so I got \
to play with it myself before I gave it to my daughter.
"""
prompt = f"proofread and correct this review: ```{text}```"
response = get_completion(prompt)
print(response)

I got this for my daughter for her birthday because she keeps taking mine from my room. Yes, adults also like pandas too. She takes it everywhere with her, and it's super soft and cute. One of the ears is a bit lower than the other, and I don't think that was designed to be asymmetrical. It's a bit small for what I paid for it though. I think there might be other options that are bigger for the same price. It arrived a day earlier than expected, so I got to play with it myself before I gave it to my daughter.


In [17]:
 ! pip3 install redlines

Collecting redlines
  Downloading redlines-0.4.2-py3-none-any.whl.metadata (6.0 kB)
Collecting rich<14.0.0,>=13.3.5 (from redlines)
  Downloading rich-13.9.2-py3-none-any.whl.metadata (18 kB)
Collecting rich-click<2.0.0,>=1.6.1 (from redlines)
  Downloading rich_click-1.8.3-py3-none-any.whl.metadata (7.9 kB)
Collecting markdown-it-py>=2.2.0 (from rich<14.0.0,>=13.3.5->redlines)
  Downloading markdown_it_py-3.0.0-py3-none-any.whl.metadata (6.9 kB)
Collecting mdurl~=0.1 (from markdown-it-py>=2.2.0->rich<14.0.0,>=13.3.5->redlines)
  Downloading mdurl-0.1.2-py3-none-any.whl.metadata (1.6 kB)
Downloading redlines-0.4.2-py3-none-any.whl (8.0 kB)
Downloading rich-13.9.2-py3-none-any.whl (242 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m242.1/242.1 kB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hDownloading rich_click-1.8.3-py3-none-any.whl (35 kB)
Downloading markdown_it_py-3.0.0-py3-none-any.whl (87 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [18]:
from redlines import Redlines

diff = Redlines(text,response)
display(Markdown(diff.output_markdown))

<span style='color:red;font-weight:700;text-decoration:line-through;'>Got </span><span style='color:green;font-weight:700;'>I got </span>this for my daughter for her birthday <span style='color:red;font-weight:700;text-decoration:line-through;'>cuz </span><span style='color:green;font-weight:700;'>because </span>she keeps taking mine from my <span style='color:red;font-weight:700;text-decoration:line-through;'>room.  </span><span style='color:green;font-weight:700;'>room. </span>Yes, adults also like pandas <span style='color:red;font-weight:700;text-decoration:line-through;'>too.  </span><span style='color:green;font-weight:700;'>too. </span>She takes it everywhere with her, and it's super soft and <span style='color:red;font-weight:700;text-decoration:line-through;'>cute.  </span><span style='color:green;font-weight:700;'>cute. </span>One of the ears is a bit lower than the other, and I don't think that was designed to be asymmetrical. It's a bit small for what I paid for it though. I think there might be other options that are bigger for the same <span style='color:red;font-weight:700;text-decoration:line-through;'>price.  </span><span style='color:green;font-weight:700;'>price. </span>It arrived a day earlier than expected, so I got to play with it myself before I gave it to my daughter.

In [19]:
prompt = f"""
proofread and correct this review. Make it more compelling. 
Ensure it follows APA style guide and targets an advanced reader. 
Output in markdown format.
Text: ```{text}```
"""
response = get_completion(prompt)
display(Markdown(response))

I purchased this adorable panda plush for my daughter's birthday as she kept borrowing mine from my room. It's not just for kids - even adults can appreciate the charm of pandas. The plush is incredibly soft and cute, making it the perfect companion for my daughter wherever she goes. However, I did notice that one of the ears is slightly lower than the other, which seems like a design flaw rather than intentional asymmetry. Additionally, considering the price I paid, I found the plush to be a bit smaller than expected. I believe there are larger options available for the same price point. Despite this, I was pleasantly surprised when the plush arrived a day earlier than anticipated, allowing me to enjoy it myself before gifting it to my daughter. Overall, while there are some minor flaws, the quality and cuteness of this panda plush make it a worthwhile purchase for any panda lover, young or old.

# Exercise
 - Complete the prompts similar to what we did in class. 
     - Try at least 3 versions
     - Be creative
 - Write a one page report summarizing your findings.
     - Were there variations that didn't work well? i.e., where GPT either hallucinated or wrong
 - What did you learn?

In [20]:
# Version 1: Multi-task Inference from Product Review
# This version focuses on extracting multiple pieces of information in one go, including the sentiment,
# whether anger is expressed, and identifying the product and the brand.
lamp_review = """
Needed a nice lamp for my bedroom, and this one had \
additional storage and not too high of a price point. \
Got it fast. The string to our lamp broke during the \
transit and the company happily sent over a new one. \
Came within a few days as well. It was easy to put \
together. I had a missing part, so I contacted their \
support and they very quickly got me the missing piece! \
Lumina seems to me to be a great company that cares \
about their customers and products!!
"""

prompt = f"""
Identify the following items from the review text: 
- Sentiment (positive or negative)
- Is the reviewer expressing anger? (true or false)
- Item purchased by reviewer
- Company that made the item

The review is delimited with triple backticks. \
Format your response as a JSON object with \
"Sentiment", "Anger", "Item" and "Brand" as the keys.
If the information isn't present, use "unknown" \
as the value.
Make your response as short as possible.
Format the Anger value as a boolean.

Review text: '''{lamp_review}'''
"""
response = get_completion(prompt)
print(response)


{
    "Sentiment": "positive",
    "Anger": false,
    "Item": "lamp",
    "Brand": "Lumina"
}


In [21]:
# Version 2: Summarizing Product Reviews with a Focus on Customer Service
# This version asks the model to summarize the review, focusing specifically on the customer service aspect.
prompt = f"""
Summarize the following product review with a focus on customer service experience.
The review is delimited with triple backticks.

Review: '''{lamp_review}'''
"""
response = get_completion(prompt)
print(response)


The customer had a positive experience with Lumina's customer service. They received a replacement part quickly and found the company to be caring and responsive to their needs.


In [22]:
# Version 3: Inferring Topics from a News Article
# This version focuses on inferring topics discussed in a news article, as well as determining which predefined topics are present in the text.
news_story = """
In a recent survey conducted by the government, public sector employees were asked to rate their level of satisfaction with the department they work at. The results revealed that NASA was the most popular department with a satisfaction rating of 95%.

One NASA employee, John Smith, commented on the findings, stating, "I'm not surprised that NASA came out on top. It's a great place to work with amazing people and incredible opportunities. I'm proud to be a part of such an innovative organization."

The results were also welcomed by NASA's management team, with Director Tom Johnson stating, "We are thrilled to hear that our employees are satisfied with their work at NASA. We have a talented and dedicated team who work tirelessly to achieve our goals, and it's fantastic to see that their hard work is paying off."

The survey also revealed that the Social Security Administration had the lowest satisfaction rating, with only 45% of employees indicating they were satisfied with their job. The government has pledged to address the concerns raised by employees in the survey and work towards improving job satisfaction across all departments.
"""

topic_list = ["nasa", "local government", "engineering", "employee satisfaction", "federal government"]

prompt = f"""
Determine whether each item in the following list of \
topics is a topic in the text below, which
is delimited with triple backticks.

Give your answer as a dictionary where the key is a topic and the value is 0 or 1 for each topic if it appears in the text.

List of topics: {", ".join(topic_list)}

Text: '''{news_story}'''
"""
response = get_completion(prompt)
print(response)


{
    "nasa": 1,
    "local government": 0,
    "engineering": 0,
    "employee satisfaction": 1,
    "federal government": 1
}


Step 2: One Page Report Summarizing Findings
Overview:
In this exercise, I explored different versions of prompts for inferring various pieces of information from product reviews and news articles using GPT-3.5. The tasks included multi-task inference (sentiment, product details, and company), summarization with a specific focus (customer service), and topic detection from a news article. By varying the prompts, I was able to compare the model's behavior in handling different types of queries.

Version 1: Multi-task Inference
This version of the prompt allowed for the extraction of multiple pieces of information (sentiment, anger, product name, and brand) from a single product review. The model responded accurately, providing a clean and concise JSON object that correctly identified the sentiment as positive, did not detect any anger, and accurately extracted the product and brand information.

Strengths:

Efficient for multi-tasking: The prompt allows the model to handle multiple types of inferences in one request, reducing the number of API calls.
The format (JSON) was easy to parse programmatically, making the response well-suited for further automated processing.
Limitations:

The model relies on explicit mentions for the product and brand. In a review that lacks this information, it defaults to "unknown," which can cause some gaps in extracted details.
Version 2: Summarization with a Focus on Customer Service
In this version, the model was tasked with summarizing the product review with an emphasis on the customer service experience. The model delivered a focused and clear summary, highlighting the aspects related to customer service (e.g., quick replacement of broken parts, good support).

Strengths:

The model provided a summary that accurately reflected the review's key point: the positive customer service experience.
It was concise and remained true to the focus requested.
Limitations:

When the review contains multiple aspects (product quality, shipping, price), the model sometimes prioritizes those over the main focus, especially if they are emphasized more in the review.
Version 3: Topic Detection in News Articles
For this version, the model was asked to detect specific topics (e.g., NASA, employee satisfaction) from a news article about a survey. The results were accurate, identifying the topics that were clearly present in the article.

Strengths:

The model correctly identified key topics like "NASA" and "employee satisfaction" while ignoring irrelevant ones like "engineering."
The use of a predefined list of topics made it easy to focus the model’s attention and eliminate extraneous topics.
Limitations:

The model works well with clearly defined topics but might struggle with ambiguous or closely related topics that aren't explicitly mentioned in the text.
If a topic is mentioned only in passing, the model might miss it, especially when competing with more prominent themes in the text.
Key Learnings:
Prompt Structure Matters: Well-structured prompts, especially those specifying output formats like JSON, yield clear and easily usable responses.
Multi-task Efficiency: Combining multiple tasks in one prompt (e.g., extracting sentiment, product name, and anger) is an efficient way to save resources and time.
Hallucinations are Rare but Possible: While the model performed well, in cases where the data is incomplete or ambiguous, there is a slight risk of hallucination or inaccurate inference.
Targeted Summarization: The model can follow specific instructions to focus on one aspect (e.g., customer service), but it may still provide information on other areas if not careful.
In conclusion, GPT-3.5 is a powerful tool for inferencing tasks across multiple domains, but careful prompt design is crucial to ensure accurate and relevant outputs.






