# Setup

In [1]:
from pathlib import Path
import os

In [2]:
review_paths = list(Path('raw_reviews').glob('*.txt'))
review_paths[0]

PosixPath('raw_reviews/review_0.txt')

In [3]:
with Path('openai.key').open() as f:
    os.environ['OPENAI_API_KEY'] = f.read().strip()

# Extraction

In [4]:
import openai

openai.api_key = os.environ['OPENAI_API_KEY']

In [5]:
def get_completion(prompt, model="gpt-3.5-turbo"):
    messages = [{"role": "user", "content": prompt}]
    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        temperature=0, # this is the degree of randomness of the model's output
    )
    return response.choices[0].message["content"]

In [42]:
template = """The text in tripple backticks is a product review written in german. It is given in the format

title: title of the review
text: main text of the review
rating: rating out of 5, 5 being best

From those three information, extract problems with the product that can be used by the manufacturer of the product to improve it. Translate all parts of your reply to english. Return a list of short problem descriptions, separated by a new line.
Put that list into a  code block.
Example output format: 

```
problem1
problem2
```
Only include problems. For example do not include "the product is great" or "the product is very good".
Make the problem descriptions as short as possible, but still understandable.
Don't use word for word parts from the reviews, but try to generalize.
It should not be obvious that the text is from a customer review. For example instead of this: "missing part (circle attachment) can be a problem for some customers.", write this: "missing part (circle attachment)". Or instead of "need for own screw" write "screw missing"
Only include problems if they are clearly mentioned in the review. Don't add problems that are only implied. I.e. if the review says "seems stable at first" but does not mention any problems, don't add "unstable" as a problem.

Here is the review:
"""

In [43]:
import json
import re

def extract_json(response):
    try:
        return json.loads(response)

    except:
        match = re.search("```(.*?)```", response, re.DOTALL)

        if match:
            json_data = match.group(1)
            problems = json.loads(json_data)
            return problems
        else:
            print("Failed to extract json from response:", response)

In [44]:
def load_review(idx: int):
    return review_paths[idx].read_text()

In [45]:
def build_prompt(template, review):
    prompt = template + f"```{review}```"
    prompt = prompt.replace('\n', ' ')
    prompt = prompt.replace("'", '"')
    return prompt

In [46]:
build_prompt(template, load_review(0))

'The text in tripple backticks is a product review written in german. It is given in the format  title: title of the review text: main text of the review rating: rating out of 5, 5 being best  From those three information, extract problems with the product that can be used by the manufacturer of the product to improve it. Translate all parts of your reply to english. Return a list of short problem descriptions, separated by a new line. Put that list into a  code block. Example output format:   ``` problem1 problem2 ``` Only include problems. For example do not include "the product is great" or "the product is very good". Make the problem descriptions as short as possible, but still understandable. Don"t use word for word parts from the reviews, but try to generalize. It should not be obvious that the text is from a customer review. For example instead of this: "missing part (circle attachment) can be a problem for some customers.", write this: "missing part (circle attachment)". Or ins

In [47]:
problems = []
for i in range(10):
    print(f"Review {i}")
    prompt = build_prompt(template, load_review(i))
    response = get_completion(prompt)
    problems.append(response)
    

Review 0
Review 1
Review 2
Review 3
Review 4
Review 5
Review 6
Review 7
Review 8
Review 9


In [48]:
problem_list = []
for problem in problems:
    sub_problems = problem.split('\n')
    for sub_problem in sub_problems:
        if '```' in sub_problem:
            continue
        if sub_problem.startswith('-'):
            sub_problem = sub_problem[1:].strip()
        problem_list.append(sub_problem)

In [49]:
problem_list

['own screw necessary ',
 'top attachment not fitting ',
 'missing circle attachment ',
 'many parts to assemble ',
 'No problems mentioned in the review. ',
 'None of the problems are clearly mentioned in the review. ',
 'easy assembly ',
 'fulfills its purpose ',
 'looks good ',
 'uncertainty about durability after winter season ',
 'rust after one year ',
 'assembly problems with other products from the same brand ',
 'None of the problems are clearly mentioned in the review. ',
 'many individual parts ',
 'difficult to assemble ',
 'necessary stability ',
 'None mentioned ']

Putting that into gpt4 with the following prompt gave results:

**prompt**

From this list of information extracted from a product review, extract all points that can be used by the manufacturer of the product to improve the product. If similar points are included in the list, generalize them into one single item.

**result**

Based on the feedback provided, here are the points that the manufacturer can use to improve the product:

- Include all necessary screws with the product.
- Improve the fit of the top attachment.
- Ensure all attachments (like the circle attachment) are included in the package.
- Reduce the number of parts to assemble, or provide a clearer assembly guide.
- Improve the durability of the product, particularly for withstanding winter seasons.
- Address the issue of rust after one year.
- Address general assembly problems, potentially identified across other products from the same brand.
- Ensure necessary stability of the product.