# Unified customer feedback POC

The idea is to use a single map-reduce pattern for multiple different sources of customer feedback:

- Reddit
- App stores
- (Maybe) formal reviews

We'll use the extract-organize-abstract pattern with map-reduce to handle extract-organize.

## Challenges

- The length of the input will vary drastically depending on the data source

## Steps

1. Init
1. Define the target
1. Fetch Reddit raw data
1. Fetch app review raw data
1. Format all into markdown
1. Pack
1. LLM pipeline

In [1]:
from core import Seed, init

init()

target = Seed("Singularity 6", "Palia", "singularity6.com")

# Steam

In [2]:
import app_stores.steam as steam

url = steam.find_steam_page(target)
steam_id = steam.extract_steam_id(url)
steam_reviews = steam.get_reviews(steam_id, num_reviews=500)
steam_review_markdowns = [steam.review_to_markdown(review) for review in steam_reviews]

[32m2024-09-01 10:39:21.354[0m | [34m[1mDEBUG   [0m | [36mgoogle_search[0m:[36msearch[0m:[36m58[0m - [34m[1mGoogle search results: {'kind': 'customsearch#search', 'url': {'type': 'application/json', 'template': 'https://www.googleapis.com/customsearch/v1?q={searchTerms}&num={count?}&start={startIndex?}&lr={language?}&safe={safe?}&cx={cx?}&sort={sort?}&filter={filter?}&gl={gl?}&cr={cr?}&googlehost={googleHost?}&c2coff={disableCnTwTranslation?}&hq={hq?}&hl={hl?}&siteSearch={siteSearch?}&siteSearchFilter={siteSearchFilter?}&exactTerms={exactTerms?}&excludeTerms={excludeTerms?}&linkSite={linkSite?}&orTerms={orTerms?}&dateRestrict={dateRestrict?}&lowRange={lowRange?}&highRange={highRange?}&searchType={searchType}&fileType={fileType?}&rights={rights?}&imgSize={imgSize?}&imgType={imgType?}&imgColorType={imgColorType?}&imgDominantColor={imgDominantColor?}&alt=json'}, 'queries': {'request': [{'title': 'Google Custom Search - site:store.steampowered.com/app/ "Singularity 6" "Palia"'

In [3]:
steam_review_markdowns[1]

"# Thumbs Up [(Anonymous, Steam, 2024-07-23)](https://steam/76561199201971173)\nI have been playing since Open Beta started last August, longer than I have been playing on Steam. There are things that I learned recently that I wish I had known from the beginning. \r\n1. Female Frame Avatars have a built-in disadvantage while hunting, fishing, and other activities. This is cruel and unfair, I hope it changes.\r\n2. Most rare fish/bugs are locked until you reach a higher level in those skills, at least level 25! Some won't unlock til level 50!!!\r\n3. Fishing is setup like real fishing. When you hear the chime, start to pull the fish in, if it starts to fight, stop. Then pull it in a lil more, stop again. Do this until you can pull it in all the way.\r\n4. Sushi, Fisherman's Brew, Fishstew offer some fishing boosts, so will hook boosters; but they won't be as effective until you reach higher levels in fishing.\r\n5 If you see dark green vines growing on a rock or wooden structure, it's m

# Reddit

In [4]:
import reddit.fetch
import reddit.search

reddit_client = reddit.fetch.init()

# Search for URLs
search_results = reddit.search.find_submissions(target, num_results=20)

# Fetch the Submissions from Reddit
post_submissions = [
    reddit_client.submission(url=result.link) for result in search_results
]

# Filter Submissions to only those with enough comments
post_submissions = [
    submission
    for submission in post_submissions
    if submission.num_comments >= 2
]

reddit_markdowns = [reddit.fetch.submission_to_markdown(thread) for thread in post_submissions]

[32m2024-09-01 10:39:24.147[0m | [34m[1mDEBUG   [0m | [36mgoogle_search[0m:[36msearch[0m:[36m58[0m - [34m[1mGoogle search results: {'kind': 'customsearch#search', 'url': {'type': 'application/json', 'template': 'https://www.googleapis.com/customsearch/v1?q={searchTerms}&num={count?}&start={startIndex?}&lr={language?}&safe={safe?}&cx={cx?}&sort={sort?}&filter={filter?}&gl={gl?}&cr={cr?}&googlehost={googleHost?}&c2coff={disableCnTwTranslation?}&hq={hq?}&hl={hl?}&siteSearch={siteSearch?}&siteSearchFilter={siteSearchFilter?}&exactTerms={exactTerms?}&excludeTerms={excludeTerms?}&linkSite={linkSite?}&orTerms={orTerms?}&dateRestrict={dateRestrict?}&lowRange={lowRange?}&highRange={highRange?}&searchType={searchType}&fileType={fileType?}&rights={rights?}&imgSize={imgSize?}&imgType={imgType?}&imgColorType={imgColorType?}&imgDominantColor={imgDominantColor?}&alt=json'}, 'queries': {'request': [{'title': 'Google Custom Search - site:reddit.com "Singularity 6"" "Palia"', 'totalResults'

In [5]:
print(reddit_markdowns[1])

# Post ID 1bwiuin: Cozy MMO Palia Developer Singularity 6 Has Suffered Layoffs with +220 score by [(quinn50, Reddit, 2024-04-05)](https://www.reddit.com/r/pcgaming/comments/1bwiuin/cozy_mmo_palia_developer_singularity_6_has/)


## Comment ID ky6tkrn with +193 score by [(Cavissi, Reddit, 2024-04-05)](https://www.reddit.com/r/pcgaming/comments/1bwiuin/cozy_mmo_palia_developer_singularity_6_has/ky6tkrn/) (in reply to ID 1bwiuin):
The game is just not very good. For a life Sim it lacks basic features, the decorating is nice but you can hardly interact with anything. Especially in town. Benches you can't sit in at all, a tavern where you can't sit eat, drink, nothing. 

And for an mmo it's like 20 people in a town map, and your alone in your home instance. They could have done a neighborhood like xiv so you can at least see some other houses without having to ask to be invited. 

There is a clothing shop but it only sells clothes for premium currency, all "basic" clothes are just unlocked f

# Summarize

In [8]:
from llm_utils import pack_documents

review_markdowns = steam_review_markdowns + reddit_markdowns

print("Total reviews:", len(review_markdowns))

packed_reviews = pack_documents(review_markdowns, max_chars=70000)

print("Packed reviews:", len(packed_reviews))
print("Lengths:", [len(doc) for doc in packed_reviews])

# Trim long documents
packed_reviews = [doc[:100000] for doc in packed_reviews]
print("Lengths, trimmed:", [len(doc) for doc in packed_reviews])

# 

Total reviews: 519
Packed reviews: 15
Lengths: [36642, 44665, 39192, 52802, 59955, 48105, 41808, 91258, 69014, 260999, 85467, 54526, 55214, 46086, 44778]
Lengths, trimmed: [36642, 44665, 39192, 52802, 59955, 48105, 41808, 91258, 69014, 100000, 85467, 54526, 55214, 46086, 44778]


In [15]:
from langchain_core.documents import Document
from langchain.chains.summarize import load_summarize_chain
from langchain import PromptTemplate
from langchain_openai import ChatOpenAI

from core import URLShortener, log_summary_metrics

map_prompt = """
Please read the following customer comments and extract all opinions and facts relating to the user experience of the PRODUCT {product} by the COMPANY {company} from the perspective of current users.
Only include information about {product}. 
If the text does not contain any relevant information, return an empty string.

Format the results as a Markdown list of quotes, each with a permalink to the source of the quote like so:
- "quote" [(Author, Source, Date)](cache://source/NUM)

If the author is not available, use "Anonymous".

----

Each quote should be a brief statement that captures the essence of the sentiment or information.
Be sure to extract a comprehensive sample of both positive and negative opinions, as well as any factual statements about the product.

REVIEWS: 
{text}

MARKDOWN LIST OF QUOTES ABOUT THE COMPANY {company} AND PRODUCT {product} (markdown only, don't wrap in backticks):
"""
map_prompt_template = PromptTemplate(template=map_prompt, input_variables=["text", "company", "product"])

combine_prompt = """
Please organize all of the quotes below into topics about the COMPANY {company} and PRODUCT {product}.
Organize into headings based on the sentiment and/or type of information in the quote.

EXTRACTS FROM REVIEWS: 
{text}

ORGANIZED QUOTES IN MARKDOWN FORMAT (markdown only, don't wrap in backticks):
"""
combine_prompt_template = PromptTemplate(
    template=combine_prompt, input_variables=["text", "company", "product"]
)

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

shortener = URLShortener()

documents = [
    Document(page_content=shortener.shorten_markdown(markdown))
    for markdown in packed_reviews
]



summary_chain = load_summarize_chain(
    llm=llm,
    chain_type="map_reduce",
    map_prompt=map_prompt_template,
    combine_prompt=combine_prompt_template,
    token_max=30000,
    verbose=False,
    return_intermediate_steps=True,
)

result = summary_chain.invoke(
    {
        "company": target.company,
        "product": target.product,
        "input_documents": documents,
    }
)

result["output_text"] = shortener.unshorten_markdown(result["output_text"])
log_summary_metrics(result["output_text"], "\n".join(packed_reviews))

print(result["output_text"])

[32m2024-09-01 11:11:06.546[0m | [1mINFO    [0m | [36mcore[0m:[36mshorten_markdown[0m:[36m230[0m - [1m36,642 -> 29,793 chars (81% of original)[0m
[32m2024-09-01 11:11:06.548[0m | [1mINFO    [0m | [36mcore[0m:[36mshorten_markdown[0m:[36m230[0m - [1m44,665 -> 34,563 chars (77% of original)[0m
[32m2024-09-01 11:11:06.550[0m | [1mINFO    [0m | [36mcore[0m:[36mshorten_markdown[0m:[36m230[0m - [1m39,192 -> 34,232 chars (87% of original)[0m
[32m2024-09-01 11:11:06.552[0m | [1mINFO    [0m | [36mcore[0m:[36mshorten_markdown[0m:[36m230[0m - [1m52,802 -> 49,762 chars (94% of original)[0m
[32m2024-09-01 11:11:06.553[0m | [1mINFO    [0m | [36mcore[0m:[36mshorten_markdown[0m:[36m230[0m - [1m59,955 -> 57,921 chars (97% of original)[0m
[32m2024-09-01 11:11:06.555[0m | [1mINFO    [0m | [36mcore[0m:[36mshorten_markdown[0m:[36m230[0m - [1m48,105 -> 41,381 chars (86% of original)[0m
[32m2024-09-01 11:11:06.556[0m | [1mINFO    [0m 

# Sentiment and Feedback on Palia

## Positive Sentiment
- "Honestly? I enjoy the characters and setting." [(NoWordCount, Reddit, 2024-04-09)](https://www.reddit.com/r/MMORPG/comments/1bz2e0z/palia_developers_singularity_6_axes_35_of_staff/kyppu7o/)
- "The story and characters are great, if you like those kind of stories." [(SvenWollinger, Reddit, 2024-04-09)](https://www.reddit.com/r/MMORPG/comments/1bz2e0z/palia_developers_singularity_6_axes_35_of_staff/kypy0pe/)
- "Game is based around community and generally has a very positive and supportive player base." [(Anonymous, Steam, 2024-08-10)](https://steam/76561198251408750)
- "Game is relatively easy to get the hang of while remaining challenging." [(Anonymous, Steam, 2024-08-10)](https://steam/76561198251408750)
- "So far, there are consistent updates and devs seem quite transparent about their intentions." [(Anonymous, Steam, 2024-08-10)](https://steam/76561198251408750)
- "Characters are relatively diverse and different enough from

In [13]:
def debug_around(substring: str):
    for doc in packed_reviews:
        if substring in doc:
            i = doc.index(substring)
            print("----")
            print(doc[max(0, i-200):min(len(doc), i+200)])
            print("----")

    for doc in result["intermediate_steps"]:
        if substring in doc:
            i = doc.index(substring)
            print("----")
            print(doc[max(0, i-200):min(len(doc), i+200)])
            print("----")

debug_around("Puffelpuff, Reddit")

----
ments/1dtp97n/daybreak_acquires_singularity_6_palia_developer/lbc5r1s/) (in reply to ID lbc3xwx):
They've always owned EQ2, but yes, it's monetized to hell.

### Comment ID lbeo38c with +4 score by [(Puffelpuff, Reddit, 2024-07-03)](https://www.reddit.com/r/MMORPG/comments/1dtp97n/daybreak_acquires_singularity_6_palia_developer/lbeo38c/) (in reply to ID lbb7j1o):
The game has shit monetization any
----
----
game, is much more exciting and replayable than this 'MMO'." [(Sand3rok, Reddit, 2024-07-03)](cache://...)
- "The game has shit monetization anyway with cosmetics being 99% locked behind the shop." [(Puffelpuff, Reddit, 2024-07-03)](cache://...)
- "Palia was dying pretty steadily, all the devs getting fired, etc." [(tsukaimeLoL, Reddit, 2024-07-03)](cache://...)
----


# Run 1: S6 / Palia

Overall the results were great

To improve:

- Citations are VERY inconsistent: (starry101, Source, 2024-04-09), (Mousekiwiiks, Source, 2023-08-17), (Anonymous, Steam, 2024-08-01), (Thumbs Up, Steam, 2024-08-21). I think this could be solved by making the citations better upstream
- It's heavily weighted towards Reddit even though Steam is probably a better data source. I'm guessing that it's simply that there's more Reddit text
- It took 3 minutes (!) to run

# Run 2: S6 / Palia

Changes:

1. Updated the citation format in both Reddit and Steam pipelines
2. Fetched 500 Steam reviews, up from 100
3. Trimmed a long document to 100k characters (it was 260k chars)

Results:

- Worked very nicely!
- One issue was multiple extracts from the same source, which made it look more serious
- 4 minutes :(

# Run 3: S6 / Palia

Changes:

1. URL shortener
2. Added the evaluation logger

It's a LOT faster, though it generated some malformed links like cache://... I'm also noticing that it tends to have more Steam content than the first run