# Unified customer feedback POC

The idea is to use a single map-reduce pattern for multiple different sources of customer feedback:

- Reddit
- App stores
- (Maybe) formal reviews

We'll use the extract-organize-abstract pattern with map-reduce to handle extract-organize.

## Challenges

- The length of the input will vary drastically depending on the data source

## Steps

1. Init
1. Define the target
1. Fetch Reddit raw data
1. Fetch app review raw data
1. Format all into markdown
1. Pack
1. LLM pipeline

In [4]:
from core import Seed, init

init()

target = Seed.init("Singularity 6", "Palia", domain="singularity6.com")

In [5]:
# The unified version that's been drastically refactored
import customer_experience
import general_search
import reddit.search

general_search_results = general_search.search_web(target)

app_store_urls = customer_experience.extract_app_store_urls(general_search_results)
reddit_urls = [
    result.link
    for result in reddit.search.find_submissions(
        target, num_results=10
    )
]
customer_experience_result = customer_experience.run(
    target, reddit_urls=reddit_urls, **app_store_urls
)

print(customer_experience_result.output_text)

[32m2024-09-05 14:27:19.649[0m | [34m[1mDEBUG   [0m | [36mgoogle_search[0m:[36msearch[0m:[36m58[0m - [34m[1mGoogle search results: {'kind': 'customsearch#search', 'url': {'type': 'application/json', 'template': 'https://www.googleapis.com/customsearch/v1?q={searchTerms}&num={count?}&start={startIndex?}&lr={language?}&safe={safe?}&cx={cx?}&sort={sort?}&filter={filter?}&gl={gl?}&cr={cr?}&googlehost={googleHost?}&c2coff={disableCnTwTranslation?}&hq={hq?}&hl={hl?}&siteSearch={siteSearch?}&siteSearchFilter={siteSearchFilter?}&exactTerms={exactTerms?}&excludeTerms={excludeTerms?}&linkSite={linkSite?}&orTerms={orTerms?}&dateRestrict={dateRestrict?}&lowRange={lowRange?}&highRange={highRange?}&searchType={searchType}&fileType={fileType?}&rights={rights?}&imgSize={imgSize?}&imgType={imgType?}&imgColorType={imgColorType?}&imgDominantColor={imgDominantColor?}&alt=json'}, 'queries': {'request': [{'title': 'Google Custom Search - "Singularity 6"', 'totalResults': '51200', 'searchTerms':

# Positive Sentiment

## Enjoyment of Characters and Setting
- "Honestly? I enjoy the characters and setting." [(NoWordCount, Reddit, 2024-04-09)](https://www.reddit.com/r/MMORPG/comments/1bz2e0z/palia_developers_singularity_6_axes_35_of_staff/kyppu7o/)
- "The story and characters are great, if you like those kind of stories." [(SvenWollinger, Reddit, 2024-04-09)](https://www.reddit.com/r/MMORPG/comments/1bz2e0z/palia_developers_singularity_6_axes_35_of_staff/kypy0pe/)
- "The people who made the story and characters are talented." [(MyStationIsAbandoned, Reddit, 2024-04-08)](https://www.reddit.com/r/MMORPG/comments/1bz2e0z/palia_developers_singularity_6_axes_35_of_staff/kyn75n0/)
- "I love this game, it combines elements from so many of my favorite cozy games :)" [(Anonymous, Steam, 2024-08-27)](https://steam/76561199442723712)
- "The characters are also really nice, I really love Jel, Reth, and Tish's storylines and personalities." [(Anonymous, Steam, 2024-08-97)](https://steam/765611

In [6]:
from core import make_experiment_dir

dir = make_experiment_dir(target)

with open(f"{dir}/customer_experience_version1.md", "w") as f:
    f.write(customer_experience_result.output_text)

# Run 1: S6 / Palia

Overall the results were great

To improve:

- Citations are VERY inconsistent: (starry101, Source, 2024-04-09), (Mousekiwiiks, Source, 2023-08-17), (Anonymous, Steam, 2024-08-01), (Thumbs Up, Steam, 2024-08-21). I think this could be solved by making the citations better upstream
- It's heavily weighted towards Reddit even though Steam is probably a better data source. I'm guessing that it's simply that there's more Reddit text
- It took 3 minutes (!) to run

# Run 2: S6 / Palia

Changes:

1. Updated the citation format in both Reddit and Steam pipelines
2. Fetched 500 Steam reviews, up from 100
3. Trimmed a long document to 100k characters (it was 260k chars)

Results:

- Worked very nicely!
- One issue was multiple extracts from the same source, which made it look more serious
- 4 minutes :(

# Run 3: S6 / Palia

Changes:

1. URL shortener
2. Added the evaluation logger

It's a LOT faster, though it generated some malformed links like cache://... I'm also noticing that it tends to have more Steam content than the first run