# load_summarize_chain experiment

If I can get load_summarize_chain to work it may simplify my pipeline considerably. Reddit is the main source in which there's too much content.

In [2]:
# Step 1: Load a bunch of Reddit data
import reddit.summarizer
import reddit.search
import reddit.fetch

from core import CompanyProduct
from search import SearchResult

from dataclasses import dataclass
from typing import List, Mapping, Optional

@dataclass
class RedditSummary:
    sources: List[SearchResult]

    overall_summary: reddit.summarizer.AggregatedSummaryResult
    summaries: List[reddit.summarizer.ThreadSummaryResult]
    permalinks: Mapping[str, str]

def load_submissions(target: CompanyProduct, num_threads=2, min_comments=2) -> Optional[RedditSummary]:
    reddit_client = reddit.fetch.init()

    # Search for URLs
    search_results = reddit.search.find_submissions(target, num_results=num_threads)

    # Fetch the Submissions from Reddit
    post_submissions = [reddit_client.submission(url=result.link) for result in search_results]

    # Filter Submissions to only those with enough comments
    post_submissions = [submission for submission in post_submissions if submission.num_comments >= min_comments]

    if len(post_submissions) == 0:
        print(f"No posts with enough comments found for {target}")
        return None

    # Limit the number of threads
    return post_submissions[:num_threads]

target = CompanyProduct.same("98point6")
threads = load_submissions(target, num_threads=20)
threads

[Submission(id='bg7ip2'),
 Submission(id='rgxxbw'),
 Submission(id='l5bbt9'),
 Submission(id='nqxfli'),
 Submission(id='ipmklh'),
 Submission(id='14n48uy'),
 Submission(id='tz9sws'),
 Submission(id='lx0zjb'),
 Submission(id='u7mkpr'),
 Submission(id='u7mkpr'),
 Submission(id='1dr3lsq'),
 Submission(id='y7elk6'),
 Submission(id='rlij9r'),
 Submission(id='18t2txn'),
 Submission(id='ru2atc'),
 Submission(id='elryxp'),
 Submission(id='bfj4ni'),
 Submission(id='g7vc1q'),
 Submission(id='17qqo37')]

In [3]:
thread_markdowns = [reddit.fetch.submission_to_markdown(thread) for thread in threads]
thread_markdowns

['\n# Post ID bg7ip2:  Internet medicine is awesome, 98point6 was so so helpful for me by FrugalChef13 on 2019-04-22 [+57 votes]\n**TL;DR- $20 got me an awesome appointment with a nice doctor and a prescription for a medication I could afford that solved my issue.**\n\n*Disclaimer: This particular thing worked well for me so I\'m going to tell you about it. Everyone is different, so it might not work as well (or at all) for you.  Take what you find useful from this post and ignore the rest.  I\'m not compensated or connected to the website I\'m discussing.*\n\nSo like a lot of people on here I\'m usually either uninsured or underinsured.  Right now it\'s underinsured with a high deductible, so when I messed my back up badly enough that I could barely move I freaked.  I\'ve got scoliosis, a fucked up spine, bad knees, and muscles that love to spasm uncontrollably for days on end.  I\'d run out of my prescription muscle relaxants last fall and hadn\'t been able to afford another appointm

In [7]:
# count the chars
combined_markdown = "\n\n".join(thread_markdowns)

char_count = sum(len(thread_markdown) for thread_markdown in thread_markdowns)
char_count, [len(thread_markdown) for thread_markdown in thread_markdowns]

(336461,
 [8672,
  630,
  1125,
  1443,
  1439,
  744,
  1249,
  3846,
  72309,
  72309,
  4263,
  55563,
  7072,
  15992,
  6079,
  67570,
  1624,
  7469,
  7063])

In [15]:
# about the input
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

for i, document in enumerate(thread_markdowns):
    print(f"Document {i}: {len(document)} chars, {llm.get_num_tokens(document)} tokens")

Document 0: 8672 chars, 2071 tokens
Document 1: 630 chars, 179 tokens
Document 2: 1125 chars, 311 tokens
Document 3: 1443 chars, 474 tokens
Document 4: 1439 chars, 371 tokens
Document 5: 744 chars, 231 tokens
Document 6: 1249 chars, 353 tokens
Document 7: 3846 chars, 1173 tokens
Document 8: 72309 chars, 21352 tokens
Document 9: 72309 chars, 21352 tokens
Document 10: 4263 chars, 1131 tokens
Document 11: 55563 chars, 14470 tokens
Document 12: 7072 chars, 1862 tokens
Document 13: 15992 chars, 4757 tokens
Document 14: 6079 chars, 1520 tokens
Document 15: 67570 chars, 17506 tokens
Document 16: 1624 chars, 407 tokens
Document 17: 7469 chars, 2104 tokens
Document 18: 7063 chars, 1967 tokens


In [10]:
from langchain_openai import ChatOpenAI
from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_core.documents import Document

from dotenv import load_dotenv
load_dotenv()

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
# llm.get_num_tokens(combined_markdown)

summary_chain = load_summarize_chain(
    llm=llm, 
    chain_type='map_reduce',
    verbose=True
)

documents = [Document(page_content=thread_markdown) for thread_markdown in thread_markdowns]
output = summary_chain.run(documents)
output



[1m> Entering new MapReduceDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


"
# Post ID bg7ip2:  Internet medicine is awesome, 98point6 was so so helpful for me by FrugalChef13 on 2019-04-22 [+57 votes]
**TL;DR- $20 got me an awesome appointment with a nice doctor and a prescription for a medication I could afford that solved my issue.**

*Disclaimer: This particular thing worked well for me so I'm going to tell you about it. Everyone is different, so it might not work as well (or at all) for you.  Take what you find useful from this post and ignore the rest.  I'm not compensated or connected to the website I'm discussing.*

So like a lot of people on here I'm usually either uninsured or underinsured.  Right now it's underinsured with a high deductible, so when I messed my back up badly enough that I could barely move I freaked.  I've got scoliosis, a fucked up spine, bad knees, a

"The Reddit discussion covers various experiences and recommendations related to online medical services, particularly 98point6, which offers affordable consultations for $20. Users share positive experiences with the app for managing conditions like back pain and PCOS, highlighting its convenience and low cost, while also noting its limitations for serious health issues. The conversation includes inquiries about obtaining doctor's notes, mental health treatment, and alternative telehealth options, with users recommending various services and sharing personal experiences. Additionally, discussions touch on frustrations with filling prescriptions for controlled substances, accessing affordable dermatological care, and navigating health insurance options, emphasizing the importance of seeking professional medical advice and exploring low-cost resources. Overall, the thread showcases the growing interest in telehealth solutions and the community's efforts to share valuable information on 

In [13]:
from langchain import PromptTemplate

map_prompt = """
Review the following Reddit thread and extract quotes that express positive or negative experiences with 98point6.

REDDIT THREAD:
{text}

KEY QUOTES:
"""
map_prompt_template = PromptTemplate(template=map_prompt, input_variables=["text"])

combine_prompt = """
Write a summary of the user experience of 98point6 based on the following extracts from Reddit threads.
Separately list out the strengths and weaknesses of 98piont6.

SUMMARISED THREADS:
{text}

SUMMARY IN MARKDOWN FORMAT:
"""
combine_prompt_template = PromptTemplate(template=combine_prompt, input_variables=["text"])


summary_chain = load_summarize_chain(
    llm=llm, 
    chain_type='map_reduce',
    map_prompt=map_prompt_template,
    combine_prompt=combine_prompt_template,
    verbose=True
)

documents = [Document(page_content=thread_markdown) for thread_markdown in thread_markdowns]
output = summary_chain.run(documents)
output



[1m> Entering new MapReduceDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Review the following Reddit thread and extract quotes that express positive or negative experiences with 98point6.

REDDIT THREAD:

# Post ID bg7ip2:  Internet medicine is awesome, 98point6 was so so helpful for me by FrugalChef13 on 2019-04-22 [+57 votes]
**TL;DR- $20 got me an awesome appointment with a nice doctor and a prescription for a medication I could afford that solved my issue.**

*Disclaimer: This particular thing worked well for me so I'm going to tell you about it. Everyone is different, so it might not work as well (or at all) for you.  Take what you find useful from this post and ignore the rest.  I'm not compensated or connected to the website I'm discussing.*

So like a lot of people on here I'm usually either uninsured or underinsured.  Right now it's underinsured with a high deductible, so when I messed my back up badly enough t

"```markdown\n## Summary of User Experience with 98point6\n\nThe user experience with 98point6 is predominantly positive, with many users expressing satisfaction with the app's affordability, efficiency, and convenience. Users appreciate the low cost of the service, with some mentioning a yearly fee of $20 and minimal charges for consultations. The app is praised for its quick response times, allowing users to receive prescriptions without the hassle of traditional in-person visits. Many users highlight the relief and support they feel from being able to access healthcare services easily, especially for ongoing medication needs.\n\nWhile there are no explicit negative experiences reported in the threads, some users do express concerns about the overall accessibility of mental health services and the rising costs associated with certain treatments. However, these concerns are more about the broader healthcare landscape rather than specific issues with 98point6 itself.\n\n### Strengths o

In [14]:
print(output)

```markdown
## Summary of User Experience with 98point6

The user experience with 98point6 is predominantly positive, with many users expressing satisfaction with the app's affordability, efficiency, and convenience. Users appreciate the low cost of the service, with some mentioning a yearly fee of $20 and minimal charges for consultations. The app is praised for its quick response times, allowing users to receive prescriptions without the hassle of traditional in-person visits. Many users highlight the relief and support they feel from being able to access healthcare services easily, especially for ongoing medication needs.

While there are no explicit negative experiences reported in the threads, some users do express concerns about the overall accessibility of mental health services and the rising costs associated with certain treatments. However, these concerns are more about the broader healthcare landscape rather than specific issues with 98point6 itself.

### Strengths of 98poin