In [1]:
import os
from dotenv import load_dotenv
load_dotenv()

True

In [2]:
from langchain_groq import ChatGroq
api_key = os.getenv("GROQ_API_KEY")
llm = ChatGroq(api_key=api_key,model="gemma2-9b-it")

In [3]:
from langchain.schema import AIMessage, HumanMessage, SystemMessage

In [4]:
speech = '''
    Four score and seven years ago our fathers brought forth, upon this continent, a new nation, conceived in liberty, and dedicated to the proposition that all men are created equal.
    Now we are engaged in a great civil war, testing whether that nation, or any nation so conceived, and so dedicated, can long endure. We are met on a great battle field of that war. We come to dedicate a portion of it, as a final resting place for those who died here, that the nation might live. This we may, in all propriety do.
    But, in a larger sense, we can not dedicate we can not consecrate we can not hallow, this ground The brave men, living and dead, who struggled here, have hallowed it, far above our poor power to add or detract. The world will little note, nor long remember what we say here; while it can never forget what they did here.
    It is rather for us, the living, we here be dedicated to the great task remaining before us that, from these honored dead we take increased devotion to that cause for which they here, gave the last full measure of devotion that we here highly resolve these dead shall not have died in vain; that the nation, shall have a new birth of freedom, and that government of the people, by the people, for the people, shall not perish from the earth.
    '''

In [5]:
chat_messages = [SystemMessage(content='You are an AI assistant specializing in summarizing speeches.'),
                 HumanMessage(content=f'Please provide a short and concise summary by given speech:/n Text:{speech}'),]

In [6]:
llm.get_num_tokens(speech)

301

In [7]:
llm(chat_messages).content

  llm(chat_messages).content


'This is Abraham Lincoln\'s Gettysburg Address. \n\nDelivered at the dedication of a national cemetery, Lincoln framed the Civil War as a test of whether a nation "conceived in liberty, and dedicated to the proposition that all men are created equal" could survive.  \n\nHe emphasized that the battlefield itself was already hallowed by the sacrifices of the soldiers, and urged the living to dedicate themselves to the unfinished work of preserving the Union and ensuring that "government of the people, by the people, for the people, shall not perish from this earth." \n\n'

In [8]:
llm(chat_messages)

AIMessage(content='This is the Gettysburg Address by President Abraham Lincoln. \n\nLincoln commemorates the fallen soldiers at Gettysburg, emphasizing that their sacrifices have already consecrated the battlefield far beyond any words he could speak. He calls upon the living to dedicate themselves to the unfinished work of preserving the Union and ensuring that "government of the people, by the people, for the people, shall not perish from this earth." \n\n\nThe speech is a powerful reminder of the cost of freedom and a call to action for the nation to continue striving for a more perfect union. \n\n', response_metadata={'token_usage': {'completion_tokens': 113, 'prompt_tokens': 321, 'total_tokens': 434, 'completion_time': 0.205454545, 'prompt_time': 0.017338029, 'queue_time': 0.026391546999999998, 'total_time': 0.222792574}, 'model_name': 'gemma2-9b-it', 'system_fingerprint': 'fp_10c08bf97d', 'finish_reason': 'stop', 'logprobs': None}, id='run-b600b21e-2e77-4f9c-8a99-8ba3af3b7d90-0',

Prompt Template Text Summarization

In [9]:
from langchain.chains import LLMChain
from langchain import PromptTemplate

template = """
    Write a summary of the following speech:
    Speech: {speech}
    Translate the summary into {language}.
    """

prompt = PromptTemplate(input_variables=["speech","language"], template=template)

In [10]:
complete_prompt = prompt.format(speech=speech, language="Chinese")
complete_prompt

'\n    Write a summary of the following speech:\n    Speech: \n    Four score and seven years ago our fathers brought forth, upon this continent, a new nation, conceived in liberty, and dedicated to the proposition that all men are created equal.\n    Now we are engaged in a great civil war, testing whether that nation, or any nation so conceived, and so dedicated, can long endure. We are met on a great battle field of that war. We come to dedicate a portion of it, as a final resting place for those who died here, that the nation might live. This we may, in all propriety do.\n    But, in a larger sense, we can not dedicate we can not consecrate we can not hallow, this ground The brave men, living and dead, who struggled here, have hallowed it, far above our poor power to add or detract. The world will little note, nor long remember what we say here; while it can never forget what they did here.\n    It is rather for us, the living, we here be dedicated to the great task remaining befor

In [11]:
llm.get_num_tokens(complete_prompt)

336

In [12]:
llm_chain = LLMChain(llm = llm, prompt = prompt)
summary = llm_chain.run({"speech":speech, "language":"Chinese"})
summary

  llm_chain = LLMChain(llm = llm, prompt = prompt)
  summary = llm_chain.run({"speech":speech, "language":"Chinese"})


'## Summary of the Gettysburg Address:\n\nThis speech, delivered by President Abraham Lincoln at the dedication of a national cemetery at Gettysburg, Pennsylvania, reflects on the Civil War and the meaning of freedom. \n\nLincoln reminds the audience that the nation was founded on the principles of liberty and equality for all. He acknowledges the immense sacrifice made by the soldiers who fought and died at Gettysburg, stating that their actions have already consecrated the ground, far beyond the power of words. \n\nThe speech then shifts its focus to the living, calling upon them to dedicate themselves to the unfinished work of preserving the Union and ensuring that the sacrifices made at Gettysburg were not in vain. Lincoln emphasizes the need to rededicate themselves to the cause of freedom and to ensure that "government of the people, by the people, for the people, shall not perish from the earth."\n\n\n##  中文翻译：\n\n**Gettysburg演说概要：**\n\n这番演说是由亚伯拉罕·林肯总统在宾夕法尼亚州盖茨堡国立墓地的落成典礼上发表的，它反思

Stuff Summarization

In [13]:
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("55_JoLT_Jointly_Learned_Repres.pdf")
pages = loader.load_and_split()
pages

[Document(metadata={'source': '55_JoLT_Jointly_Learned_Repres.pdf', 'page': 0}, page_content='JoLT: Jointly Learned Representations of\nLanguage and Time-Series\nYifu Cai, Mononito Goswami, Arjun Choudhry, Arvind Srinivasan, Artur Dubrawski\nAuton Lab, School of Computer Science, Carnegie Mellon University\nPittsburgh, PA 15213\narvind.srini.8@gmail.com ,{yifuc, mgoswami, arjuncho, awd}@andrew.cmu.edu\nAbstract\nTime-series and text data is prevalent in healthcare and frequently exist in tandem,\nfor e.g., in electrocardiogram (ECG) interpretation reports. Yet, these modalities\nare typically modeled independently. Even studies that jointly model time-series\nand text do so by converting time-series to images or graphs. We hypothesize\nthat explicitly modeling time-series jointly with text can improve tasks such as\nsummarization and question answering for time-series data, which have received\nlittle attention so far. To address this gap, we introduce JoLT to jointly learn desired\nre

In [14]:
prompt_template = """Please provide a summary of the following text:
    Text: {text}
    """
prompt = PromptTemplate(input_variables=["text"], template=prompt_template)

In [15]:
from langchain.chains.summarize import load_summarize_chain

In [16]:
chain = load_summarize_chain(llm, chain_type="stuff", prompt=prompt, verbose=True)
summary = chain.run(pages)
summary



[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mPlease provide a summary of the following text:
    Text: JoLT: Jointly Learned Representations of
Language and Time-Series
Yifu Cai, Mononito Goswami, Arjun Choudhry, Arvind Srinivasan, Artur Dubrawski
Auton Lab, School of Computer Science, Carnegie Mellon University
Pittsburgh, PA 15213
arvind.srini.8@gmail.com ,{yifuc, mgoswami, arjuncho, awd}@andrew.cmu.edu
Abstract
Time-series and text data is prevalent in healthcare and frequently exist in tandem,
for e.g., in electrocardiogram (ECG) interpretation reports. Yet, these modalities
are typically modeled independently. Even studies that jointly model time-series
and text do so by converting time-series to images or graphs. We hypothesize
that explicitly modeling time-series jointly with text can improve tasks such as
summarization and question answering for time-series data, which have received
little

"The following is a summary of the provided text.\n\n**JoLT: Jointly Learned Representations of Language and Time-Series**\n\nThis paper introduces JoLT, a novel model for jointly learning representations of language and time-series data, with a specific focus on ECG interpretation.\n\n**Key aspects of JoLT:\n\n* **Time-series Encoder:** Trained using a masked time-series reconstruction objective.\n* **Text Decoder:** Utilizes a pre-trained language model (e.g., OPT) to generate text summaries.\n* **Querying Transformer (Q-Former) aligns time-series and text representations.\n* **Fine-tuning: Both encoder and decoder are fine-tuned together.\n\n**Evaluations:**\n\n* **ECG interpretation task on the PTB-XL dataset. Compared to BLIP-2 (image captioning model).\n* **Results:** JoLT outperforms BLIP-2 in ECG interpretation tasks.\n* **Other language models tested: GPT-2, BioGPT, BioMedLM.\n* **Findings: JoLT generally outperforms other models for ECG interpretation.\n* **Time-series condit

Map Reduce Summarization

In [17]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [18]:
loader = PyPDFLoader("55_JoLT_Jointly_Learned_Repres.pdf")
docs = loader.load_and_split()
docs

[Document(metadata={'source': '55_JoLT_Jointly_Learned_Repres.pdf', 'page': 0}, page_content='JoLT: Jointly Learned Representations of\nLanguage and Time-Series\nYifu Cai, Mononito Goswami, Arjun Choudhry, Arvind Srinivasan, Artur Dubrawski\nAuton Lab, School of Computer Science, Carnegie Mellon University\nPittsburgh, PA 15213\narvind.srini.8@gmail.com ,{yifuc, mgoswami, arjuncho, awd}@andrew.cmu.edu\nAbstract\nTime-series and text data is prevalent in healthcare and frequently exist in tandem,\nfor e.g., in electrocardiogram (ECG) interpretation reports. Yet, these modalities\nare typically modeled independently. Even studies that jointly model time-series\nand text do so by converting time-series to images or graphs. We hypothesize\nthat explicitly modeling time-series jointly with text can improve tasks such as\nsummarization and question answering for time-series data, which have received\nlittle attention so far. To address this gap, we introduce JoLT to jointly learn desired\nre

In [19]:
splitted_docs = RecursiveCharacterTextSplitter(chunk_size = 2048, chunk_overlap = 128).split_documents(docs)
splitted_docs

[Document(metadata={'source': '55_JoLT_Jointly_Learned_Repres.pdf', 'page': 0}, page_content='JoLT: Jointly Learned Representations of\nLanguage and Time-Series\nYifu Cai, Mononito Goswami, Arjun Choudhry, Arvind Srinivasan, Artur Dubrawski\nAuton Lab, School of Computer Science, Carnegie Mellon University\nPittsburgh, PA 15213\narvind.srini.8@gmail.com ,{yifuc, mgoswami, arjuncho, awd}@andrew.cmu.edu\nAbstract\nTime-series and text data is prevalent in healthcare and frequently exist in tandem,\nfor e.g., in electrocardiogram (ECG) interpretation reports. Yet, these modalities\nare typically modeled independently. Even studies that jointly model time-series\nand text do so by converting time-series to images or graphs. We hypothesize\nthat explicitly modeling time-series jointly with text can improve tasks such as\nsummarization and question answering for time-series data, which have received\nlittle attention so far. To address this gap, we introduce JoLT to jointly learn desired\nre

In [20]:
chunk_prompt = """Please provide a summary of the following article:
    Article: {text}
    """
chunk_prompt = PromptTemplate(input_variables=["text"], template=chunk_prompt)

In [21]:
final_prompt = """
    Provide the final summary of the entire article with the important points.
    Add a Motivation Title,Start the precise summary with an introduction and provide the summary in number 
    points for the article.
    Article:{text}"""
final_prompt = PromptTemplate(input_variables=["text"], template=final_prompt)

In [22]:
summary_chain = load_summarize_chain(llm, chain_type="map_reduce", map_prompt=chunk_prompt, combine_prompt = final_prompt,verbose=True)

In [23]:
summary = summary_chain.run(splitted_docs)
summary



[1m> Entering new MapReduceDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mPlease provide a summary of the following article:
    Article: JoLT: Jointly Learned Representations of
Language and Time-Series
Yifu Cai, Mononito Goswami, Arjun Choudhry, Arvind Srinivasan, Artur Dubrawski
Auton Lab, School of Computer Science, Carnegie Mellon University
Pittsburgh, PA 15213
arvind.srini.8@gmail.com ,{yifuc, mgoswami, arjuncho, awd}@andrew.cmu.edu
Abstract
Time-series and text data is prevalent in healthcare and frequently exist in tandem,
for e.g., in electrocardiogram (ECG) interpretation reports. Yet, these modalities
are typically modeled independently. Even studies that jointly model time-series
and text do so by converting time-series to images or graphs. We hypothesize
that explicitly modeling time-series jointly with text can improve tasks such as
summarization and question answering for time-series data, which have recei

Token indices sequence length is longer than the specified maximum sequence length for this model (3713 > 1024). Running this sequence through the model will result in indexing errors



[1m> Finished chain.[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
    Provide the final summary of the entire article with the important points.
    Add a Motivation Title,Start the precise summary with an introduction and provide the summary in number 
    points for the article.
    Article:This article introduces **JoLT**, a novel method for jointly learning representations from time-series and text data. 

**The Problem:**

* Time-series and text data are abundant in healthcare (e.g., ECG reports) but are often treated separately.
* Existing joint modeling methods convert time-series into images or graphs, which can be inefficient and lose inherent temporal information.
* Tasks like summarizing and answering questions about time-series data in healthcare are under-explored.

**JoLT's Solution:**

* **Joint Representation Learning:** JoLT leverages pre-trained models for both time-series

'##  Navigating the Dual Nature of AI in Healthcare: Promises and Perils\n\nThis summary explores the potential benefits and inherent risks associated with applying large language models (LLMs) and foundation models in healthcare, focusing on their impact on biomedical vision-language processing (VLP) and electronic health records (EHRs).\n\n**Key Points:**\n\n1. **Enhanced Biomedical VLP:**  Research suggests that pretraining LLMs on large biomedical datasets can significantly improve their performance in tasks like image captioning and question answering within the healthcare domain. This specialized training addresses the limitations of using general-purpose LLMs for biomedical VLP.\n\n2. **Addressing Healthcare-Specific Challenges:** LLMs can be valuable tools for analyzing and interpreting EHRs, potentially aiding in diagnoses, treatment recommendations, and patient monitoring. However, their application in this sensitive domain requires careful consideration due to ethical and sa

Refine Chain Summarization

In [24]:
chain = load_summarize_chain(llm, chain_type="refine", verbose=True)
summary = chain.run(splitted_docs)



[1m> Entering new RefineDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


"JoLT: Jointly Learned Representations of
Language and Time-Series
Yifu Cai, Mononito Goswami, Arjun Choudhry, Arvind Srinivasan, Artur Dubrawski
Auton Lab, School of Computer Science, Carnegie Mellon University
Pittsburgh, PA 15213
arvind.srini.8@gmail.com ,{yifuc, mgoswami, arjuncho, awd}@andrew.cmu.edu
Abstract
Time-series and text data is prevalent in healthcare and frequently exist in tandem,
for e.g., in electrocardiogram (ECG) interpretation reports. Yet, these modalities
are typically modeled independently. Even studies that jointly model time-series
and text do so by converting time-series to images or graphs. We hypothesize
that explicitly modeling time-series jointly with text can improve tasks such as
summarization and question answering for time-series data, which have received
little attention s

In [25]:
summary

"This paper introduces JoLT, a novel neural method for jointly learning representations from time-series and text data. Existing approaches often struggle to effectively integrate these modalities, either treating them independently or relying on rule-based systems that require significant domain expertise and may not generalize well.  JoLT directly aligns time-series and text representations using a Querying Transformer (Q-Former), addressing the limitations of previous methods that convert time-series into images or graphs.\n\nKey innovations include:\n\n* **Direct Alignment:** JoLT directly works with time-series and text data, enabling a more natural and potentially more effective integration of the two modalities.\n* **Querying Transformer:** The Q-Former mechanism allows for selective extraction of time-series features based on textual prompts, enhancing the model's ability to focus on relevant information.\n* **Pre-training:** JoLT leverages pre-trained language models and a tim