In [13]:
import yaml, os, openai, textwrap
from langchain.chat_models import ChatOpenAI
from langchain.docstore.document import Document
from langchain.chains.mapreduce import MapReduceChain
from langchain import OpenAI, PromptTemplate, LLMChain
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains.summarize import load_summarize_chain

In [14]:
with open('cadentials.yaml') as f:
    credentials = yaml.load(f, Loader=yaml.FullLoader)

os.environ['OPENAI_API_KEY'] = credentials['OPENAI_API_KEY']
os.environ['HUGGINGFACEHUB_API_TOKEN'] = credentials['HUGGINGFACEHUB_API_TOKEN']
os.environ['ENGINE'] = credentials['ENGINE']

openai.api_key = credentials['OPENAI_API_KEY']
openai.api_base = credentials['OPENAI_API_BASE']
openai.api_type = credentials['OPENAI_API_TYPE']
openai.api_version = credentials['OPENAI_API_VERSION']
openai.engine = credentials['ENGINE']

In [15]:
llm = ChatOpenAI(
                openai_api_key=os.environ["OPENAI_API_KEY"],
                engine = os.environ["ENGINE"],
                model='gpt-3.5-turbo',
                temperature=0.9, 
                max_tokens = 256
                )

                    engine was transferred to model_kwargs.
                    Please confirm that engine is what you intended.


In [21]:
text_splitter = CharacterTextSplitter()

with open('data/7-summarization.txt') as f:
    ml_text = f.read()
texts = text_splitter.split_text(ml_text)

In [10]:
len(texts)

4

In [11]:
docs = [Document(page_content=t) for t in texts]
docs

[Document(page_content='Machine learning is a rapidly evolving field within the broader domain of artificial intelligence (AI) that has garnered significant attention and acclaim in recent years. It encompasses a diverse array of techniques and algorithms designed to enable computers to learn from data and make predictions or decisions without being explicitly programmed. This transformative technology has applications across various industries and is at the heart of many cutting-edge innovations, promising to reshape the way we interact with technology and solve complex problems.\n\nAt its core, machine learning relies on the idea that computers can analyze and interpret large datasets to discover patterns, relationships, and insights that might be too complex for humans to discern through traditional programming. This paradigm shift has empowered machines to excel in tasks that were once considered exclusively human, such as image and speech recognition, natural language processing, 

##  3 types of CombineDocuments Chains

### 1. Map Reduce [run individually and then combine]
##### this method involves **an initial prompt on each chunk of data**
        - for summarization tasks, this could be a summary of that chunk
        - for question-answering tasks, it could be an answer based solely on that chunk
##### **Then a different prompt is run to combine all the initial outputs.** This is implemented in the LangChain as the MapReduceDocumentsChain.

**Pros:** Can scale to larger documents (and more documents) than StuffDocumentsChain. The calls to the LLM on individual documents are independent and can therefore be parallelized.

**Cons:** Requires many more calls to the LLM than StuffDocumentsChain. Loses some information during the final combining call.

In [17]:
chain = load_summarize_chain(
                            llm, 
                            chain_type="map_reduce"
                            )


output_summary = chain.run(docs)
wrapped_text = textwrap.fill(output_summary, width=100)
print(wrapped_text)

Machine learning is a rapidly advancing field of artificial intelligence that enables computers to
learn from data and make predictions or decisions. It has various types, including supervised,
unsupervised, and reinforcement learning. Deep learning, a subfield of machine learning, has been
instrumental in recent AI breakthroughs. However, the success of machine learning relies on high-
quality datasets and ethical considerations. Deep learning has progressed rapidly, particularly in
natural language processing, with the development of Large Language Models (LLMs) that can
understand and generate human language. LLMs have impressive abilities but also present challenges
such as computational requirements, data biases, and ethical concerns. The objective is to find the
right balance between utilizing their potential and addressing these issues.


In [18]:
# for summarizing each part
chain.llm_chain.prompt.template

'Write a concise summary of the following:\n\n\n"{text}"\n\n\nCONCISE SUMMARY:'

In [19]:
# for combining the parts
chain.combine_document_chain.llm_chain.prompt.template

'Write a concise summary of the following:\n\n\n"{text}"\n\n\nCONCISE SUMMARY:'

In [20]:
chain = load_summarize_chain(llm, 
                             chain_type="map_reduce",
                             verbose=True
                             )


output_summary = chain.run(docs)
wrapped_text = textwrap.fill(output_summary, 
                             width=100,
                             break_long_words=False,
                             replace_whitespace=False)
print(wrapped_text)



[1m> Entering new MapReduceDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


"Machine learning is a rapidly evolving field within the broader domain of artificial intelligence (AI) that has garnered significant attention and acclaim in recent years. It encompasses a diverse array of techniques and algorithms designed to enable computers to learn from data and make predictions or decisions without being explicitly programmed. This transformative technology has applications across various industries and is at the heart of many cutting-edge innovations, promising to reshape the way we interact with technology and solve complex problems.

At its core, machine learning relies on the idea that computers can analyze and interpret large datasets to discover patterns, relationships, and insights that might be too complex for humans to discern through traditional programming. This paradigm s

![Alt text](assets/image333.png)



### 2] Stuffing [run all at once, the normal way]
Stuffing is the simplest method, whereby you simply stuff all the related data into the prompt as context to pass to the language model. This is implemented in LangChain as the StuffDocumentsChain.

**Pros:** Only makes a single call to the LLM. When generating text, the LLM has access to all the data at once.

**Cons:** Most LLMs have a context length, and for large documents (or many documents) this will not work as it will result in a prompt larger than the context length.

The main downside of this method is that **it only works one smaller pieces of data.**  Once you are working with many pieces of data, this approach is no longer feasible. The next two approaches are designed to help deal with that.



In [22]:
len(ml_text)

11767

In [23]:
chain = load_summarize_chain(llm, chain_type="stuff")

prompt_template = """Write a concise bullet point summary of the following:


{text}


CONSCISE SUMMARY IN BULLET POINTS:"""

BULLET_POINT_PROMPT = PromptTemplate(
                                    template=prompt_template, 
                                    input_variables=["text"]
                                    )

In [24]:
chain = load_summarize_chain(llm, 
                             chain_type="stuff", 
                             prompt=BULLET_POINT_PROMPT)

output_summary = chain.run(docs)

wrapped_text = textwrap.fill(output_summary, 
                             width=100,
                             break_long_words=False,
                             replace_whitespace=False)
print(wrapped_text)

- Machine learning is an evolving field within AI that allows computers to learn from data without
explicit programming.
- It has diverse applications in industries like image and speech recognition,
recommendation systems, and medical diagnostics.
- Fundamental types of machine learning include
supervised, unsupervised, and reinforcement learning.
- Deep learning, a subfield of machine
learning, uses neural networks with multiple layers to solve complex tasks.
- Deep learning has
found applications in computer vision, natural language processing, robotics, and healthcare.
-
Large language models are characterized by their immense size and ability to understand and generate
human language.
- They are built on the transformer architecture, pre-trained on large datasets, and
fine-tuned for specific tasks.
- Large language models excel in text generation, language
understanding, and have applications in chatbots, content generation, and more.
- Ethical
considerations and responsible AI pr

### Since we are using 16K turbo this can run, but what if 4K turbo ? there whould be a information loss because of the token length exceeding the limit of 4K tokens.

# How to use map reduce with custom prompt

In [26]:
chain = load_summarize_chain(
                            llm, 
                            chain_type="map_reduce",
                            map_prompt=BULLET_POINT_PROMPT,    # define the prompt for individual docs 
                            combine_prompt=BULLET_POINT_PROMPT # define the prompt for combining docs
                            )

# chain.llm_chain.prompt= BULLET_POINT_PROMPT
# chain.combine_document_chain.llm_chain.prompt= BULLET_POINT_PROMPT

output_summary = chain.run(docs)
wrapped_text = textwrap.fill(output_summary, 
                             width=100,
                             break_long_words=False,
                             replace_whitespace=False)
print(wrapped_text)

- Machine learning is a rapidly evolving field within AI that enables computers to learn from data
without explicit programming.
- It has applications across industries and can reshape technology and
problem-solving.
- There are three fundamental types of machine learning: supervised, unsupervised,
and reinforcement learning.
- Deep learning, a subfield of machine learning, has been a driving
force behind recent breakthroughs in AI.
- Deep learning models use neural networks with multiple
layers to learn complex patterns in high-dimensional data.
- Ethical considerations, such as bias
and fairness, are important in machine learning systems.
- Deep learning has made significant
progress thanks to advancements in hardware and availability of large datasets.
- Large Language
Models (LLMs) are breakthrough models in natural language processing (NLP) and AI.
- LLMs are pre-
trained on massive text datasets and fine-tuned for specific NLP tasks.
- LLMs have impressive
capabilities but requir

### You can also see and use the outputs of the intermediate step aswell

In [27]:
chain = load_summarize_chain(
                            llm, 
                            chain_type="map_reduce",
                            return_intermediate_steps=True,
                            map_prompt=BULLET_POINT_PROMPT,    # define the prompt for individual docs 
                            combine_prompt=BULLET_POINT_PROMPT # define the prompt for combining docs
                            )

output_summary = chain({"input_documents": docs}, return_only_outputs=True)
wrapped_text = textwrap.fill(output_summary['output_text'], 
                             width=100,
                             break_long_words=False,
                             replace_whitespace=False)
print(wrapped_text)

- Machine learning is a rapidly evolving field within AI that allows computers to learn from data
and make predictions without explicit programming.
- It has diverse applications across industries
and relies on analyzing large datasets to discover patterns.
- There are three types of machine
learning: supervised, unsupervised, and reinforcement learning.
- Deep learning, a subfield of
machine learning, can handle vast amounts of data and is inspired by the human brain.
- Ethical
considerations, data quality, and quantity are important factors in the success of machine learning.
- Deep learning has had a significant impact on AI, with neural networks at its core.
-
Convolutional neural networks are effective for image processing, while recurrent neural networks
are suited for time series and text data.
- Transfer learning, fine-tuning pre-trained models, and
generative models have been successful in deep learning.
- Ethical considerations, bias in data, and
algorithmic fairness must be 

## 4 intermediate steps for 4 individual documents

In [28]:
output_summary

{'intermediate_steps': ['- Machine learning is a rapidly evolving field within artificial intelligence (AI) that allows computers to learn from data and make predictions or decisions without explicit programming.\n- It has diverse applications across various industries and is at the heart of cutting-edge innovations.\n- Machine learning relies on analyzing large datasets to discover complex patterns and insights.\n- There are three fundamental types of machine learning: supervised learning, unsupervised learning, and reinforcement learning.\n- Machine learning models can range from simple to complex, with deep learning being a subfield that can handle vast amounts of data.\n- Ensuring models generalize well to new data and avoiding overfitting is a critical challenge in machine learning.\n- The success of machine learning depends on data quality and quantity, as well as ethical considerations to prevent bias and discrimination.',
  '- Machine learning is advancing rapidly, promising to

## 3] Refine
This method involves **an initial prompt on the first chunk of data, generating some output. For the remaining documents, that output is passed in, along with the next document**, asking the LLM to refine the output based on the new document.

**Pros:** Can pull in more relevant context, and may be less lossy than MapReduceDocumentsChain.

**Cons:** Requires many more calls to the LLM than StuffDocumentsChain. The calls are also NOT independent, meaning they cannot be paralleled like MapReduceDocumentsChain. There is also some potential dependencies on the ordering of the documents.

![Alt text](image.png)

In [29]:
chain = load_summarize_chain(llm, chain_type="refine")

output_summary = chain.run(docs)
wrapped_text = textwrap.fill(output_summary, width=100)
print(wrapped_text)

The field of machine learning continues to advance rapidly, enabling computers to learn from data
and make predictions without explicit programming. It has applications in various industries and is
at the forefront of cutting-edge innovations. Machine learning relies on large datasets to discover
complex patterns and insights. There are three fundamental types of machine learning: supervised
learning, unsupervised learning, and reinforcement learning. Models can range from simple linear
regression to complex neural networks. Ensuring models generalize well and addressing ethical
considerations are critical challenges in machine learning. Additionally, deep learning, a subfield
of machine learning, has gained popularity and is driving breakthroughs in artificial intelligence.
It uses neural networks with multiple layers to solve complex tasks and has found applications in
various domains, including autonomous vehicles, healthcare, finance, and more. Deep learning faces
challenges relate

# Now we going to verify the Facts generated by the LLM

In [30]:
article = '''Coinbase, the second-largest crypto exchange by trading volume, released its Q4 2022 earnings on Tuesday, giving shareholders and market players alike an updated look into its financials. In response to the report, the company's shares are down modestly in early after-hours trading.In the fourth quarter of 2022, Coinbase generated $605 million in total revenue, down sharply from $2.49 billion in the year-ago quarter. Coinbase's top line was not enough to cover its expenses: The company lost $557 million in the three-month period on a GAAP basis (net income) worth -$2.46 per share, and an adjusted EBITDA deficit of $124 million.Wall Street expected Coinbase to report $581.2 million in revenue and earnings per share of -$2.44 with adjusted EBITDA of -$201.8 million driven by 8.4 million monthly transaction users (MTUs), according to data provided by Yahoo Finance.Before its Q4 earnings were released, Coinbase's stock had risen 86% year-to-date. Even with that rally, the value of Coinbase when measured on a per-share basis is still down significantly from its 52-week high of $206.79.That Coinbase beat revenue expectations is notable in that it came with declines in trading volume; Coinbase historically generated the bulk of its revenues from trading fees, making Q4 2022 notable. Consumer trading volumes fell from $26 billion in the third quarter of last year to $20 billion in Q4, while institutional volumes across the same timeframe fell from $133 billion to $125 billion.The overall crypto market capitalization fell about 64%, or $1.5 trillion during 2022, which resulted in Coinbase's total trading volumes and transaction revenues to fall 50% and 66% year-over-year, respectively, the company reported.As you would expect with declines in trading volume, trading revenue at Coinbase fell in Q4 compared to the third quarter of last year, dipping from $365.9 million to $322.1 million. (TechCrunch is comparing Coinbase's Q4 2022 results to Q3 2022 instead of Q4 2021, as the latter comparison would be less useful given how much the crypto market has changed in the last year; we're all aware that overall crypto activity has fallen from the final months of 2021.)There were bits of good news in the Coinbase report. While Coinbase's trading revenues were less than exuberant, the company's other revenues posted gains. What Coinbase calls its "subscription and services revenue" rose from $210.5 million in Q3 2022 to $282.8 million in Q4 of the same year, a gain of just over 34% in a single quarter.And even as the crypto industry faced a number of catastrophic events, including the Terra/LUNA and FTX collapses to name a few, there was still growth in other areas. The monthly active developers in crypto have more than doubled since 2020 to over 20,000, while major brands like Starbucks, Nike and Adidas have dived into the space alongside social media platforms like Instagram and Reddit.With big players getting into crypto, industry players are hoping this move results in greater adoption both for product use cases and trading volumes. Although there was a lot of movement from traditional retail markets and Web 2.0 businesses, trading volume for both consumer and institutional users fell quarter-over-quarter for Coinbase.Looking forward, it'll be interesting to see if these pieces pick back up and trading interest reemerges in 2023, or if platforms like Coinbase will have to keep looking elsewhere for revenue (like its subscription service) if users continue to shy away from the market.
'''

wrapped_text = textwrap.fill(article, 
                             width=100,
                             break_long_words=False,
                             replace_whitespace=False)
print(wrapped_text)

Coinbase, the second-largest crypto exchange by trading volume, released its Q4 2022 earnings on
Tuesday, giving shareholders and market players alike an updated look into its financials. In
response to the report, the company's shares are down modestly in early after-hours trading.In the
fourth quarter of 2022, Coinbase generated $605 million in total revenue, down sharply from $2.49
billion in the year-ago quarter. Coinbase's top line was not enough to cover its expenses: The
company lost $557 million in the three-month period on a GAAP basis (net income) worth -$2.46 per
share, and an adjusted EBITDA deficit of $124 million.Wall Street expected Coinbase to report $581.2
million in revenue and earnings per share of -$2.44 with adjusted EBITDA of -$201.8 million driven
by 8.4 million monthly transaction users (MTUs), according to data provided by Yahoo Finance.Before
its Q4 earnings were released, Coinbase's stock had risen 86% year-to-date. Even with that rally,
the value of Coinbase

In [31]:
len(article)

3533

In [32]:
fact_extraction_prompt = PromptTemplate(
                                        input_variables=["text_input"],
                                        template="Extract the key facts out of this text. Don't include opinions. \
                                        Give each fact a number and keep them short sentences. :\n\n {text_input}"
                                        )

In [33]:
fact_extraction_chain = LLMChain(llm=llm, prompt=fact_extraction_prompt)

facts = fact_extraction_chain.run(article)

wrapped_text = textwrap.fill(facts, 
                             width=100,
                             break_long_words=False,
                             replace_whitespace=False)
print(wrapped_text)

1. Coinbase released its Q4 2022 earnings, revealing $605 million in total revenue.
2. The company's
revenue declined sharply from $2.49 billion in the year-ago quarter.
3. Coinbase experienced a net
loss of $557 million in the fourth quarter.
4. Wall Street expected Coinbase to report $581.2
million in revenue.
5. Coinbase's stock had risen 86% year-to-date before the earnings release.
6.
Coinbase's trading volumes and transaction revenues fell 50% and 66% year-over-year.
7. Trading
revenue at Coinbase fell from $365.9 million to $322.1 million in Q4.
8. Coinbase's "subscription
and services revenue" increased by just over 34% in Q4.
9. The number of monthly active developers
in crypto has more than doubled since 2020.
10. Major brands like Starbucks, Nike, and Adidas have
entered the crypto space.
11. Trading volume for both consumer and institutional users fell quarter-
over-quarter for Coinbase.
12. The industry hopes for greater adoption due to big players entering
crypto.


In [34]:
from langchain.chains import LLMSummarizationCheckerChain

llm_fact = ChatOpenAI(
                openai_api_key=os.environ["OPENAI_API_KEY"],
                engine = os.environ["ENGINE"],
                model='gpt-3.5-turbo',
                temperature=0.0 # make sure temperature is 0.0
                )

checker_chain = LLMSummarizationCheckerChain(
                                            llm=llm_fact, 
                                            verbose=True, 
                                            max_checks=2
                                            )

final_summary = checker_chain.run(article)
final_summary

                    engine was transferred to model_kwargs.
                    Please confirm that engine is what you intended.




[1m> Entering new LLMSummarizationCheckerChain chain...[0m


[1m> Entering new SequentialChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven some text, extract a list of facts from the text.

Format your output as a bulleted list.

Text:
"""
Coinbase, the second-largest crypto exchange by trading volume, released its Q4 2022 earnings on Tuesday, giving shareholders and market players alike an updated look into its financials. In response to the report, the company's shares are down modestly in early after-hours trading.In the fourth quarter of 2022, Coinbase generated $605 million in total revenue, down sharply from $2.49 billion in the year-ago quarter. Coinbase's top line was not enough to cover its expenses: The company lost $557 million in the three-month period on a GAAP basis (net income) worth -$2.46 per share, and an adjusted EBITDA deficit of $124 million.Wall Street expected Coinbase to report $581.2 million in revenu

'Coinbase, the second-largest crypto exchange by trading volume, released its Q4 2022 earnings on Tuesday, giving shareholders and market players alike an updated look into its financials. In the fourth quarter of 2022, Coinbase generated $605 million in total revenue, down sharply from $2.49 billion in the year-ago quarter. Coinbase\'s top line was not enough to cover its expenses: The company lost $557 million in the three-month period on a GAAP basis (net income) worth -$2.46 per share, and an adjusted EBITDA deficit of $124 million. Wall Street expected Coinbase to report $581.2 million in revenue and earnings per share of -$2.44 with adjusted EBITDA of -$201.8 million driven by 8.4 million monthly transaction users (MTUs), according to data provided by Yahoo Finance. Before its Q4 earnings were released, Coinbase\'s stock had risen 86% year-to-date. Even with that rally, the value of Coinbase when measured on a per-share basis is still down significantly from its 52-week high of $