<a href="https://colab.research.google.com/github/prabhanjan-jadhav/open-source-research-paper-summarizer/blob/main/Research_paper_summarizer_Langchain.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Installations and Imports

In [None]:
!pip install -qqq langchain openai huggingface_hub arxiv pypdf faiss-cpu tiktoken cohere

In [None]:
import getpass
import os
from IPython.display import Markdown
from langchain.llms import OpenAI
from langchain import HuggingFaceHub, LLMChain
from langchain.prompts import PromptTemplate
from langchain.document_loaders import PyPDFLoader
# from langchain.embeddings.openai import OpenAIEmbeddings    # credits required
# from langchain.embeddings import AlephAlphaAsymmetricSemanticEmbedding   # credits required
from langchain.embeddings import CohereEmbeddings
from langchain.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
import arxiv

In [None]:
# openai_api_key = getpass.getpass("OpenAI API Key:")

In [None]:
# os.environ['OPENAI_API_KEY'] = openai_api_key

In [None]:
# llm = OpenAI(model_name="text-davinci-003", temperature=0)

In [None]:
# llm("hello this is a test")

I don't have any OpenAI credits. Therefore, lets find some open source apis.

# Cohere

In [None]:
cohere_api_key = getpass.getpass("Cohere API Key:")

Cohere API Key:··········


# HuggingFaceHub

In [None]:
huggingface_api_key = getpass.getpass("HuggingFace API Key:")

HuggingFace API Key:··········


In [None]:
os.environ['HUGGINGFACEHUB_API_TOKEN'] = huggingface_api_key

In [None]:
hub_llm = HuggingFaceHub(
    repo_id ="OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5",
    model_kwargs={'temperature':0.75, 'max_new_tokens':200, 'top_p': 0.95,
                  'repetition_penalty': 1.2, 'top_k':50}
  )

prompt = PromptTemplate(
      input_variables = ['question'],
      template = '<|prompter|>{question}<|endoftext|><|assistant|>'
  )

hub_chain = LLMChain(prompt = prompt, llm = hub_llm, verbose=True)

In [None]:
text = """# Transform the image to the form expected by the model
     56 input_image = self.transform.apply_image(image)
---> 57 input_image_torch = torch.as_tensor(input_image, device=self.device)
     58 input_image_torch = input_image_torch.permute(2, 0, 1).contiguous()[None, :, :, :]
     60 self.set_torch_image(input_image_torch, image.shape[:2])

RuntimeError: Could not infer dtype of numpy.uint8"""

Markdown(hub_chain.run(text))

# Let's bring in some context.

In [None]:
paper = next(arxiv.Search(id_list=['2205.11916']).results())

Markdown(paper.summary)

In [None]:
Markdown(hub_chain.run(paper.summary))

In [None]:
hub_chain.run("give an summary on that")

In [None]:
hub_chain.run("what is the name of the paper?")

# See that this is a wrong answer. The api doesn't track your previous prompt for now.

# Make the paper searchable

In [None]:
paper_path = paper.download_pdf()

In [None]:
loader = PyPDFLoader(paper_path)
pages = loader.load_and_split()

In [None]:
len(pages)

49

In [None]:
content = "\n\n".join([page.page_content for page in pages[:2]])

In [None]:
response = hub_chain.run(f"""Pls go through this paper :
{content}.


Now, based on the content tell me  what is zero shot chain of thought prompting?
""")

In [None]:
Markdown(response)

It's an AI strategy that relies on human teachers to help kids learn how to write their stories, poems, essays and other creative writing skills. The AI prompts them to make sure they focus on creating stories that best match what your preconceived instructions, and are true/imagined in the story you ask, and it is about.

# Find relevant content using embedding search

In [None]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size = 1000, chunk_overlap=0)
docs = text_splitter.split_documents(pages)

In [None]:
embeddings = CohereEmbeddings(cohere_api_key=cohere_api_key)
db = FAISS.from_documents(docs, embeddings)

In [None]:
docs = db.similarity_search("What is zero-shot chain-of-thought prompting?")

In [None]:
len(docs)

4

In [None]:
relevant_content = "\n\n".join([doc.page_content for doc in docs[:1]])

In [None]:
Markdown(docs[0].page_content)

language models like PaLM [Chowdhery et al., 2022]. The top row of Figure 1 shows standard
few-shot prompting against (few-shot) CoT prompting. Notably, few-shot learning was taken as a
given for tackling such difﬁcult tasks, and the zero-shot baseline performances were not even reported
in the original work [Wei et al., 2022]. To differentiate it from our method, we call Wei et al. [2022]
asFew-shot-CoT in this work.
3 Zero-shot Chain of Thought
We propose Zero-shot-CoT, a zero-shot template-based prompting for chain of thought reasoning.
It differs from the original chain of thought prompting [Wei et al., 2022] as it does not require
step-by-step few-shot examples, and it differs from most of the prior template prompting [Liu et al.,
2021b] as it is inherently task-agnostic and elicits multi-hop reasoning across a wide range of tasks
with a single template. The core idea of our method is simple, as described in Figure 1: add Let’s

In [None]:
hub_chain.run(f"""Acknowledge the below excerpt:
{relevant_content}.

Based on the above excerpt, what is zero-shot chain-of-thought?""")



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m<|prompter|>Acknowledge the below excerpt:
language models like PaLM [Chowdhery et al., 2022]. The top row of Figure 1 shows standard
few-shot prompting against (few-shot) CoT prompting. Notably, few-shot learning was taken as a
given for tackling such difﬁcult tasks, and the zero-shot baseline performances were not even reported
in the original work [Wei et al., 2022]. To differentiate it from our method, we call Wei et al. [2022]
asFew-shot-CoT in this work.
3 Zero-shot Chain of Thought
We propose Zero-shot-CoT, a zero-shot template-based prompting for chain of thought reasoning.
It differs from the original chain of thought prompting [Wei et al., 2022] as it does not require
step-by-step few-shot examples, and it differs from most of the prior template prompting [Liu et al.,
2021b] as it is inherently task-agnostic and elicits multi-hop reasoning across a wide range of tasks
with a single template. The 

'Zero-Shot COOT stands for "Chain Of Thought with Out-Of-The-Domain Examples". In other words, while training the model to solve specific problems, they will learn how to understand text in general terms so that they can apply to any problem given, without needing to go through many steps. It\'s main benefit is that once trained, you do not have to train it again for every new type of problem you want to try and answer. \nI hope I answered your question! Is there anything else I should know or explain about COOT?'