#### Databrick's Free Dolly with LangChain 

- To use the pipeline with LangChain, you must set return_full_text=True, as LangChain expects the full text to be returned and the default for the pipeline is to only return the new text.

##### Main Use Cases of LangChain

- Summarization - Express the most important facts about a body of text or chat interaction

- Question and Answering Over Documents - Use information held within documents to answer questions or query

- Extraction - Pull structured data from a body of text or an user query

- Evaluation - Understand the quality of output from your application

- Querying Tabular Data - Pull data from databases or other tabular source

- Code Understanding - Reason about and digest code

- Interacting with APIs - Query APIs and interact with the outside world

- Chatbots - A framework to have a back and forth interaction with a user combined with memory in a chat interface

- Agents - Use LLMs to make decisions about what to do next. Enable these decisions with tools.



In [2]:
!pip install "accelerate>=0.16.0,<1" "transformers[torch]>=4.28.1,<5" "torch>=1.13.1,<2"

[0m

In [3]:
import torch
from transformers import pipeline

generate_text = pipeline(model="databricks/dolly-v2-12b", torch_dtype=torch.bfloat16,
                         trust_remote_code=True, device_map="auto", return_full_text=True)


In [4]:
!pip install langchain

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[0m

In [4]:
!pip install langchain>=0.0.139

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[0m

In [18]:
!pip install "unstructured[pdf]"

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Collecting pdf2image (from unstructured[pdf])
  Using cached pdf2image-1.16.3-py3-none-any.whl (11 kB)
Collecting pdfminer.six (from unstructured[pdf])
  Using cached pdfminer.six-20221105-py3-none-any.whl (5.6 MB)
Collecting Pillow<10 (from unstructured[pdf])
  Using cached Pillow-9.5.0-cp310-cp310-manylinux_2_28_x86_64.whl (3.4 MB)
Collecting unstructured-inference (from unstructured[pdf])
  Obtaining dependency information for unstructured-inference from https://files.pythonhosted.org/packages/fc/d7/ed668982ab0b8bae6813b6ed7de90cb0ab40948dc019d72e5caf9f71761b/unstructured_inference-0.5.19-py3-none-any.whl.metadata
  Downloading unstructured_inference-0.5.19-py3-none-any.whl.metadata (6.5 kB)
Collecting la

In [5]:
from langchain import PromptTemplate, LLMChain
from langchain.llms import HuggingFacePipeline
import unstructured
from langchain.document_loaders import S3FileLoader

In [6]:
# template for an instrution with no input
prompt = PromptTemplate(
    input_variables=["instruction"],
    template="{instruction}")

# template for an instruction with input
prompt_with_context = PromptTemplate(
    input_variables=["instruction", "context"],
    template="{instruction}\n\nInput:\n{context}")

hf_pipeline = HuggingFacePipeline(pipeline=generate_text)

llm_chain = LLMChain(llm=hf_pipeline, prompt=prompt)
llm_context_chain = LLMChain(llm=hf_pipeline, prompt=prompt_with_context)

In [7]:
#Loading pdf file from S3 bucket to langchain
loader1 = S3FileLoader("sagemaker-studio-njiztjducek", "genai/Private-Data/CV1.pdf")

In [9]:
context=loader1.load_and_split()
text=context[0].page_content
print(llm_chain.predict(instruction="What is  experience of Christopher Morgan",context=text).lstrip())

He turned into ice cream after spending too much time in the sun


In [None]:
print(llm_chain.predict(instruction="What is  Education of Christopher Morgan",context=text).lstrip())

In [39]:
text

'Senior Web Developer specializing in front end development. Experienced with all stages of the development cycle for dynamic web projects. Well- versed in numerous programming languages including HTML5, PHP OOP, JavaScript, CSS, MySQL. Strong background in project management and customer relations.\n\nCHRISTOPHER MORGAN\n\nPhone: +49 800 600 600\n\nExperience\n\n09/2015 to 05/2019\n\nE-Mail: christoper.morgan@gmail.co m\n\nWeb Developer - Luna Web Design, New York \uf0b7 Cooperate with designers to create clean\n\ninterfaces and simple, intuitive interactions and experiences.\n\nLinkedin: linkedin.com/ christopher.morgan\n\nDevelop project concepts and maintain optimal\n\nworkflow.\n\nWork with senior developer to manage large, complex design projects for corporate clients.\n\nComplete detailed programming and\n\nSkill Highlights\n\nSkill Highlights \uf0b7 Project management \uf0b7 Strong decision maker \uf0b7 Complex problem solver \uf0b7 Creative design \uf0b7 \uf0b7 Service-focused

In [37]:
context[0].page_content

'Senior Web Developer specializing in front end development. Experienced with all stages of the development cycle for dynamic web projects. Well- versed in numerous programming languages including HTML5, PHP OOP, JavaScript, CSS, MySQL. Strong background in project management and customer relations.\n\nCHRISTOPHER MORGAN\n\nPhone: +49 800 600 600\n\nExperience\n\n09/2015 to 05/2019\n\nE-Mail: christoper.morgan@gmail.co m\n\nWeb Developer - Luna Web Design, New York \uf0b7 Cooperate with designers to create clean\n\ninterfaces and simple, intuitive interactions and experiences.\n\nLinkedin: linkedin.com/ christopher.morgan\n\nDevelop project concepts and maintain optimal\n\nworkflow.\n\nWork with senior developer to manage large, complex design projects for corporate clients.\n\nComplete detailed programming and\n\nSkill Highlights\n\nSkill Highlights \uf0b7 Project management \uf0b7 Strong decision maker \uf0b7 Complex problem solver \uf0b7 Creative design \uf0b7 \uf0b7 Service-focused

In [10]:
loader = S3FileLoader("sagemaker-studio-njiztjducek", "genai/Private-Data/CV2.pdf")

In [12]:
context1=loader.load()
text=context1[0].page_content
print(llm_chain.predict(instruction="What is Susan Williams's Experience?",context=text).lstrip())

After suffering from the same buggy software in the late 90s, Early this decade, Susan Williams was demoted as CTO of Databricks. The hackers made her comeback with a kickstarter and today she is the CEO of Databricks.


In [28]:
context1

[Document(page_content='Susan Williams\n\n+1 (970) 333-3833\n\nsusan.williams@coolfreecv.com\n\nStore Manager equipped with extensive experience in automotive sales management. Employs excellent leadership skills and multi-tasking strengths. Demonstrated ability to improve store operations, increase top line sales, and reduce costs. Demonstrated ability to improve store operations, increase top line sales, and reduce costs.\n\nExperience\n\nHighlights\n\n09/2015 to 05/2019\n\nStore Manager LUXURY CAR CENTER, New York\n\nMotivate and coach employees to\n\nmeet service, sales, and repair goals.\n\nCreate and modify employee\n\nschedules with service levels in mind.\n\nResults-oriented \uf0b7 Revenue generation \uf0b7 Business development \uf0b7 Effective marketing \uf0b7 Organisational capacity \uf0b7 Operability and commitment\n\nRecruit and hire top mechanics,\n\nservice advisors, and sales people.\n\nRecruit and hire top mechanics,\n\nAbility to motivate staff and maintain good relati

In [20]:
context
print(llm_chain.predict(instruction="What is CHRISTOPHER MORGAN's Education",context=context).lstrip())

[Document(page_content='Senior Web Developer specializing in front end development. Experienced with all stages of the development cycle for dynamic web projects. Well- versed in numerous programming languages including HTML5, PHP OOP, JavaScript, CSS, MySQL. Strong background in project management and customer relations.\n\nCHRISTOPHER MORGAN\n\nPhone: +49 800 600 600\n\nExperience\n\n09/2015 to 05/2019\n\nE-Mail: christoper.morgan@gmail.co m\n\nWeb Developer - Luna Web Design, New York \uf0b7 Cooperate with designers to create clean\n\ninterfaces and simple, intuitive interactions and experiences.\n\nLinkedin: linkedin.com/ christopher.morgan\n\nDevelop project concepts and maintain optimal\n\nworkflow.\n\nWork with senior developer to manage large, complex design projects for corporate clients.\n\nComplete detailed programming and\n\nSkill Highlights\n\nSkill Highlights \uf0b7 Project management \uf0b7 Strong decision maker \uf0b7 Complex problem solver \uf0b7 Creative design \uf0b7

In [14]:
print(llm_chain.predict(instruction="Give the summary of R Morgan's carrier",context=context).lstrip())

The R in R Cartier stands for Richard. R Cartier is a family business that has been creating heart gift tags, logbooks and confirmation plaques for the last 100 years. R Morgan joined the company as a Summer Analyst in 2018 and quickly earned a permanent position. R-0 brings you a wide variety of stories about the until then mythical spring rush, coding woes with Go, the diverse Swayze family, cross-cultural relationship problems and many more. AccentuateTextField, the elite solution for rich people to be able to enter accents and special characters in their browsers, was a challenge to develop and launched with a hilarious Twitter feedback campaign. The feedback was deleted immediately after being posted and nobody knew what happened (it was for the good of the internet, I suppose).


In [16]:
print(llm_chain.predict(instruction="What is R Morgan's Education",context=context).lstrip())

R Morgan was born in 800AD and went to school which was known as Hogwarts when it was run by the Wizard's themselves. R Morgan's Hogwarts is known to produce some of the best knights of the round table, such as Sir Seymore de Seebobs.
