In [4]:

import os

import tiktoken
import unstructured
from langchain import OpenAI
from langchain.chains.question_answering import load_qa_chain
from langchain.chains.summarize import load_summarize_chain
from langchain.document_loaders import PyPDFLoader

                    engine was transferred to model_kwargs.
                    Please confirm that engine is what you intended.


### Load model

In [None]:

OPENAI_API_KEY = '...'
llm = OpenAI(openai_api_key=OPENAI_API_KEY)

### Load Documents

In [115]:
from langchain.document_loaders import PyPDFLoader
from langchain.docstore.document import Document

# Load the PDF
job_loader = PyPDFLoader("../data/example.pdf")
pages = job_loader.load_and_split()

# Concatenate the text from all pages
document_text = "".join([page.page_content for page in pages])

# Create a list of Document object
job_doc = [Document(page_content=document_text)]

In [116]:
def doc_summary(docs):
    print (f'You have {len(docs)} document(s)')
    
    num_words = sum([len(doc.page_content.split(' ')) for doc in docs])
    
    print (f'You have roughly {num_words} words in your docs')
doc_summary(job_doc)

You have 1 document(s)
You have roughly 10285 words in your docs

Preview: 
WORKING PAPER
GPTs are GPTs: An Early Look at the Labor Market Impact Potential
of Large Language Models
Tyna Eloundou1, Sam Manning1,2, Pamela Mishkin 1, and Daniel Rock3
1OpenAI
2OpenResearch
3University of Pennsylvania
August 22, 2023
Abstract
We investigate the potential implications of large language models (LLMs), such as Generative Pre-
trained Transformers (GPTs), on the U.S


### Summarize: Stuff

In [101]:
chain = load_summarize_chain(llm, chain_type="stuff", verbose=True )

In [102]:
chain.run(job_doc)



[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


"WORKING PAPER
GPTs are GPTs: An Early Look at the Labor Market Impact Potential
of Large Language Models
Tyna Eloundou1, Sam Manning1,2, Pamela Mishkin 1, and Daniel Rock3
1OpenAI
2OpenResearch
3University of Pennsylvania
August 22, 2023
Abstract
We investigate the potential implications of large language models (LLMs), such as Generative Pre-
trained Transformers (GPTs), on the U.S. labor market, focusing on the increased capabilities arising from
LLM-powered software compared to LLMs on their own. Using a new rubric, we assess occupations based
on their alignment with LLM capabilities, integrating both human expertise and GPT-4 classiﬁcations.
Our ﬁndings reveal that around 80% of the U.S. workforce could have at least 10% of their work tasks
aﬀected by the introduction of LLMs, while approximately 19% of w

InvalidRequestError: This model's maximum context length is 8192 tokens. However, your messages resulted in 18821 tokens. Please reduce the length of the messages.

### Summarize: Map Reduce

In [152]:
from langchain.text_splitter import TokenTextSplitter

text_splitter = TokenTextSplitter(chunk_size=1024, chunk_overlap=100)
job_doc = text_splitter.split_documents(job_doc)
job_doc

[Document(page_content='WORKING PAPER\nGPTs are GPTs: An Early Look at the Labor Market Impact Potential\nof Large Language Models\nTyna Eloundou1, Sam Manning1,2, Pamela Mishkin\x001, and Daniel Rock3\n1OpenAI\n2OpenResearch\n3University of Pennsylvania\nAugust 22, 2023\nAbstract\nWe investigate the potential implications of large language models (LLMs), such as Generative Pre-\ntrained Transformers (GPTs), on the U.S. labor market, focusing on the increased capabilities arising from\nLLM-powered software compared to LLMs on their own. Using a new rubric, we assess occupations based\non their alignment with LLM capabilities, integrating both human expertise and GPT-4 classiﬁcations.\nOur ﬁndings reveal that around 80% of the U.S. workforce could have at least 10% of their work tasks\naﬀected by the introduction of LLMs, while approximately 19% of workers may see at least 50% of their\ntasks impacted. We do not make predictions about the development or adoption timeline of such LLMs.\n

In [104]:
chain = load_summarize_chain(llm, chain_type="map_reduce", verbose=True)
chain.run(job_doc[:3])



[1m> Entering new MapReduceDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


"WORKING PAPER
GPTs are GPTs: An Early Look at the Labor Market Impact Potential
of Large Language Models
Tyna Eloundou1, Sam Manning1,2, Pamela Mishkin 1, and Daniel Rock3
1OpenAI
2OpenResearch
3University of Pennsylvania
August 22, 2023
Abstract
We investigate the potential implications of large language models (LLMs), such as Generative Pre-
trained Transformers (GPTs), on the U.S. labor market, focusing on the increased capabilities arising from
LLM-powered software compared to LLMs on their own. Using a new rubric, we assess occupations based
on their alignment with LLM capabilities, integrating both human expertise and GPT-4 classiﬁcations.
Our ﬁndings reveal that around 80% of the U.S. workforce could have at least 10% of their work tasks
aﬀected by the introduction of LLMs, while approximately 19% 

'A study on the potential impact of large language models (LLMs) such as Generative Pre-trained Transformers (GPTs) on the US labor market found that around 80% of the US workforce could have at least 10% of their work tasks affected by the introduction of LLMs, while approximately 19% of workers may see at least 50% of their tasks impacted. The study proposes a rubric to assess LLM capabilities and their potential impact on jobs and discusses the potential economic, social, and policy implications of LLMs. The paper also analyzes the challenges for policymakers to predict and regulate the eventual trajectory of LLM development and application.'

### Summarize: Refine

In [119]:
prompt_template = (
    "{question}"
    "\n"
    "\n"
    "{text}"
    "\n"
    "\n"
    "CONCISE SUMMARY:"
)

refine_prompt_template = (
    "We have provided an existing summary up to a certain point: {existing_answer}"
    "We have the opportunity to refine the existing summary (only if needed) with some more context below."
    "------------"
    "{text}"
    "------------"
    "Given the new context, and the original summary, please answer {question}."
)

question_prompt = PromptTemplate.from_template(prompt_template)
refine_prompt = PromptTemplate.from_template(refine_prompt_template)

In [120]:
chain = load_summarize_chain(
    llm,
    chain_type="refine",
    verbose=True,
    question_prompt=question_prompt,
    refine_prompt=refine_prompt,
)
query = "summarize the document in 50 words"
summary_result = chain({"input_documents": job_doc[:3], "question": query})
summary_result



[1m> Entering new RefineDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3msummarize the document in 50 words

WORKING PAPER
GPTs are GPTs: An Early Look at the Labor Market Impact Potential
of Large Language Models
Tyna Eloundou1, Sam Manning1,2, Pamela Mishkin 1, and Daniel Rock3
1OpenAI
2OpenResearch
3University of Pennsylvania
August 22, 2023
Abstract
We investigate the potential implications of large language models (LLMs), such as Generative Pre-
trained Transformers (GPTs), on the U.S. labor market, focusing on the increased capabilities arising from
LLM-powered software compared to LLMs on their own. Using a new rubric, we assess occupations based
on their alignment with LLM capabilities, integrating both human expertise and GPT-4 classiﬁcations.
Our ﬁndings reveal that around 80% of the U.S. workforce could have at least 10% of their work tasks
aﬀected by the introduction of LLMs, while approximately 19% of workers m

In [121]:
print(summary_result['output_text'])

A study by OpenAI and the University of Pennsylvania suggests that up to 49% of US workers could have half or more of their tasks exposed to large language models (LLMs), with higher-wage occupations presenting higher exposure. LLMs are thought to have considerable economic, social, and policy implications, and exhibit traits of general-purpose technologies. The study also finds that information processing industries have high exposure, while manufacturing, agriculture, and mining have lower exposure. Overall, the impact of LLMs is expected to persist and increase, even if new capabilities are not developed.


### Question-Answering: Map Re-reduce

In [129]:
chain = load_qa_chain(
    llm, chain_type="map_reduce", verbose=True, return_intermediate_steps=True
)
query = "what are the skills that are not likely to be replaced by LLMs?"

result = chain({"input_documents": job_doc, "question": query}
, return_only_outputs=True
)
result['output_text']



[1m> Entering new MapRerankDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mUse the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

In addition to giving an answer, also return a score of how fully it answered the user's question. This should be in the following format:

Question: [question here]
Helpful Answer: [answer here]
Score: [score between 0 and 100]

How to determine the score:
- Higher is a better answer
- Better responds fully to the asked question, with sufficient level of detail
- If you do not know the answer based on the context, that should be a score of 0
- Don't be overconfident!

Example #1

Context:
---------
Apples are red
---------
Question: what color are apples?
Helpful Answer: red
Score: 100

Example #2

Context:
---------
it was night and the witness forgot his glasses. he was not sure if it 




[1m> Finished chain.[0m

[1m> Finished chain.[0m


In [130]:
result['output_text']

'Science and critical thinking skills are strongly negatively associated with exposure, suggesting that occupations requiring these skills are less likely to be impacted by current LLMs.'

### Question-Answering: Map Re-Rank

In [156]:
chain = load_qa_chain(
    llm, chain_type="map_rerank", verbose=True, return_intermediate_steps=True
)
query = "what are the skills that are not likely to be replaced by LLMs?"

result = chain(
    {"input_documents": job_doc, "question": query}, return_only_outputs=True
)
result["output_text"]



[1m> Entering new MapRerankDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mUse the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

In addition to giving an answer, also return a score of how fully it answered the user's question. This should be in the following format:

Question: [question here]
Helpful Answer: [answer here]
Score: [score between 0 and 100]

How to determine the score:
- Higher is a better answer
- Better responds fully to the asked question, with sufficient level of detail
- If you do not know the answer based on the context, that should be a score of 0
- Don't be overconfident!

Example #1

Context:
---------
Apples are red
---------
Question: what color are apples?
Helpful Answer: red
Score: 100

Example #2

Context:
---------
it was night and the witness forgot his glasses. he was not sure if it 




[1m> Finished chain.[0m

[1m> Finished chain.[0m


'Roles heavily reliant on science and critical thinking skills show a negative correlation with exposure to LLMs, indicating that these skills are less likely to be replaced by LLMs. '

### Question-Answering: Document embedding with RetrievalQA and Vector store

In [170]:

from langchain.vectorstores import Qdrant
embedding = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)

qdrant = Qdrant.from_documents(
    job_doc,
    embedding,
    location=":memory:",  # Local mode with in-memory storage only
    collection_name="my_documents",
)

query = "what are the important skills that are not likely to be replaced by LLMs?"
found_docs = qdrant.similarity_search_with_score(query)

for document, score in found_docs:
    print(document.page_content)
    print(f"\nScore: {score}")
    print("\n---\n") 


, potentially o ﬀering a lower payo ﬀ(in
terms of median income) once competency is achieved. Conversely, jobs with no on-the-job training required
or only internship/residency required appear to yield higher income but are more exposed to LLMs.
8For this set of results, all tasks have equal weight within an occupation. Results do not change meaningfully with the
core/supplemental weighting scheme.WORKING PAPER
Figure 5: Vexposure ratings of occupations in the ﬁve Job Zones, which are groups of similar occupations
that are classiﬁed according to the level of education, experience, and on-the-job training needed to perform
them. All tasks are weighted equally.WORKING PAPER
Group Occupations with highest exposure % Exposure
Human UUU Interpreters and Translators 76.5
Survey Researchers 75.0
Poets, Lyricists and Creative Writers 68.8
Animal Scientists 66.7
Public Relations Specialists 66.7
Human VVV Survey Researchers 84.4
Writers and Authors 82.5
Interpreters and Translators 82.4
Public 

In [145]:
from langchain.chains import RetrievalQA

qa_chain = RetrievalQA.from_chain_type(
    llm, retriever=qdrant.as_retriever(search_type="similarity", search_kwargs={"k": 2})
)
# Pass question to the qa_chain
question = "what are the skills that are not likely to be impacted by LLMs?"
result = qa_chain({"query": question})
result["result"]

'According to the study, roles heavily reliant on science and critical thinking skills show a negative correlation with exposure to LLMs, which means they are less likely to be impacted. On the other hand, programming and writing skills are positively associated with LLM exposure, so they are more likely to be impacted. However, it is important to note that this study only measures exposure to LLMs and does not necessarily predict labor-augmenting or labor-displacing effects. Additionally, social, economic, regulatory, and other determinants also play a role in determining the impact on labor productivity or automation outcomes.'