# Text Generation Exercise 

# GOAL: Using LLMs, create a program that can read in a PDF and create a simplified explanation.
### COMPLETE THE TASKS BELOW, NOTE SOME CELLS ARE ALREADY FILLED OUT FOR YOU

----

## TASK: Install PyPDF2 Library if you do not have it already. Hint: https://pypdf2.readthedocs.io/en/3.0.0/user/installation.html

In [1]:
#Install Here with !pip install library_name or Install at Command Line

In [2]:
!pip install PyPDF2

Defaulting to user installation because normal site-packages is not writeable


## TASK: Write a function that uses PyPDF2 Library to Read in the Corporate_Travel_Policy.pdf file text as a string.

Hint: This PDF only has one single page, so index 0 is the only page you need to read.

Hint: Reading in PDFs usually causes some issues, see if you can fix this by replace \n with just a whitespace. Use the .replace() method for this. Remember you care more about content than true formatting for the LLM.

Hint: https://pypdf2.readthedocs.io/en/3.0.0/user/extract-text.html

In [3]:
# CODE HERE

In [4]:
from PyPDF2 import PdfReader

def read_pdf_text(pdf_filepath):
    reader = PdfReader(pdf_filepath)
    page = reader.pages[0]
    return page.extract_text().replace('\n', ' ')

In [5]:
result = read_pdf_text("Corporate_Travel_Policy.pdf")

In [6]:
print(result)

Corporate Travel and Time Off Policy Introduction This policy establishes clear guidelines and procedures for time off and corporate travel for employees. It aims to ensure fair and consistent application throughout the organization while supporting operational needs. Annual Paid Time Off (PTO) Entitlement ● PTO Allocation: All employees receive five weeks (25 working days) of PTO per calendar year . ● Accrual of PTO: PTO accrues monthly based on the annual entitlement. ● Carryover: Unused PTO cannot be carried over to the next year . Employees are encouraged to utilize their PTO within the accrual year . Time Off Beyond PTO ● Managerial Approval: Additional time off beyond the allocated five weeks requires prior approval from the employee's direct manager . ● Request Procedure: Submit time off requests at least four weeks in advance for any period exceeding annual PTO. ● Considerations for Approval: Managers will assess the operational impact, employee performance and attendance, and 

----

## TASK: Establish a connection to Amazon Bedrock Runtime

In [7]:
# CODE HERE

In [8]:
import boto3

bedrock_runtime = boto3.client(region_name='us-east-1', service_name='bedrock-runtime')

## TASK: Engineer a Prompt that takes in a user question about the PDF text and then inserts the text as context.

**For an example user question, try: "How many working days of PTO do employees get?"**

In [9]:
#CODE HERE

In [10]:
pdf_text = read_pdf_text('Corporate_Travel_Policy.pdf')
user_question = "How many working days of PTO do employees get?"

In [11]:
prompt = f"Answer the following questions: {user_question}. Here is the reference text:\n{pdf_text}"

## TASK: Test your prompt by calling it with an Amazon Bedrock Model, choose any model you prefer.

In [12]:
# CODE HERE

In [13]:
import json

body = json.dumps({'inputText': prompt, 'textGenerationConfig': {'temperature':0, 'maxTokenCount':4096}})

In [14]:
response = bedrock_runtime.invoke_model(body=body, modelId='amazon.titan-text-express-v1')
response_body = json.loads(response.get('body').read())
print(response_body['results'][0]['outputText'])

 It is important to note that these policies are subject to change and may be modified by the company based on business needs and legal requirements.


## OPTIONAL TASK: Create one single function that takes in a PDF and a user question, then returns the answer via the LLM call. Basically just create a single function out of everything you just did above.

In [15]:
# COMBINE ALL YOUR CODE HERE TO ONE NICE FUNCTION

In [17]:
from PyPDF2 import PdfReader

def answer_with_context(user_question,pdf_filepath):
    
    reader = PdfReader(pdf_filepath)
    page = reader.pages[0]
    pdf_text = page.extract_text().replace('\n',' ')
    
    prompt = f"Answer the following question: {user_question}. Here is the reference text:\n{pdf_text}"
    
    body = json.dumps({
    "inputText": prompt,
    "textGenerationConfig": {
        "temperature": 0,
        "maxTokenCount": 4096
        }
    })
    
    response = bedrock_runtime.invoke_model(body=body, modelId="amazon.titan-text-express-v1")
    response_body = json.loads(response.get('body').read())
    return response_body["results"][0]["outputText"]

In [18]:
result = answer_with_context("How many working days of PTO do employees get per year?", "Corporate_Travel_Policy.pdf")
print(result)

 By adhering to these policies, we can ensure that time off and corporate travel are managed effectively, promoting work-life balance, employee satisfaction, and efficient operations within the organization.
