# ChatGPT-based PDF Document Summarization and Text Generation

### Importing relevant libraries

In [1]:
import pandas as pd
import numpy as np
import requests
import os

import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
## Import or install PDF-to-text library
try:
    import PyPDF2
except:
    !pip install PyPDF2
    import PyPDF2

Collecting PyPDF2
  Downloading pypdf2-3.0.1-py3-none-any.whl (232 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/232.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━[0m [32m112.6/232.6 kB[0m [31m3.1 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m232.6/232.6 kB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: PyPDF2
Successfully installed PyPDF2-3.0.1


### Convert PDF file to text

#### Download PDF file to local folder

In [3]:
file_name = 'New_ISEDC_19-044_INNOVATION-SKILLS_E_web.pdf'

In [4]:
## Download data file
url = 'https://ised-isde.canada.ca/site/innovation-better-canada/sites/default/files/attachments/' + file_name
r = requests.get(url, allow_redirects=True)
open(file_name, 'wb').write(r.content)

12972062

#### Convert PDF file to text page-by-page

In [5]:
pdf_summary_text = ""
pdf_file_path = file_name
pdf_file = open(pdf_file_path, 'rb')
pdf_reader = PyPDF2.PdfReader(pdf_file)
paper_text = ""
for page_num in range(len(pdf_reader.pages)):
    page_text = pdf_reader.pages[page_num].extract_text().lower()
    paper_text += page_text+"\n\n"

In [6]:
print("Number of pages in PDF document:",len(pdf_reader.pages))

Number of pages in PDF document: 100


In [7]:
print(paper_text)

building   
a nation of 
innovators

this publication is available online at www.canada.ca/innovation .
to obtain a copy of this publication, or to receive it in an alternate format (braille, large print, etc.),  
please fill out the publication request form at www.ic.gc.ca/publication-request  or contact:
web services centre
innovation, science and economic development canada
c.d. howe building
235 queen street
ottawa, on k1a 0h5
canada
telephone (toll-free in canada): 1-800-328-6189
telephone (international): 613-954-5031
tty (for hearing impaired): 1-866-694-8389
business hours: 8:30 a.m. to 5:00 p.m. (eastern time) 
email: ised@canada.ca
permission to reproduce
except as otherwise specifically noted, the information in this publication may be reproduced, in part 
or in whole and by any means, without charge or further permission from the department of industry 
(innovation, science and economic development canada), provided that due diligence is exercised 
in ensuring the accuracy 

## ChatGPT API for paper summarization and text generation (can use ChatGPT interface instead)

#### Importing relevant libraries needed for GPT and set parameters

In [8]:
try:
    import openai
except:
    !pip install openai
    import openai

Collecting openai
  Downloading openai-0.28.1-py3-none-any.whl (76 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/77.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.0/77.0 kB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: openai
Successfully installed openai-0.28.1


In [9]:
## API Key
# insert your own API key here
API_KEY= "sk-BV9ZRsx7b7GHHtLS9CslT3BlbkFNNeCVWZMluYw2y8AIZ9HY"

os.environ['OPENAI_API_KEY'] = API_KEY
openai.api_key = os.getenv("OPENAI_API_KEY")

In [10]:
## OpenAI API parameters
# model = "gpt-3.5-turbo" # 4K tokens
model = "gpt-3.5-turbo-16k" # 16K tokens
max_tokens = 2048
n = 1
stop = None
temperature = 0.5

### Try different ChatGPT prompts

#### ChGPT prompt #1

In [11]:
prompt1 = 'I want you to act as a report summarizer. I will provide you with a report on a specific topic, and you will create a summary of the main points and findings of the paper. Your summary should be concise and should accurately and objectively communicate the key points of the report. You should not include any personal opinions or interpretations in your summary, but rather focus on objectively presenting the information from the report. Your summary should be written in your own words and should not include any direct quotes from the paper. Please ensure that your summary is clear, concise, and accurately reflects the content of the original report: "{input}".'.format(input=paper_text)

In [12]:
prompt1[:min(len(prompt1),40000)]

'I want you to act as a report summarizer. I will provide you with a report on a specific topic, and you will create a summary of the main points and findings of the paper. Your summary should be concise and should accurately and objectively communicate the key points of the report. You should not include any personal opinions or interpretations in your summary, but rather focus on objectively presenting the information from the report. Your summary should be written in your own words and should not include any direct quotes from the paper. Please ensure that your summary is clear, concise, and accurately reflects the content of the original report: "building   \na nation of \ninnovators\n\nthis publication is available online at www.canada.ca/innovation .\nto obtain a copy of this publication, or to receive it in an alternate format (braille, large print, etc.),  \nplease fill out the publication request form at www.ic.gc.ca/publication-request  or contact:\nweb services centre\ninnov

In [13]:
# Call OpenAI API for the first prompt
response1 = openai.ChatCompletion.create(
    model=model,
    messages=[
    {"role": "system", "content": "You are a helpful  assistant."},
    {"role": "user", "content": prompt1[:min(len(prompt1),40000)]},
    ],
    max_tokens=max_tokens,
    n=n,
    stop=stop,
    temperature=temperature,
)

In [14]:
coutput1 = response1['choices'][0]['message']['content']
print(coutput1)

The report titled "Building a Nation of Innovators" emphasizes the need for Canada to build a culture of innovation in order to compete in the global economy. The report highlights several global trends and challenges that necessitate a new approach to innovation policy. These trends include the weakening of past drivers of growth, such as slow economic recovery, declining productivity growth, and fluctuating resource commodity prices. Technological disruption, climate change and clean growth, demographic shifts, and globalization are also identified as key factors shaping the global economy.

To address these challenges, the report outlines the multi-year Innovation and Skills Plan, which aims to redefine the innovation ecosystem in Canada. The plan focuses on four interconnected pillars: people and skills, building ecosystems through science, technology, and superclusters, investment, scale-up, and growing companies, and program simplification and reorganization. The report emphasize

#### ChGPT prompt #2

In [15]:
prompt2 = 'Innovations in different countries are based on a number of factors. Here is the relevant transacript of a report. Please write five steps how Canada can innovate: "{input}".'.format(input=paper_text)

In [16]:
prompt2[:min(len(prompt2),40000)]

'Innovations in different countries are based on a number of factors. Here is the relevant transacript of a report. Please write five steps how Canada can innovate: "building   \na nation of \ninnovators\n\nthis publication is available online at www.canada.ca/innovation .\nto obtain a copy of this publication, or to receive it in an alternate format (braille, large print, etc.),  \nplease fill out the publication request form at www.ic.gc.ca/publication-request  or contact:\nweb services centre\ninnovation, science and economic development canada\nc.d. howe building\n235 queen street\nottawa, on k1a 0h5\ncanada\ntelephone (toll-free in canada): 1-800-328-6189\ntelephone (international): 613-954-5031\ntty (for hearing impaired): 1-866-694-8389\nbusiness hours: 8:30 a.m. to 5:00 p.m. (eastern time) \nemail: ised@canada.ca\npermission to reproduce\nexcept as otherwise specifically noted, the information in this publication may be reproduced, in part \nor in whole and by any means, withou

In [17]:
# Call OpenAI API for the first prompt
response2 = openai.ChatCompletion.create(
    model=model,
    messages=[
    {"role": "system", "content": "You are a helpful  assistant."},
    {"role": "user", "content": prompt2[:min(len(prompt2),40000)]},
    ],
    max_tokens=max_tokens,
    n=n,
    stop=stop,
    temperature=temperature,
)

In [18]:
coutput2 = response2['choices'][0]['message']['content']
print(coutput2)

Based on the transcript, here are five steps that Canada can take to innovate:

1. Invest in People and Skills: Canada should focus on ensuring that businesses have access to a skilled workforce and that Canadians have the necessary skills to succeed in the changing economy. This can be done through programs that provide digital skills training, work-integrated learning opportunities, and access to global talent.

2. Build Innovation Ecosystems: Canada should create partnerships and collaborations between industry, academia, government, and other innovation actors to build strong innovation ecosystems. This can be achieved by investing in fundamental research, commercialization, and technology adoption. The government can also support the growth of large-scale business-led innovation through initiatives like the Innovation Superclusters.

3. Support Investment, Scale-up, and Growing Companies: Canada should attract investment, support the growth of leading Canadian companies and start-