# RAG with Unstructured & AstraDB

This example shows loading and parsing a PDF document with Unstructured.io into an Astra DB Serverless vector store, then querying the index with LangChain.

### Requirements

In [1]:
! pip install --quiet ragstack-ai

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.9/44.9 kB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m86.7/86.7 kB[0m [31m6.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m38.0/38.0 MB[0m [31m10.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m124.4/124.4 kB[0m [31m6.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m817.7/817.7 kB[0m [31m26.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/1.9 MB[0m [31m44.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m291.3/291.3 kB[0m [31m22.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m18.7 MB/s

### Configuration

To use Unstructured.io, you need an API key. Sign-up for one here: https://unstructured.io/api-key-hosted.

In [2]:
import os
from getpass import getpass

os.environ["UNSTRUCTURED_API_KEY"] = getpass("Enter your Unstructured API Key:")
os.environ["UNSTRUCTURED_API_URL"] = getpass("Enter your Unstructured API URL:")
os.environ["ASTRA_DB_API_ENDPOINT"] = input("Enter your Astra DB API Endpoint: ")
os.environ["ASTRA_DB_APPLICATION_TOKEN"] = getpass("Enter your Astra DB Token: ")
os.environ["OPENAI_API_KEY"] = getpass("Enter your OpenAI API Key: ")


Enter your Unstructured API Key:··········
Enter your Unstructured API URL:··········
Enter your Astra DB API Endpoint: https://756b996f-a063-4881-9757-1f7299209ae5-us-east-1.apps.astra.datastax.com
Enter your Astra DB Token: ··········
Enter your OpenAI API Key: ··········


### Using the Unstructured API to parse a PDF

#### Advanced Parsing

The unstructured library aims to simplify and streamline the preprocessing of structured and unstructured documents for downstream tasks. When we partition a document, the output is a list of document Element objects. These element objects represent different components of the source document. Currently, the unstructured library supports the following element types:

* type
    * FigureCaption
    * NarrativeText
    * ListItem
    * Title
    * Address
    * Table
    * PageBreak
    * Header
    * Footer
    * UncategorizedText
    * Image
    * Formula
* element_id
* metadata - see: Metadata page
* text



In [33]:
from langchain_community.document_loaders import unstructured

elements = unstructured.get_elements_from_api(
    file_path="/content/ES-20.02-D2_Quality_Requirement_for_Pressure_Vessel.pdf",
    api_key=os.getenv("UNSTRUCTURED_API_KEY"),
    api_url = os.getenv("UNSTRUCTURED_API_URL"),
    strategy="hi_res", # default "auto"
    pdf_infer_table_structure=True,
)

len(elements)

395

In [34]:
from IPython.display import display, HTML

parents = {}

for el in elements:
    parents[el.id] = el.text

for el in elements:
    if el.category == "Table":
        display(HTML(el.metadata.text_as_html))
    elif el.metadata.parent_id:
        print(f"parent: '{parents[el.metadata.parent_id]}' content: {el.text}")
    else:
        print(el)

QUALITY REQUIREMENTS FOR PRESSURE VESSEL
  
PTT PUBLIC CO., LTD ENGINEERING STANDARD
        
PTT PUBLIC COMPANY LIMITED
parent: 'PTT PUBLIC COMPANY LIMITED' content: GAS SEPARATION PLANT RAYONG
parent: 'PTT PUBLIC COMPANY LIMITED' content: QUALITY REQUIREMENT FOR
parent: 'PTT PUBLIC COMPANY LIMITED' content: PRESSURE VESSEL
parent: 'PTT PUBLIC COMPANY LIMITED' content: ES-20.02
ES-20.02 PAGE 1 OF 28
parent: 'ES-20.02 PAGE 1 OF 28' content: ES-20.02-D2.docx
QUALITY REQUIREMENTS FOR PRESSURE VESSEL
  
PTT PUBLIC CO., LTD ENGINEERING STANDARD
ES-20.02 PAGE 2 OF 28
parent: 'ES-20.02 PAGE 2 OF 28' content: CONTENTS


SECTION,SUBJECT
1.0,SCOPE
2.0,APPLICABLE CODES
3.0,PTT SPECIFICATION& STANDARD
4.0,DRAWINGS AND RELATED DOCUMENT!
5.0,MATERIALS
6.0,FABRICATION
7.0,TESTING AND INSPECTION
8.0,MARKING


parent: 'CONTENTS' content: APPENDIX 1
parent: 'CONTENTS' content: ES-20.02-D2.docx
parent: 'ES-20.02 PAGE 2 OF 28' content: QUALITY REQUIREMENTS FOR PRESSURE VESSEL
  
PTT PUBLIC CO., LTD ENGINEERING STANDARD
ES-20.02 PAGE 3 OF 28
parent: 'ES-20.02 PAGE 3 OF 28' content: 1.0 SCOPE
parent: '1.0 SCOPE' content: The instructions specified in this standard are the basic, minimum require, and general quality requirements for the design, fabrication, testing and inspection of pressure vessels columns.
parent: '1.0 SCOPE' content: Any conflicting requirements shall be referred to PTT for clarification before proceeding with fabrication of the affected parts.
parent: 'ES-20.02 PAGE 3 OF 28' content: 2.0 APPLICABLE CODES AND STANDARDS.
parent: '2.0 APPLICABLE CODES AND STANDARDS.' content: The American Society of Mechanical Engineers (ASME) Boiler and Pressure Vessel Code (ASME Code)
parent: 'ES-20.02 PAGE 3 OF 28' content: ASME Section II Part A, ferrous Materials


0,1
ASME Section V,Non-Destructive Examination
ASME Section VIII Div. 1,Unfired Pressure Vessel
ASME Section VIII Div .2,Alternative rule
ASME Section IX,Welding and Brazing Qualifications


parent: 'ASME Section II Part A, ferrous Materials' content: 2.2 American Society of Mechanical Engineers (ASME)/ American National Standard Institute (ANSI)


ASME B16.5,Pipe Flanges and Flange Fitting,Unnamed: 2
ASME B16.9,Factory made Wrought Steel Butt-welding Fittin,
ASME B16.47,Large Diameter Steel Flange,
ASME B36.10,Welded and Seamless Wrought Steel Pipe,
ASME RTP-1 Resistant,Reinforced Thermoset Plastic (RTP) Corrosion,
ASME RTP-1 Resistant,,Equipment


parent: 'ASME Section II Part A, ferrous Materials' content: 2.3 American Society for Testing Materials (ASTM)
parent: 'ASME Section II Part A, ferrous Materials' content: 2.4.1 Stresses in Large Horizontal Pressure Vessel on Two Saddled Support- by LP Zick
parent: 'ASME Section II Part A, ferrous Materials' content: Part C. Welding Rods, Electrode and Filler Metal
parent: 'ASME Section II Part A, ferrous Materials' content: 2.4 Research Report
parent: 'ASME Section II Part A, ferrous Materials' content: ES-20.02-D2.docx
parent: 'ES-20.02 PAGE 3 OF 28' content: QUALITY REQUIREMENTS FOR PRESSURE VESSEL
  
PTT PUBLIC CO., LTD ENGINEERING STANDARD
ES-20.02 PAGE 4 OF 28
parent: 'ES-20.02 PAGE 4 OF 28' content: 2.4.2 Welding Research Council (WRC) Bulletin No 107
parent: 'ES-20.02 PAGE 4 OF 28' content: 2.5 National Association of Corrosion Engineers
parent: 'ES-20.02 PAGE 4 OF 28' content: NACE MR 0175 Sulphide Stress Cracking Resistant-Metallic Materials for oilfield Equipment
parent: 'ES

0,1
ES-20.01,Vessel Standard
ES-20.03,Marking for Vessel and Heat Exchanger.
ES-20.04,Column Tray
ES-92.01,Hot Insulation
ES-92.02,Cold Insulation
ES-92.05,External Fireproofing of equipment Supports and Structures
ES-92.06,Painting
ES-92.07,Cathodic Protection of Onshore and Buried Pipe work
ES.99.01,Numbering System
ES-99.04,Final Documentation


parent: '3.0 PTT SPECIFICATIONS, STANDARD DRAWINGS' content: The term of “Drawings and Related Documents” shall mean to include, and not be limited to, workshop drawings and design calculations as well as all other relevant documents in detail related to fabrication, testing and inspection.
parent: '3.0 PTT SPECIFICATIONS, STANDARD DRAWINGS' content: All drawings and related documents shall comply with engineering standard ES-99.001 Numbering system and ES-99.04 Final Document.
parent: '3.0 PTT SPECIFICATIONS, STANDARD DRAWINGS' content: All Drawings and related documents are subject to review and comment by PTT /CONSULTANT. However, such review and comment by PTT /CONSULTANT does not in any way relieve CONTRACTOR of his responsibility to meet all requirements of the CONTRACT. Fabrication shall commence only after the drawings and related documents have been approved by the Third Party Inspector and comment received from PTT /CONSULTANT. In case of subcontracted is required before subm

0,1,2
The chemical,"composition,",product analysis shall be limited as follows
% Manganese,= -,1.30 maximum
% Phosphorus,,0.025 maximum
% Sulphur,non,0.003 maximum
% V+Nb,1},0.03 maximum


parent: 'ES-20.02 PAGE 5 OF 28' content: 5.3.3 Hardness of weld metal, parent metal and heat-affected zone shall be 225 (average) max and 240 (single location) max.
parent: 'ES-20.02 PAGE 5 OF 28' content: Above-mentioned hardness values shall be based on HV 10 measurement on a machined and ground cross-section of procedure qualification test plates and shall be BHN hardness values in case of production welds or production materials.
parent: 'ES-20.02 PAGE 5 OF 28' content: 5.0 MATERIALS
parent: '5.0 MATERIALS' content: ES-20.02-D2.docx
parent: 'ES-20.02 PAGE 5 OF 28' content: QUALITY REQUIREMENTS FOR PRESSURE VESSEL
  
PTT PUBLIC CO., LTD ENGINEERING STANDARD
ES-20.02 PAGE 6 OF 28
parent: 'ES-20.02 PAGE 6 OF 28' content: 5.3.4 Pressure parts, welded attachments, internal (including bolting) shall fully comply with the requirement of NACE MR 0175.
parent: 'ES-20.02 PAGE 6 OF 28' content: 5.3.5 Plate furnished by the supplier shall meet the requirement to SA 516. In addition, HIC resist

Carbon steel,600+/-20,60
1.25Cr-0.5Mo,660+/-20,120
2.25Cr-0.5Mo,715+/-25,120
5Cr-0.5Mo or 9Cr-1Mo,730+/-30,120
3.5Ni,615+/-15,60


parent: 'ES-20.02 PAGE 12 OF 28' content: 6.3.6 When clad steel or dissimilar welded parts are heat-treated, the heat- treating procedure shall be submitted for PTT /CONSULTANT approval.
parent: 'ES-20.02 PAGE 12 OF 28' content: Austenitic stainless steel shall not be subject to PWHT or stabilization Heat treatment without approval from PTT / CONSULTANT.
parent: 'ES-20.02 PAGE 12 OF 28' content: 6.3.8 Thermocouples shall be attached every 4.6 meters both longitudinally and circumferentially at the top, bottom and centre of the vessel and at each head.
parent: 'ES-20.02 PAGE 12 OF 28' content: 6.3.9 Plates, seamless heads, parts of built-up heads, and similar pressure-holding parts subject to cold or hot bending or forming or forging shall be heat- treated as required by table 1. Annealing, normalizing, and tempering required by table 1 shall be performed in accordance with table 2.
parent: 'ES-20.02 PAGE 12 OF 28' content: ES-20.02-D2.docx
QUALITY REQUIREMENTS FOR PRESSURE VESSEL
  
PT

Unnamed: 0,t s e T  t n e m e r i u q e R,No extra test i y p r a h C  t c a p m  test i y p r a h C  t c a p m  test,1 per each plate *2,1 per each plate per heat treatm: Lot 3 *  5 t o  l,1 per each dished end,1 por cach plato por heat treatment lot *3 l,1 per each dished end.1,1 per each plate per heat treatment lot *3 l,Unnamed: 9
Heat Treatment,Due to Service,"Required for - alkaline *1 p m e t  >80°C . SG, BWN 5 4 2 >  e n  i l a k l a",- min i B i,0 3 2 >,,g n i z i n o i t u o s,*4,d e r i u q e r  tempering,
,Dueto s e n k c i h T,As per *1 6 5 - S C U  s,as per *1 6 5 - S C U,g n i z i n o i t u o S  When BHN After forming r e t f  A,Not d e r i u q e r,g n i z i n o i t u o s,,Not d e r i u q e r,"ing or, i ing and"
,o t e u D  g n m r o F,As per #1 9 7 - S C U,s per *1 9 7 - S C U,,,g n i z i n o i t u o s,,,Normal Normali
,Mater,cs CS Temp <-10°C*e 3% Ni i,"CMo ""8 -",ss,ss,ss,"S, CS p m e t  <-10°C ½ 3  Ni o M C  ,  i  N","S C  CS p m e t  <1 3%Ni o M C  -  ,  i",
,g n m r o F  *6,d o C  #7 l,,,"Hot Forming temp 900- 1,050 °C",Forming tomp <800 °C,"Forming temp 850 1,050 °C",g n m r o F  p m e t  <850 °C,


parent: 'ES-20.02 PAGE 13 OF 28' content: ES-20.02-D2.docx
parent: 'ES-20.02 PAGE 13 OF 28' content: QUALITY REQUIREMENTS FOR PRESSURE VESSEL
  
PTT PUBLIC CO., LTD ENGINEERING STANDARD
ES-20.02 PAGE 14 OF 28
parent: 'ES-20.02 PAGE 14 OF 28' content: Remarks
parent: 'ES-20.02 PAGE 14 OF 28' content: *1: PWHT can be performed on the dished end after vessel assembly.
parent: 'ES-20.02 PAGE 14 OF 28' content: *2: Simulation by mill can be representative of test.
parent: 'ES-20.02 PAGE 14 OF 28' content: *3: When the dished ends from a plate are heat-treated in different lots and the records (charts) show good conformity with each other with regard to cooling rate (±15 °C) and to soaking temperature (±15 °C), such heat treatments can be deemed as one lot.
parent: 'ES-20.02 PAGE 14 OF 28' content: *4: When normalizing and tempering are required to the base material, tempering shall be performed.
parent: 'ES-20.02 PAGE 14 OF 28' content: *5: No test is required when dished ends are not heat-

Heat Treatmen,Type of Material,Soaking Temp (c,Holding Time (hour),Method of Cooling,Unnamed: 5,Unnamed: 6,Unnamed: 7
Anneal,"AISI Types 304 316, 321, and 347","1,040 - 1,100",1 per 25 mm of thickness but not less than 1/2 1 per 25 mm of thickness but not less than 1,Water Quench or Air blast,,,
Anneal,,Incoloy,1 per 25 mm of thickness but not less than 1/2 1 per 25 mm of thickness but not less than 1,Water Quench or Air blast,"1,150 (1)",,
Anneal,,AIS| Type 310,1 per 25 mm of thickness but not less than 1/2 1 per 25 mm of thickness but not less than 1,Water Quench or Air blast,,,
Normaliz e,C steel C% % Mo steel %2 t0 9 % CrMo steel,900 - 950,1 per 25 mm of thickness but not less than 2,Still air,,,
Normaliz e,,2to 6 % Ni steel,1 per 25 mm of thickness but not less than 2,Still air,820 - 845,,
Temper,%2 t0 9 % CrMo steel,700 - 760,1 per 25 mm of thickness but not less than 1,In furnace (2),,,
Temper,,2to 6 % Ni steel,1 per 25 mm of thickness but not less than 1,593 - 650,,Still air,


parent: 'Table 2 Heat-Treating Requirements' content: Notes :
parent: 'Table 2 Heat-Treating Requirements' content: (1) The temperature range during heat treatments shall be + 0, + 30 oC.
parent: 'Table 2 Heat-Treating Requirements' content: (2) After attaining the soaking temperature and maintaining the required holding time, the cooling time to 430oC shall not be less than 1 h.
parent: 'Table 2 Heat-Treating Requirements' content: ES-20.02-D2.docx
QUALITY REQUIREMENTS FOR PRESSURE VESSEL
  
PTT PUBLIC CO., LTD ENGINEERING STANDARD
ES-20.02 PAGE 16 OF 28
parent: 'ES-20.02 PAGE 16 OF 28' content: 7.0 TESTING AND INSPECTION
parent: '7.0 TESTING AND INSPECTION' content: 7.1.1 PTT/ CONSULTANT reserves the right to inspect, to approve or reject the facilities, materials or CONTRACTOR or SUB-CONTRACTOR’s workmanship at any time.
parent: '7.0 TESTING AND INSPECTION' content: 7.1.2 CONTRACTOR shall take full responsibility for examination in accordance with the requirement of ASME Code sectio

Material,Marking Symbol,Material.1,Marking Symbol.1
SS41,,A193 GrB5,B5
S25C,25¢,A193 GrB6,B6
S35C,35C,A193 GrB7,B7
S45C,45C,A193 GrB16,B16
SCM3,M3,A193 GrB8,B8
A320 L7,L7,A193 GrB8C,B8C
A320 B8,B8L,A193 GrB8m,B8M
TYPE 304 S.S.,S304,A193 GrB8T,B8T
TYPE 304L S.S,S304L,"A194 Gr2, 2H","G2, 2H"
TYPE 321 S.S.,S321,A194 Gr3,G3


parent: 'ES-20.02 PAGE 19 OF 28' content: 9.0 Baseline Thickness measurement
parent: 'ES-20.02 PAGE 19 OF 28' content: 8.0 MARKING
parent: '8.0 MARKING' content: ES-20.02-D2.docx
QUALITY REQUIREMENTS FOR PRESSURE VESSEL
  
PTT PUBLIC CO., LTD ENGINEERING STANDARD
ES-20.02 PAGE 20 OF 28
parent: 'ES-20.02 PAGE 20 OF 28' content: After vessel installation at site, contractor shall carried out thickness measurement of the vessel by using Ultrasonic method (UT).
parent: 'ES-20.02 PAGE 20 OF 28' content: Measurement shall be :
parent: 'ES-20.02 PAGE 20 OF 28' content: done at the internal surface of shell, nozzle, bottom, top, head etc. • basically, at 4 direction 0, 90,180, 270 degree • compare with the design thickness, corrosion allowance, selected thickness
parent: 'ES-20.02 PAGE 20 OF 28' content: The location of measurement and format of report shall be submitted to PTT for prior approval. The report shall be in both hard copy and computer database. Marking of the measurement location 

Material and Services Carbon steel for high and intermediate temperature service (Design temp.>- 10°C,Killed Steel,Class
Carbon steel for low temperature service (Design temp. <-10°C),Fine Grained steel,C
Low-alloy steel for low temperature service,2.5Ni 3.5 Ni Killed steel,D
Austenitic stainless steel,,


parent: '1. CLASSIFICATION OF THE VESSEL' content: Remarks:
parent: '1. CLASSIFICATION OF THE VESSEL' content: 1.0 A vessel may be classified using any combination of the classes provided vessel parts having different materials and/or design condition.
parent: '1. CLASSIFICATION OF THE VESSEL' content: 2.0 For common elements, the more severe classification shall apply.
parent: '1. CLASSIFICATION OF THE VESSEL' content: ES-20.02-D2.docx
QUALITY REQUIREMENTS FOR PRESSURE VESSEL
  
PTT PUBLIC CO., LTD ENGINEERING STANDARD
ES-20.02 PAGE 22 OF 28
parent: 'ES-20.02 PAGE 22 OF 28' content: ES-20.02-D2.docx
  
QUALITY REQUIREMENTS FOR PRESSURE VESSEL PTT PUBLIC CO., LTD ENGINEERING STANDARD
parent: 'QUALITY REQUIREMENTS FOR PRESSURE VESSEL PTT PUBLIC CO., LTD ENGINEERING STANDARD' content: 2. INSPECTION AND TESTING REQUIREMENTS
ES-20.02 PAGE 23 OF 28
parent: 'ES-20.02 PAGE 23 OF 28' content: Inspection and testing requirements for the individual classification of vessel (Table A) are listed, 

Unnamed: 0,Inspection and Testing Requirements,Refer to ASME Paragraph,Notes,Unnamed: 4
(1) UT is required for all carbon steel plates with a nominal thickness of 75 mm and above (*).,(1) UT is required for all carbon steel plates with a nominal thickness of 75 mm and above (*).,(1) UT is required for all carbon steel plates with a nominal thickness of 75 mm and above (*).,(1) UT is required for all carbon steel plates with a nominal thickness of 75 mm and above (*).,
(2),UT is required for all forging (except for standard nozzle flanges) with a nominal thickness of 100 mm and above (*).,,,
(3),Charpy impact tests of materials. Production test plates. Welders performance qualification. With shell thickness 100 mm and above *),UG-84 UCS-66 ucCs-67 Qw-140 Qw-170 QW-401.3,(iii),
(4),MT or PT is required for welding edges of base materials with shell thickness 50 mm and above (*).,UG-84 UCS-66 ucCs-67 Qw-140 Qw-170 QW-401.3,,(iv)
(5 -,MT or PT is required for backside of double welded joints after being prepared for welding with shell thickness 50 mm and above (*).,UG-84 UCS-66 ucCs-67 Qw-140 Qw-170 QW-401.3,,(iv)
(6 =,MT is required for weld surface of: (a) Category A & B with shell thickness 38 mm and above (*) (b) Category C & D when full RT required (*). (c) Category E with shell thickness 50 mm and above (*),UG-84 UCS-66 ucCs-67 Qw-140 Qw-170 QW-401.3,,(vi)
(7),"UT is required for Category A,B & D with shell thickness 75 mm and above (*)",UG-84 UCS-66 ucCs-67 Qw-140 Qw-170 QW-401.3,,(iv)/(vii)


parent: 'Vessel Class B:' content: Note: 1. (*) PTT’s requirements
parent: 'Vessel Class B:' content: ES-20.02-D2.docx
QUALITY REQUIREMENTS FOR PRESSURE VESSEL
  
PTT PUBLIC CO., LTD ENGINEERING STANDARD
ES-20.02 PAGE 24 OF 28
parent: 'ES-20.02 PAGE 24 OF 28' content: Vessel Class C:


Unnamed: 0,Inspection and Testing Requirements,Refer to ASME Paragraph,Notes
(1),UT is required for all plates with a nominal thickness of 38 mm and above (*).,,
(2),UT is required for all forging (except for standard nozzle flanges) with a nominal thickness of 100 mm and above (*).,,
(3),Charpy impact tests of materials. Production test plates. Welders performance qualification.,UCS-66 UG-84 uUCs-67 QWwW-140 Qw-170 QW-401.3,(iii)
(4),MT or PT is required for welding edges of base materials with shell thickness 38 mm and above (*).,,(iv)
(5),MT or PT is required for backside of double welded joints after being prepared for welding with shell thickness 25 mm and above (*).,,(iv)
(6),"MT is required for weld surface of: (a) Category A & B with shell thickness 25 mm and above (*) (b) Category C, D & E (*).",,(v)
(7),"UT is required for Category A,B & D with shell thickness 50 mm and above (*)",,(iv)/(vii )
(8),Vessel production impact test (*).,,(viii)
(9),Tensile test for tube sheet hubs is required.,UW-13 (f),


parent: 'ES-20.02 PAGE 24 OF 28' content: Notes
parent: 'ES-20.02 PAGE 24 OF 28' content: (iv)/(vii )
parent: '(iv)/(vii )' content: Note: 1. (*) PTT’s requirements
parent: '(iv)/(vii )' content: ES-20.02-D2.docx
QUALITY REQUIREMENTS FOR PRESSURE VESSEL
  
PTT PUBLIC CO., LTD ENGINEERING STANDARD
ES-20.02 PAGE 25 OF 28
parent: 'ES-20.02 PAGE 25 OF 28' content: Vessel Class D:


Unnamed: 0,Inspection and Testing Requirements,Refer to ASME Paragraph,Notes,Unnamed: 4
(1),UT is required for all plates with a nominal thickness of 38 mm and above (*).,,,
(2),UT is required for all forgings (except for standard nozzle flanges) with a nominal thickness of 100 mm and above (*).,,,
(3),MT or PT is required for welding edges of base materials with shell thickness 38 mm and above *),,(iv),
(4),MT or PT is required for backside of double welded joints after being prepared for welding *),,(iv),
(5),"MT of weld surfaces for category A, B, C, D & E *)",,(iv) (vii),
(6),"UT is required for category A,B & D with shell thickness 50 mm and above (*).",,(iv) (vii),
(7),Charpy impact test is required for all pressure retaining parts.,UCS-66,(iv) (vii),


parent: 'Vessel Class D:' content: Note: 1. (*) PTT’s requirements
parent: 'Vessel Class D:' content: ES-20.02-D2.docx
QUALITY REQUIREMENTS FOR PRESSURE VESSEL
  
PTT PUBLIC CO., LTD ENGINEERING STANDARD
ES-20.02 PAGE 26 OF 28
parent: 'ES-20.02 PAGE 26 OF 28' content: Vessel Class S:


Unnamed: 0,Inspection and Testing Requirements,Refer to ASME Paragraph,Notes
(1),PT is required for backside of double welded joints after being prepared for welding with shell thickness 25 m and above (*),,(iv)
(2),"PT is required for weld surfaces of: (a) Category A, B, C, D & E in contact with fluids (*). (b) Category A, B, C, D & E when shell thickness 19 mm and above (*). (c) Category C & D when full RT required (*).",,(ix)


parent: 'ES-20.02 PAGE 26 OF 28' content: Note: 1. (*) PTT’s requirements
parent: 'ES-20.02 PAGE 26 OF 28' content: Notes:
parent: 'ES-20.02 PAGE 26 OF 28' content: Main components shall cover such parts as shell, heads, girth, flanges, forged nozzles, tube-sheets, channels and channel covers.
parent: 'ES-20.02 PAGE 26 OF 28' content: The test temperature shall be the minimum design temperature or 0°C, whichever is lower.
parent: 'ES-20.02 PAGE 26 OF 28' content: Charpy impact test shall be according to applicable Codes, Standards and Specifications.
parent: 'ES-20.02 PAGE 26 OF 28' content: This requirement shall apply to the pressure retaining welds (Category A, B & D) in the shell and heads.
parent: 'ES-20.02 PAGE 26 OF 28' content: Category A & B in nozzles that are not subject to RT shall additionally be examined by UT.
parent: 'ES-20.02 PAGE 26 OF 28' content: vi) When full RT required, category A & B in nozzles that are not suitable to RT shall be examined by UT.
parent: 'ES-20.

Here we clearly see that Unstructured is parsing both table and document structure.

### Storing into Astra DB

Now we will continue with the RAG process, by creating embeddings for the pdf, and storing them in Astra.

In [35]:
from langchain_astradb import AstraDBVectorStore
from langchain_openai import OpenAIEmbeddings

astra_db_store = AstraDBVectorStore(
    collection_name="PTT_unstructured",
    embedding=OpenAIEmbeddings(),
    token=os.getenv("ASTRA_DB_APPLICATION_TOKEN"),
    api_endpoint=os.getenv("ASTRA_DB_API_ENDPOINT")
)

We will create LangChain Documents by splitting the text after `Table` elements and before `Title` elements. Additionally, we use the html output format for table data.

In [36]:
from langchain_core.documents import Document

documents = []
current_doc = None

for el in elements:
    if el.category in ["Header", "Footer"]:
        continue # skip these
    if el.category == "Title":
        documents.append(current_doc)
        current_doc = None
    if not current_doc:
        current_doc = Document(page_content="", metadata=el.metadata.to_dict())
    current_doc.page_content += el.metadata.text_as_html if el.category == "Table" else el.text
    if el.category == "Table":
        documents.append(current_doc)
        current_doc = None



In [37]:
# Sanitise the data by filtering out None values from the documents list
documents = [doc for doc in documents if doc is not None]


In [38]:
astra_db_store.add_documents(documents)

['3dd3456e271747ebad9ad986ae8a6982',
 '8caf451b19474519b13e0fc0660668de',
 'e33730ebd3bf4ba7aa3ad2807c225f4c',
 '192062b51cf64965a113e98a442f32f8',
 '09d4864357744352a3ac3de4151de443',
 '1f83710dbb104c5b917f491b2cf2b087',
 'f44214f3b8844974a43d31d75106ad6d',
 '1c20aa1592ac4f9c971faf1097290ebd',
 'f60f8b27d90e4aa5bea8aa951f2cd6b1',
 '99d6376b057b4ce79623302788b04e77',
 '7d4bb9d049874dfca20d78839218f3bd',
 'e1f2734a0c7a4065acf2322e5cd6551d',
 'a76865a4cb7b412d84e2d5c83423e73b',
 'd93602adfe2c47a493305dd3b8b18e50',
 'c3cee894dc494a6b9d70e153e118f1fc',
 '8a3222714e9640019e64b190082d6e7d',
 '7ad2c73727ce4821b4198c5e3bf2d55b',
 'bfcfba34e7d74bffb4b6d0790203c317',
 '6141c30088534d35aa6e8a4eda6b35ee',
 '29dc355c202642f6ba713cd8ac49f13c',
 'd31854c3536146948542e267ca8c314f',
 'ab488d7751f44941b779eb1c3e20391b',
 'f1b78250733d41048631418ff4b503fd',
 'd45b8f7429a5454fb484bf1ab53cf39d',
 'b5e047a7e4a34fdd8305495f16b73a1f',
 '170cbcbf653b4d07ab2c461afa02789a',
 '912da3df14f342e49e0cabb96876af16',
 

### Querying

Now that we have populated our vector store, we will build a RAG pipeline and execute some queries.

In [39]:
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough

prompt = """
Answer the question based only on the supplied context. If you don't know the answer, say "I don't know".
Context: {context}
Question: {question}
Your answer:
"""

llm = ChatOpenAI(model="gpt-3.5-turbo", streaming=False, temperature=0)

chain = (
    {"context": astra_db_store.as_retriever(), "question": RunnablePassthrough()}
    | PromptTemplate.from_template(prompt)
    | llm
    | StrOutputParser()
)

First we can ask a question about some text in the document:

In [40]:
chain.invoke("What kind of markings should be used?")

'Markings in ink, oily paint, or waterproof color should be used.'

Next we can try to get a value from the 2nd table:

In [None]:
chain.invoke("What should be the marking symbol for TYPE 304L S.S?")

'S304L'

In [42]:
chain.invoke("What is the recommended heat treatment for cold forming SS material")

'Annealing, normalizing, and tempering as required by table 1 should be performed in accordance with table 2 for cold forming stainless steel material.'

In [44]:
 chain.invoke("What is the holding temperature for post weld heat treatment for Carbon steel material type?")

"I don't know."

In [49]:
#Debug
astra_db_store.similarity_search_with_score("Carbon steel")

[(Document(page_content='1. CLASSIFICATION OF THE VESSELVessels and their parts are classified as shown in table A:Table A. Classification of Vessel<table><thead><th>Material and Services Carbon steel for high and intermediate temperature service (Design temp.&gt;- 10°C</th><th>Killed Steel</th><th>Class</th></thead><tr><td>Carbon steel for low temperature service (Design temp. &lt;-10°C)</td><td>Fine Grained steel</td><td>C</td></tr><tr><td>Low-alloy steel for low temperature service</td><td>2.5Ni 3.5 Ni Killed steel</td><td>D</td></tr><tr><td>Austenitic stainless steel</td><td></td><td></td></tr></table>', metadata={'filetype': 'application/pdf', 'languages': ['eng'], 'page_number': 21, 'parent_id': '674ecd15c17e6673137a36e51247bf34', 'filename': 'ES-20.02-D2_Quality_Requirement_for_Pressure_Vessel.pdf'}),
  0.91858506),
 (Document(page_content='PTT PUBLIC CO., LTD ENGINEERING STANDARD<table><thead><th>Carbon steel</th><th>600+/-20</th><th>60</th></thead><tr><td>1.25Cr-0.5Mo</td><td>

In [None]:
#Debug
astra_db_store.similarity_search_with_score("What is the holding temperature for post weld heat treatment for Carbon steel material type?")

[(Document(page_content='6.3 Post weld Heat Treatment( PWHT)6.3.1 PWHT shall not be commenced unless and until all non-destructive testing has been successfully completed. PWHT shall be performed prior final examination.6.3.2 No welding is permissible on the equipment after completion of post weld heat treatment.6.3.3 All carbon steel and C- Mn steels shall be subject to PHWT where “Sour service” or “Amine service” is indicated.6.3.4 A continuous record of temperature shall be made on recorder charts.6.3.5 The holding temperature for post weld heat treatment or stress relief temperature shall be :', metadata={'filetype': 'application/pdf', 'languages': ['eng'], 'page_number': 11, 'parent_id': '917e2a0ba25081f5662f26efa4039865', 'filename': 'ES-20.02-D2_Quality_Requirement_for_Pressure_Vessel.pdf'}),
  0.942477),
 (Document(page_content='6.3.6 When clad steel or dissimilar welded parts are heat-treated, the heat- treating procedure shall be submitted for PTT /CONSULTANT approval.Austeni

And finally we can ask a question that doesn't exist in our content to confirm that the LLM rejection is working correctly.