# IMDG Genie

## Project Name: 
IMDG Genie

## Service Summary: 
Support users who has any IMDG Code related question and provide relevant information in natrual language but only based on IMDG Code book

## Definition Problem:
The contents in IMDG Code is huge and very specialized information. It is required a long-term experience to understand the details in IMDG Code. Also, English based code book makes difficult to understand intuitive way

## Prompt:
"Tell me about IMDG class 3."

"I need to understand about limited quantity."

"Provide requirements for handling marine pollutants."


In [None]:
import os
import openai
import gradio as gr
from dotenv import load_dotenv
from langchain_openai import OpenAI

load_dotenv('api.env') 
api_key = os.getenv('OPENAI_API_KEY')

In [None]:
from langchain.chat_models import ChatOpenAI
from langchain.schema import (
    AIMessage,
    HumanMessage,
    SystemMessage
)

In [None]:
# # 파이썬으로 pdf 파일을 읽기 위한 라이브러리입니다.
# !pip install -q pypdf

# # 벡터 데이터베이스를 지원합니다.
# !pip install -q chromadb

# # 토큰을 계산하는 라이브러리입니다.
# !pip install -q tiktoken

In [None]:
# from langchain.embeddings.openai import OpenAIEmbeddings
from langchain_openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.document_loaders import PyPDFLoader

In [None]:
# pdf 파일을 읽습니다.
loader = PyPDFLoader("/Users/kenny_jung/Documents/GitHub/IMDG.pdf") 
documents = loader.load()

In [None]:
# pdf 파일의 내용을 1000글자씩 자릅니다.
text_splitter = CharacterTextSplitter(chunk_size=3000, chunk_overlap=500)
texts= text_splitter.split_documents(documents)

In [None]:
len(texts)

1080

In [None]:
# 결과가 어떻게 나왔을까요?
texts

In [None]:
# pdf의 내용을 임베딩(embedding)하여 벡터 데이터베이스에 저장합니다.
# 임베딩: 텍스트를 모델이 이해할 수 있는 벡터(숫자들의 배열) 형태로 변환하는 것

embeddings = OpenAIEmbeddings()
vector_store = Chroma.from_documents(texts, embeddings)
retriever = vector_store.as_retriever(search_kwargs={"k": 10})

In [None]:
# 랭체인의 대화 모델을 불러옵니다.
# from langchain.chat_models import ChatOpenAI
from langchain_openai import ChatOpenAI
from langchain.chains import RetrievalQAWithSourcesChain

# GPT-4를 사용할 수 있다면 gpt-3.5-turbo 대신 gpt-4를 쓰셔도 좋습니다.
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0, max_tokens=500)

# 질의응답을 위한 체인을 정의합니다.
chain = RetrievalQAWithSourcesChain.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever = retriever,
    return_source_documents=True)

In [None]:
query = "Tell me about marine pollutants"
result = chain(query)
print(result)

  warn_deprecated(


{'question': 'Tell me about marine pollutants', 'answer': 'Marine pollutants are substances subject to the provisions of Annex III of MARPOL, as amended. They are classified and transported under the appropriate entries according to their properties. Marine pollutants are identified by the symbol "MP" in the column headed MP in the Index. They are classified in accordance with Chapter 2.9.3 of the IMDG Code. Marine pollutants shall be marked with the marine pollutant mark, which is a square set at an angle of 45 degrees with a fish and tree symbol in black on a white background. The minimum dimensions for the mark are 100 mm x 100 mm with a minimum line width of 2 mm. Marine pollutants are transported under specific consignment procedures outlined in Part 5 of the IMDG Code. The classification criteria for marine pollutants are based on their acute aquatic toxicity, chronic aquatic toxicity, potential for bioaccumulation, and degradation properties.\n\n', 'sources': '/Users/kenny_jung/

In [None]:
print(result['answer'])

Marine pollutants are substances subject to the provisions of Annex III of MARPOL, as amended. They are classified and transported under the appropriate entries according to their properties. Marine pollutants are identified by the symbol "MP" in the column headed MP in the Index. They are classified in accordance with Chapter 2.9.3 of the IMDG Code. Marine pollutants shall be marked with the marine pollutant mark, which is a square set at an angle of 45 degrees with a fish and tree symbol in black on a white background. The minimum dimensions for the mark are 100 mm x 100 mm with a minimum line width of 2 mm. Marine pollutants are transported under specific consignment procedures outlined in Part 5 of the IMDG Code. The classification criteria for marine pollutants are based on their acute aquatic toxicity, chronic aquatic toxicity, potential for bioaccumulation, and degradation properties.




In [None]:
print(result['sources'])

/Users/kenny_jung/Documents/GitHub/IMDG.pdf


In [None]:
from langchain.prompts.chat import (
    ChatPromptTemplate,
    SystemMessagePromptTemplate,
    HumanMessagePromptTemplate,
)

system_template="""
You are IMDG Code specialist who has a deep understanding of the International Maritime Dangerous Goods (IMDG) Code. You are asked to provide detailed information about the questions.

Use the following pieces of context to answer the users question in details.

Given the following summaries of a long document and a question, create a final answer with references ("SOURCES"), use "SOURCES" in capital letters regardless of the number of sources.

Provide the information in a clear and concise manner in a way that is easy to understand. 

Provide the feedback with bullet points to list the information in organized manner.

If the question is not clear, ask the user to clarify the question.

If the question is asking about UN No. or UN Number of four(4) digits basis, please answer with reference to the pdf file from page 572 to 913.

If the question made by Korean, please answer in Korean.

If you don't know the answer, just say that "I don't know", don't try to make up an answer.
----------------
{summaries}
"""

# If you don't know the answer, just say that "I don't know", don't try to make up an answer.

messages = [
    SystemMessagePromptTemplate.from_template(system_template),
    HumanMessagePromptTemplate.from_template("{question}")
]

prompt = ChatPromptTemplate.from_messages(messages)

In [None]:
from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQAWithSourcesChain

chain_type_kwargs = {"prompt": prompt}

# GPT-4를 사용할 수 있다면 gpt-3.5-turbo 대신 gpt-4를 쓰셔도 좋습니다.
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0, max_tokens=500)

chain = RetrievalQAWithSourcesChain.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever = retriever,
    return_source_documents=True,
    chain_type_kwargs=chain_type_kwargs
)

  warn_deprecated(


In [None]:
query = "Tell me about marine pollutants"
result = chain(query)
print(result)

{'question': 'Tell me about marine pollutants', 'answer': 'Marine pollutants are substances that are subject to the provisions of Annex III of MARPOL, as amended. These substances are regulated under the International Maritime Dangerous Goods (IMDG) Code to prevent harm to the marine environment during transportation by sea. Here is detailed information about marine pollutants:\n\n- **Definition**: Marine pollutants are substances identified in Annex III of MARPOL, which includes substances harmful to the marine environment.\n- **Transportation**: Marine pollutants must be transported according to the provisions of Annex III of MARPOL, as amended.\n- **Identification**: The IMDG Code uses the symbol "MP" to indicate substances, materials, and articles that are marine pollutants.\n- **Classification**: Marine pollutants are classified according to their properties within classes 1 to 8. If not falling within these classes, they are transported as "ENVIRONMENTALLY HAZARDOUS SUBSTANCE, SO

In [None]:
print(result['answer'])

Marine pollutants are substances that are subject to the provisions of Annex III of MARPOL, as amended. These substances are regulated under the International Maritime Dangerous Goods (IMDG) Code to prevent harm to the marine environment during transportation by sea. Here is detailed information about marine pollutants:

- **Definition**: Marine pollutants are substances identified in Annex III of MARPOL, which includes substances harmful to the marine environment.
- **Transportation**: Marine pollutants must be transported according to the provisions of Annex III of MARPOL, as amended.
- **Identification**: The IMDG Code uses the symbol "MP" to indicate substances, materials, and articles that are marine pollutants.
- **Classification**: Marine pollutants are classified according to their properties within classes 1 to 8. If not falling within these classes, they are transported as "ENVIRONMENTALLY HAZARDOUS SUBSTANCE, SOLID, N.O.S., UN 3077" or "ENVIRONMENTALLY HAZARDOUS SUBSTANCE, L

In [None]:
print(result['source_documents'])

[Document(page_content='Chapter 2.10 \nMarine pollutants \n2.10.1 Definition \n2.10.2 \n2.10.2.1 \n2.10.2.2 \n2.10.2.3 \n2.10.2.4 \n2.10.2.5 \n2.10.2.6 \n2.10.2.7 \n2.10.3 \n2.10.3.1 \n2.10.3.2 \n144 Maríne pollutants means substances which are subject to the provisions of Annex 111 of MARPOL, as amended. \nGeneral provisions \nMarine pollutants shall be transported under the provisions of Annex 111 of MARPOL , as amended. \nThe Index indicates by the symb이 p in the column headed MP those substances , materials and articles that \nare identified as marine p이lutants. \nMarine pollutants shall be transported under the appr야xiate entry according to their properties if they fall \nwithin the criteria of any of the classes 1 to 8. If they do not fall within the criteria of any of these classes, \nthey shall be transported under the entry: ENVIRONMENTALLY HAZARDOUS SUBSTANCE , SOLlD, N.O.S., \nUN 3077 or ENVIRONMENTALLY HAZARDOUS SUBSTANCE , LlQUID, N.O.S., UN 3082, as appropriate , \nunless

In [None]:
# 이렇게 하면 답변 내용을 깔끔하게 정리할 수 있습니다.
for doc in result['source_documents']:
    print('내용 : ' + doc.page_content[0:100].replace('\n', ' '))
    print('파일 : ' + doc.metadata['source'])
    print('페이지 : ' + str(doc.metadata['page']))

내용 : Chapter 2.10  Marine pollutants  2.10.1 Definition  2.10.2  2.10.2.1  2.10.2.2  2.10.2.3  2.10.2.4  
파일 : /Users/kenny_jung/Documents/GitHub/IMDG.pdf
페이지 : 156
내용 : Part 5 -Consignment procedures  .5 Marine pollutants: Except as provided in 2.10.2.7, if the goods t
파일 : /Users/kenny_jung/Documents/GitHub/IMDG.pdf
페이지 : 300
내용 : Part 5 -Consignment procedures  The marine pollutant mark shall be as shown in the figure below. 5.2
파일 : /Users/kenny_jung/Documents/GitHub/IMDG.pdf
페이지 : 282
내용 : Chapter 1.1 -General provísions  Regulation 2  Application  The carriage of harmful substances is pr
파일 : /Users/kenny_jung/Documents/GitHub/IMDG.pdf
페이지 : 19
내용 : Part 2 -C/assification  For the purposes of this section,  Substance means chemical elements and the
파일 : /Users/kenny_jung/Documents/GitHub/IMDG.pdf
페이지 : 146
내용 : Preamble  Carriage of dangerous goods by sea is regulated in order to reasonably prevent injury to p
파일 : /Users/kenny_jung/Documents/GitHub/IMDG.pdf
페이지 : 11
내용 : Note: F

In [None]:
import gradio as gr

# 채팅봇의 응답을 처리하는 함수를 정의합니다.
def respond(user_input_message, chatbot_ui):

    # 사용자의 메시지를 체인으로 처리한 결과입니다.
    result = chain(user_input_message)
    ai_respond_message = result['answer']

    for i, doc in enumerate(result['source_documents']):
        ai_respond_message += '[' + str(i+1) + '] ' + doc.metadata['source'] + '(' + str(doc.metadata['page']) + ') '

    # 채팅 기록에 사용자의 메시지와 봇의 응답을 추가합니다.
    chatbot_ui.append((user_input_message, ai_respond_message))

    # 수정된 채팅 기록을 반환합니다.
    return "", chatbot_ui


# gr.Blocks()를 사용하여 인터페이스를 생성합니다.
with gr.Blocks() as demo:

    # '채팅창'이라는 레이블을 가진 채팅봇 컴포넌트를 생성합니다.
    chatbot_ui = gr.Chatbot(label="채팅창")

    # '입력'이라는 레이블을 가진 텍스트박스를 생성합니다.
    user_input_message = gr.Textbox(label="입력")

    # 텍스트박스에 메시지를 입력하고 제출하면 respond 함수가 호출되도록 합니다.
    user_input_message.submit(respond, [user_input_message, chatbot_ui], [user_input_message, chatbot_ui])


# 인터페이스를 실행합니다.
# 사용자는 '입력' 텍스트박스에 메시지를 작성하고 제출할 수 있으며,
# '초기화' 버튼을 통해 채팅 기록을 초기화할 수 있습니다.
demo.launch(share=True, debug=True)

Running on local URL:  http://127.0.0.1:7860
Running on public URL: https://160714e5a8931d4b81.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)


Keyboard interruption in main thread... closing server.
Killing tunnel 127.0.0.1:7860 <> https://160714e5a8931d4b81.gradio.live


