*Copyright - L. Siddharth, Singapore University of Technology and Design* (siddharthl.iitrpr.sutd@gmail.com)

__Short guide to using extracting design knowledge from patent text__
based on the following research:
Siddharth, L., Luo, J., 2024. Retrieval-Augmented Generation using Engineering Design Knowledge. (cs.CL) https://arxiv.org/abs/2307.06985

__Package Installation__
Please install the following packages in the desired Python environment.

In [1]:
from IPython.display import clear_output

!pip install spacy[transformers]
!pip install spacy torch patoolib bs4
!python -m spacy download en_core_web_sm
!python -m spacy download en_core_web_trf

clear_output()

__Module Import__
The package modules shall be imported as follows. The underlying trained transformer models will be downloaded during first time import.
Please ensure that the package folder "design_kgex" is placed in the sample directory as the current working directory.
The package can be downloaded from GitHub - *https://github.com/siddharthl93/engineering-design-knowledge*

In [2]:
from design_kgex import patent_text, design_knowledge

Downloading entity_relation_tagger.rar...


  0%|          | 0/434907286 [00:00<?, ?it/s]

INFO patool: Extracting entity_relation_tagger.rar ...
INFO patool: running "C:\Program Files\WinRAR\rar.EXE" x -- C:\Users\jonny\Desktop\edk\entity_relation_tagger.rar
INFO patool:     with cwd=design_kgex, input=
INFO patool: ... entity_relation_tagger.rar extracted to `design_kgex'.


Downloading relation_identifier.rar...


  0%|          | 0/435547086 [00:00<?, ?it/s]

INFO patool: Extracting relation_identifier.rar ...
INFO patool: running "C:\Program Files\WinRAR\rar.EXE" x -- C:\Users\jonny\Desktop\edk\relation_identifier.rar
INFO patool:     with cwd=design_kgex, input=
INFO patool: ... relation_identifier.rar extracted to `design_kgex'.


__Getting patent sentences__
First, let us use the module "patent_text" to get the list of formatted sentences from a patent. For this purpose, we will use an example patent as follows.
Card for textile fibers with carding cylinders cooperating in series - *https://patents.google.com/patent/US5974628A/*

From the above patent, we will input the patent number as follows.

In [3]:
import random

patent_number = "7520356"
sentences = patent_text.getPatentText(patent_number)

print(f"The patent {patent_number} includes {len(sentences)} sentences. Some sample sentences are as follows.\n\n")
random.shuffle(sentences)

for sent in sentences[:5]:
    print(" - " + sent + "\n")

The patent 7520356 includes 209 sentences. Some sample sentences are as follows.


 - a hinge element fixed to said support frame, said hinge element adapted to couple with a hinge element of a second suction module, whereby said suction modules can be angularly oriented with respect to each other.

 - The impeller has an axis of rotation and is adapted to draw air from the vacuum chamber into the impeller in a direction generally parallel to the impeller axis of rotation.

 - Referring first to FIGS 1 and 2, the wall climbing robot 10 of the present invention generally includes at least two suction modules 11 and 12 pivotably connected together by a hinge assembly consisting of a bracket 13 and hinge 14 arrangement.

 - The flexible joint 52 also allows the robot to maneuver over uneven surfaces with obstacles.

 - The drive wheels 18 rotate about a single common axis, while the castor wheel 20 is permitted to rotate about multiple axes.



__Extracting design knowledge__
Next, let us utilise the "design_knowledge" module to extract engineering design knowledge from the sentences thus obtained. The following code will return knowledge as a list, wherein, each item is a dictionary pertaining to the following format. 
{
    __"sentence"__: "...", 
    __"entities"__: 
    ["entity #1", 
     "entity #1"...
    ], 
    __"facts"__: 
    [
        ["head entity", "relationship", "tail entity"], 
        ["head entity", "relationship", "tail entity"]...
    ]
}

__Note__: It is preferable that the following code is executed in a GPU environment if several sentences are input.

In [4]:
example_sentences = sentences[:3]
extracted_knowledge = design_knowledge.extractDesignKnowledge(example_sentences)

for item in extracted_knowledge:
    for key in item:
        print(key, "-\n")
        print(item[key], "\n")
    print("\n---------------------\n")

Sorry, no GPU is available! Processing will be performed in normal time.


  0%|          | 0/3 [00:00<?, ?it/s]

sentence -

a hinge element fixed to said support frame, said hinge element adapted to couple with a hinge element of a second suction module, whereby said suction modules can be angularly oriented with respect to each other. 

entities -

['said suction modules', 'a second suction module', 'a hinge element', 'said hinge element', 'respect', 'said support frame'] 

facts -

[['a hinge element', 'fixed to', 'said support frame'], ['a hinge element', 'fixed', 'said hinge element'], ['said hinge element', 'adapted to couple with', 'a hinge element'], ['a hinge element', 'of', 'a second suction module'], ['a second suction module', 'whereby', 'said suction modules'], ['said suction modules', 'angularly oriented with', 'respect']] 


---------------------

sentence -

The impeller has an axis of rotation and is adapted to draw air from the vacuum chamber into the impeller in a direction generally parallel to the impeller axis of rotation. 

entities -

['rotation', 'the vacuum chamber', 'th

In the above output,

Entities are subsets of noun-phrases in the sentence. The appropriate ones that communicate design knowledge are identified by the models that we have trained included in the package.

Facts associate a pair of the above list entities using a relationship that is communicated in the sentence. The fact is given in the form of a triple "head entity, relationship, tail entity".
The above exracted facts constitute a graph that represents design knowledge extracted from a list of sentences. To visualise the graph, various libraries like networkx or vis.js.

For any queries, please write to *siddharthl.iitrpr.sutd@gmail.com*.