# LLM for named entity recognition and relationship extraction

In [2]:
from utils import *
from json import JSONDecodeError
import cohere
from cohere import CohereError

# Add you Cohere key here
CO_KEY = ['YOUR_COHERE_KEY_HERE']
co = cohere.Client(CO_KEY)

In [3]:
# This should be a list of sentences that were extracted from the notebook 1_sentence_extraction.ipynb
sentences = "LOAD_HERE_THE_EXTRACTED_SENTENCES"

Here we make the initial prompt with a few hand labelled examples. Each sentence is going to be appended at the bottom of this list and will form the final prompt given to the LLM.

In [1]:
init_prompt = """You are a machine extracting the phase, property, and relationship between the two from sentences extracted form the material science literature. The relationship can either be positive or negative. Your response must contain all three in the following format:
Phase:
Property:
Relationship:
Below are some examples:

Sentence: The significant improvement in tensile properties is attributed to the dissolution of network-like T phases and solid solution strengthening.
Phase: T phases
Property: tensile properties
Relationship: positive
--
Sentence: In addition, the length of the β-particles is also detrimental to the ultimate tensile strength, as it allows for easy crack propagation along the particle.
Phase: β-particles
Property: ultimate tensile strength
Relationship: negative
--
Sentence: Mudgal et al. [15] studied the corrosion properties of D gun sprayed Cr3C2-NiCr coatings and found that chromium oxide and NiCr2O3 phases were formed on the coating structure and that exhibited better corrosion resistance.
Phase: NiCr2O3 phases
Property: corrosion resistance
Relationship: positive
--
Sentence: Increases in wear resistance (less mass loss) are due to the protective nature of reinforcement particles as hard and high strength load bearing components of the composite and hard abrasion resistance nature of surface oxides.
Phase: reinforcement particles
Property: wear resistance
Relationship: positive
--
Sentence: After solution treatment at 470 degC for 24 h, these T-phases were mostly dissolved into the α-Al matrix, resulting in a remarkable increase in elongation and strength.
Phase: T-phases
Property: elongation && strength
Relationship: positive
--
Sentence: The high-resolution transmission electron microscopy (TEM) image reveals the wrapping of microstructural silicon by rGO, which inhibits the growth of primary silicon and improves the interfacial strengthening.
Phase: rGO
Property: interfacial strength
Relationship: positive
--
Sentence: A clear decrease in hardness is observed in the HAZ due to the partial dissolution of hardening phase (β'' and η' respectively) in the two grades with growth of non-hardening phase (β' and η respectively) during the cooling leading to solute depletion in aluminium matrix.
Phase: β'' && η'
Property: hardness
Relationship: negative
--
Sentence: Such an increase has also been seen in FSW joints, where it was coupled to increasing dissolution of non-hardening precipitates leaving increasing amounts of solutes in solid solution, which in turn gave increasing hardness during natural ageing after cooling [41,44].
Phase: non-hardening precipitates
Property: hardness
Relationship: positive
--
Sentence: [87] show that an increase in volume fraction of ceramic reinforcements reduces ductility and fracture toughness.
Phase: ceramic reinforcements
Property: fracture toughness
Relationship: negative
--
Sentence: At the same time, in the multi-pass tube fabrication process, Laves phase particles could suppress grain coarsening rates during inter-pass annealing, helping obtain desirable microstructures for balanced deformability and the final properties of tube products.
Phase: Laves phase particles
Property: deformability && final properties
Relationship: positive
--
Sentence: It was observed that the formation of the heusler phase Ni2AlHf in hafnium reinforced eutectic alloys NiAl-Cr(Mo)-0.1Hf improves strength at elevated temperatures; however, this strength is enhanced to the detriment of the room temperature ductility of the intermetallic [44].
Phase: Ni2AlHf
Property: strength
Relationship: positive
--
Sentence: Li XZ, Hansen V, Gjonnes J et al (1999) HREM study and structure modeling of the η' phase, the hardening precipitates in commercial Al-Zn-Mg alloys.
Phase: η' phase
Property: hardness
Relationship: positive
--
Sentence: The formation of slightly large-sized Al12(Fe, V)3Si (30-110 nm) and coarser AlmFe (100-400 nm) phases in EBM built sample could cause a reduction in tensile strength in contrast to PFC products.
Phase: Al12(Fe, V)3Si && AlmFe
Property: tensile strength
Relationship: negative
--
Sentence: Although the strength of materials decreased slightly, the impact toughness of materials significantly increased due to the positive effect of the intragranular equilibrium η phase.
Phase: η phase
Property: impact toughness
Relationship: positive
--
Sentence: Spray forming of the hypereutectic aluminium-silicon alloy allows introduction of the strengthening Fe and Si components into the material.
Phase: Fe && Si
Property: strength
Relationship: positive
--
Sentence: However, silicon carbide reinforced aluminium matrix composites are reported to be more susceptible to localized corrosion attack than their monolithic counterpart 3.
Phase: silicon carbide reinforced aluminium matrix composites
Property: corrosion
Relationship: positive
--
Sentence: However, conventional techniques of processing of these materials lead to coarse and segregated microstructures with long plates of intermetallic transition metal compounds that give rise to inferior properties.
Phase: transition metal compounds
Property: inferior properties
Relationship: positive
--
Sentence: The X-ray diffraction (XRD) pattern in Fig.1(b) reveals the principal strengthening precipitate δ' (Al3Li) together with other precipitates such as T1 (Al2CuLi), T2 (Al6Li3Cu), S' (Al2CuMg), θ' (Al2Cu) and β' (Al3Zr) in the as-received parent material.
Phase: δ' (Al3Li)
Property: strength
Relationship: positive
--
Sentence: The addition of clay to silica-based sol-gel films protects aluminium from corrosion as reported by Dalmoro et al. The integration of zirconia nanoparticles into a hybrid matrix enhances several material functionalities because of its high mechanical strength, temperature resistance and chemical stability.
Phase: zirconia nanoparticles
Property: mechanical strength && temperature resistance && chemical stability
Relationship: positive
--
Sentence: Primary Al13Fe4 grows into coarse flakes, needles and laths along a favorable orientation, severely cracking the matrix and negatively affecting its properties.
Phase: Al13Fe4
Property: properties
Relationship: negative
--
Sentence: The attractive properties of this alloy are due to the presence of several metastable strengthening precipitates such as GP zones, δ′ (Al3Li), T1 (Al2CuLi), θ′ (Al2Cu), Ω (Al2Cu) and S′ (Al2CuMg).
Phase: GP zones && δ′ (Al3Li) && T1 (Al2CuLi) && θ′ (Al2Cu) && Ω (Al2Cu) && S′ (Al2CuMg)
Property: strength
Relationship: positive
--
Sentence: The resistance of aluminium and its alloys against corrosion in aqueous media can be attributed to a rapidly formed surface oxide film which is composed of Al2O3, Al(OH)3 and AlO(OH) phases.
Phase: Al2O3 && Al(OH)3 && AlO(OH)
Property: corrosion resistance
Relationship: positive
--
Sentence: Li et al. expressed that Mn, Fe and Si can form fine particles of AlMn, AlMnSi and Al(Mn,Fe)Si type precipitates along with aluminum solid solution which aid to work hardening and dislocations accumulation during deformations under high and ultra-high strains.
Phase: AlMn && AlMnSi && Al(Mn,Fe)Si
Property: work hardening
Relationship: positive"""

In [None]:
sentence_data = []

for i, sent in enumerate(sentences):
    sent_data = {'sentence': sent}
    
    # We build the final prompt by appending the sentence to the initial prompt
    prompt = init_prompt + "\n--\nSentence: " + sent + "\nPhase:"

    # Call Cohere model
    response = co.generate(  
        model='command-xlarge-beta', # It is possible that command-xlarge-beta isn't avaialble anymore
        prompt=prompt,
        max_tokens=100,
        temperature=0.5,
        stop_sequences=["--"])

    # We get the answer and parse it
    answers = [s.split('\n')[0][1:] for s in response.generations[0].text.split(':')]
    sent_data['phase'] = answers[0]
    sent_data['property'] = answers[1]
    sent_data['relationship'] = answers[2]

    sentence_data.append(sent_data)

## Splitting the resulting entities
Some sentences mention multiple phases and/or properties at the same time. We split those using a mix of our convention (separating entities with &&) and the common way of writing lists (A, B, and C).

In [None]:
sentence_data_split = split_entities(sentence_data)

# Evaluating LLM extraction
To evaluate the performance of the NER and RE performed with LLM, a manually labeled set of entities is present in "data/manually_labelled_entities.npy".

In [5]:
print(np.load("data/manually_labelled_entities.npy", allow_pickle=True)[:10])

[{'sentence': 'Alloys exhibiting this microstructure show poor ductility due to the large and brittle silicon plates.', 'phase': 'silicon plates', 'property': 'ductility', 'relationship': 'bad', 'true_phase': 'Si', 'true_property': 'ductility', 'true_relationship': 'bad', 'ture_phase': 'Si phase'}
 {'sentence': '1 - 3 However, at high temperatures, matrix/SiCp interfaces favor MgO formation, promoting premature failure and severe brittleness.', 'phase': 'matrix/SiCp interfaces', 'property': 'brittleness', 'relationship': 'bad', 'true_phase': 'MgO', 'true_property': 'brittleness', 'true_relationship': 'bad'}
 {'sentence': 'Homogeneously distributed fine precipitates appear to improve elongation, both to ultimate tensile strength and to failure in most cases.', 'phase': 'fine precipitates', 'property': 'elongation', 'relationship': 'good', 'true_phase': 'fine precipitates', 'true_property': 'elongation', 'true_relationship': 'good'}
 {'sentence': 'For fine Al-Cu-Mg matrix powders (FNP gr