# TERMO: TERM extractiOn from scientific literature

In [3]:
from termo import Termo
import pandas as pd  # for visualization purposes

In [4]:
text = '''
The Calvin cycle, light-independent reactions, bio synthetic phase, dark reactions, or photosynthetic carbon reduction (PCR) cycle of photosynthesis 
is a series of chemical reactions that convert carbon dioxide and hydrogen-carrier compounds into glucose. The Calvin cycle is present in all 
photosynthetic eukaryotes and also many photosynthetic bacteria. In plants, these reactions occur in the stroma, the fluid-filled region of a 
chloroplast outside the thylakoid membranes. These reactions take the products (ATP and NADPH) of light-dependent reactions and perform further 
chemical processes on them. The Calvin cycle uses the chemical energy of ATP and reducing power of NADPH from the light dependent reactions to 
produce sugars for the plant to use. These substrates are used in a series of reduction-oxidation (redox) reactions to produce sugars in a step-wise 
process; there is no direct reaction that converts several molecules of CO2 to a sugar. There are three phases to the light-independent reactions, 
collectively called the Calvin cycle: carboxylation, reduction reactions, and ribulose 1,5-bisphosphate (RuBP) regeneration.
'''

model = 'llama3.1:70b'
model_params = {
    'temperature': 0.0,
    'num_ctx': 512,
}

# 1. Extract terms

This step is required for all the next extractions.

In [9]:
termo = Termo(text, backend='ollama')
terms = termo.extract_terms(model=model, options=model_params)
df = pd.DataFrame(terms, columns=['Term', 'Start Index', 'End Index', 'Sentence Index'])
df

Unnamed: 0,Term,Start Index,End Index,Sentence Index
0,Calvin cycle,6,18,1
1,Light-independent reactions,20,47,1
2,Bio synthetic phase,49,68,1
3,Dark reactions,70,84,1
4,Photosynthetic carbon reduction (PCR) cycle,89,132,1
5,Photosynthesis,136,150,1
6,Carbon dioxide,199,213,1
7,Hydrogen-carrier compounds,218,244,1
8,Glucose,250,257,1
9,Calvin cycle,263,275,2


# 2. Acronym extraction

In [14]:
# we need to set the terms to the termo object
# so the acronym extracted can be matched with the terms
termo['terms'] = terms

acronyms = termo.extract_acronyms(model=model, max_length_split=2000, options=model_params)
df = pd.DataFrame(list(acronyms.items()), columns=['Acronym', 'Term'])
df

Removing acronym 'NADPH':'Not explicitly defined in the vocabulary but commonly known as Nicotinamide adenine dinucleotide phosphate' because it is not in the text
Removing acronym 'ATP':'Adenosine triphosphate' because it is not in the text


Unnamed: 0,Acronym,Term
0,PCR,Photosynthetic carbon reduction
1,RuBP,"Ribulose 1,5-bisphosphate"


# 3. Definitions extraction

In [16]:
# we need to set the terms to the termo object
# so that a definition is extracted for each term
termo['terms'] = terms

definitions = termo.extract_definitions(model=model, max_length_split=2000, options=model_params)
df = pd.DataFrame(list(definitions.items()), columns=['Term', 'Definition'])
df

Removing definitions for 'ATP (Adenosine triphosphate)' because unknown term
Removing definitions for 'NADPH (Nicotinamide adenine dinucleotide phosphate)' because unknown term


Unnamed: 0,Term,Definition
0,Light-independent reactions,Reactions that convert carbon dioxide and hydr...
1,Glucose,A product of light-independent reactions.
2,Carboxylation,Part of the Calvin cycle process.
3,Stroma,Part of a chloroplast where light-independent ...
4,Sugars,Products of photosynthesis.
5,Reduction-oxidation (redox) reactions,Type of reaction involved in converting carbon...
6,Eukaryotes,Organisms that have chloroplasts.
7,Reduction reactions,Part of the process of converting carbon dioxi...
8,Chloroplast,Organelle where photosynthesis takes place.
9,Thylakoid membranes,Part of a chloroplast where light-dependent re...


# 4. Relationships extraction

In [18]:
# we need to set the terms to the termo object
# so that relationships are extracted between the given terms
termo['terms'] = terms

relationships = termo.extract_relationships(model=model, max_length_split=2000, options=model_params)
df = pd.DataFrame(relationships, columns=['Term1 ->', 'Relationship ->', 'Term2'])
df

Unnamed: 0,Term1 ->,Relationship ->,Term2
0,Calvin cycle,uses,ATP
1,Calvin cycle,uses,NADPH
2,Light-dependent reactions,produce,ATP
3,Light-dependent reactions,produce,NADPH
4,Photosynthesis,involves,Light-independent reactions
5,Chloroplast,contains,Thylakoid membranes
6,Stroma,is part of,Chloroplast
7,Eukaryotes,include,Bacteria
8,RuBP,is involved in,Carboxylation
9,Reduction-oxidation (redox) reactions,occur during,Calvin cycle
