# Second Step
This notebook takes the summarized sections from the first notebook to iteratively generate one shorter summarization that encompases the whole paper. You have to provide the sections in the strings yourself (there are not loaded automatically).

In [None]:
from transformers import GenerationConfig, LlamaForCausalLM, LlamaTokenizer
from peft import PeftModel
import torch
from tqdm import tqdm

In [None]:

base_model= "meta-llama/Llama-2-70b-chat-hf"
use_perf = False

tokenizer = LlamaTokenizer.from_pretrained(base_model)
model = LlamaForCausalLM.from_pretrained(
    base_model,
    load_in_8bit=True,
    device_map="auto",
)
if use_perf:
  model = PeftModel.from_pretrained(
          model,
          '/home/ws/kg2371/lora_weights/llama2_70b_8bit_mof_fold0_4epoch',
          device_map="auto"
  )

In [None]:
input_text_start = f"""Please read the provided "Original Section of the paper" about Metal-Organic Frameworks to understand the chemical content. Use this understanding to scan the original section for MOF-related and enzymatic reaction details and summarize them into a short text, incorporating all relevant details about MOFs and without information redundancies.

Original Section of the paper:
# Abstract
  The peroxidase enzyme from orange peel (Girrus sienes, a fruit waste) was immobilized within metal-organic framework (MOF) via self-assembled biomineralization method. The resulting OPP-MOF composite was synthesized by mixing 2-methylimidazole (24 mmol), zinc acetate (8 mmol) and orange peel extract at room temperature within 30 minutes. Characterization methods such as XRD, FT-IR, and SEM confirmed the synthesis of OPP-MOF. The half-life of OPP-MOF was determined to be 2.1 folds more thermally stable than free enzyme in the temperature range of 40-60 degC. Michaels constant (Ksd) and maximum rate (Vmax) values of OPP-MOF were evaluated by Michaels-Menten kinetics studies, showing higher Ksd and lower Vmax compared to native peroxidase. The immobilized OPP-MOF retained 48% residual activity after 6th cycle and showed no significant loss in activity till 18 days. The immobilized OPP-MOF was used for degradation of methylene blue (MB) and Congo red (CR) dye, and it was found to be more efficient and rapid.
## Introduction
  The peroxidase enzyme was extracted from orange peel and immobilized within a metal-organic framework (MOF), specifically ZIF-8, to form a hybrid composite called OPP-MOF. This composite showed enhanced thermal stability, storage stability, and reusability for multiple cycles. The immobilization of peroxidase within MOF also prevented the protein molecules from leaching out during application. The OPP-MOF composite was characterized using XRD, FT-IR, and SEM. The Michaelis-Menten kinetics parameters were determined for OPP-MOF and the free form of peroxidase. Thermo-kinetic studies were carried out to determine the thermal stability in terms of thermal deactivation constant (kd), half-life (t1/2) and deactivation energy (Ea). The OPP-MOF composite was able to degrade different dyes such as methylene blue (MB) and congo red (CR).

Notes:
  - Focus on removing redundancies.
  - The original structure (e.g. headings) is not important, if it makes sense you can restructure the contents.
  - Don't output anything like "in summary" or similar, only condense the information.
  - The volume specifications and components relating to the chemical processes are the most important and must always be kept.
  - Keep the balance between keeping everything MOF related in detail and still being as short.
  - Never ever drop ANY details regarding MOF details, parameters, and attributes!!!

MOF-Detail-Extraction with the found MOF-details of the "Original Section of the paper" merged into one text with all chemical details and MOF-related information inlcuded in detail but without redundancies. Includes all quantities (number of moles) and units (e.g. mmol, min, degC, g, cm, mL, mM, ppm, mg/mL, nm, µL, mol etc.) and exact names (e.g. Girrus sienes, a fruit waste etc.) and all composite, MOF, and chemical linker names, metal salt names, as well as enzyme and organism names, number of moles (quantities) all pH-values of the metal salts, linkers, and enzymes, chemical concentrations, chemical masses, and proteins along with everything related to the enzymatic reaction (reactants, products, enzyme solutions with all details etc.). Absolutely all number of moles, units, and exact names but also short and to the point. One coherent paragraph without any bullet-points:
"""

sections = [
"""## Materials
  The peroxidase enzyme was extracted from fresh unripe oranges (_Citrus sinensis_) and immobilized within a metal-organic framework (MOF) composite, named OPP-MOF. The MOF was synthesized using zinc acetate, guaiacol, 2-methylimidazole and hydrogen peroxide, purchased from HilMedia Laboratories Pvt. Ltd, Mumbai, India. All other reagents were purchased from Sigma Aldrich and used without further purification. The resulting OPP-MOF composite was used for the degradation of dyes such as methylene blue (MB) and congo red (CR).
### Preparation of peroxidase extract
  The peroxidase extract was prepared by homogenizing washed orange peels (10 g) in phosphate buffer (0.1 M, pH 7.0) and filtering the mixture through Whatman's filter paper No. 1. The filtrate was centrifuged (10,000 x g at 5 degC) for 20 min, and the supernatant was partially purified using 80% ammonium sulphate. The resulting pellet was re-dissolved and used as the peroxidase source for immobilization.
### Peroxidase activity assay
  The peroxidase activity was assessed by measuring the oxidation of guaiacol in the presence of hydrogen peroxidase, resulting in the formation of tetraquaicol. The colored complex was analyzed calorimetrically at 470 nm using a UV-vis spectrophotometer. The enzyme activity was defined as the amount of enzyme catalyzing one mmol of guaiacol in unit time. The protein content was estimated using the Bradford protein assay with bovine serum albumin as the standard.""",

"""### OPP-MOF synthesis
  The OPP-MOF composite was synthesized by mixing zinc acetate (8 mmol, 14 mL) with an enzyme extract solution (containing protein content of 1.2 mg/mL) of 2-methylimidazole water solution (24 mmol, 14 mL) and stirring at 200 rpm at 28 ± 2 degC. The resulting white precipitate was separated by cold centrifugation (8000 x g, 5 degC) for 10 min and washed thrice with DI water. Pure ZIF-8 crystals were synthesized without enzyme for comparison purposes.
### Characterization of OPP-MOF
  The OPP-MOF was characterized by FT-IR, XRD, SEM, and TGA. FT-IR spectra showed the presence of Zn-N stretching band at 421.75 cm\({}^{-1}\), and the in-plane bending absorption bands were observed at 953.16 and 1308.79 cm\({}^{-1}\). Additionally, two peaks at 2927.43 and 3133.14 cm\({}^{-1}\), respectively, attributed to the aliphatic and aromatic C\(-\)H stretching of the 2-methylimidazole, confirmed the successful synthesis of ZIF-8. The XRD patterns of the ZIF-8 and OPP-MOF showed similar diffraction patterns, indicating that the immobilization of OPP did not change the chemical/physical characteristics and properties of the original ZIF-8 MOF. SEM analysis showed that the OPP-MOF had a rhombic dodecahedral shape with a size of approximately 875-975 nm. TGA analysis showed a gradual weight loss up to \({}^{-}\)23.5% till 200 degC, corresponding to evaporation and removal of H\({}_{2}\)O and other organic small molecules from the cavities, and a 9% weight loss occurred in the 2nd decomposition stage (250-350 degC), attributed to the decomposition of protein/enzyme molecules within ZIF-8. These results confirmed the successful encapsulation and immobilization of OPP within ZIF-8 MOF.""",

"""### Properties of OPP-MOF
  The OPP-MOF composite showed improved thermo-stability in the temperature range of 40-60 degC, with a thermal deactivation constant (Kd) of 0.05 min^-1, half-life (t1/2) of 30 min, and deactivation energy (Ed) of 120 kJ/mol. In comparison, the free form of peroxidase (OPP) had a lower thermal stability, with a Kd of 0.1 min^-1, t1/2 of 10 min, and Ed of 80 kJ/mol. The Michaelis-Menten kinetics revealed that the OPP-MOF had a higher Kd value (0.2 mM) and lower Vmax value (120 mM/min) than the free OPP (Kd = 0.1 mM, Vmax = 180 mM/min). The immobilized OPP-MOF showed good recyclability, retaining 48% of its initial activity after six repetitive cycles. Additionally, the OPP-MOF retained its activity for 18 days at room temperature, while the free OPP lost its activity within 3 days.
### Dye degradation studies
  The immobilized peroxidase enzyme (OPP-MOF) was tested for its ability to degrade Congo red and methylene blue dyes. The dye solutions were prepared at a concentration of 100 ppm in phosphate buffer and were incubated with free OPP and OPP-MOF for one hour at room temperature. The decolourization was measured by spectrophotometry at 495 nm for Congo red and 665 nm for methylene blue. The results showed that OPP-MOF was able to degrade both dyes, with a higher decolourization percentage compared to free OPP. The experiments were done in triplicate and the error bar represents the percentage error in each set of readings. Additionally, the reusability of OPP-MOF was tested up to 6th cycle, with the enzyme retaining 100% residual activity in the first cycle and showing a gradual decrease in activity with increasing cycles. The SEM images of OPP-MOF after 6th cycle showed no significant changes in the morphology of the MOF.""",


"""## 3 Results and discussion
  The immobilization of peroxidase within MOF resulted in the formation of OPP-MOF composite, which showed enhanced thermal stability and reusability compared to free peroxidase. The Michaelis-Menten kinetics parameters were determined for OPP-MOF and the free form of peroxidase, and the thermal stability was studied using thermal deactivation constant (k\({}_{ ext{d}}\)), half-life (t\({}_{1/2}\)) and deactivation energy (Ea). The OPP-MOF was able to degrade dyes such as methylene blue (MB) and congo red (CR). The immobilization of peroxidase within MOF also eased the enzyme recovery and re-use, resulting in its cost-effective use focusing its industrial applications.
### Preparation of OPP–MOF
  The preparation of OPP-MOF involved the extraction of orange peel peroxidase (OPP) and its immobilization within a metal-organic framework (MOF). The extraction process used a phosphate buffer at 4 °C for 120 min, and the enzyme stability was found to be maximum at pH 7.0. The obtained OPP enzyme solution had an activity of 15 U/mL. The SDS PAGE analysis showed few bands after partial purification of OPP with 80% saturation of ammonium sulphate. The enzyme solution was then mixed with zinc acetate solution (8 mmol) and 2-methylimidazole (24 mmol) and stirred for 30 min, resulting in the formation of a cloudy solution. The mixture was then centrifuged and washed with phosphate buffer (pH 7.4, 10 mM). The resulting OPP-MOF exhibited an activity recovery of 9.2 U/mg.
### Thermal kinetics and stability studies
  The immobilization of peroxidase within MOF (OPP-MOF) resulted in enhanced thermal stability, with a half-life 2.8, 2.1 and 1.4 folds higher than free peroxidase at 40, 50 and 60 degC, respectively. The deactivation constant of OPP-MOF was found to be much lower than the free peroxidase in the given temperature range. The energy required to deactivate native OPP and OPP-MOF was calculated to be 27.3 and 57.4 kJ mol\({}^{-1}\), respectively. Confocal scanning laser microscopy (CSLM) images confirmed that the enzyme molecules were encapsulated within ZIF-8 MOF, which contributed to the structure and functional stability.""",

"""### Michaelis-Menten kinetic parameters
  The Michaelis-Menten parameters (K\({}_{\text{M}}\) and V\({}_{\text{max}}\)) of free and immobilized orange peel peroxidase (OPP) were determined by analyzing the initial rates of reaction corresponding to different substrate concentrations. The results showed that the value of K\({}_{\text{M}}\) increased after immobilization of OPP within MOF, indicating a slight decrease in enzyme affinity towards substrate. Additionally, the rate of reaction (V\({}_{\text{max}}\)) of OPP-MOF was lower than that of the free enzyme mixture, possibly due to conformational changes during immobilization and mass transfer limitations.
### Dye degradation by OPP
  The decolourization potential of free OPP and immobilized OPP-MOF was evaluated using two different dyes, methylene blue (MB) and congo red (CR). The percentage decolourization for MB and CR was found to be 88.3% and 93.2% using free enzymes, respectively. OPP-MOF brought about 85.6% and 89.9% decolourization of MB and CR, respectively. The immobilization of peroxidase within MOF has been shown to enhance its stability and activity, making it a promising candidate for industrial applications. The decolourization efficiency of OPP-MOF was found to be comparable to that of other peroxidase-based systems, such as _Azadirachta indica_ peroxidase immobilized onto chitosan and manganese peroxidase immobilized within alginate bead.
### Reusability and storage stability studies
  The OPP-MOF composite showed excellent reusability and storage stability. In the reusability studies, the residual activity of OPP-MOF was found to be 48% after six successive cycles, indicating that the enzyme did not leach out of the MOF platform. The structure of OPP-MOF remained intact even after six cycles, as observed from SEM images. In storage stability studies, OPP-MOF retained 88% residual activity till 18th day of storage, while free OPP showed a gradual reduction in residual activity. The high storage stability of OPP-MOF was due to the shielding effect of the framework around the enzyme molecules. These results suggest that OPP-MOF has excellent chemical and conformational stability, making it a promising biocatalyst for industrial applications.
## 4 Conclusion
  The study demonstrated the successful preparation of MOF induced by peroxidase from orange peels, with the enzyme being successfully extracted and immobilized within the MOF via a self-assembled biomineralization method. The resulting OPP-MOF showed significant improvement in thermal stability, with a half-life (t1/2) and degradation energy (E4) that were estimated to be higher than those of the free enzyme. Additionally, the OPP-MOF exhibited remarkable reusability and storage stability, and was able to rapidly degrade commercial dyes, demonstrating its potential in bioremediation."""
]

In [None]:
# first split
messages = [
    {"role": "system", "content": "You are an expert chemist. You are always truthful and never make facts up. Your main objective is to extract information about Metal Organic Frameworks (MOFs) from scientific papers. You ALWAYS include all number of moles, units, and exact names but you are also short and to the point."},
    {"role": "user", "content": input_text_start}
]
prompt = tokenizer.decode(tokenizer.apply_chat_template(messages, return_tensors="pt")[0],skip_special_tokens=False)
response = model.generate(**tokenizer(prompt,return_tensors='pt').to('cuda'),do_sample=False, max_new_tokens=2096)
response = tokenizer.decode(response[0], skip_special_tokens=False)
response = response.replace(prompt,"")
response = response.replace(tokenizer.eos_token,"")
response = response.replace(tokenizer.bos_token,"")
print('------')
print(response)
print('------')

for section in tqdm(sections):

  input_text = f"""Please read the provided "Original Section of the paper" about Metal-Organic Frameworks to understand the chemical content. Use this understanding to scan the original section for MOF-related and enzymatic reaction details and summarize them into a short text, incorporating all relevant details about MOFs and without information redundancies. Then, merge your extrated MOF-details with the "MOF-Detail-Extractions for previous sections" to generate a coherent text detailing everything about MOFs.

  MOF-Detail-Extractions for previous sections:
{response.strip()}

  Original Section of the paper:
{section.strip()}


Notes:
  - Focus on removing redundancies.
  - The "MOF-Detail-Extractions for previous sections" was created from previous sections of the paper directly preceding the current one.
  - Merge the MOF and chemical related details you extract from the original section and the "MOF-Detail-Extractions for previous sections" into one text.
  - The original structure (e.g. headings) is not important, if it makes sense you can restructure the contents of "MOF-Detail-Extractions for previous sections".
  - Don't output anything like "in summary" or similar, only condense the information!
  - The volume specifications and components relating to the chemical processes are the most important and must always be kept!
  - Keep the balance between keeping everything MOF related in detail and still being short and consise.
  - Never ever drop ANY details regarding MOF details, parameters, and attributes!!!

 MOF-Detail-Extraction with "MOF-Detail-Extractions for previous sections" and the found MOF-details of the "Original Section of the paper" merged into one text with all chemical details and MOF-related information inlcuded in detail but without redundancies. Includes all quantities (number of moles) and units (e.g. mmol, min, degC, g, cm, mL, mM, ppm, mg/mL, nm, µL, mol etc.) and exact names (e.g. Girrus sienes, a fruit waste etc.) and all composite, MOF, and chemical linker names, metal salt names, as well as enzyme and organism names, number of moles (quantities) all pH-values of the metal salts, linkers, and enzymes, chemical concentrations, chemical masses, and proteins  along with everything related to the enzymatic reaction (reactants, products, enzyme solutions with all details etc.). Absolutely all number of moles, units, and exact names but also short and to the point. One coherent paragraph without any bullet-points:
"""

  messages = [
    {"role": "system", "content": "You are an expert chemist. You are always truthful and never make facts up. Your main objective is to extract information about Metal Organic Frameworks (MOFs) from scientific papers. You ALWAYS include all number of moles, units, and exact names but you are also short and to the point."},
    {"role": "user", "content": input_text}
  ]
  prompt = tokenizer.decode(tokenizer.apply_chat_template(messages, return_tensors="pt")[0],skip_special_tokens=False)
  response = model.generate(**tokenizer(prompt,return_tensors='pt').to('cuda'),do_sample=False, max_new_tokens=2096)
  response = tokenizer.decode(response[0], skip_special_tokens=False)
  response = response.replace(prompt,"")
  response = response.replace(tokenizer.eos_token,"")
  response = response.replace(tokenizer.bos_token,"")
  print('------')
  print(response)
  print('------')
