<a href="https://colab.research.google.com/github/erenarkangil/RAG_for_scientific_Articles/blob/main/RAG_for_bioeconomy_articles.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install pymupdf
!pip install faiss-cpu
!pip install faiss-gpu # Python 3.6-3.10 (legacy, no longer available after version 1.7.3)


Collecting pymupdf
  Downloading PyMuPDF-1.24.13-cp39-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (3.4 kB)
Downloading PyMuPDF-1.24.13-cp39-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (19.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m19.8/19.8 MB[0m [31m70.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pymupdf
Successfully installed pymupdf-1.24.13
Collecting faiss-cpu
  Downloading faiss_cpu-1.9.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.4 kB)
Downloading faiss_cpu-1.9.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (27.5 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m27.5/27.5 MB[0m [31m53.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: faiss-cpu
Successfully installed faiss-cpu-1.9.0
Collecting faiss-gpu
  Downloading faiss_gpu-1.7.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.4 kB)
Downloading faiss_gp

# **Information retrieval from multiple PDF Files:**
We need to process information from five separate PDF files. The data within these files is organized into **chunks**, and our goal is to extract relevant details from these chunks. Since each document has a unique structure (e.g., single-column, two-column layouts), it's essential to carefully choose the appropriate **chunking strategy** for each file.The documents are divided into sections like **Introduction**, **Literature Review**, **Methodology**, etc. We use common patterns for chunking, such as:- **Horizontal lines** (e.g., `/xa0`)- **Page numbers** (e.g., 'Page 2 of 20')

In [6]:
import faiss
import numpy as np
import fitz  # PyMuPDF
from sklearn.feature_extraction.text import TfidfVectorizer  # Or any embedding model
import re
import nltk  # For NLP tasks

# List of PDF filenames
pdf_files = ["a1.pdf", "a2.pdf", "a3.pdf", "a4.pdf", "a5.pdf"]

# Dictionary to store text of each PDF
pdf_texts = {}

# Loop through each PDF file
for pdf_file in pdf_files:
    # Open the PDF file
    document = fitz.open(pdf_file)
    text = ""

    # Extract text from each page
    for page_num in range(document.page_count):
        page = document[page_num]
        text += page.get_text()

    document.close()

    # Store the text in the dictionary with the filename as the key
    pdf_texts[pdf_file] = text

In [7]:
nltk.download('punkt')


def chunk_text(text,cstyle):
    # Preprocess text (e.g., normalize whitespace)
    text = re.sub(r'\s+', ' ', text).strip()

    # Use NLTK to tokenize sentences
    sentences = nltk.tokenize.sent_tokenize(text)

    # Initialize list to hold chunks
    chunks = []
    current_chunk = []

    # Define patterns to match for chunking
    patterns = [
        r"\b[A-Z][a-z]+, [A-Z]\.;?",                            # Matches author initials, e.g., "Smith, J."
        r"https?://[^\s]+",                                     # Matches URLs (e.g., DOIs, journal links)
        r"Received:\s?\d{1,2}\s?\w+\s?\d{4}",                   # Matches "Received: 21 October 2022"
        r"Accepted:\s?\d{1,2}\s?\w+\s?\d{4}",                   # Matches "Accepted: 19 November 2022"
        r"Published:\s?\d{1,2}\s?\w+\s?\d{4}",                  # Matches "Published: 22 November 2022"
        r"Creative\s?Commons\s?Attribution",                    # Matches copyright and licensing statements
        r"Publisher’s Note:",                                   # Matches publisher notes
        r"Abstract:",                                           # Matches "Abstract:" section header
        r"vol\.\s?\d+|vol\s?\d+",                               # Matches "vol. 26" or "vol 9"
        r"Page\s?\d+\s?(of\s?\d+)?",                            # Matches "Page 1" or "Page 1 of 10"
        r"\b[0-9]{3,4}\b",                                      # Matches standalone page numbers (e.g., "3554")
        r"[A-Z][a-z]+ et al.;?\s+J\.\s+[A-Z][a-z]+\.\s+[A-Z][a-z]+\.,?\s+vol\.\s?\d+,?\s+no\.\s?\d+",  # Matches "Delphine et al.; J. Adv. Biol. Biotechnol., vol. 26, no. 9"
        r"\bFoods\s+\d{4},\s+\d+,\s+\d{4}\b",                   # Matches citation with "Foods" and year
        r"\d{1,2}\s+of\s+\d+",                                  # Matches "2 of 20" type page numbers
        r"\xa0",                                                  # Matches non-breaking space character
        r"\n{2,}",
    ]

    patterns2 = [
    r"\b(?:Introduction|Methods|Results|Discussion|Conclusion)\b",      # Section headers
    r"\b(?:vol\.?\s?\d+|page\s?\d+(?:\s?of\s?\d+)?|\d{3,4})\b",         # Volume, page, standalone numbers
    r"\b(?:Creative\s?Commons\s?Attribution|Received|Accepted|Published|Publisher's Note)\b", # Metadata
    r"\n{2,}"
                                                        # Paragraph breaks (two or more newlines)
    ]

    patterns3 = [
    r"\b(?:Introduction|Methods|Results|Discussion|Conclusion)\b",  # Main section headers
    r"\n{2,}",  # Paragraph breaks (two or more newlines)
    ]
    # Compile the patterns for efficiency

    if cstyle == 1:
      compiled_patterns = [re.compile(pattern) for pattern in patterns]
    if cstyle == 2:
      compiled_patterns = [re.compile(pattern) for pattern in patterns2]
    if cstyle == 3:
      compiled_patterns = [re.compile(pattern) for pattern in patterns3]


    for sentence in sentences:
        current_chunk.append(sentence)

        # Check if any pattern matches in the sentence
        if (len(' '.join(current_chunk)) > 1200) or \
           re.search(r'\b(?:Introduction|Methods|Results|Discussion|Conclusion)\b', sentence, re.I) or \
           any(pattern.search(sentence) for pattern in compiled_patterns):
            # Create a chunk and reset current chunk
            chunks.append(' '.join(current_chunk))
            current_chunk = []

    # Add any remaining sentences as the last chunk
    if current_chunk:
        chunks.append(' '.join(current_chunk))

    return chunks




# List of PDF filenames
pdf_files = ["a1.pdf", "a2.pdf", "a3.pdf", "a4.pdf", "a5.pdf"]
pdf_texts_c1 = {}
pdf_texts_c2 = {}
pdf_texts_c3 = {}

# Loop through each PDF file
for pdf_file in pdf_files:
    document = fitz.open(pdf_file)
    text = ""

    for page_num in range(document.page_count):
        page = document[page_num]
        text += page.get_text()

    document.close()

    # Chunk the extracted text
    pdf_texts_c1[pdf_file] = chunk_text(text,1)
    pdf_texts_c2[pdf_file] = chunk_text(text,2)
    pdf_texts_c3[pdf_file] = chunk_text(text,3)



# Now pdf_texts dictionary contains chunked text for each PDF
for filename, chunks in pdf_texts_c3.items():  #chunks per file created
    print(f"Chunks from {filename}:")
    #for i, chunk in enumerate(chunks):
    #    print(f"Chunk {i + 1}:\n{chunk[:500]}...\n")  # Display first 500 characters of each chunk


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


Chunks from a1.pdf:
Chunks from a2.pdf:
Chunks from a3.pdf:
Chunks from a4.pdf:
Chunks from a5.pdf:


# Before starting to create embeddings, there are two strategies we can apply:
 
<h2>1. We can combine all the chunks together and generate a single set of vector embeddings.
<h2>2. Alternatively, we can create embeddings for each PDF file separately.

Afterward, I calculated the similarity between the embeddings using cosine similarity. The difference between the two approaches is negligible in this case, as we are working with only five files.


In [202]:
all_chunks = []

for pdf_dict in [pdf_texts_c3]:
    for chunks in pdf_dict.values():
        print(len(chunks))
        all_chunks.extend(chunks)

60
96
45
65
84


In [143]:
len(all_chunks)

350

In [81]:
from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')  # Choose a suitable model
m_embeddings = model.encode(all_chunks)  # Generate embeddings for each chunk




In [82]:
dimension_m= m_embeddings.shape[1]
index_m = faiss.IndexFlatL2(dimension_m)
index_m.add(np.array(m_embeddings))

#Once the chunks and query are embedded using a transformer model, similarity search can be performed with different metrics. The **FAISS** library typically uses **Euclidean distance** or **inner product**.

- <h2>Euclidean distance** measures the straight-line distance between vectors.
- <h2>Inner product** (dot product) measures similarity based on the alignment of vectors.

When using **unit vectors**, the inner product becomes equivalent to **cosine similarity**, which compares vector directions.

In our case, regardless of the metric (Euclidean or cosine), the search returned the same indices.


In [83]:
index_l2 = faiss.IndexFlatL2(dimension_m)
index_l2.add(np.array(m_embeddings))

# Step 3: Inner Product (no normalization)
index_ip = faiss.IndexFlatIP(dimension_m)
index_ip.add(np.array(m_embeddings))

# Step 4: Cosine Similarity (normalize embeddings to unit vectors first)
# Normalize the embeddings
normalized_embeddings = m_embeddings / np.linalg.norm(m_embeddings, axis=1, keepdims=True)
index_cosine = faiss.IndexFlatIP(dimension_m)
index_cosine.add(np.array(normalized_embeddings))

In [108]:
query = "What are the High-Value Products, Materials, and Biofuels that can be obtained from Spent brewer’s yeast (SBY)"
query_embedding = model.encode([query])

# Retrieve top 5 most similar chunks
distances_l2, indices_l2 = index_l2.search(np.array(query_embedding), k=2)
distances_ip, indices_ip = index_ip.search(np.array(query_embedding), k=2)
distances_cosine, indices_cosine = index_cosine.search(np.array(query_embedding), k=2)

In [109]:
print(indices_cosine)
print(indices_l2)
print(indices_ip)



[[267 270]]
[[267 270]]
[[267 270]]


# I initially tried to retrieve the top 5, but the GPT-2 mini model has a maximum token limit of 1024. Adding more context to the prompt caused some issues. My approach here requires more careful and detailed processing, as the texts and chunks are not yet clean or optimal. Additionally, the tokenization process should be examined at the end.

In [110]:
retrieved_text = " ".join([all_chunks[i] for i in indices_cosine[0]])
prompt = f"Answer the question based on the context:\nContext: {retrieved_text}\nQuestion: {query}"
#print(prompt)
len(prompt.split())

334

In [1]:
#len(prompt.split()) 334 words

In [184]:
#all_chunks[270]

In [112]:
from transformers import GPT2LMHeadModel, GPT2Tokenizer

tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
llm_model = GPT2LMHeadModel.from_pretrained('gpt2')




In [118]:
inputs = tokenizer.encode(prompt, return_tensors="pt")
outputs = llm_model.generate(inputs, max_length=1024, do_sample=True, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [129]:
response[len(prompt):]

' for the production of high-value products and environmental protection?\nThere are a number of different types of SBY. Some SBY in particular contain β-glucans, which are proteins that are able to increase the rate of oxidation of a natural substance. Some SBY in particular contain β-glucans, which are proteins that are able to increase the rate of oxidation of a natural substance. Some SBY in particular contain β-glucans, which are proteins that are able to increase the rate of oxidation of a natural substance. There are two main types of SBY. The first type involves the use of a variety of compounds (such as alcohols and sugars) as a feed additive. The second type involves the use of a variety of compounds (such as alcohols and sugars) to increase the rate of oxidation of a natural substance. SBY are commonly referred to as "high-value" and "low-value" products [14, 15]. The term "high-value" refers to the higher value of an SBY product compared with a standard high-value product. 

In [157]:
token_ids = inputs[0].tolist()

# Convert each token ID back into its word or subword form
tokens = tokenizer.convert_ids_to_tokens(token_ids)

# Print the list of tokens as words
print("Tokenized words:", tokens)

Tokenized words: ['Answer', 'Ġthe', 'Ġquestion', 'Ġbased', 'Ġon', 'Ġthe', 'Ġcontext', ':', 'Ċ', 'Context', ':', 'ĠAbstract', ':', 'ĠSp', 'ent', 'Ġbrewer', 'âĢ', 'Ļ', 's', 'Ġyeast', 'Ġ(', 'SB', 'Y', ')', 'Ġis', 'Ġa', 'Ġby', 'product', 'Ġof', 'Ġthe', 'Ġbrewing', 'Ġindustry', 'Ġtraditionally', 'Ġused', 'Ġas', 'Ġa', 'Ġfeed', 'Ġadditive', ',', 'Ġalthough', 'Ġit', 'Ġcould', 'Ġhave', 'Ġmuch', 'Ġbroader', 'Ġapplications', '.', 'ĠIn', 'Ġthis', 'Ġpaper', ',', 'Ġa', 'Ġcomprehensive', 'Ġreview', 'Ġof', 'Ġval', 'or', 'ization', 'Ġof', 'ĠS', 'BY', 'Ġfor', 'Ġthe', 'Ġproduction', 'Ġof', 'Ġhigh', '-', 'value', 'Ġproducts', ',', 'Ġnew', 'Ġmaterials', ',', 'Ġand', 'Ġbio', 'fu', 'els', ',', 'Ġas', 'Ġwell', 'Ġas', 'Ġenvironmental', 'Ġapplication', ',', 'Ġis', 'Ġpresented', '.', 'ĠAn', 'Ġeconomic', 'Ġperspective', 'Ġis', 'Ġgiven', 'Ġby', 'Ġmirror', 'ing', 'Ġmarketing', 'Ġof', 'Ġconventional', 'ĠS', 'BY', 'Ġwith', 'Ġinnovative', 'Ġhigh', '-', 'value', 'Ġproducts', '.', 'ĠC', 'asc', 'ading', 'Ġutilization', '

In [162]:
len(tokens)

527

In [194]:
query = "What are the High-Value Products, Materials, and Biofuels that can be obtained from Spent brewer’s yeast (SBY)?"
query_embedding = model.encode([query])

# Retrieve top 5 most similar chunks
distances_l2, indices_l2 = index_l2.search(np.array(query_embedding), k=3)
distances_ip, indices_ip = index_ip.search(np.array(query_embedding), k=3)
distances_cosine, indices_cosine = index_cosine.search(np.array(query_embedding), k=2)

retrieved_text = " ".join([all_chunks[i] for i in indices_cosine[0]])
prompt = f"Answer the question based on the context:\nContext: {retrieved_text}\nQuestion: {query}"
#print(prompt)
len(prompt.split())

334

In [198]:
inputs = tokenizer.encode(prompt, return_tensors="pt")
outputs = llm_model.generate(inputs, max_length=1024, do_sample=True, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Some examples:
# Question: What are the High-Value Products, Materials, and Biofuels that can be obtained from Spent brewer’s yeast (SBY)?


for the production of high-value products and environmental protection?\nThere are a number of different types of SBY. Some SBY in particular contain β-glucans, which are proteins that are able to increase the rate of oxidation of a natural substance. Some SBY in particular contain β-glucans, which are proteins that are able to increase the rate of oxidation of a natural substance. Some SBY in particular contain β-glucans, which are proteins that are able to increase the rate of oxidation of a natural substance. There are two main types of SBY. The first type involves the use of a variety of compounds (such as alcohols and sugars) as a feed additive. The second type involves the use of a variety of compounds (such as alcohols and sugars) to increase the rate of oxidation of a natural substance. SBY are commonly referred to as "high-value" and "low-value" products [14, 15]. The term "high-value" refers to the higher value of an SBY product compared with a standard high-value product. SBY have a high-value value, as do other SBY products, but are not the same. A high-value SBY product can exceed the standard SBY feed additive by over 70%. If a high-value SBY product is consumed in a large quantity, the food additive may be required to be added to the product. If the product is used as a feed additive, the ingredients and the product must be added at the same time. These two requirements have differing levels of success, as will be discussed in the next section. Protein Synthesis As a source of SBY, proteins are made using the reaction of sugar molecules with hydrogen peroxide, which is the primary gas in the brewing process. SBY are made by adding sugars and water together. Many SBY are made using the reaction of sugars with hydrogen peroxide and hydrogen peroxide, which is the primary gas in the brewing process. SBY are made by adding sugars and water together. Many SBY are made using the reaction of sugars with hydrogen peroxide and hydrogen peroxide, which is the primary gas in the brewing process. SBY are made by adding sugars and water together. Most SBY are made using the reaction of sugars with hydrogen peroxide and hydrogen peroxide, which is the primary gas in the brewing process. In addition, most SBY can be mad



# Question:What is the CAGR of spent yeast?

This question has been addressed at the Annual Meeting of the Society of Microorganisms of the International Society of Microbiology.\n[1] Zeko-Pivaˇc. The concept of the CAGR of spent yeast is a reference to the theory of the growth of the yeast, which is a basic theory of the field. It is understood to be based on the concept of the rate of growth of yeast and their ability to ferment a given amount of substrates. The term "spent" is a derivation of the word "spore". In the literature, a yeast is often defined as the capacity for growth of the material that is produced by the active part of the structure (e.g. a cell wall, a water-soluble substrate, etc.).

# Question:What is the cost to produce one gram of Mannose? What is the highest yield of Mannose that could be obtained from spent yeast?"

The yield of the malt of Th 16 24 32 51, which is 1.5 ± 0.4% of the yield of Th 16 24 32 51. This is the highest yield of mannose in the malt industry worldwide and the world's highest yield of mannose; its higher yield is due to the higher level of extract and the more complex chemical reactions that occur in the process. It is important to note that TH is a volatile organic compound, and the highest TH yield of a malt product is based on the fact that the product is the only source of high yield mannose. The highest yield of mannose in mannose is derived from the extraction of TH. There are many possible reasons to follow this reasoning. First of all, the process is very expensive and the extraction temperature is higher than the boiling temperature of mannose, and therefore the higher extraction temperature of mannose also provides a great amount of potential for the synthesis of other mannose compounds. For example, we can obtain the highest yield of Mannose in a single malt at a boiling temperature of 1.5°C, thus obtaining the high yield of mannose at a boiling temperature of 1.5°C. But the highest yield of mannose could not be obtained by the boiling temperature of mannose, as it is highly difficult to extract the highest yield of Mannose from the malt. Secondly, the process requires more energy than the other methods of extraction, and the process is not as efficient as the other methods. Thirdly, the process is very expensive and the extraction temperature is higher than the boiling temperature of mannose. This means that mannose is not a viable source of mannose, and the higher TH yield of mannose is therefore the product of the process. Finally, the process is more complex and requires more energy than the other methods. Because the process is more complex, the time required to produce the highest yield of mannose is reduced. The cost to produce the highest yield from Th 36 24 32 51, which is 1.5 ± 0.4% of the yield of Th 16 24 32 51 is the very highest. The process costs about 1.25-1.5 years of raw mannose, and the higher TH yield of Th 36 24 32 51, this is due to the higher extraction temperature of


# Overall, the responses are not best quality, but considering gpt2 is very limited the retrieval process and RAG seems to work and answer questions correctly to some degree. Please notice the performance can be improved by filtering and re-ranking and by more careful data processing + chunking

In [175]:
query = "What is the CAGR of spent yeast?"
query_embedding = model.encode([query])

# Retrieve top 5 most similar chunks
distances_l2, indices_l2 = index_l2.search(np.array(query_embedding), k=2)
distances_ip, indices_ip = index_ip.search(np.array(query_embedding), k=3)
distances_cosine, indices_cosine = index_cosine.search(np.array(query_embedding), k=3)

retrieved_text = " ".join([all_chunks[i] for i in indices_l2[0]])
prompt = f"Answer the question based on the context:\nContext: {retrieved_text}\nQuestion: {query}"
#print(prompt)
len(prompt.split())

401

In [176]:
inputs = tokenizer.encode(prompt, return_tensors="pt")
outputs = llm_model.generate(inputs, max_length=1024, do_sample=True, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [177]:
response[len(prompt):]

'his question has been addressed at the Annual Meeting of the Society of Microorganisms of the International Society of Microbiology.\n[1] Zeko-Pivaˇc. The concept of the CAGR of spent yeast is a reference to the theory of the growth of the yeast, which is a basic theory of the field. It is understood to be based on the concept of the rate of growth of yeast and their ability to ferment a given amount of substrates. The term "spent" is a derivation of the word "spore". In the literature, a yeast is often defined as the capacity for growth of the material that is produced by the active part of the structure (e.g. a cell wall, a water-soluble substrate, etc.).'

In [178]:
query = "What is the cost to produce one gram of Mannose? What is the highest yield of Mannose that could be obtained from spent yeast?"
query_embedding = model.encode([query])

# Retrieve top 5 most similar chunks
distances_ip, indices_ip = index_ip.search(np.array(query_embedding), k=2)
distances_cosine, indices_cosine = index_cosine.search(np.array(query_embedding), k=2)

retrieved_text = " ".join([all_chunks[i] for i in indices_l2[0]])
prompt = f"Answer the question based on the context:\nContext: {retrieved_text}\nQuestion: {query}"
#print(prompt)
len(prompt.split())

370

In [199]:
response[len(prompt):]

'\nStructure of the synthesis of high-value products: The synthesis of SBY for the production of high-value products is a process that uses a complex of chemicals and materials. The synthesis of the SBY in yeast is achieved by the following three processes: (a) the synthesis of the SBY into the base of the complex; (b) the synthesis of the SBY into ethanol and ethanol-derived compounds; and (c) the synthesis into the base of the complex. The synthesis of the SBY is a process of the synthesis of a mixture of SBY and ethanol, and then the synthesis of ethanol and ethanol-derived compounds. The synthesis of SBY is a process of the synthesis of a mixture of the two molecules that are present in the complex. The synthesis of the SBY is a process of the synthesis of a mixture of the two molecules that are present in the complex. The synthesis of SBY is a process of the synthesis of a mixture of the two molecules that are present in the complex. Synthesis of the SBY is a process of the synthe

In [179]:
inputs = tokenizer.encode(prompt, return_tensors="pt")
outputs = llm_model.generate(inputs, max_length=1024, do_sample=True, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [180]:
response[len(prompt):]

" Answer: The yield of the malt of Th 16 24 32 51, which is 1.5 ± 0.4% of the yield of Th 16 24 32 51. This is the highest yield of mannose in the malt industry worldwide and the world's highest yield of mannose; its higher yield is due to the higher level of extract and the more complex chemical reactions that occur in the process. It is important to note that TH is a volatile organic compound, and the highest TH yield of a malt product is based on the fact that the product is the only source of high yield mannose. The highest yield of mannose in mannose is derived from the extraction of TH. There are many possible reasons to follow this reasoning. First of all, the process is very expensive and the extraction temperature is higher than the boiling temperature of mannose, and therefore the higher extraction temperature of mannose also provides a great amount of potential for the synthesis of other mannose compounds. For example, we can obtain the highest yield of Mannose in a single m