# PDF Parser using LLAMA-Parser

### Method 1: 

https://www.youtube.com/watch?v=7DJzHncUlpI

https://colab.research.google.com/drive/18KB9yXxDUeQGrEZEP1eCrXQ0dNB-Oazm?usp=sharing

https://docs.llamaindex.ai/en/latest/examples/cookbooks/llama3_cookbook_ollama_replicate/


- Set up the account here:

https://cloud.llamaindex.ai/

- Install llama parser: 

%pip install llama-parse




In [1]:
import sys, os
sys.path.append(os.path.abspath(os.path.join('..', 'secret')))
from secret_info import llama_parser

In [3]:
# The video says that this is not needed outside Collab, but I got this error: 
#       RuntimeError: The event loop is already running. Add `import nest_asyncio; nest_asyncio.apply()` to your code to fix this issue.
import nest_asyncio
nest_asyncio.apply()

In [4]:
from llama_parse import LlamaParse

document = LlamaParse(api_key=llama_parser, result_type="markdown").load_data("my_latest_paper.pdf")
print(document[0].text[:1000])
file_name = "paper.md"
with open(file_name, 'w') as file:
    file.write(document[0].text)

Started parsing the file under job_id 40588c92-6727-4987-a84b-bc45588aba48
........Downloaded from https://royalsocietypublishing.org/ on 19 July 2023

# High physiological function for corals with thermally tolerant, host-adapted symbionts

royalsocietypublishing.org/journal/rspb

Kira E. Turnham1 , Matthew D. Aschaffenburg 2, D. Tye Pettay 3 , David A. Paz-García4 , Héctor Reyes-Bonilla5 , Jorge Pinzón 1 , Ellie Timmins 1 , Robin T. Smith6, Michael P. McGinley 2 , Mark E. Warner 2 and Todd C. LaJeunesse1

Research

Cite this article: Turnham KE et al. 2023

High physiological function for corals with thermally tolerant, host-adapted symbionts. Proc. R. Soc. B 290: 20231021.

https://doi.org/10.1098/rspb.2023.1021

Received: 09 May 2023

Accepted: 23 June 2023

KET, 0000-0001-9236-7237; DTP, 0000-0002-2060-3226; DAP-G, 0000-0002-1228-5221; HR-B, 0000-0003-2593-9631; JP, 0000-0002-3330-8226; MEW, 0000-0003-1015-9413; TCL, 0000-0001-7607-9358

Subject Category: Ecology

Subject Areas: e

In [5]:
documents_with_instruction = LlamaParse(api_key=llama_parser,
    result_type="markdown",
    parsing_instruction="""
    What are the 5 most important points of the article. Present in bullet points. 
    """
    ).load_data("my_latest_paper.pdf")

Started parsing the file under job_id 2875a26a-dc1d-4eb2-a4a4-0f8fec3c48e3


In [6]:
print(documents_with_instruction[0].text[:1000])

#

# Article Summary

# Article Summary

# High physiological function for corals with thermally tolerant, host-adapted symbionts

- The flexibility to associate with more than one symbiont may considerably expand a host’s niche breadth.
- Symbiont identity greatly affects a coral’s ability to cope with extremes in temperature and light.
- Pocillopora grandis depends on mutualisms with the dinoflagellates Durusdinium glynnii and Cladocopium latusorum.
- Hosting the more thermally tolerant D. glynnii shows negligible physiological differences in colonies.
- Lessons from the Eastern Pacific indicate that co-evolved thermally tolerant host–symbiont combinations are likely to dominate reef ecosystems in the future.
---
#

# Article Summary

# Article Summary

# 5 Most Important Points:

- Corals hosting symbionts that maintain physiological function under stress conditions have reduced coral bleaching, mortality, and faster recoveries.
- Partner specificity between coral and symbiont is in

### Method 2: 

Essentially is the same as the first method as the default to method 1 is SimpleDirectoryReader

https://docs.cloud.llamaindex.ai/llamaparse/getting_started/python


In [7]:
from llama_parse import LlamaParse
from llama_index.core import SimpleDirectoryReader

# set up parser
parser = LlamaParse(api_key=llama_parser,
    result_type="markdown"  # "markdown" and "text" are available
)
# use SimpleDirectoryReader to parse our file
file_extractor = {".pdf": parser}
document2 = SimpleDirectoryReader(input_files=['my_latest_paper.pdf'], file_extractor=file_extractor).load_data()

file_name = "paper2.md"
with open(file_name, 'w') as file:
    file.write(document2[0].text)

Started parsing the file under job_id cac11eca-f0b4-4a47-9835-c765331005e4


In [13]:
print(document2[0].text)

Downloaded from https://royalsocietypublishing.org/ on 19 July 2023

# High physiological function for corals with thermally tolerant, host-adapted symbionts

royalsocietypublishing.org/journal/rspb

Kira E. Turnham1 , Matthew D. Aschaffenburg 2, D. Tye Pettay 3 , David A. Paz-García4 , Héctor Reyes-Bonilla5 , Jorge Pinzón 1 , Ellie Timmins 1 , Robin T. Smith6, Michael P. McGinley 2 , Mark E. Warner 2 and Todd C. LaJeunesse1

Research

Cite this article: Turnham KE et al. 2023

High physiological function for corals with thermally tolerant, host-adapted symbionts. Proc. R. Soc. B 290: 20231021.

https://doi.org/10.1098/rspb.2023.1021

Received: 09 May 2023

Accepted: 23 June 2023

KET, 0000-0001-9236-7237; DTP, 0000-0002-2060-3226; DAP-G, 0000-0002-1228-5221; HR-B, 0000-0003-2593-9631; JP, 0000-0002-3330-8226; MEW, 0000-0003-1015-9413; TCL, 0000-0001-7607-9358

Subject Category: Ecology

Subject Areas: ecology, ecosystems, evolution

Keywords: functional ecology, mutualism, Pocillopora

### Method 3: 

https://www.llamaindex.ai/blog/mastering-pdfs-extracting-sections-headings-paragraphs-and-tables-with-cutting-edge-parser-faea18870125

https://github.com/nlmatics/llmsherpa

- Install llsherpa
!pip install llmsherpa 



In [14]:
from llmsherpa.readers import LayoutPDFReader

llmsherpa_api_url = "https://readers.llmsherpa.com/api/document/developer/parseDocument?renderFormat=all"
pdf_path = "my_latest_paper.pdf" 
pdf_reader = LayoutPDFReader(llmsherpa_api_url)
doc = pdf_reader.read_pdf(pdf_path)
file_name = "paper3.md"
with open(file_name, 'w') as file:
    file.write(doc.to_text())

In [15]:
print(doc.to_text()[:1000])

High physiological function for corals with thermally tolerant, host-adapted symbionts
Research
Cite this article: Turnham KE et al. 2023
High physiological function for Kira E. Turnham1, Matthew D. Aschaffenburg2, D. Tye Pettay3, David A. Paz-García4, Héctor Reyes-Bonilla5, Jorge Pinzón1, Ellie Timmins1, Robin T. Smith6, Michael P. McGinley2, Mark E. Warner2 and Todd C. LaJeunesse1
corals with thermally tolerant, host-adapted
1Department of Biology, The Pennsylvania State University, University Park, PA, USA
symbionts.
Proc.
R. Soc. B 290: 20231021.
2School of Marine Science and Policy, University of Delaware, Lewes, DE, USA 3Department of Natural Sciences, University of South Carolina Beaufort, 801 Carteret Street, Beaufort, SC 29902,USA
https://doi.org/10.1098/rspb.2023.1021 4Centro de Investigaciones Biológicas del Noroeste (CIBNOR), Av.
IPN 195, La Paz, Baja California Sur 23096, México
Received: 09 May 2023
5Universidad Autónoma de Baja California Sur, Carretera al Sur 5.5, La Pa

In [16]:
from IPython.core.display import HTML
HTML(doc.tables()[6].to_html())

0,1,2,3,4,5,6
treatment,–,–,–,0.005,<0.001,0.7
species × date,0.92,<0.001,–,0.093,0.24,0.22
species × treatment,–,–,0.11,0.007,<0.001,0.021
treatment × date,–,–,–,0.038,0.064,0.065
species ×,–,–,–,0.092,0.70,0.88


In [17]:
# Show the introduction only 
HTML(doc.sections()[20].to_html(include_children=True, recurse=True))


0,1,2,3,4,5,6,7
symbiont volume per host area,,,7714.3 ± 2029,,12039.7 ± 3957,,0.005
symbiont summer MI maximum,,,1.42 ± 0.28,,,7.82 ± 1.33,<0.001
symbiont winter MI maximum,,,5.4 ± 1.47,,,3.17 ± 1.05,0.066
symbiont MI during summer heated treatment,,,0.45 ± 0.22,,,11.58 ± 2.59,0.004
symbiont MI during summer control treatment,,,1.18 ± 0.45,,,4.49 ± 0.56,0.26
Fv/Fm in thermal treatment (experiment day 7),,,0.35 ± 0.081,,,0.46 ± 0.026,<0.001
Fv/Fm in control treatment (experiment day 7),,,0.46 ± 0.029,,,0.47 ± 0.044,0.68
σPSII in thermal treatment (experiment day 7) (Å 2),,,320 ± 33.23,,,255 ± 19.24,<0.001
σPSII in control treatment (experiment day 7) (Å 2),,,248 ± 32.98,,,242 ± 34.71,0.62
τ in thermal treatment (experiment day 7) (µseconds),,,600 ± 78.91,,,425 ± 99.63,0.002

0,1,2
26ºC 32ºC,,
ramp treatment 32º,,
biometrics,,

0,1,2
0,7,symbiont cell division rates

0,1,2,3,4,5
Fv/Fm,,,,,
0.98,0.29,0.005,<0.001,<0.001,0.001

0,1,2,3,4,5,6
treatment,–,–,–,0.005,<0.001,0.7
species × date,0.92,<0.001,–,0.093,0.24,0.22
species × treatment,–,–,0.11,0.007,<0.001,0.021
treatment × date,–,–,–,0.038,0.064,0.065
species ×,–,–,–,0.092,0.70,0.88


In [18]:
from langchain_community.llms import Ollama
import pandas as pd
# Generate the model
llm3 = Ollama(model ='llama3')

In [19]:
import ollama
ollama.chat(model='llama3', messages=[{'role': 'user', 'content': 'Why is the sky blue?'}])


{'model': 'llama3',
 'created_at': '2024-06-25T03:01:02.901122Z',
 'message': {'role': 'assistant',
  'content': "A question that has puzzled humans for centuries!\n\nThe sky appears blue because of a phenomenon called scattering. When sunlight enters Earth's atmosphere, it encounters tiny molecules of gases such as nitrogen (N2) and oxygen (O2). These molecules scatter the light in all directions, but they scatter shorter (blue) wavelengths more than longer (red) wavelengths.\n\nThis is known as Rayleigh scattering, named after the British physicist Lord Rayleigh, who first described the phenomenon in the late 19th century. The shorter wavelengths of light, such as blue and violet, are scattered more than the longer wavelengths, like red and orange, because they have a higher frequency and interact more strongly with the small molecules in the atmosphere.\n\nAs a result, the blue light is dispersed throughout the atmosphere, giving the sky its blue appearance. The color of the sky can

In [20]:
from langchain_community.llms import Ollama
import pandas as pd
# Generate the model
llm3 = Ollama(model ='llama3')
sect = doc.sections()[20].to_text(include_children=True, recurse=True)
llm3.invoke(f"given {sect} what is the main conclusion of this document ")

'The main conclusion of this document appears to be that:\n\n1. The symbiotic relationship between certain coral species (such as Pocillopora) and their symbionts (specifically Durusdinium spp.) is crucial for the health and productivity of these corals.\n2. This mutualism allows for optimized physiological performance, including high rates of photosynthesis per cell and carbon translocation under higher temperatures.\n3. In contrast, some coral-symbiont pairings may not have a co-evolutionary history, leading to reduced growth, lower energy reserves, and smaller oocyte sizes.\n4. The presence of stress-tolerant symbionts, such as D. glynnii, can help corals adapt to climate change by increasing thermal tolerance.\n5. These findings suggest that the persistence of reef corals in a changing climate may rely on the continued presence and diversity of these symbiotic relationships.\n\nOverall, the study emphasizes the importance of understanding the complex interactions between coral host

In [21]:
print('The main conclusions of this document are:\n\n1. The functional convergence of different host-symbiont combinations on the Great Barrier Reef indicates that certain symbionts (Durusdinium) can have a negative impact on coral growth, leading to reduced calcification rates and lower tissue biomass.\n2. These negative impacts seem to be related to rare or introduced species of Durusdinium, rather than co-evolved mutualisms between corals and their symbionts.\n3. Co-evolved mutualisms, such as those between Pocillopora corals and D. glynnii symbionts, can support high rates of photosynthesis per cell and maintain carbon translocation under higher temperatures.\n4. The persistence of reef corals in a changing climate depends on their ability to harbor stress-tolerant symbionts that raise the thermal tolerance of host corals by 1-2°C.\n5. The physiological integration between co-evolved mutualisms can create fast-growing, highly fecund colonies with few observable trade-offs.\n\nOverall, the study suggests that understanding the natural history of reef corals and their symbionts is crucial for explaining physiological and ecological patterns and processes, and that co-evolved mutualisms may be more resilient to climate change than rare or introduced species.')

The main conclusions of this document are:

1. The functional convergence of different host-symbiont combinations on the Great Barrier Reef indicates that certain symbionts (Durusdinium) can have a negative impact on coral growth, leading to reduced calcification rates and lower tissue biomass.
2. These negative impacts seem to be related to rare or introduced species of Durusdinium, rather than co-evolved mutualisms between corals and their symbionts.
3. Co-evolved mutualisms, such as those between Pocillopora corals and D. glynnii symbionts, can support high rates of photosynthesis per cell and maintain carbon translocation under higher temperatures.
4. The persistence of reef corals in a changing climate depends on their ability to harbor stress-tolerant symbionts that raise the thermal tolerance of host corals by 1-2°C.
5. The physiological integration between co-evolved mutualisms can create fast-growing, highly fecund colonies with few observable trade-offs.

Overall, the study s

In [22]:
# doc.sections()[20].to_text()

doc.sections()[20].to_text(include_children=True, recurse=True)

'4. Discussion\n(a) The functional convergence of different host– symbiont combinations\n(i) Mutualisms converged on a functionally stable and productive unit\nThe similarities in growth as well as gamete production, seen here, indicate that colonies derive similar metabolic benefits from hosting evolutionarily divergent symbionts (figure 2).\nThe steady-state condition, or phenotype, of P. grandis with each symbiont species is noticeably distinct.\nYet the emergent effect of each combination produces functionally similar mutualisms under normal environmental conditions.\nWhile colonies with C. latusorum have considerably fewer symbiont cells per surface area, this difference is compensated by the greater C. latusorum cell size relative to D. glynnii (figure 2b).\nTherefore, the total standing biomass of each resident symbiont population is nearly equivalent and partially explains the similarities in attributes related to colony growth and reproduction.\nWhile estimates of nutrient tra

In [23]:
doc.json

[{'block_class': 'cls_0',
  'block_idx': 0,
  'level': 0,
  'page_idx': 0,
  'sentences': ['royalsocietypublishing.org/journal/rspb'],
  'tag': 'para'},
 {'block_class': 'cls_6',
  'block_idx': 1,
  'level': 0,
  'page_idx': 0,
  'sentences': ['High physiological function for corals with thermally tolerant, host-adapted symbionts'],
  'tag': 'header'},
 {'block_class': 'cls_1',
  'block_idx': 2,
  'level': 1,
  'page_idx': 0,
  'sentences': ['Research'],
  'tag': 'header'},
 {'block_class': 'cls_2',
  'block_idx': 3,
  'level': 2,
  'page_idx': 0,
  'sentences': ['Cite this article: Turnham KE et al. 2023'],
  'tag': 'header'},
 {'block_class': 'cls_3',
  'block_idx': 4,
  'level': 3,
  'page_idx': 0,
  'sentences': ['High physiological function for Kira E. Turnham1, Matthew D. Aschaffenburg2, D. Tye Pettay3, David A. Paz-García4, Héctor Reyes-Bonilla5, Jorge Pinzón1, Ellie Timmins1, Robin T. Smith6, Michael P. McGinley2, Mark E. Warner2 and Todd C. LaJeunesse1'],
  'tag': 'para'},
 {'