Skip to content

[Bug]: Smart PDF Loader - Failed to establish a new connection #14902

@RGalkin

Description

@RGalkin

Bug Description

I'm running the example from the website:

from llama_index.readers.smart_pdf_loader import SmartPDFLoader

llmsherpa_api_url = "https://readers.llmsherpa.com/api/document/developer/parseDocument?renderFormat=all"
pdf_url = "https://arxiv.org/pdf/1910.13461.pdf"  # also allowed is a file path e.g. /home/downloads/xyz.pdf
pdf_loader = SmartPDFLoader(llmsherpa_api_url=llmsherpa_api_url)
documents = pdf_loader.load_data(pdf_url)

I'm getting the following error:
`An exception occurred: HTTPSConnectionPool(host='readers.llmsherpa.com', port=443): Max retries exceeded with url: /api/document/developer/parseDocument?renderFormat=all (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000002671A13C910>: Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond'))

urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x000002671A13C910>: Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond

The above exception was the direct cause of the following exception:
raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type]
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='readers.llmsherpa.com', port=443): Max retries exceeded with url: /api/document/developer/parseDocument?renderFormat=all (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000002671A13C910>: Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond'))`

Version

llama-index 0.10.54
llama-index-readers-smart-pdf-loader 0.1.4

Steps to Reproduce

from llama_index.readers.smart_pdf_loader import SmartPDFLoader

llmsherpa_api_url = "https://readers.llmsherpa.com/api/document/developer/parseDocument?renderFormat=all"
pdf_url = "https://arxiv.org/pdf/1910.13461.pdf"  # also allowed is a file path e.g. /home/downloads/xyz.pdf
pdf_loader = SmartPDFLoader(llmsherpa_api_url=llmsherpa_api_url)
documents = pdf_loader.load_data(pdf_url)

Relevant Logs/Tracbacks

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtriageIssue needs to be triaged/prioritized

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions