Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Page numbers returned in response #57

Closed
stevelizcano opened this issue Mar 2, 2024 · 10 comments
Closed

Page numbers returned in response #57

stevelizcano opened this issue Mar 2, 2024 · 10 comments
Assignees
Labels
enhancement New feature or request

Comments

@stevelizcano
Copy link

For some indexing into our VectorDB, it's very helpful to know the page numbers if possible.

Is this feature possible now, or to add?

Either way, love the library. Great work!

@anoopshrma
Copy link
Collaborator

Currently page number is not returned in the metadata. But page content is separated using "-----------". You can separate the text based on the string pattern and provide page number for that by yourself.

PS: Adding more info in metadata is being implemented and will be available soon.

@hexapode hexapode added the enhancement New feature or request label Mar 4, 2024
@hexapode
Copy link
Member

hexapode commented Mar 4, 2024

We will provide a more reliable way to get page number (JSON output) in a future release

@hexapode hexapode self-assigned this Mar 4, 2024
@Laktus
Copy link

Laktus commented Apr 25, 2024

@hexapode When would such a release come? Until then is @anoopshrma suggestion a valid solution? Can we just replace each occurence of "-----------" with its respective page number?

Would be nice if you could approve this @hexapode

@anoopshrma
Copy link
Collaborator

Hey @Laktus , json mode is out for quite some time now.
You can give it a try: https://github.com/run-llama/llama_parse/blob/main/examples/demo_json_parsing.ipynb

@Laktus
Copy link

Laktus commented Apr 26, 2024

@anoopshrma Hi, is there also LlamaIndexTS support? Can i somewhere see a list of all supported modes? (Markdown, JSON, any more?). Thanks.

@salujav4
Copy link

Hey @Laktus, I am facing the same issue, I am using LlamaParseReader in typescript and I don't see the json format there, just text and markdown

@BinaryBrain
Copy link
Member

You can now retrieve page numbers and other information using the Json output.

Using the API:

GET https://api.cloud.llamaindex.ai/api/parsing/job/<jobId>/result/json

Using the TS lib:

const reader = new LlamaParseReader({ resultType: "json" });
const json = await reader.loadJson(path);

@BinaryBrain BinaryBrain self-assigned this Jul 8, 2024
@thisadev
Copy link

Hi, Can we get the page number using Python?

@aceci0127
Copy link

You can now retrieve page numbers and other information using the Json output.

Using the API:

GET https://api.cloud.llamaindex.ai/api/parsing/job/<jobId>/result/json

Using the TS lib:

const reader = new LlamaParseReader({ resultType: "json" });
const json = await reader.loadJson(path);

what is the JobID?

@BinaryBrain
Copy link
Member

Hi, Can we get the page number using Python?

Yes, if you get the Json results.

You can also use {pageNumber} in the page separator and page prefix and suffix

what is the JobID?

https://docs.cloud.llamaindex.ai/llamaparse/getting_started/api
Using the API, you first upload your file using https://api.cloud.llamaindex.ai/api/parsing/upload endpoint.
It'll return a jobID, and with the jobId, you can retrieve results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

8 participants