Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use it as translation bot #10

Closed
sdugoten opened this issue Mar 20, 2023 · 5 comments
Closed

Use it as translation bot #10

sdugoten opened this issue Mar 20, 2023 · 5 comments

Comments

@sdugoten
Copy link

First of all, thanks for creating the code that allow people to feed in PDF.

I was trying to change the prompt to something like

"`You are an AI assistant providing translation service of the document. You are given the following extracted parts of a long document and a question. "

However, it seems the program do not have the concept of page number. When i try to tell the bot to translate page 1 into English, it will return some random page and do the translation. I wonder if this bot is able to work like a translator from some foreign language into English?

My ultimate goal is to feed in a foreign language pdf and it will translate into a English PDF that I can download.

Thanks.

@mayooear
Copy link
Owner

Hi, thanks for the feedback.

Based on what you're saying the translation works well, but not for the page you want? What language is this?

I will add this feature of page numbers as a PR soon.

@sdugoten
Copy link
Author

sdugoten commented Mar 22, 2023

It's a Japanese light novel. You can try that here https://ufile.io/10rqqw7j

Basically, I feed the PDF into chatbot, and then have the prompt setup like "You are an AI assistant providing translation service of the document. You are given the following extracted parts of a long document and a question"

Then, I asked the question, "Can you translate page 1 of the PDF into English". The bot will translate some random page out of the PDF. If you try to ask chatbot to translate the whole PDF into English, it wont' work as well.

@mayooear
Copy link
Owner

Generally openai's embeddings aren't great for multilingual.

If you ask it to translate from English to Japanese how is the performance?

@sdugoten
Copy link
Author

Generally openai's embeddings aren't great for multilingual.

If you ask it to translate from English to Japanese how is the performance?

You can use your provided court case PDF to test using my prompt.

"You are an AI assistant providing translation service of the document. You are given the following extracted parts of a long document and a question"

Using your provided PDF as an example, even if you specifically ask GPT to translate page 1 of the PDF, It will still pick one random page from the PDF and translate it into whatever language you asked. That's why I said it seems like it doesn't have the concept of page. It looks like it's not about multilingual, it's about how to explain to GPT , he has to understand page number, and able to pin point exactly the page that we refer to and use that as input for translation.

perhaps , you can add some debug coding on the result so that we can know which page GPT is currently looking at when we ask the question.

@mayooear
Copy link
Owner

There is no concept of page because the chunks are currently split by character count. I will add a PR later to split the PDF docs by page number later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants