Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multilingual support #470

Merged
merged 8 commits into from
Jan 24, 2024
Merged

Multilingual support #470

merged 8 commits into from
Jan 24, 2024

Conversation

ArpitaisAn0maly
Copy link
Contributor

@ArpitaisAn0maly ArpitaisAn0maly commented Jan 18, 2024

Multilingual Support.

  • Added code which detects language from user questions and translates response in same language in order to get more deterministic language translation.

  • Language translation using prompt engineering was not giving concrete results. We also have query language which is default index/prompt language so we dont want to complicate prompt with many language parameters.

  • Used deep_translator python library for translation and used langdetect for language detection for user question and response translation.deep_translator allows to perform language detection, but it does not work out of the box. you have to register and get the api key so that is why I used langdetect.

  • with these approach it only sometimes misses translation but otherwise definitely an improvement over existing implementation. Translation is not 100% with LLM so this is acceptable.

  • SS below Albanian and French question/response while in bg prompt and index is in english

image

@ArpitaisAn0maly
Copy link
Contributor Author

ArpitaisAn0maly commented Jan 24, 2024

PR Updated.

  • Language detection and translation with Azure Translator API
  • Search query translation using Azure translator API. Removal of query term language from prompt for translation.
  • Removal of Query Language and speller from semantic reranking.
  • Kept some language in prompt for translation of default responses.
  • Kept query term language in prompt to help with robust retrieval.

@dayland dayland merged commit 4494d75 into vNext-Dev Jan 24, 2024
6 checks passed
@dayland dayland deleted the aparmar/5784-multilingual-support branch January 24, 2024 22:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants