Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Helper for calling a translation API #692

Open
pudo opened this issue Mar 26, 2024 · 2 comments
Open

Helper for calling a translation API #692

pudo opened this issue Mar 26, 2024 · 2 comments

Comments

@pudo
Copy link
Member

pudo commented Mar 26, 2024

To be used for description text, but also for people names:

from zavod import helpers as h

eng_name = h.translate_text(context, name, lang="eng")

(Passing in context so that we can use context.cache)

See:
https://cloud.google.com/translate/docs/basic/translate-text-basic

It would be cleaner if we made this an enricher that puts the translated statements into a bespoke dataset.

@pudo
Copy link
Member Author

pudo commented Aug 7, 2024

@jbothma Do we need to productise this any further? Would it be worth to have a little dive into existing crawlers that may benefit from it?

@jbothma
Copy link
Contributor

jbothma commented Aug 7, 2024

It seems to have done a nice job on the Georgian declarations. It could be worth asking Frederik to sanity check some of the arabic before going all out on arabic-relevant countries. We don't have a lot of merges to compare but the three that have been merged match other transliterations in the entity.

should we promote it from zavod.shed to zavod.helpers.names? or perhaps an explicitly LLM namespace is important in helpers.

to expand to other crawlers, should we just pick any crawlers with names consistently in a non-latin script, prioritising countries where business is done in multiple scripts?

if you still reckon an enricher is ideal, maybe now is the time to try it with a couple of datasets, e.g.

config:
  translit:
    ge_declarations:
      from: kat
      to:
        - code: eng
          script: Latin
          lang: English
        - code: rus
          script: Cyrillic
          lang: Russian
    cn_peoples_congress_wikipedia:
      from: chi
      to:
        - code: eng
          script: Latin
          lang: English

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants