Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] fetch translations online #30759

Open
wants to merge 2 commits into
base: master
from

Conversation

Projects
None yet
9 participants
@mart-e
Copy link
Contributor

mart-e commented Feb 1, 2019

This is a PoC to test the feasibility of such approach, nothing decided yet, creating a PR to make it easier to discuss

Fetch the latest translations from an external provider (e.g. nightly/CDN server) instead of relying on local files. This would allow to:

  • fetch translations more regularly than once per week
  • be able to update translations without having to wait for the source code to be updated
  • reduce the size of the repository/install files by removing .po files from this repository

Questions:

  • how to manage SaaS installations (you do no want the action on one db to impact the files used by another)
    --> solution used, two extraction methods : save to fs or save to ir.attachment
  • how to manage depreciation of versions and nightly server (we won't serve v13 translation files in 2042)?
  • triple check security related stuff as executing files downloaded

I am also taking this opportunity to refactor the translations methods that are quite old and doing strange things (tools.trans_load_data should not create a language, reduce logic done in wizards and tools, make more methods private)

@mart-e mart-e requested review from odony and rco-odoo Feb 1, 2019

@C3POdoo C3POdoo added the RD label Feb 1, 2019

@robodoo robodoo added the seen 🙂 label Feb 1, 2019

@@ -28,12 +28,16 @@ def _get_languages(self):
help="If you check this box, your customized translations will be overwritten and replaced by the official ones.")
state = fields.Selection([('init', 'init'), ('done', 'done')],
string='Status', readonly=True, default='init')
remote_translations = fields.Boolean("Fetch Online Translations", default=True,

This comment has been minimized.

@Yenthe666

Yenthe666 Feb 1, 2019

Contributor

@mart-e wouldn't it be better to set this to False by default? So that you have to turn it on yourself, by purpose. I can imagine people going nuts about changing translations if they don't know about this 😉

This comment has been minimized.

@mart-e

mart-e Feb 1, 2019

Author Contributor

@Yenthe666 please, see my comment on top of the PR, this is still at the very early stage, not even sure it will be accepted ;-)

@Yenthe666

This comment has been minimized.

Copy link
Contributor

Yenthe666 commented Feb 1, 2019

🍰 oh man, this one is awesome.
@mart-e is this what we talked about at Odoo Experience 2 years ago where you said you heavily wanted to reduce the amount of source files for translations?

@@ -47,6 +61,7 @@ class Lang(models.Model):
"Provided ',' as the thousand separator in each case.")
decimal_point = fields.Char(string='Decimal Separator', required=True, default='.', trim=False)
thousands_sep = fields.Char(string='Thousands Separator', default=',', trim=False)
last_fetch_date = fields.Datetime(string='Last Translations Fetch')

This comment has been minimized.

@rco-odoo

rco-odoo Feb 1, 2019

Member

should be by fetch location and lang...

odoo/addons/base/models/res_lang.py Outdated
@@ -88,15 +103,16 @@ def _register_hook(self):
if not self.search_count([]):
_logger.error("No language is active.")

def _load_lang(self):

This comment has been minimized.

@rco-odoo

rco-odoo Feb 1, 2019

Member
Suggested change
def _load_lang(self):
def _activate(self):
odoo/addons/base/models/res_lang.py Outdated
def install_lang(self):
def _get_i18n_url(self, base, lang):
""" Generate the URL to fetch the translation resource
e.g. https://nightly.odoo.com/i18n/13.0/fr.po

This comment has been minimized.

@rco-odoo

rco-odoo Feb 1, 2019

Member
Suggested change
e.g. https://nightly.odoo.com/i18n/13.0/fr.po
e.g. https://nightly.odoo.com/i18n/13.0/fr.tar.xz
@Yenthe666

This comment has been minimized.

Copy link
Contributor

Yenthe666 commented Feb 1, 2019

@mart-e I don't have the time to go through the whole PR but I have a question.
What happens in the following case?
Transifex term in Dutch on 01/02/2019:
Documents
Term in Dutch on 01/02/2019 on local instance:
Documents

On 02/02/2019 somebody locally edits the source term as they want to give it another name, Files instead of Documents
On 03/02/2019 this PR goes off and syncs translations. Will the local translation be kept when it was locally modified (and thus differs from the Transifex terms)? Will this work like a noupdate functionality then or?

odoo/addons/base/models/res_lang.py Outdated
@@ -182,7 +383,8 @@ def install_lang(self):

@tools.ormcache('code')
def _lang_get_id(self, code):
return (self.search([('code', '=', code)]) or

return (self.with_context(active_test=False).search([('code', '=', code)]) or

This comment has been minimized.

@rco-odoo

rco-odoo Feb 1, 2019

Member

maybe remove those weird fallbacks?

@rim-odoo

This comment has been minimized.

Copy link
Contributor

rim-odoo commented Feb 1, 2019

👍 for the purpose of this PR !

@sbidoul

This comment has been minimized.

Copy link
Contributor

sbidoul commented Feb 1, 2019

@mart-e it do you consider supporting custom addons too?

@mart-e

This comment has been minimized.

Copy link
Contributor Author

mart-e commented Feb 1, 2019

@Yenthe666 ideally, it will be invisible for the user, if the terms are fetched online or on the fs, it should not change the way you interact with translations and Transifex.

@sbidoul of course, this is why there is a field i18n_location on the ir.module.module. The implementation may still change but the goal is to support multiple translation servers.

from operator import itemgetter
from io import BytesIO
from urllib.parse import urljoin

This comment has been minimized.

@xmo-odoo

xmo-odoo Feb 1, 2019

Collaborator

Use werkzeug.urls, not urllib.parse

This comment has been minimized.

@mart-e

mart-e Feb 1, 2019

Author Contributor

what's the difference?

This comment has been minimized.

@xmo-odoo

xmo-odoo Feb 1, 2019

Collaborator

Mostly that we use werkzeug.urls everywhere. Partially for compat' during the python 3 transition, partially because it often deals better with the bytes/text dichotomy.

odoo/addons/base/models/res_lang.py Outdated
return urljoin(urljoin(urljoin(base,
"i18n/"),
version),
lang + ".tar.xz")

This comment has been minimized.

@xmo-odoo

xmo-odoo Feb 1, 2019

Collaborator

Seems unnecessary to use urljoin for the path we're providing, why not a regular format string or str.join? It's not like the separator can change or anything.

This comment has been minimized.

This comment has been minimized.

@mart-e

mart-e Feb 1, 2019

Author Contributor

@rim-odoo no, it's an URL

bio = BytesIO()
bio.write(stream.content)
bio.seek(0)
self._extract_i18n_file_content(bio, lang, urls[url], extraction_method)

This comment has been minimized.

@xmo-odoo

xmo-odoo Feb 1, 2019

Collaborator

With stream.raw the tarfile can be decompressed on the fly (streaming). Requires opening with r|* to ensure it never seeks.

"""
with tarfile.open(mode='r:xz', fileobj=fileobj) as tar_content:
with tempfile.TemporaryDirectory() as tmp:
for filename in tar_content.getnames():

This comment has been minimized.

@xmo-odoo

xmo-odoo Feb 1, 2019

Collaborator

Might be worth iterating on the TarFile directly to get TarEntry objects so we can check the entry's size, & keep a running tally or somesuch to avoid decompression bombs? (using xz, 6.5GB of zeroes compress to <1MB).

Might also need to check if the entry isfile(), extract/extractfile apparently blows up with non-file entries.

from odoo.tools.safe_eval import safe_eval
from odoo.exceptions import UserError, ValidationError

_logger = logging.getLogger(__name__)

DEFAULT_DATE_FORMAT = '%m/%d/%Y'
DEFAULT_TIME_FORMAT = '%H:%M:%S'
MAX_FILE_SIZE = 15 * 1024 * 1024 # in megabytes
REFRESH_THRESHOLD = timedelta(seconds=60 * 60 * 12) # 12 hours

This comment has been minimized.

@xmo-odoo

xmo-odoo Feb 1, 2019

Collaborator

FWIW timedelta(hours=12) is valid.

continue

_logger.debug("Extracting translation file %s to %s", filename, tmp)
tar_content.extract(filename, path=tmp)

This comment has been minimized.

@xmo-odoo

xmo-odoo Feb 1, 2019

Collaborator

Might be able to skip the tempdir with extractfile (which returns a pseudo-file object backed by the tarchive)

This comment has been minimized.

@mart-e

mart-e Feb 1, 2019

Author Contributor

the thing is that my file is something like account/i18n/fr.po, having one dir made it quite easier

@naglis
Copy link
Contributor

naglis left a comment

I find it an awesome feature. Great effort! 👍

for url in urls:
full_url = self._get_i18n_url(url, lang)
try:
stream = requests.get(full_url, stream=True)

This comment has been minimized.

@naglis

naglis Feb 9, 2019

Contributor

Set a timeout to avoid indefinite hanging? I know that this would be handled by worker timeout, but maybe we don't need to wait that long.

for url in urls:
full_url = self._get_i18n_url(url, lang)
try:
stream = requests.get(full_url, stream=True)

This comment has been minimized.

@naglis

naglis Feb 9, 2019

Contributor

Since it is likely there will be multiple requests to the same host, I'd suggest to utilize requests.Session, to reuse the underlying TCP connection (potential performance increase).

for url in urls:
full_url = self._get_i18n_url(url, lang)
try:
stream = requests.get(full_url, stream=True)

This comment has been minimized.

@naglis

naglis Feb 9, 2019

Contributor

From requests docs:

If you set stream to True when making a request, Requests cannot release the connection back to the pool unless you consume all the data or call Response.close. This can lead to inefficiency with connections. If you find yourself partially reading request bodies (or not reading them at all) while using stream=True, you should make the request within a with statement to ensure it’s always closed:

with requests.get('https://httpbin.org/get', stream=True) as r:
    # Do things with the response here.

So I think it would be safer to use the response as a context manager.

full_url = self._get_i18n_url(url, lang)
try:
stream = requests.get(full_url, stream=True)
if stream.status_code != 200:

This comment has been minimized.

@naglis

naglis Feb 9, 2019

Contributor

What if we used if not stream.ok to be more lenient?

continue

if int(stream.headers['content-length']) > MAX_FILE_SIZE:
raise UserError("Content too long (got %.2fMB, max %.2fMB)" % (

This comment has been minimized.

@naglis

naglis Feb 9, 2019

Contributor

Since this is a user-facing error message, would be nice to have it translatable.


from odoo import api, fields, models, http, tools, release, _
from odoo.modules.module import ad_paths
from odoo.modules import get_module_path, get_module_resource

This comment has been minimized.

@naglis

naglis Feb 9, 2019

Contributor

I think get_resource_path is the more correct name to import and use, get_module_resource is marked for backwards compatibility.

This comment has been minimized.

@Yenthe666

Yenthe666 Feb 9, 2019

Contributor

@naglis Odoo already has a function named get_resource_path though to find and load images etc. You can find it at from odoo.modules.module import get_resource_path so would be unwise to introduce an identical name with another use probably.

This comment has been minimized.

@naglis

naglis Feb 9, 2019

Contributor

@Yenthe666, I think my comment was not clear enough. get_resource_path and get_module_resource are the same function, where get_module_resource is the old name, kept for backwards compatibility. I think any new code should use the newer name get_resource_path.

return urljoin(base, "i18n/{}/{}.tar.xz".format(version, lang))

@api.model
def _read_po_from_i18n_folder(self, addons, lang):

This comment has been minimized.

@naglis

naglis Feb 9, 2019

Contributor

I find the variable addons here and below to be misleading. It suggests it is a list of strings of addon names, where it actually is a string with addon name.


now = fields.Datetime.now()
for lang in self - inactive_languages:
if not lang.last_fetch_date or \

This comment has been minimized.

@naglis

naglis Feb 9, 2019

Contributor

I might be dull, but I could not find where last_fetch_date is updated.

This comment has been minimized.

@mart-e

mart-e Feb 11, 2019

Author Contributor

@naglis currently nowhere, I am not even sure this field will stay

Show resolved Hide resolved odoo/addons/base/models/ir_module.py

src_file = os.path.join(tmp, extracted_file)
dst_file = os.path.join(path, extracted_file)
shutil.move(src_file, dst_file)

This comment has been minimized.

@naglis

naglis Feb 9, 2019

Contributor

Is the addons_path/addon/i18n directory guaranteed to exist? If not, I believe the move will fail if it does not exist.

@mart-e

This comment has been minimized.

Copy link
Contributor Author

mart-e commented Feb 11, 2019

@naglis Thank you for the review but keep in mind this is very early work. It does not work yet and it will probably still change a lot so no need to spend too much time on details of implementation at the moment.

@mart-e mart-e force-pushed the odoo-dev:master-download-languages-mat branch 5 times, most recently Feb 19, 2019

mart-e added some commits Sep 20, 2018

[WIP] base: fetch translations online
TODO:
- double check not possible to change value of i18n_location for standard modules
- take care of warning https://docs.python.org/3/library/tarfile.html#tarfile.TarFile.extractall
- document how to generate translation files
- .tar.xz ?
[IMP] base: rename install_lang method
Was confusing as did not installed any language

@mart-e mart-e force-pushed the odoo-dev:master-download-languages-mat branch to db30163 Feb 19, 2019

@xmo-odoo
Copy link
Collaborator

xmo-odoo left a comment

I am cow, hear me moo

@mart-e mart-e requested a review from xmo-odoo Mar 5, 2019

@xmo-odoo
Copy link
Collaborator

xmo-odoo left a comment

This is not an approval.

testing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.