Refactor `Catalogs.extract()` to implement more consistent behaviour #231

stucox · 2016-06-16T23:15:32Z

Pretty much by definition this is a breaking change — see #204 and the aim & the included tests for details of cases I’ve covered. Not sure the best way to communicate this to users.

Note that in this version /data/ isn’t treated as a special directory anymore.

I’ve tested this on a very large project I work on and it does what I’d expect:

Removes unnecessary translations (because of Grow extract extracts strings for non-specified locales #157)
No longer extracts tagged terms in ‘global’ locations (i.e. where no document part, document or collection locale can be determined) for all locales defined in the pod: just for those locales in podspec.yaml

This last point caused some work for us: we had a “globals.yaml” file which assumed everything was relevant to every locale (which was wrong: not every locale used those) and hence didn’t declare any locales… which worked until now. Solved by either listing all relevant locales in that file, or listing them all in podspec.yaml (the list there only contained primary locales, which might be better defined at collection level).

Note: Would appreciate if people could test on some other projects.

I’ll try a few others of ours while @jeremydw is checking out the code.

…zable strings & update helpers to allow these pods to be used easily

True by default, as that's the current & most common behaviour

… the correct locale catalogs

…talog.save()` with `include_obsolete` arg for `Catalog.update()` and `Catalog.update_using_catalog()` This is more in line with `babel.Catalog` behaviour, but better suits Grow needs of wanting to include obsolete terms _without marking them obsolete_ (which is what Babel would do). `Catalogs.extract()` can then lean on this behaviour and hence be simpler.

…ed tests

jeremydw · 2016-06-17T19:12:39Z

Awesome, thank you for putting all the time into the change and also for the excellent test coverage. I'll start the review with some style-related stuff just to keep consistency with the rest of the project and then I'll dig in after to see if I have any comments on the overall design. I gave a cursory look at it yesterday and it looks pretty great.

jeremydw · 2016-06-17T19:13:12Z

grow/pods/catalog_holder.py

+
+        # Extract from root of /content/:
+        for path in self.pod.list_dir('/content/', recursive=False):
+            if path.endswith('.yaml') or path.endswith('.yml'):


endswith accepts tuples, so you can do path.endswith(('.yaml', '.yml'))

This I didn’t know! Will update.

jeremydw · 2016-06-17T19:19:01Z

In response to this comment, "No longer extracts tagged terms in ‘global’ locations", does that behavior only kick in when --localized is used with grow extract? In other words, are there no changes if --localized is not specified?

stucox · 2016-06-17T20:37:57Z

In response to this comment, "No longer extracts tagged terms in ‘global’ locations", does that behavior only kick in when --localized is used with grow extract? In other words, are there no changes if --localized is not specified?

Well, sorta N/A: grow extract (without --localized) always ignored the ‘locale context’* of tagged strings and just stuck anything in /content/, /views/ and /data/ marked for translation in messages.pot — so the only change there is /data/ is now ignored. Maybe this could do with some more test coverage.

* I like this term. I might use it more.

With --localized, terms in the ‘global locale context’ (i.e. somewhere in /content/ or /views/ which isn’t in a doc or collection with a locales declaration) are still extracted, but only for locales defined in the podspec, not for locales which might be declared for some other document/collection (which they currently are).

Asid: maybe with --localized we should spit out an error if no locale declarations are found anywhere?

jeremydw · 2016-06-17T20:44:42Z

Makes sense, thanks for explaining. The term "locale context" sounds good. ;)

I'm OK with the breaking change of ignoring /data/ as we're aligned in moving forward with simply putting all content (partials, data files, etc.) in the /content/ directory anyway. /data/ was special and undocumented so I'm OK breaking any behavior there.

ALSO – This should definitely be a separate feature/PR but I'd also like to add a new configuration option to localization in podspec.yaml where the user can specify the defaults for the extract command.

localization:
  locales: [...]
  default_locale: [...]
  extract:
    localized: true  # Defaults `--localized` when `grow extract` is run.

That would simplify translation management across multiple developers and help with any continuous translation initiative we work on.

stucox · 2016-06-17T20:46:32Z

You can still put stuff in /data/ — but if you want it extracted for localization, you need to !g.yaml() / !g.csv() it into a document in /content/. That could probably do with test coverage too.

stucox · 2016-06-17T20:47:48Z

Agree that config option would be nice — but low priority IMO. Kinda like teams to know what the commands they’re running are doing.

jeremydw · 2016-06-17T20:51:27Z

Ha, well, if we provide some built-in optional commit hooks (such as a pre-commit grow extract followed by a diff/report and a post-commit grow stage) it'll be important for the commands to be self-configured.

stucox · 2016-06-17T20:58:21Z

👍 We’ve played with an “feature finished” script too, which runs linting, grow extract --localized (and complains if your translations files are out of date), grow stage, pushes to our code review tool & sticks changes a ticket status in our bug tracker. So would be helpful for that too.

(we decided pre/post-commit was a bit much)

jeremydw · 2016-06-18T23:56:08Z

Just a note: I ran this against against a project I work on and everything works as intended! I noticed this exposes the reason behind that use_podspec_locales identifier we chatted about on email.

As now intended, in absence of that identifier, and in absence of any $localization config, grow extract --localized no longer extracts these partials. Here's an example file that I'm working with. Note that it's a partial/includable content file and therefore doesn't have a URL mapped to it.

---
foo: bar
---
$locale: de
foo: baz

Also, there is no $localization in the document or in its blueprint.

I think the change in behavior introduced by this PR is correct -- to not extract this content at all. To get the previous behavior, you'd want the equivalent of use_podspec_locales but that's wordy and just weird, so...

After merging this PR in, I'm thinking about making a change to the default behavior of these partial files that do not have a $localization key and want your thoughts, here are my proposals:

Accept $localization: ~ (null character) as an indicator that the file should be localized. This would be the equivalent of use_podspec_locales, it just leverages consistent syntax. Since this file is a partial and does not have a path mapped to it, it wouldn't need $localization: path: ... and therefore can just accept $localization: ~ as the "I am localized too" flag.
In lieu of a $localization key, if the user has specified a sub-document in the file with $locale: foo, then that should automatically imply $localization: ~. Note that this would not change the current behavior of inheriting the podspec locales, it would just automatically indicate that the doc is localized.

Therefore, in order to get the previous behavior back and have this file extracted too, I would have to do nothing (since $locale: de would imply $localization) -- or, if the partial didn't have $locale: de, I would simply have to add $localization: ~ to indicate that I want the file to be localized and extracted to all locales.

Thoughts on that?

stucox · 2016-06-19T06:48:55Z

I'm pretty sure with the new implementation strings in that file will still be extracted — but it looks like they're not tagged for translation?

If it were this:

---
foo@: bar
---
$locale: de
foo@: baz

I'd expect bar to be extracted for podspec locales and baz to be extracted for de only.

stucox · 2016-06-19T07:12:29Z

grow/pods/catalogs.py

        super(Catalog, self).update(
            catalog_to_merge, no_fuzzy_matching=(not use_fuzzy_matching))
+        # Don't use gettext's obsolete functionality as it polutes files: merge


@jeremydw Meant to ask — do you agree with this? Looks like the current implementation doesn't use gettext's functionality for obsolete strings, so I avoided it too: if you include_obsolete it'll just include obsolete terms as if they weren't obsolete. Seemed safest, in case obsolete syntax isn't widely supported by translation tools (dunno if this is the case, just a guess it might not be!)

Oh, maybe. I see what you've done, but I don't know how the default functionality pollutes files other than leaving the terms in. What does it do exactly that's bad? I think obsolete terms in PO files are just comments preceded with #~. IMO if we could support that default obsolete behavior/syntax (when --include-obsolete is used) that'd be good? Feel free to make that outside the scope of this PR though.

Raised #232 to discuss. Will leave as-is on this PR.

jeremydw · 2016-06-19T07:16:04Z

Bad explanation on my part, the strings are indeed tagged, e.g. foo@: bar.

You can take a peek but the reason they're not extracted is because without the $localization key Grow thinks the document isn't localized (believing it to just be of the default locale) and thus doc_locales (L208 on catalog_holder.py) contains just en_US.

This logging statement shows up when running grow extract --localized: Extracting: /content/includes/footer.yaml (0 locale)

As I mentioned I think this is correct behavior as-is (aside from the logging statement). The presence of $localization: ~ or just a change to make $localization inferred if a document has $locale: [...] specified inside a part would fix the behavior.

jeremydw · 2016-06-19T07:24:58Z

grow/pods/catalogs.py

+        self.save(include_header=include_header)
+
+    def merge_obsolete(self):
+        """ Copy obsolete terms into the main catalog """


Just a tiny style nit, but docstrings should end in punctuation and not be padded with space characters. ;) This isn't a blocker but I just wanted to raise this for consistency for future code you write!

stucox · 2016-06-19T08:18:55Z

Yep that's a bug then — my intention was for $localization to be inferred if omitted. Will add a test for this case and fix it.

…rts do

This wasn't a problem before adding validation for doc parts in c8bb8ab.

jeremydw · 2016-06-20T20:44:52Z

Thanks for adding those extra tests, now it looks like they're (correctly) failing and that matches up with the behavior that I reported when using grow extract --localized against my project.

Other than that I think we're good with all these changes. If you're happy with this I think we can merge it in as soon as the fix to infer $localization is added.

I think the change for that should be made here (https://github.com/stucox/pygrow/blob/c8bb8ab51de401459816db141bbb40ae5da17a8f/grow/pods/documents.py#L260).

With some pseudo code like:

If $localization is not in fields, check whether document is split.
To check whether a document is split, we can probably update _handle_pairs_of_parts_and_bodies in https://github.com/stucox/pygrow/blob/c8bb8ab51de401459816db141bbb40ae5da17a8f/grow/pods/formats.py#L124 to set some property on the Format instance (such as has_localized_parts).
I think that if $localization: locales: [...] is not set on the base part, the list of locales should still be inherited from the collection's locales.

That'd just be one way to do it. Am open to another way or a simpler way of inferring whether a document is localized. LMK if you have any questions or want to pass this over to me for merging.

stucox · 2016-06-20T22:02:34Z

Yep I'll fix those cases tomorrow. Thanks for the pseudocode.

jeremydw · 2016-08-01T17:37:26Z

Merged by #239

stucox added 9 commits June 16, 2016 23:52

Disallow document parts without locales declared. Fixes grow#206

c8bb8ab

Remove unused load_yaml_all util

678d336

Add test pods dedicated to testing many different contexts for locali…

b4ce725

…zable strings & update helpers to allow these pods to be used easily

Add a recursive option to storage.listdir() and pod.list_dir()

d2d7caa

True by default, as that's the current & most common behaviour

Add extensive test class to check --localized extracts strings into…

8ad9302

… the correct locale catalogs

Refactor Catalogs.extract() to match behaviour specified by localiz…

ad91e83

…ed tests

Improve logging

347372e

Correct syntax for getting collection's locales

1c6d227

jeremydw reviewed Jun 17, 2016
View reviewed changes

stucox added 2 commits June 17, 2016 21:06

Simplify .endswith() syntax

b45bcd1

Rename pod_dir to root

10541e4

stucox reviewed Jun 19, 2016
View reviewed changes

jeremydw reviewed Jun 19, 2016
View reviewed changes

stucox mentioned this pull request Jun 19, 2016

Use gettext built-in syntax for obsolete translations? #232

Open

Correct docstring style to match PEP 257

9adc42a

stucox added 2 commits June 20, 2016 11:46

Add tests for case where doc base part doesn't declare locales but pa…

861e629

…rts do

Don't treat trailing dashes as a new part

f13e71c

This wasn't a problem before adding validation for doc parts in c8bb8ab.

stucox force-pushed the feature/204-grow-extract branch from d3b4b44 to f13e71c Compare June 20, 2016 10:46

jeremydw added a commit that referenced this pull request Jul 4, 2016

Add recursive arg to listdir ported from #231.

26565e4

jeremydw closed this Aug 1, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor `Catalogs.extract()` to implement more consistent behaviour #231

Refactor `Catalogs.extract()` to implement more consistent behaviour #231

stucox commented Jun 16, 2016

jeremydw commented Jun 17, 2016

jeremydw Jun 17, 2016

stucox Jun 17, 2016

jeremydw commented Jun 17, 2016

stucox commented Jun 17, 2016 •

edited

jeremydw commented Jun 17, 2016

stucox commented Jun 17, 2016 •

edited

stucox commented Jun 17, 2016

jeremydw commented Jun 17, 2016

stucox commented Jun 17, 2016

jeremydw commented Jun 18, 2016

stucox commented Jun 19, 2016

stucox Jun 19, 2016

jeremydw Jun 19, 2016

stucox Jun 19, 2016

jeremydw commented Jun 19, 2016

jeremydw Jun 19, 2016

stucox Jun 19, 2016

stucox commented Jun 19, 2016 •

edited

jeremydw commented Jun 20, 2016

stucox commented Jun 20, 2016

jeremydw commented Aug 1, 2016

Refactor Catalogs.extract() to implement more consistent behaviour #231

Refactor Catalogs.extract() to implement more consistent behaviour #231

Conversation

stucox commented Jun 16, 2016

jeremydw commented Jun 17, 2016

jeremydw Jun 17, 2016

Choose a reason for hiding this comment

stucox Jun 17, 2016

Choose a reason for hiding this comment

jeremydw commented Jun 17, 2016

stucox commented Jun 17, 2016 • edited

jeremydw commented Jun 17, 2016

stucox commented Jun 17, 2016 • edited

stucox commented Jun 17, 2016

jeremydw commented Jun 17, 2016

stucox commented Jun 17, 2016

jeremydw commented Jun 18, 2016

stucox commented Jun 19, 2016

stucox Jun 19, 2016

Choose a reason for hiding this comment

jeremydw Jun 19, 2016

Choose a reason for hiding this comment

stucox Jun 19, 2016

Choose a reason for hiding this comment

jeremydw commented Jun 19, 2016

jeremydw Jun 19, 2016

Choose a reason for hiding this comment

stucox Jun 19, 2016

Choose a reason for hiding this comment

stucox commented Jun 19, 2016 • edited

jeremydw commented Jun 20, 2016

stucox commented Jun 20, 2016

jeremydw commented Aug 1, 2016

Refactor `Catalogs.extract()` to implement more consistent behaviour #231

Refactor `Catalogs.extract()` to implement more consistent behaviour #231

stucox commented Jun 17, 2016 •

edited

stucox commented Jun 17, 2016 •

edited

stucox commented Jun 19, 2016 •

edited