Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Catalogs.extract() to implement more consistent behaviour #231

Closed
wants to merge 14 commits into from

Conversation

stucox
Copy link
Collaborator

@stucox stucox commented Jun 16, 2016

Fixes #204, #157 & #206.

Pretty much by definition this is a breaking change — see #204 and the aim & the included tests for details of cases I’ve covered. Not sure the best way to communicate this to users.

Note that in this version /data/ isn’t treated as a special directory anymore.

I’ve tested this on a very large project I work on and it does what I’d expect:

  • Removes unnecessary translations (because of Grow extract extracts strings for non-specified locales #157)
  • No longer extracts tagged terms in ‘global’ locations (i.e. where no document part, document or collection locale can be determined) for all locales defined in the pod: just for those locales in podspec.yaml

This last point caused some work for us: we had a “globals.yaml” file which assumed everything was relevant to every locale (which was wrong: not every locale used those) and hence didn’t declare any locales… which worked until now. Solved by either listing all relevant locales in that file, or listing them all in podspec.yaml (the list there only contained primary locales, which might be better defined at collection level).

Note: Would appreciate if people could test on some other projects.

I’ll try a few others of ours while @jeremydw is checking out the code.

…zable strings & update helpers to allow these pods to be used easily
True by default, as that's the current & most common behaviour
…talog.save()` with `include_obsolete` arg for `Catalog.update()` and `Catalog.update_using_catalog()`

This is more in line with `babel.Catalog` behaviour, but better suits Grow needs of wanting to include obsolete terms _without marking them obsolete_ (which is what Babel would do).

`Catalogs.extract()` can then lean on this behaviour and hence be simpler.
@jeremydw
Copy link
Member

Awesome, thank you for putting all the time into the change and also for the excellent test coverage. I'll start the review with some style-related stuff just to keep consistency with the rest of the project and then I'll dig in after to see if I have any comments on the overall design. I gave a cursory look at it yesterday and it looks pretty great.


# Extract from root of /content/:
for path in self.pod.list_dir('/content/', recursive=False):
if path.endswith('.yaml') or path.endswith('.yml'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

endswith accepts tuples, so you can do path.endswith(('.yaml', '.yml'))

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This I didn’t know! Will update.

@jeremydw
Copy link
Member

In response to this comment, "No longer extracts tagged terms in ‘global’ locations", does that behavior only kick in when --localized is used with grow extract? In other words, are there no changes if --localized is not specified?

@stucox
Copy link
Collaborator Author

stucox commented Jun 17, 2016

In response to this comment, "No longer extracts tagged terms in ‘global’ locations", does that behavior only kick in when --localized is used with grow extract? In other words, are there no changes if --localized is not specified?

Well, sorta N/A: grow extract (without --localized) always ignored the ‘locale context’* of tagged strings and just stuck anything in /content/, /views/ and /data/ marked for translation in messages.pot — so the only change there is /data/ is now ignored. Maybe this could do with some more test coverage.

* I like this term. I might use it more.

With --localized, terms in the ‘global locale context’ (i.e. somewhere in /content/ or /views/ which isn’t in a doc or collection with a locales declaration) are still extracted, but only for locales defined in the podspec, not for locales which might be declared for some other document/collection (which they currently are).

Asid: maybe with --localized we should spit out an error if no locale declarations are found anywhere?

@jeremydw
Copy link
Member

Makes sense, thanks for explaining. The term "locale context" sounds good. ;)

I'm OK with the breaking change of ignoring /data/ as we're aligned in moving forward with simply putting all content (partials, data files, etc.) in the /content/ directory anyway. /data/ was special and undocumented so I'm OK breaking any behavior there.

ALSO – This should definitely be a separate feature/PR but I'd also like to add a new configuration option to localization in podspec.yaml where the user can specify the defaults for the extract command.

localization:
  locales: [...]
  default_locale: [...]
  extract:
    localized: true  # Defaults `--localized` when `grow extract` is run.

That would simplify translation management across multiple developers and help with any continuous translation initiative we work on.

@stucox
Copy link
Collaborator Author

stucox commented Jun 17, 2016

You can still put stuff in /data/ — but if you want it extracted for localization, you need to !g.yaml() / !g.csv() it into a document in /content/. That could probably do with test coverage too.

@stucox
Copy link
Collaborator Author

stucox commented Jun 17, 2016

Agree that config option would be nice — but low priority IMO. Kinda like teams to know what the commands they’re running are doing.

@jeremydw
Copy link
Member

Ha, well, if we provide some built-in optional commit hooks (such as a pre-commit grow extract followed by a diff/report and a post-commit grow stage) it'll be important for the commands to be self-configured.

@stucox
Copy link
Collaborator Author

stucox commented Jun 17, 2016

👍 We’ve played with an “feature finished” script too, which runs linting, grow extract --localized (and complains if your translations files are out of date), grow stage, pushes to our code review tool & sticks changes a ticket status in our bug tracker. So would be helpful for that too.

(we decided pre/post-commit was a bit much)

@jeremydw
Copy link
Member

Just a note: I ran this against against a project I work on and everything works as intended! I noticed this exposes the reason behind that use_podspec_locales identifier we chatted about on email.

As now intended, in absence of that identifier, and in absence of any $localization config, grow extract --localized no longer extracts these partials. Here's an example file that I'm working with. Note that it's a partial/includable content file and therefore doesn't have a URL mapped to it.

---
foo: bar
---
$locale: de
foo: baz

Also, there is no $localization in the document or in its blueprint.

I think the change in behavior introduced by this PR is correct -- to not extract this content at all. To get the previous behavior, you'd want the equivalent of use_podspec_locales but that's wordy and just weird, so...

After merging this PR in, I'm thinking about making a change to the default behavior of these partial files that do not have a $localization key and want your thoughts, here are my proposals:

  1. Accept $localization: ~ (null character) as an indicator that the file should be localized. This would be the equivalent of use_podspec_locales, it just leverages consistent syntax. Since this file is a partial and does not have a path mapped to it, it wouldn't need $localization: path: ... and therefore can just accept $localization: ~ as the "I am localized too" flag.
  2. In lieu of a $localization key, if the user has specified a sub-document in the file with $locale: foo, then that should automatically imply $localization: ~. Note that this would not change the current behavior of inheriting the podspec locales, it would just automatically indicate that the doc is localized.

Therefore, in order to get the previous behavior back and have this file extracted too, I would have to do nothing (since $locale: de would imply $localization) -- or, if the partial didn't have $locale: de, I would simply have to add $localization: ~ to indicate that I want the file to be localized and extracted to all locales.

Thoughts on that?

@stucox
Copy link
Collaborator Author

stucox commented Jun 19, 2016

I'm pretty sure with the new implementation strings in that file will still be extracted — but it looks like they're not tagged for translation?

If it were this:

---
foo@: bar
---
$locale: de
foo@: baz

I'd expect bar to be extracted for podspec locales and baz to be extracted for de only.

super(Catalog, self).update(
catalog_to_merge, no_fuzzy_matching=(not use_fuzzy_matching))
# Don't use gettext's obsolete functionality as it polutes files: merge
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeremydw Meant to ask — do you agree with this? Looks like the current implementation doesn't use gettext's functionality for obsolete strings, so I avoided it too: if you include_obsolete it'll just include obsolete terms as if they weren't obsolete. Seemed safest, in case obsolete syntax isn't widely supported by translation tools (dunno if this is the case, just a guess it might not be!)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, maybe. I see what you've done, but I don't know how the default functionality pollutes files other than leaving the terms in. What does it do exactly that's bad? I think obsolete terms in PO files are just comments preceded with #~. IMO if we could support that default obsolete behavior/syntax (when --include-obsolete is used) that'd be good? Feel free to make that outside the scope of this PR though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raised #232 to discuss. Will leave as-is on this PR.

@jeremydw
Copy link
Member

Bad explanation on my part, the strings are indeed tagged, e.g. foo@: bar.

You can take a peek but the reason they're not extracted is because without the $localization key Grow thinks the document isn't localized (believing it to just be of the default locale) and thus doc_locales (L208 on catalog_holder.py) contains just en_US.

This logging statement shows up when running grow extract --localized: Extracting: /content/includes/footer.yaml (0 locale)

As I mentioned I think this is correct behavior as-is (aside from the logging statement). The presence of $localization: ~ or just a change to make $localization inferred if a document has $locale: [...] specified inside a part would fix the behavior.

self.save(include_header=include_header)

def merge_obsolete(self):
""" Copy obsolete terms into the main catalog """
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a tiny style nit, but docstrings should end in punctuation and not be padded with space characters. ;) This isn't a blocker but I just wanted to raise this for consistency for future code you write!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK

@stucox
Copy link
Collaborator Author

stucox commented Jun 19, 2016

Yep that's a bug then — my intention was for $localization to be inferred if omitted. Will add a test for this case and fix it.

@jeremydw
Copy link
Member

Thanks for adding those extra tests, now it looks like they're (correctly) failing and that matches up with the behavior that I reported when using grow extract --localized against my project.

Other than that I think we're good with all these changes. If you're happy with this I think we can merge it in as soon as the fix to infer $localization is added.

I think the change for that should be made here (https://github.com/stucox/pygrow/blob/c8bb8ab51de401459816db141bbb40ae5da17a8f/grow/pods/documents.py#L260).

With some pseudo code like:

That'd just be one way to do it. Am open to another way or a simpler way of inferring whether a document is localized. LMK if you have any questions or want to pass this over to me for merging.

@stucox
Copy link
Collaborator Author

stucox commented Jun 20, 2016

Yep I'll fix those cases tomorrow. Thanks for the pseudocode.

jeremydw added a commit that referenced this pull request Jul 4, 2016
@jeremydw
Copy link
Member

jeremydw commented Aug 1, 2016

Merged by #239

@jeremydw jeremydw closed this Aug 1, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants