Scrooge codegen improvements #4177

Merged
merged 2 commits into from Jan 14, 2017

Conversation

Projects
None yet
2 participants
@kevinoliver
Contributor

kevinoliver commented Jan 12, 2017

Problem

Scrooge codegen does not do fingerprinting for enough facets and is
missing support for include_path and default-java-namespace.

Solution

Add support for include-path and default-java-namespace.

Fix the fingerprinting to include namespace_map, include_path and
default_java_namespace.

Result

Scrooge support is now good enough for to be used for development
of Scrooge itself.

+ help='The default Java namespace to generate for java_thrift_library targets.')
+ register('--namespace-map', type=dict, advanced=True, default={},
+ help='The default namespace map to generate for java_thrift_library targets, {old: new}.')
+ register('--include-paths', type=list, advanced=True, default=[],

This comment has been minimized.

@stuhood

stuhood Jan 12, 2017

Member

Can you add a bit more information about how this would be used? We would generally prefer that people explicitly declare the dependencies of a target, which provides enough information to infer the include paths automatically.

So on the surface, I don't think this option makes sense: it would bypass change detection and allow a target to depend on an entire other subdirectory without declaring it as a dependency. In essence, it would exacerbate the internal ticket we have about this: DPB-5639, which states:

Currently, scrooge gen adds the source root of targets to the import paths of the global scrooge gen. This means that a target that depends on a thrift file [under a particular source root] can include any thrift [under that source root].

When there is an include in a thrift file whose target doesn't have matching dependency listings, incompatible changes to the included file can pass SQ, but will cause subsequent builds to fail.

@stuhood

stuhood Jan 12, 2017

Member

Can you add a bit more information about how this would be used? We would generally prefer that people explicitly declare the dependencies of a target, which provides enough information to infer the include paths automatically.

So on the surface, I don't think this option makes sense: it would bypass change detection and allow a target to depend on an entire other subdirectory without declaring it as a dependency. In essence, it would exacerbate the internal ticket we have about this: DPB-5639, which states:

Currently, scrooge gen adds the source root of targets to the import paths of the global scrooge gen. This means that a target that depends on a thrift file [under a particular source root] can include any thrift [under that source root].

When there is an include in a thrift file whose target doesn't have matching dependency listings, incompatible changes to the included file can pass SQ, but will cause subsequent builds to fail.

This comment has been minimized.

@kevinoliver

kevinoliver Jan 12, 2017

Contributor

@stuhood what do you think about omitting these from defaults entirely? i just added these as cargo-cult.

@kevinoliver

kevinoliver Jan 12, 2017

Contributor

@stuhood what do you think about omitting these from defaults entirely? i just added these as cargo-cult.

@kevinoliver

This comment has been minimized.

Show comment
Hide comment
@kevinoliver

kevinoliver Jan 12, 2017

Contributor

reviewers: apologies in advance for my lack of python and pants knowledge.

also note that i tested my changes on a earlier commit and it looks like some of the codegen code got moved around since then. that means my confidence level is not super high given the levels of test coverage i experienced.

Contributor

kevinoliver commented Jan 12, 2017

reviewers: apologies in advance for my lack of python and pants knowledge.

also note that i tested my changes on a earlier commit and it looks like some of the codegen code got moved around since then. that means my confidence level is not super high given the levels of test coverage i experienced.

@stuhood

This comment has been minimized.

Show comment
Hide comment
@stuhood

stuhood Jan 12, 2017

Member

@kevinoliver : It should be relatively easy to add an integration test for this by cloning https://github.com/pantsbuild/pants/blob/master/contrib/scrooge/tests/python/pants_test/contrib/scrooge/tasks/test_thrift_linter_integration.py into a new file test_scrooge_integration.py, and then invoking the compile or gen task there.

There is also an @ensure_cached annotation that you can add to your test (example) to validate that some number of distinct artifacts are written to the cache.

Member

stuhood commented Jan 12, 2017

@kevinoliver : It should be relatively easy to add an integration test for this by cloning https://github.com/pantsbuild/pants/blob/master/contrib/scrooge/tests/python/pants_test/contrib/scrooge/tasks/test_thrift_linter_integration.py into a new file test_scrooge_integration.py, and then invoking the compile or gen task there.

There is also an @ensure_cached annotation that you can add to your test (example) to validate that some number of distinct artifacts are written to the cache.

Problem
Scrooge codegen does not do fingerprinting for enough facets and is
missing support for include_path and default-java-namespace.

Solution

Add support for include-path and default-java-namespace.

Fix the fingerprinting to include namespace_map, include_path and
default_java_namespace.

Result

Scrooge support is now good enough for to be used for development
of Scrooge itself.
@kevinoliver

This comment has been minimized.

Show comment
Hide comment
@kevinoliver

kevinoliver Jan 12, 2017

Contributor

Thanks for the review. I've pushed a new version:

  • Removed the defaults for include_paths
  • Added integration tests for scrooge_gen

I tried, unsuccessfully, to add the caching test. Hopefully that isn't a blocker.

Let me know if there is anything else.

Contributor

kevinoliver commented Jan 12, 2017

Thanks for the review. I've pushed a new version:

  • Removed the defaults for include_paths
  • Added integration tests for scrooge_gen

I tried, unsuccessfully, to add the caching test. Hopefully that isn't a blocker.

Let me know if there is anything else.

@stuhood

Thanks Kevin!

+ for include_path in partial_cmd.include_paths:
+ args.extend(['--include-path', include_path])
+
+ #raise TaskError('AHA! args are now: {}'.format(str(args)))

This comment has been minimized.

@stuhood

stuhood Jan 13, 2017

Member

xx

+ namespace is not explicitly specified in the IDL. The default is defined in the global
+ options under ``--thrift-default-default-java-namespace``.
+ :param include_paths: The include_path(s) used to search for included IDL files. The default
+ is defined in the global options under ``--thrift-default-include-paths``.

This comment has been minimized.

@stuhood

stuhood Jan 13, 2017

Member

IMO, let's not document this one. And if I can convince you to remove this parameter entirely, that would be preferable.

I understand that you want to use it for testing, but AFAICT, it should be possible to define dependencies that result in any interesting set of include-paths. As you saw above, include-paths are already passed to scrooge, and I imagine that if you stare at how they are generated for a few minutes, it will be clear how to cause other paths to be searched.

@stuhood

stuhood Jan 13, 2017

Member

IMO, let's not document this one. And if I can convince you to remove this parameter entirely, that would be preferable.

I understand that you want to use it for testing, but AFAICT, it should be possible to define dependencies that result in any interesting set of include-paths. As you saw above, include-paths are already passed to scrooge, and I imagine that if you stare at how they are generated for a few minutes, it will be clear how to cause other paths to be searched.

This comment has been minimized.

@kevinoliver

kevinoliver Jan 13, 2017

Contributor

gotcha. let me take a crack and see if it can be removed.

@kevinoliver

kevinoliver Jan 13, 2017

Contributor

gotcha. let me take a crack and see if it can be removed.

@kevinoliver

This comment has been minimized.

Show comment
Hide comment
@kevinoliver

kevinoliver Jan 13, 2017

Contributor

@stuhood thanks for the feedback. I've pushed a change but was unable to successfully remove the include_paths option as it would require moving thrift files to new source roots in scrooge.

Contributor

kevinoliver commented Jan 13, 2017

@stuhood thanks for the feedback. I've pushed a change but was unable to successfully remove the include_paths option as it would require moving thrift files to new source roots in scrooge.

@stuhood stuhood merged commit 5f424ce into pantsbuild:master Jan 14, 2017

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

peiyuwang pushed a commit to twitter/pants that referenced this pull request Feb 16, 2017

Peiyu Wang
Options fingerprinting as provided by `TaskIdentityFingerprintStrateg…
…y` has proven to be extremely useful. But since passing a `fingerprint_strategy` to `Task.invalidated` acts as an _override_ of the default strategy (rather than being additive), it is currently easy to accidentally disable options fingerprinting (as happened in #4177 when a `Task` began passing a strategy).

When options fingerprinting was initially introduced, we were less certain about its efficacy and risks, which likely led to the decision to make it very easy to disable fingerprinting Task options. But in hindsight, options fingerprinting is the right choice in all cases we can think of.

Make options fingerprinting very difficult to disable by pushing it down to the `CacheKeyGenerator`, which is initialized once per `Task`. Deprecate `TaskIdentityFingerprintStrategy`, as it is now redundant.

Options that are marked `fingerprint=True` will always be included in a `Task`'s fingerprint.

Squashed commit of the following:

commit 7abf3d1
Author: Stu Hood <stuhood@gmail.com>
Date:   Thu Feb 16 13:16:45 2017 -0800

    Make cache_key_generator creation lazy.

commit 1d0464d
Author: Stu Hood <stuhood@gmail.com>
Date:   Thu Feb 16 12:52:51 2017 -0800

    Fix ivy task tests.

commit 43c7eef
Author: Stu Hood <stuhood@gmail.com>
Date:   Wed Feb 15 18:16:14 2017 -0800

    Oops

commit 02ec73c
Author: Stu Hood <stuhood@gmail.com>
Date:   Wed Feb 15 17:57:34 2017 -0800

    Remove internal usages of TaskIdentityFingerprintStrategy.

commit ce27d46
Author: Stu Hood <stuhood@gmail.com>
Date:   Wed Feb 15 13:24:39 2017 -0800

    Make options fingerprinting very difficult to disable by pushing it down to the CacheKeyGenerator

peiyuwang pushed a commit to twitter/pants that referenced this pull request Feb 17, 2017

Peiyu Wang
Options fingerprinting as provided by `TaskIdentityFingerprintStrateg…
…y` has proven to be extremely useful. But since passing a `fingerprint_strategy` to `Task.invalidated` acts as an _override_ of the default strategy (rather than being additive), it is currently easy to accidentally disable options fingerprinting (as happened in #4177 when a `Task` began passing a strategy).

When options fingerprinting was initially introduced, we were less certain about its efficacy and risks, which likely led to the decision to make it very easy to disable fingerprinting Task options. But in hindsight, options fingerprinting is the right choice in all cases we can think of.

Make options fingerprinting very difficult to disable by pushing it down to the `CacheKeyGenerator`, which is initialized once per `Task`. Deprecate `TaskIdentityFingerprintStrategy`, as it is now redundant.

Options that are marked `fingerprint=True` will always be included in a `Task`'s fingerprint.

Squashed commit of the following:

commit 7abf3d1
Author: Stu Hood <stuhood@gmail.com>
Date:   Thu Feb 16 13:16:45 2017 -0800

    Make cache_key_generator creation lazy.

commit 1d0464d
Author: Stu Hood <stuhood@gmail.com>
Date:   Thu Feb 16 12:52:51 2017 -0800

    Fix ivy task tests.

commit 43c7eef
Author: Stu Hood <stuhood@gmail.com>
Date:   Wed Feb 15 18:16:14 2017 -0800

    Oops

commit 02ec73c
Author: Stu Hood <stuhood@gmail.com>
Date:   Wed Feb 15 17:57:34 2017 -0800

    Remove internal usages of TaskIdentityFingerprintStrategy.

commit ce27d46
Author: Stu Hood <stuhood@gmail.com>
Date:   Wed Feb 15 13:24:39 2017 -0800

    Make options fingerprinting very difficult to disable by pushing it down to the CacheKeyGenerator

stuhood added a commit that referenced this pull request Feb 21, 2017

Make options fingerprinting very difficult to disable (#4262)
### Problem

Options fingerprinting as provided by `TaskIdentityFingerprintStrategy` has proven to be extremely useful. But since passing a `fingerprint_strategy` to `Task.invalidated` acts as an _override_ of the default strategy (rather than being additive), it is currently easy to accidentally disable options fingerprinting (as happened in #4177 when a `Task` began passing a strategy).

When options fingerprinting was initially introduced, we were less certain about its efficacy and risks, which likely led to the decision to make it very easy to disable fingerprinting Task options. But in hindsight, options fingerprinting is the right choice in all cases we can think of.

### Solution

Make options fingerprinting very difficult to disable by pushing it down to the `CacheKeyGenerator`, which is initialized once per `Task`. Deprecate `TaskIdentityFingerprintStrategy`, as it is now redundant.

### Result

Options that are marked `fingerprint=True` will always be included in a `Task`'s fingerprint.

lenucksi added a commit to lenucksi/pants that referenced this pull request Apr 25, 2017

Scrooge codegen improvements (#4177)
### Problem

Scrooge codegen does not do fingerprinting for enough facets and is
missing support for include_path and default-java-namespace.

### Solution

Add support for include-path and default-java-namespace.

Fix the fingerprinting to include namespace_map, include_path and
default_java_namespace.

### Result

Scrooge support is now good enough for to be used for development
of Scrooge itself.

lenucksi added a commit to lenucksi/pants that referenced this pull request Apr 25, 2017

Make options fingerprinting very difficult to disable (#4262)
### Problem

Options fingerprinting as provided by `TaskIdentityFingerprintStrategy` has proven to be extremely useful. But since passing a `fingerprint_strategy` to `Task.invalidated` acts as an _override_ of the default strategy (rather than being additive), it is currently easy to accidentally disable options fingerprinting (as happened in #4177 when a `Task` began passing a strategy).

When options fingerprinting was initially introduced, we were less certain about its efficacy and risks, which likely led to the decision to make it very easy to disable fingerprinting Task options. But in hindsight, options fingerprinting is the right choice in all cases we can think of.

### Solution

Make options fingerprinting very difficult to disable by pushing it down to the `CacheKeyGenerator`, which is initialized once per `Task`. Deprecate `TaskIdentityFingerprintStrategy`, as it is now redundant.

### Result

Options that are marked `fingerprint=True` will always be included in a `Task`'s fingerprint.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment