-
-
Notifications
You must be signed in to change notification settings - Fork 605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add S3 Cache to pants #4589
Add S3 Cache to pants #4589
Conversation
Thanks Todd! Not sure why Travis isn't kicking off... perhaps because this is your first PR. Will review tomorrow! |
# Delegate storage and extraction to local cache | ||
try: | ||
return self._localcache.store_and_use_artifact( | ||
cache_key, iter_content(get_result['Body']), results_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to close get_result['Body']
, otherwise python holds onto the file contents in memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ha, that's your fabled memory leak. Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify the help string for the --resolver
option in the context of this new capability? For example, indicating that the resolver is only relevant to http(s)
URLs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
I'd like a second opinion on the risk of adding boto as a dependency, but... seems low?
Changed resolver flag, waiting on a response from travis support. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am super excited for this. I spoke with AWS about Pants support last week and we discussed how we would probably want to write this interface someday.
Code looks good too. I have some suggestions and some nits. The biggest effort ask is the switch from pytest to unittest. If Stu doesn't mind pytest creeping back in, I will not block on it.
Thanks Todd!
@@ -1,5 +1,6 @@ | |||
ansicolors==1.0.2 | |||
beautifulsoup4>=4.3.2,<4.4 | |||
boto3==1.4.4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
boto is for sure the way to go - not a concern for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no problem with this dep.
@@ -84,6 +85,12 @@ def register_options(cls, register): | |||
help='number of times pinger tries a cache') | |||
register('--write-permissions', advanced=True, type=str, default=None, | |||
help='Permissions to use when writing artifacts to a local cache, in octal.') | |||
register('--s3-credential-file', advanced=True, type=str, | |||
default=os.path.expanduser('~/.pants/.s3credentials'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a pants_configdir option that this should use instead.
(Pdb) self.get_options().pants_configdir
'/Users/mateo/.config/pants'
@@ -0,0 +1,178 @@ | |||
# coding=utf-8 | |||
# Copyright 2014 Pants project contributors (see CONTRIBUTORS.md). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: copyright year
exceptions.EndpointConnectionError, exceptions.ChecksumError | ||
] | ||
|
||
def _connect_to_s3(config_file, profile_name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote something similar, android_config_util.py
, as an intern - at some point there should be a general interface for non-Pants config reads for secrets like these.
Not a problem for this review, though
s3_object = self._get_object(cache_key) | ||
try: | ||
get_result = s3_object.get() | ||
except Exception as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Be nice to catch a tighter exception here, but I think staying in sync with what the local_artifact_cache does was probably the right choice.
def _get_object(self, cache_key): | ||
return self._s3.Object(self._bucket, self._path_for_key(cache_key)) | ||
|
||
def _path_for_key(self, cache_key): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you move _cache_file_for_key
up to the base class instead please?
@@ -0,0 +1,198 @@ | |||
# coding=utf-8 | |||
# Copyright 2014 Pants project contributors (see CONTRIBUTORS.md). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: year again
from contextlib import contextmanager | ||
|
||
import boto3 | ||
import pytest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be best to not import pytest here and use unittest like the rest of the codebase.
There was some effort put into ripping pytest out and I don't see any remaining usage in the Pants tests tree
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, didn't realize this was an intentional choice in pants. I'm fine with rewriting it (and doing the rest of these changes); will take a stab after work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am excited about this contribution, glad you (re)joined the Pants community :)
Feel free to ping me if you need a committer/reviewer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was an intentional choice, although not necessarily the right one. Pytest certainly encourages you not to, and there are some features unavailable to class-based tests. Nonetheless, fixtures are confusing, and xtest-style tests are well understood by most, so...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an awesome change, that will increase Pants's appeal (assuming latency to S3 is acceptable in a build environment). Thanks for this! Just a few comments inline.
@@ -1,5 +1,6 @@ | |||
ansicolors==1.0.2 | |||
beautifulsoup4>=4.3.2,<4.4 | |||
boto3==1.4.4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no problem with this dep.
logger.debug('Reading access key from {0}'.format(config_file)) | ||
boto_kwargs['aws_secret_access_key'] = secret_key | ||
except IOError: | ||
logger.debug('Could not load {0}, using ENV vars'.format(config_file)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add the IOError's message to the log message, so the user can figure out what happened.
Also, should this be a warn
? Or is this expected behavior in some cases? If the latter, it seems like asking for trouble to be unable to distinguish between expected behavior and an error... Perhaps we should expect that the config_file, if not None, must be readable?
logger.debug('Could not load {0}, using ENV vars'.format(config_file)) | ||
|
||
session = boto3.Session(**boto_kwargs) | ||
config = Config(connect_timeout=4, read_timeout=4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these be options?
return session.resource('s3', config=config) | ||
|
||
|
||
_READ_SIZE_BYTES = 4 * 1024 * 1024 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be an option? Or is it just super-standard to read 4MB chunks from S3?
|
||
def try_insert(self, cache_key, paths): | ||
logger.debug('Insert {0}'.format(cache_key)) | ||
# Delegate creation of artifacts to the local cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: End sentence with a period.
from contextlib import contextmanager | ||
|
||
import boto3 | ||
import pytest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was an intentional choice, although not necessarily the right one. Pytest certainly encourages you not to, and there are some features unavailable to class-based tests. Nonetheless, fixtures are confusing, and xtest-style tests are well understood by most, so...
_TEST_BUCKET = 'verst-test-bucket' | ||
|
||
|
||
@pytest.yield_fixture(scope="function") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: single quotes, here and all other fixtures below. Although I guess it doesn't matter since you're going to rewrite this anyway.
@toddgardner : Do you have time to wrap this one up? It would be really appreciated! |
I am going to experiment with this. If the latencies are workable, I can possibly carry the work forward if Todd doesn't have the bandwidth |
experiment was positive, will move forward unless Todd objects |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I just came back from vacation. This looks great, thanks for picking it up.
Okay, will do. We just converted a cache to s3 and will be experimenting. Once that looks reasonable, I will push a PR. |
Closing because caching has changed dramatically with the V2 implementation of Pants to now require gRPC. Thank you for originally putting up this change, though! |
Problem
This adds an s3 implementation of the pants cache. S3 is used as it is simpler for smaller companies to maintain than running and securing another server and potential caching layer, and configuring development laptops, etc.
Solution
This is an implementation written for my previous startup by @JoeEnnever that we're pushing upstream. It's got several practical tweaks like abandoning on errors, enforcing timeouts, etc.
The advantage of not having this as a plugin is it avoids having to monkey patch cache_setup, or build a new plugin system just for cache backends. Additionally there are some problems installing boto in a plugin, because it isn't zipsafe (#4428).
The downsides is that it adds two dependencies to pants core, in terms of boto3 and pyjavaproperties.
It has several ways to pass credentials, including standard boto chain, with an optionally specified profile.
It also includes a way to have the properties come from a java-properties file. This is useful if your company uses s3 as an ivy artifact storage as well; you can share the credentials between pants and ivy xml. It's unclear if this is a wide enough use-case to actually push upstream.
I also haven't done any performance testing on this versus actually just rebuilding targets. It doesn't seem to feel slow, at least.