-
Notifications
You must be signed in to change notification settings - Fork 272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REF: Use head_object
for .info
#24
Conversation
This should be more efficient for `.info` calls. The response dict is almost the same. They use `ContentLength` instead of `Size` and don't include the key in the response. One question I had was on exception handling. To get the test suite to pass, I had to catch botocore's `ParamvalidationError.` Otherwise things like `s3.open('x', 'rb')` failed with a ParamvalidationError, since `'x'` isn't a valid bucket name, while the tests were expecting an IOError or OSError. Would be curious to hear what you think on this. With this the pandas test-suite pretty much passes. My changes are [here](pandas-dev/pandas@e89e54a) Just a couple tests to fix up that were expecting different error messages. http://boto3.readthedocs.org/en/latest/reference/services/s3.html#S3.Client.head_object
Does head_object return other info like access times? |
Just last modified In [5]: fs.s3.head_object(Bucket='pandas-test', Key='tips.csv')
Out[5]:
{'AcceptRanges': 'bytes',
'ContentLength': 7943,
'ContentType': 'application/octet-stream',
'ETag': '"b8e189917e1e12e5e34247f90db7834f"',
'LastModified': datetime.datetime(2015, 9, 12, 20, 19, 12, tzinfo=tzutc()),
'Metadata': {},
'ResponseMetadata': {'HTTPStatusCode': 200,
'HostId': 'tR9J0P0Si+OHpRAFrENSDBA5Cos8pFC+bjJFQHkz4ESKqEc2x0chU7zNw93Rvo62cWBAfCYtLUs=',
'RequestId': 'B12B2C8015E611AC'}} |
On the ParamValidationError, actually we should have caught that before passing to boto3, because the path you give in that |
On head_object, you are probably right that it is faster when fetching details for a single key (i.e., info), but at the moment we get the info for many keys at once with _ls and cache, and info picks from the cache. |
This is the issue I was working around for pandas, this test specifically. It failed before when I see your concern about performance. Does adding a parameter to For the param validation error, are you saying that fs = S3FileSystem()
fs.info('foo') should raise an |
Yes, I suppose How did you set up I agree that we can cause |
@martindurant not sure who owns that bucket. I don't seem to have access :< |
I'll look tonight or this weekend on if there's a better way around this. FWIW here's the original issue in pandas: pandas-dev/pandas#10604 |
I haven't found anything in boto3 to control the exception handling in Could we break these two use-cases into two methods?
|
Yes, OK, that sounds totally reasonable. |
Would this do? |
That's what I was going to suggest. To be honest, I thought I was working on I can repurpose this PR to use it, or you can. Whatever you think is easier (sorry I've been slow to respond this week). There's just a couple test failures with that patch,
Those raise |
I appreciate any and all time you can spend on this! I have reformed this into PR #27 - OK? |
This should be more efficient for
.info
calls.The response dict is almost the same. They use
ContentLength
insteadof
Size
and don't include the key in the response.One question I had was on exception handling. To get the test suite to
pass, I had to catch botocore's
ParamvalidationError.
Otherwise thingslike
s3.open('x', 'rb')
failed with a ParamvalidationError, since'x'
isn't a valid bucket name, while the tests were expecting an IOError or
OSError. Would be curious to hear what you think on this.
This change was necessary to make the pandas test-suite pass. We had issues in the
past where a users's bucket allowed them to list some, but not all the items in a bucket.
The old implementation for
.info
did anls
on the bucket, and then filtered down tothe single file matching the key.
With this the pandas test-suite pretty much passes.
My changes are
here
Just a couple tests to fix up that were expecting different error
messages.
http://boto3.readthedocs.org/en/latest/reference/services/s3.html#S3.Client.head_object