Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 identifier with @ in key raises RuntimeError #94

Closed
jayantj opened this issue Nov 2, 2016 · 7 comments
Closed

S3 identifier with @ in key raises RuntimeError #94

jayantj opened this issue Nov 2, 2016 · 7 comments

Comments

@jayantj
Copy link

jayantj commented Nov 2, 2016

Trying to open a file on S3 with a @ in the filename or prefix raises a RuntimeError

uri = 's3://bucketname/docs/x@t.ai/test.pdf'
f = smart_open(uri, 'rb')
RuntimeError                              Traceback (most recent call last)
<ipython-input-60-6156dc49e12c> in <module>()
----> 1 f = smart_open(uri, 'rb')

/data/environs/nlp/local/lib/python2.7/site-packages/smart_open/smart_open_lib.pyc in smart_open(uri, mode, **kw)
    120         # this method just routes the request to classes handling the specific storage
    121         # schemes, depending on the URI protocol in `uri`
--> 122         parsed_uri = ParseUri(uri)
    123 
    124         if parsed_uri.scheme in ("file", ):

/data/environs/nlp/local/lib/python2.7/site-packages/smart_open/smart_open_lib.pyc in __init__(self, uri, default_scheme)
    251                 # Bucket names can contain lowercase letters, numbers, and hyphens.
    252                 # Each label must start and end with a lowercase letter or a number.
--> 253                 raise RuntimeError("invalid S3 URI: %s" % uri)
    254         elif self.scheme == 'file':
    255             self.uri_path = parsed_uri.netloc + parsed_uri.path

RuntimeError: invalid S3 URI: s3://bucketname/docs/x@t.ai/test.pdf
@tmylk
Copy link
Contributor

tmylk commented Nov 3, 2016

The name doesn't follow AWS best practices. What is the reason for using it?

The "@" symbol is not considered a safe character in AWS guide. It is referenced as "character that requires special handling".

@tmylk
Copy link
Contributor

tmylk commented Nov 3, 2016

Need to add logic to distinguish s3://key:secret@bucket/object vs s3://bucket/obj@ect and s3://bucket/ob:j@ect

@piskvorky
Copy link
Owner

piskvorky commented Nov 3, 2016

Is it possible to do so cleanly & unambiguously?

Is the S3 URI syntax described somewhere formally, so we know for sure?

@piskvorky
Copy link
Owner

piskvorky commented Nov 3, 2016

Maybe if slashes / are not allowed in credentials (I'm not sure -- maybe they are), a simple check would be "credentials before @ contain a / => they're not really credentials, treat @ as part of filename".

@jayantj
Copy link
Author

jayantj commented Nov 3, 2016

/ is allowed in the credentials - secret keys usually have a slash in them

I think these are the two additional tests we need -

        # correct uri, key contains @
        parsed_uri = smart_open.ParseUri("s3://mybucket/mykey@mydir")
        self.assertEqual(parsed_uri.scheme, "s3")
        self.assertEqual(parsed_uri.bucket_id, "mybucket")
        self.assertEqual(parsed_uri.key_id, "mykey@mydir")
        self.assertEqual(parsed_uri.access_id, None)
        self.assertEqual(parsed_uri.access_secret, None)

        # correct uri with credentials, key contains @
        parsed_uri = smart_open.ParseUri("s3://ACCESSID456:acces/sse_cr-et@mybucket/mykey@mydir")
        self.assertEqual(parsed_uri.scheme, "s3")
        self.assertEqual(parsed_uri.bucket_id, "mybucket")
        self.assertEqual(parsed_uri.key_id, "mykey@mydir")
        self.assertEqual(parsed_uri.access_id, "ACCESSID456")
        self.assertEqual(parsed_uri.access_secret, "acces/sse_cr-et")

Seems slightly difficult to do cleanly, lots of edge cases.

@tmylk
Copy link
Contributor

tmylk commented Nov 3, 2016

The Access Key ID is 20 alpha-numeric characters, so a regex can locate the s3://key:secret@bucket/object. Will send a fix later today.

@menshikh-iv
Copy link
Contributor

The problem still exists

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants