Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

You cannot do a pull directly from AWS ECR (namespace problem) #54

Closed
christianoguedes opened this issue Nov 9, 2022 · 64 comments
Closed

Comments

@christianoguedes
Copy link

I try to perform the following pull by oras-py

oras-py pull ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/registry-name:latest

And I get the following error:

You are minimally required to include a <namespace>/<repository>

But when I try it directly from the oras

oras pull ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/registry-name:latest
Works!

Downloading 827a4b2f6d63 .
Downloaded  827a4b2f6d63 .

It is not mandatory to use namespace in ECR

@vsoch
Copy link
Contributor

vsoch commented Nov 9, 2022

We can fix that up! I’ll prepare a PR for you to test.

@vsoch
Copy link
Contributor

vsoch commented Nov 9, 2022

Okay please give #55 a shot! You should be able to clone the respository, create an environment of your choosing and pip install -e . for a development install and then try the command as before, and do a sanity check of which oras-py to make sure it is in the new environment.

I've basically tweaked the regex that parses the name to not require a namespace, and then the URI to also accommodate. Let me know if it works!

@christianoguedes
Copy link
Author

christianoguedes commented Nov 9, 2022

Thanks for the prompt help.
Now I had another error, in the download of the manifest, because it formatted the url wrong follows output:

oras-py pull ACCOUNT.dkr.ecr.us-east-1.amazonaws.com/stacks:latest

Issue with https://ACCOUNT.dkr.ecr.us-east-1.amazonaws.com/v2/None/stacks/manifests/latest:
Unauthorized

For me the right url should be
https://ACCOUNT.dkr.ecr.us-east-1.amazonaws.com/v2/stacks/manifests/latest

@vsoch
Copy link
Contributor

vsoch commented Nov 9, 2022

oups looks like I must have missed that one reference of the namespace -- let me take a look!

@vsoch
Copy link
Contributor

vsoch commented Nov 9, 2022

Yep I missed just the sections above that, here cc644ca. Please give it another shot!

@christianoguedes
Copy link
Author

christianoguedes commented Nov 9, 2022

Tks, but still didn't work

Issue with https://ACCOUNT_NUMBER.dkr.ecr.us-east-1.amazonaws.com/v2/stacks/manifests/latest:
Unauthorized

I think it is the : at the end.
I can see the manifest json if I access this url without the :.

@vsoch
Copy link
Contributor

vsoch commented Nov 9, 2022

That colon is just in the message:

logger.exit(f"Issue with {response.request.url}:\n{response.reason}")

How are you providing authentication? If you can do an IPython.embed() in there (oras/container.py) and tell me exactly the headers / error that you are seeing (since I can't reproduce) that's the best way for me to help.

The message is "Unauthorized" so either we are providing auth (and we should not be) or we aren't and we should be. The server is not happy with what we are doing, and it's different than what you are doing in the web UI.

@christianoguedes
Copy link
Author

christianoguedes commented Nov 9, 2022

I use this command to login

aws ecr get-login-password --region $AWS_REGION --profile $PROFILE | oras-py login \
    --password-stdin \
    --username AWS \
    "${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com"

It works with oras or docker, but dont print any message (success or error).
Do you have a way to validate that the login worked?

@vsoch
Copy link
Contributor

vsoch commented Nov 9, 2022

That's a good question - we use docker-py under the hood to login, and I'm not sure what it returns! I suspect it would error if it didn't work, but if it's just writing your credentials to the file it might not be doing anything. I just pushed a print for what is returned if you want to run it again, show me that and we can add some meaningful output if there is any.

But regardless, can you try the push with the username and token provided to that command as a sanity check?

@christianoguedes
Copy link
Author

I tried to pass the authentication in pull

oras-py pull ACCOUNT.dkr.ecr.us-east-1.amazonaws.com/stacks:latest --username AWS --password MY_TOKEN

Returns

Issue with https://ACCOUNT.dkr.ecr.us-east-1.amazonaws.com/v2/stacks/manifests/latest:
Unauthorized

I tested with oras

oras pull ACCOUNT.dkr.ecr.us-east-1.amazonaws.com/stacks:latest --username AWS --password MY_TOKEN

and works! My token (password) is valid.

@vsoch
Copy link
Contributor

vsoch commented Nov 9, 2022

Okay right - so I need you to insert some IPython.embed() or otherwise give me an example to reproduce your error - seeing that one works and one does not is not helpful. Are you familiar with Python?

@vsoch
Copy link
Contributor

vsoch commented Nov 9, 2022

Also I removed the : in the message which suggests you aren't using the updated branch.

@vsoch
Copy link
Contributor

vsoch commented Nov 9, 2022

@shizhMSFT @qweeah can you help me find someone with access to ECR that could reproduce this error, and then look into the specific underlying interactions (headers, responses, etc). I am a bit helpless without being able to reproduce his error.

@qweeah
Copy link

qweeah commented Nov 10, 2022

@shizhMSFT and I don't have ECR for testing. You can talk with Nima Talebi(github account: nima) to see if he can provide you some resource for testing or help repo the issue in ECR.

@vsoch
Copy link
Contributor

vsoch commented Nov 10, 2022

hey @nima! 👋 See out discussion above. I need to help debug this issue, but I cannot reproduce, and I don't think @christianoguedes is able to dig in to provide more detail. Can you help?

@nima
Copy link

nima commented Nov 10, 2022

Sure i'll take a look at this, staying with a repro. 😃

@christianoguedes
Copy link
Author

Hey guys, sorry by my late. I believe we're in different time zones

I ran it again with the adjustments made, but it still didn't work. Although it did succeed in the login.

❯ aws ecr get-login-password --region $AWS_REGION --profile $PROFILE | oras-py login \
    --password-stdin \
    --username AWS \
    "${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com"
{'Status': 'Login Succeeded'}

❯ oras-py pull ACCOUNT.dkr.ecr.us-east-1.amazonaws.com/stacks:latest
Issue with https://ACCOUNT.dkr.ecr.us-east-1.amazonaws.com/v2/stacks/manifests/latest
Unauthorized

I was able to import IPython.embed() into oras/container.py, but I couldn't use it. Could you create a branch with these outputs that would help detect this and I would run it here? In the future I could even have a verbose mode in the CLI.

Tks for help

@vsoch
Copy link
Contributor

vsoch commented Nov 10, 2022

okay so what if you do the auth with the command?

@christianoguedes
Copy link
Author

christianoguedes commented Nov 10, 2022

The same Unauthorized error, but don't print login successfull.

oras-py pull ACCOUNT.dkr.ecr.us-east-1.amazonaws.com/stacks:latest  --username AWS  --password MY_ECR_TOKEN

Issue with https://ACCOUNT.dkr.ecr.us-east-1.amazonaws.com/v2/stacks/manifests/latest 
Unauthorized

@vsoch
Copy link
Contributor

vsoch commented Nov 10, 2022

okay gotcha, thanks for testing that! So what we need to do next is to install IPython

pip install IPython

Then what I would do is:

print('IN FUNCTION PULL')
import IPython
IPython.embed()

at the top of the functions for pull

allowed_media_type = kwargs.get("allowed_media_type")
and download_blob. The prints tell you where you are. That will open an interactive terminal at those points and you can walk through each command (yes copy paste) until you hit the request for the error. Then I'm interested to see the details / headers / whatever you can give me (this is how I would debug it if I could reproduce your error).

@christianoguedes
Copy link
Author

I can debug (with pyCharm) and verify why it doesn't work for me. In my case I don't have a helm in my ECR registry. And that seems to be a problem.

In this line

authResponse = self.session.get(h.realm, headers=headers, params=params) # type: ignore

It makes a request with the correct headers (basic auth), but the URL does not exist and return a 404.
The URL that makes the request is: https://ACCOUNT.dkr.ecr.us-east-1.amazonaws.com
As the request does not return 200 the basic auth is replaced in the header by a token None

@vsoch
Copy link
Contributor

vsoch commented Nov 10, 2022

What URL is the request supposed to go to? Normally it's assembled from the result of h.realm - is there an h.service on there (or other attributes) we are missing? Can you look at all the attributes on that object?

@christianoguedes
Copy link
Author

christianoguedes commented Nov 10, 2022

I don't know why but when trying to download a private registry manifest whitout auth amazon returns the following header

'Basic realm="https://ACCOUNT.dkr.ecr.us-east-1.amazonaws.com/",service="ecr.amazonaws.com"'

However respond 404 if I make the request to this helm url.

But once the user is logged in or has informed the credentials why not already try to make the first request with basic auth?
I made this small change here and it worked for me the manifest donwload.

But I needed to make another change, because my image is not standard, it's a custom one, but it's .tar.

From:

if layer["mediaType"] == oras.defaults.default_blob_dir_media_type:

To:

if layer["mediaType"].endswith(('+tar', '.tar+gzip', '.tar')):

@vsoch
Copy link
Contributor

vsoch commented Nov 10, 2022

Why is your image not standard?

Can you please send me the full content type? I don't think we want to blindly match based on the suffix, but we can add the default to be one from a list.

Can you please show me the change you made so it works? Generally the flow is to do the request (without auth) and parse the Www-Authenticate header to know where to ask to get a token. If ECR does something different I'd be interested to know what that is!

@christianoguedes
Copy link
Author

christianoguedes commented Nov 17, 2022

We are testing OCI as a way to provide our own packages and not docker style containers, basically is source codes and as it is something specific to our scenario we are calling it application/vnd.stackspot.stack+tar OCI can provide a lot of benefits like download validation, layers, cache, etc. Instead of simply putting it in a storage like S3.

I also did not understand why ECR does not use realm correctly in the www-Authenticate header, because when trying to auth to realm it returns 404.

In my scenario to work what I did was validate that the user is logged in and if you use already uses the basic auth always without performing the request without authentication. Currently in the manifest download tests if you can perform the donwload without auth and if receive the header www-Authenticate then try to perform the auth in realm informed.

@vsoch
Copy link
Contributor

vsoch commented Nov 17, 2022

Interesting! Is this something you could write a custom client for (that would handle this logic) or are there fixes you think would be useful to add upstream here?

@christianoguedes
Copy link
Author

I believe you could use the default client, ie, the ORAS. I'm studying a little more OCI to be able to contribute respecting the standards. But so far we are using the CLI Oras (golang) same, because it's worked for all our scenarios so far.
I don't know what they do, but it doesn't work exactly the same in oras-py

@vsoch
Copy link
Contributor

vsoch commented Nov 23, 2022

The point isn't to use the Go client, we open the issue here so we can work on it and fix it. Waiting to hear back from @nima.

@vsoch
Copy link
Contributor

vsoch commented Nov 25, 2022

hey folks, I opened a PR here #55 for this issue can anyone test please?

@vsoch
Copy link
Contributor

vsoch commented Dec 7, 2022

I don't think so - perhaps we are both on a shared slack? Are you in any cloud communities, spack, research software engineering?

@vsoch
Copy link
Contributor

vsoch commented Dec 7, 2022

Or maybe a DM on Twitter or Gitter would work?

@vsoch
Copy link
Contributor

vsoch commented Dec 7, 2022

Actually it looks like we aren't setting the username and password - give me a few minutes to tweak something! The above really should work.

@christianoguedes
Copy link
Author

I sent the temp credential to your twitter

@vsoch
Copy link
Contributor

vsoch commented Dec 7, 2022

We should double check the right person - https://twitter.com/vsoch

@vsoch
Copy link
Contributor

vsoch commented Dec 7, 2022

I don't see it - but give the pull one more try, I fixed a bug a3bfa59

@vsoch
Copy link
Contributor

vsoch commented Dec 7, 2022

Ah got it!

@christianoguedes
Copy link
Author

christianoguedes commented Dec 7, 2022

Now make the request to 'https://765614139936.dkr.ecr.us-east-1.amazonaws.com/' with basic auth, but AWS returns 404 for this url. Apparently don't need to authenticate, just call all requests with basic auth, in this case make the request with basic auth to get manifest https://765614139936.dkr.ecr.us-east-1.amazonaws.com/v2/stacks-services/manifests/registry-api.

same output

Auth response parameters: {'service': 'ecr.amazonaws.com'}
Auth response was not successful: {response.text}
Issue with https://765614139936.dkr.ecr.us-east-1.amazonaws.com/v2/stacks-services/manifests/registry-api: Unauthorized

@vsoch
Copy link
Contributor

vsoch commented Dec 7, 2022

Yeah I can reproduce that. Did you send me a basic auth or token?

@vsoch
Copy link
Contributor

vsoch commented Dec 7, 2022

When I look at the thing you sent me, when I decode it, it has a DATA_KEY type and is json. I don't think this is basic auth?

@vsoch
Copy link
Contributor

vsoch commented Dec 7, 2022

okay well I need to get more sleep - been going at this since 6am and I still have a full workday. I can try again later. I'm kind of confused about what credential example you gave me - it's not base64 encoded username password (which is basic auth).

@christianoguedes
Copy link
Author

christianoguedes commented Dec 7, 2022

It's really weird, but are really the values to mount the basic auth
You can test by accessing the url via browser https://765614139936.dkr.ecr.us-east-1.amazonaws.com/v2/stacks-services/manifests/registry-api using the user "AWS" and the password is that of the command.

@vsoch
Copy link
Contributor

vsoch commented Dec 7, 2022

Indeed that works!

@vsoch
Copy link
Contributor

vsoch commented Dec 7, 2022

I can also look at the headers for that page and verify the Basic header (the encoded thing with AWS) matches the one I'm generating.

@vsoch
Copy link
Contributor

vsoch commented Dec 7, 2022

Oh I just got it working.

@vsoch
Copy link
Contributor

vsoch commented Dec 7, 2022

Huh I didn't know "subject" was valid for a manifest?

     'mediaType': 'application/vnd.oci.image.manifest.v1+json',
     'schemaVersion': 2,
     'subject': None}

@vsoch
Copy link
Contributor

vsoch commented Dec 7, 2022

okay it's finally pulling - ECR seems to just roll it's own rules about manifests I guess. It broke the manifest schema in three places.

@vsoch
Copy link
Contributor

vsoch commented Dec 7, 2022

@christianoguedes could you maybe next time not send me an example that is going to explode all over my local directory, or give me a command to extract to a single place? It exploded a small filesystem into my local repo which could have been disaster!

@vsoch
Copy link
Contributor

vsoch commented Dec 7, 2022

okay now ECR has broken the other registry auth. https://github.com/oras-project/oras-py/actions/runs/3640378787/jobs/6144970943

@vsoch
Copy link
Contributor

vsoch commented Dec 7, 2022

okay this was kind of a nightmare but tests are passing - I'm really kind of disappointed ECR has to be different than other registries, but it is what it is. #55

@christianoguedes
Copy link
Author

christianoguedes commented Dec 7, 2022

Thanks a lot really now it worked🥇 ,I'm also disappointed with ECR, mainly for not having how to put namespaces.

Now I tried to execute a push, but got the following error: :(

Layers with digests '[sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855]' required for pushing image into repository with name 'stacks-services' in the registry with id '765614139936' do not exist
Issue with https://765614139936.dkr.ecr.us-east-1.amazonaws.com/v2/stacks-services/manifests/test_oras_py: Not Found

I'll check if I'm doing something wrong, but the same command with oras worked.

oras push --username 'AWS' --password<SAME_AWS_TEMP_PASS> 765614139936.dkr.ecr.us-east-1.amazonaws.com/stacks-services:test_oras_py2 .:application/vnd.stackspot.stack+tar

If you want to test you can put any tag, it is a temporary repository.

@vsoch
Copy link
Contributor

vsoch commented Dec 7, 2022

I'm on it! And it's OK, I am empowered to debug this with your token so I should be able to figure this out. Thanks again and will report back (hopefully with good news).

@vsoch
Copy link
Contributor

vsoch commented Dec 7, 2022

I'm able to push and get 200/201 responses for the layer and config but then when I try to push the manifest:

{'errors': [{'code': 'BLOB_UPLOAD_UNKNOWN',
   'message': "Layers with digests '[sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855]' required for pushing image into repository with name 'stacks-services' in the registry with id '765614139936' do not exist"}]}

The URL is:

https://765614139936.dkr.ecr.us-east-1.amazonaws.com/v2/stacks-services/manifests/test_oras_py2

json data

{'schemaVersion': 2,
 'mediaType': 'application/vnd.oci.image.manifest.v1+json',
 'config': {'mediaType': 'application/vnd.unknown.config.v1+json',
  'size': 0,
  'digest': 'sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855'},
 'layers': [{'mediaType': 'application/vnd.stackspot.stack+tar',
   'size': 44992,
   'digest': 'sha256:6e7eb85f6bfaea91c6c43fd3c83a55fc96a3ac65e8b30ce49ac6ed7883387efa',
   'annotations': {'org.opencontainers.image.title': 'dist'}}],
 'annotations': {}}

headers (not including the auth token)

{'Content-Type': 'application/vnd.oci.image.manifest.v1+json',
 'Content-Length': '5'}

and it's a PUT.
I can push what I have (I added support for media_type) but I'm not sure why I am able to push the config and then it tells me the layer isn't there. Maybe you can take a look?

@christianoguedes
Copy link
Author

christianoguedes commented Dec 7, 2022

I check and there is no layer type setting in the ECR portal. I also got this same error, but if you just using oras pull or docker pull with the same parameters worked.

@christianoguedes
Copy link
Author

I sent a new temp password to your gitter, the other has already exipirated

@vsoch
Copy link
Contributor

vsoch commented Dec 7, 2022

Just tried again - same error as above. The first two seem to work (after handling the auth to ultimately just use the basic token) but the last request to PUT the manifest is denied because it says the layer doesn't exist.

@vsoch
Copy link
Contributor

vsoch commented Dec 10, 2022

Hey folks! I've been thinking this over and want to share some thoughts I have about this, larger picture. This is taken from the oras slack just now:

I've been thinking about oras-py and I'm looking for some feedback. Specifically I think we should remove the oras-py client commands (and refactor them to be examples). The reason is because it's redundant to maintain two clients, and in practice I've found basic custom implementations (that are done to do exactly what some snowflake registry or interaction needs) are better than trying to provide one command to fit them all. I think the original intention of oras-py was to have a python API to interact with oras, and not to provide yet-another-command-line-tool (I just didn't think about this at the time of creation). As a quick example, I have a custom cache for a tool that can assemble metadata https://github.com/converged-computing/cloud-select/blob/main/cloud_select/main/cache.py#L81-L107 and then use by subclass of the oras provider to push! https://github.com/converged-computing/cloud-select/blob/db02c4378f06bfbbe9df6b8a83c885cc9238a04e/cloud_select/main/cache.py#L81 and I'm wrapping the set_basic_auth that I get from the environment and making sure the user has provided exactly what I need. And then I get my data with (the Go variant) of oras if I need a command line client!

For the context here, the ideal solution I think would be to remove these client commands - if you need a command line client you should be using oras in Go and we shouldn't be trying to maintain two versions of the same thing. Then in place of this I can implement custom example clients - and specifically optimized for what EC2 needs. Or GitHub packages. Or whatever. The pattern I'm starting to see is that registries have subtle difference and there is no "one client to rule them all."

TLDR The python SDK should be for Python developers, and not provide a second client. Let me know your thoughts. If you agree I'll start work on this tomorrow.

@vsoch vsoch mentioned this issue Dec 10, 2022
@vsoch
Copy link
Contributor

vsoch commented Dec 10, 2022

@christianoguedes @nima - thank you for your help (and the fun!) here. I've decided for command line interactions to point you to https://github.com/oras-project/oras - we should not be trying to maintain two equivalent clients in different languages. The SDK here is intended for custom Python applications. However, we did fix the original bug reported for push above - if you just need the client, then probably no further work is needed. If you want to look more into the EC2 differences with the pull, then please open another issue and we can discuss there. What we would do is create a custom example in the newly updated examples folder that creates a client that does exactly what (whatever ECR) needs. Thanks again!

@vsoch vsoch closed this as completed Dec 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants