Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON from private host addresses #4193

Closed
yzorg opened this issue Sep 27, 2019 · 8 comments
Closed

JSON from private host addresses #4193

yzorg opened this issue Sep 27, 2019 · 8 comments

Comments

@yzorg
Copy link

yzorg commented Sep 27, 2019

Issue Summary

New JSON data source does not allow use with private host addresses. I get the error: "Can't query private addresses."

If databases can be hosted on internal DNS names why couldn't JSON data sources? Could this be made configurable?

Steps to Reproduce

  1. Install Redash via Docker, which is now recommended.
  2. Use JSON Data Source (new in v8 beta 2)
  3. Point to JSON file hosted on local computer.

On Docker the localhost URL will look like url: http://host.docker.internal:5001/mydata/my_data.json

Expected: As a developer I can view local JSON for testing new data sources or new application URLs.
Actual: Error on the query screen, "Can't query private addresses."

Technical details:

  • Redash Version: 8.0 beta 2
  • Browser/OS: Chrome/Windows
  • How did you install Redash: Docker, updated docker-compose.yaml to redash/redash:8.0.0-beta.2.b29352

The error seems to come from:

if is_private_address(query['url']):

Comments

I've tested a local copy of the container with this raise commented out, and everything works fine. I understand in PaaS or when Redash is externally visible, this is necessary to protect internal data sources. But I'm evaluating Redash to run inside my production cluster only accessible to internal users. A core use case is to surface data internal to the cluster (PostgreSql, MongoDB, JSON, and CSV) and control it via dashboard groups and permissions. If databases can be hosted on internal DNS names why couldn't JSON data sources? Could this be made configurable?

@arikfr
Copy link
Member

arikfr commented Sep 27, 2019

We recently had this discussion already: https://discuss.redash.io/t/error-running-query-cant-query-private-addresses/4568/.

Copying my reply over here for simplicity:

If databases can be hosted on internal DNS names why couldn't JSON data sources?

This is to avoid people using the JSON data source to access information they are not supposed to, like AWS metadata API.

Could this be made configurable?

Happy to accept a PR that makes this behavior configurable with an environment variable. Just note that if you disable this check, you need to trust whoever you allow running queries in your system.

@yzorg
Copy link
Author

yzorg commented Sep 30, 2019

The env var change would be very simple, but I also worry about turning off security features with obscure settings.

I think it might be a lot clearer to have a new data source, "Unsafe Internal JSON Data Source" and the env var would enable that data source. I've only dabbled in Python, so I'm not sure how yet, but I would hope 90% or more of the implementation can be shared between the two, only disabling the private host check in the 2nd data source.

Update 2019-11-21: environment variable is simpler and easier to understand

@kneufeld
Copy link

kneufeld commented Nov 11, 2019

This makes zero sense to me. A product designed to monitor private infrastructure can't monitor private infrastructure? I don't think it's up to redash to arbitrarily to decide what's allowed to be monitored or not.

To address the security concerts:

  • don't let users create queries unless they're logged in
  • add a "allow internal queries" flag to the json datasource
  • remote api should have authentication

If the a user can access and create queries on redash then surely they can also just make random curl requests to whatever it is that you're worried about.

Please rethink this, "security" should not trump usability and as OP said, if you can query postgres et al then why not json as well?

@arikfr
Copy link
Member

arikfr commented Nov 13, 2019

@yzorg

but I also worry about turning off security features with obscure settings.

We can have proper documentation around it.

But the other option you suggested is fine as well. No need for env var, just have it in a separate file and we won't enable it by default. The implementation can definitely be shared between the two -- just add the needed configuration in the JSON one, and subclass it for the second.

@arikfr
Copy link
Member

arikfr commented Nov 13, 2019

@kneufeld, I'm not sure we see the definition of Redash the same way. Also the solutions you suggested don't address the security issue I mentioned. Let me elaborate:

  • Your Redash instance runs on AWS infrastrucutre.
  • Every EC2 instance has access to AWS' metadata API. This API is not in your control nor authenticated. This API provides access to various pieces of information, including access keys to APIs you allowed this EC2 instance to have access to.
  • Your Redash users might not have the same level of access as the EC2 instance. For example, you might be using AWS SSM to get the Redash configuration, which includes the COOKIE SECRET value (the one used to encrypt cookies).

If the JSON data source wasn't preventing access to internal APIs any Redash user (with access to the JSON data source) could query the metadata API, get the instance's API keys and access the COOKIE SECRET. Using this they can impersonate other users in your system.

While you might think that:

add a "allow internal queries" flag to the json datasource

will solve this case, it's only applicable when you can trust the admins (who can edit this configuration). It's not always the case.

if you can query postgres et al then why not json as well?

Because with Postgres you're given explicit access to explicit resources (defined by the user role in postgres). With JSON you're given an open cheque.

@kneufeld
Copy link

I get your concerns but that's a big hammer for those of us that don't run in AWS and trust their coworkers.

@arikfr
Copy link
Member

arikfr commented Nov 13, 2019

Maybe, but it's a really easy fix/change. A PR addressing this (in the way outlined above) is welcomed.

@jrm
Copy link

jrm commented Jun 19, 2020

A change to the get_response method in the BaseHTTPQueryRunner class is also needed as it is also doing the "is_private_address" check during the actual HTTP request. Is there any value in doing the same test twice?

@loganprice - is this already spotted?

@arikfr arikfr closed this as completed Jul 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants