Added deprecation warning for anonymous access#530
Conversation
nateprewitt
left a comment
There was a problem hiding this comment.
A few minor comments and a question. Please let me know if I'm missing anything.
| * Changed default value of `anon` to `False`. Explicitly pass in `True` to use anonymous | ||
| credentials. |
There was a problem hiding this comment.
I don't think we're changing the default just yet. We may want to rephrase this to be forward looking. Something like:
**Breaking:** Warning added for default anonymous use. Default adlfs use will require credentials starting in a future release. Anyone requiring anonymous access should explicitly set `anon=True` when creating their AzureFileSystem.I'd also suggest adding a small example snippet in this PR description for anyone looking. That will make it clear what changes they need to make.
| "0", | ||
| "f", | ||
| ] | ||
| if self.anon: |
There was a problem hiding this comment.
This is a little subtle but I think we may incorrectly flag usage. If I set AZURE_STORAGE_ANON to "true", I would expect that to continue working. We shouldn't be breaking that with the default change.
The case we're concerned about here is closer to os.getenv("AZURE_STORAGE_ANON", None) is None. We want the case where both the env var isn't set and the variable isn't declared explicitly.
| if ( | ||
| self.sas_token is None | ||
| and self.account_key is None | ||
| and credential is None | ||
| ): |
There was a problem hiding this comment.
I can go take a look if you don't know off hand. If anon is True but we have credentials available, are they used or do we ignore them?
There was a problem hiding this comment.
Building on what @nateprewitt said, I think it would be worth adding tests for the different credential permutations on when and when we don't warn about switching the anon default. I think if we are able to build up a good set of cases, we will have a much better understanding on when the default is reached and if there are any cases we did not account for. Furthermore, when we go to switch the default, we should hopefully be able to just update the assertion, reusing all of the cases that we've enumerated, and make that change easier/safer.
There was a problem hiding this comment.
The credentials are checked first so it will use them before anon.
d8814fa to
0e9bebc
Compare
nateprewitt
left a comment
There was a problem hiding this comment.
A few minor notes on the tests but otherwise looks good! I noticed though that the test suite now has a lot of these warnings being raised:
adlfs/tests/test_spec.py: 67 warnings
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/fsspec/spec.py:84: DeprecationWarning: The default for anonymous access will be changing to False. To continue using anonymous credentials, please set anon=True.
obj = super().__call__(*args, **kwargs, **strip_tokenize_options)
I think that's probably fine since this is a short time window, but I'm wondering if we should test changing the default and see what (if anything) breaks in the test suite with the cutover. That will help inform how much work needs to be done in advance.
| "0", | ||
| "f", | ||
| ] | ||
| if self.anon and os.getenv("AZURE_STORAGE_ANON") is None: |
There was a problem hiding this comment.
I think we can ignore the value of self.anon for this check, and just check the value of the environment variable. If it's not set (i.e. None), we know the end user hasn't specified a preference through the anon parameter or environment. That means we're using defaults which is the case where we want the warning.
| if self.anon and os.getenv("AZURE_STORAGE_ANON") is None: | |
| if os.getenv("AZURE_STORAGE_ANON") is None: |
| ("true", None, None, None, None, False), | ||
| ("false", None, None, None, None, False), | ||
| ("true", None, "credential", None, None, False), |
There was a problem hiding this comment.
anon is defined as an optional bool so we should try to keep tests reflecting the public API.
| ("true", None, None, None, None, False), | |
| ("false", None, None, None, None, False), | |
| ("true", None, "credential", None, None, False), | |
| (True, None, None, None, None, False), | |
| (False, None, None, None, None, False), | |
| (False, None, "credential", None, None, False), |
| ): | ||
| env_var = {} if env_value is None else {"AZURE_STORAGE_ANON": env_value} | ||
| with mock.patch.dict(os.environ, env_var): | ||
| if warning: |
There was a problem hiding this comment.
This is more high-level testing advice for Python, so you can do with it what you want. Generally, it's good to avoid conditionals inside tests. Once you introduce branching logic, it's not uncommon to silently break things while the tests continue to report success.
pytest.mark.parameterize is great for simplifying code, but there's also cases where you may want to make distinctions between what you're testing (positive cases, negative cases, env vs not, etc).
You might consider two tests here. We can also get a little clever with the testing inputs if you want, but not required at all.
test_anon_warning (positive test for case 1, ensure the warning is raised where we expect)
test_no_anon_warning (negative tests for cases 2-10)
It looks like it's because the connection string is set for most tests. I'm assuming we don't want the warning when it's set, so I included it in the check as well. |
|
Yeah we definitely do not want to warn if connection string is provided; connection strings count as one of the ways to provide credentials and avoid defaulting to anonymous credentials. |
nateprewitt
left a comment
There was a problem hiding this comment.
Looks good to me! I think one test is still raising the warning, but it's testing anonymous access so we should be alright leaving it to be updated with the default change.
kyleknap
left a comment
There was a problem hiding this comment.
Looks good! I just had some smaller comments.
| and self.connection_string is None | ||
| ): | ||
| warnings.warn( | ||
| "The default for anonymous access will be changing to False. To continue " |
There was a problem hiding this comment.
For the first sentence, maybe let's change it to:
AzureBlobFileSystem will no longer be defaulting to anonymous authentication in an upcoming release.
Mainly it took me several passes to understand what changing anonymous to false at a high level meant. I imagine this might be a little hard to follow for others that are not necessarily following this change.
|
|
||
|
|
||
| @pytest.mark.parametrize( | ||
| "anon,env_value,credential,sas_token,account_key", |
There was a problem hiding this comment.
It would be interesting if we could trim the parameterization down to just generalized env_vars and storage_options (which would be just the filesystem keyword arguments). Mainly it's a little difficult to follow the order of parameter right now and if we want to add any more configuration options (e.g., connection string, client id), it will bloat the parameters even more.
kyleknap
left a comment
There was a problem hiding this comment.
Looks good. Just had a couple more suggestions and I think we should be good here.
| mocker.patch.object( | ||
| AzureBlobFileSystem, | ||
| "_get_credential_from_service_principal", | ||
| return_value=(None, None), | ||
| ) |
There was a problem hiding this comment.
I think to avoid the patch, we should be also be able to set a fake secret and tenant id to get the pass validation. I was setting the configuration value like this (there should be env vars that we can use too) and I was not running into any errors:
AzureBlobFileSystem(
account_name="<account-name>",
client_id="<client-id>",
client_secret="<client-secret>",
tenant_id="00000000-0000-0000-0000-000000000000",
)I'm thinking we should just update the test cases to include fake values like that instead of adding another patch.
| (None, {"account_key": "account_key"}), | ||
| ({"AZURE_STORAGE_ANON": "true"}, {}), | ||
| ({"AZURE_STORAGE_ANON": "false"}, {}), | ||
| ({"AZURE_STORAGE_ANON": "true", "credential": "credential"}, {}), |
There was a problem hiding this comment.
Is the credential for this case supposed to be in the storage_options and not the env_vars?
kyleknap
left a comment
There was a problem hiding this comment.
Looks good! Thanks for digging into all of the different credential options here. 🚢
Added a deprecation warning for changing the anonymous access default to False. The message will only show up if users do not provide a credential or explicitly set
anon.To continue using anonymous access, users should pass in
anon=True. e.g.:AzureBlobFileSystem(account_name=<your-account-name>,...,anon=True,)