New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Azure DataStore] Handle upload strings vs bytes and filepath formation when using adlfs #1159
Conversation
hayesgb
commented
Jul 28, 2021
- When uploading strings to as model artifact attributes to abfs using put method and adlfs, operations were failing. Added ability to alter write method based on incoming data
- get, listdir, stat filepath handling
- Validated performance with private integration testing against adlfs
…and more robust filepath creation for listdir and get methods if key has leading or trailing delimiters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hayesgb one minor comment
Also, if it were failing, maybe worth adding a test so that future changes won't mistakenly break it again
@Hedingber -- I've been investigating why the tests did not catch this. It turns out there doesn't appear to be an obvious way to close a DataStore once its been created. This causes the initial DataStore created during unit testing to be persisted, and gets used for all of the authentication methods. I'll make additional updates to this PR to see if I can resolve that issue. |
…private class method
… avoids the potential for dangling open connections
@hayesgb |
…ed to allow separation of tests with different auth methods. Also added separate test containers based on auth method. Refactored azure_blob.py to pass tests
Thanks @Hedingber From what I can tell, once a data_item is created, the StoreManager creates a DataStore for that container, and caches the credentials for future use. Even if the data_item is deleted, the StoreManager does not remove them. My proposed (short-term) solution is to create a specific Azure Container for each of the individual authentication methods, and delete the environmental variables between tests. Additionally, I've parametrized the unit test, so each authentication method gets tested separately, rather than having them run together as a single test. This allowed me to find some bugs in the listdir() dataitem method. While investigating this, I realized there's not a close() method for either a DataItem or a DataStore, which creates the potential for memory leaks. I'll file a separate issue on this. |
@hayesgb part of @theSaarco 's PR is also parametrizing the tests, I will highly suggest you to wait for his PR to be merged before proceeding here |
@hayesgb: as @Hedingber mentioned - I noticed the same issue you're seeing. I have fixed it in #1149 for tests by modifying the basic test fixture to always clean up between tests - in #1149 I'm clearing up all the datastores from the |
Thanks @theSaarco Regarding the question of whether it can actually happen, there are two scenarios that come to mind.
|
…ng in put method for append=True from mlrun/datastore/azure_blob.py
@Hedingber -- Updated this PR to make use of the revised unit test from #1149. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@theSaarco WDYT ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hayesgb - Looks good. One small fix that I'd like implemented, but I'm not sure about the actual support in adlfs - please take a look.
Co-authored-by: Saar Cohen <66667568+theSaarco@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Approved.
@hayesgb PR was merged, thanks a lot! |