Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: 🎸 add concept of Resource #784

Merged
merged 29 commits into from
Feb 10, 2023
Merged

feat: 🎸 add concept of Resource #784

merged 29 commits into from
Feb 10, 2023

Conversation

severo
Copy link
Collaborator

@severo severo commented Feb 7, 2023

This long PR creates a new concept, Resource. The resource is aimed at being allocated and then released after use: connection to a database, modification of the datasets library config, or creation of storage directories...
Before this PR, it was done in the Config step. Now the Config step should never raise and should be immutable, and the Resources are created based on that config.

It makes it "easier" to understand what is failing in the tests and in the app.

GOOD LUCK to the reviewers, and sorry for the length.

@HuggingFaceDocBuilder
Copy link
Collaborator

HuggingFaceDocBuilder commented Feb 7, 2023

The documentation is not available anymore as the PR was closed or merged.

(I had an error from inside ffspec, seems like there is a name conflict
- weird!)
This way we can alsp ensure that the storage directory is not None
they were released just after the app has been returned. But we want the
resources to be released when the app is shut down instead.

Note that as we need the resources (mainly the log, and the assets
directory for the prometheus endpoint), we need to allocate them at the
start of the function, instead of using the "on_startup" parameter of
the Starlette constructor.
they are only available in __init__ and __post_init__. We don't want
that in the Resource inherited objects
@severo severo marked this pull request as ready for review February 8, 2023 15:43
Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome ! This will help a lot for the tests :)

just a comment about resources that don't have a release:

workers/datasets_based/tests/workers/test_first_rows.py Outdated Show resolved Hide resolved
to make it clear that the resource is the access to the storage, not the
assets directory in itself (which must be persisted and will not be
deleted when the ressource is released). Thanks @lhoestq for the idea.
It fixes the issue raised by @lhoestq at
#784 (comment).

Note that it's a bit radical: all the workers now require write access
to the assets storage directory, even if they will not use it.
I think it's OK since I will be working very soon on
#737, which will
merge all the workers in only one worker.
@codecov-commenter
Copy link

codecov-commenter commented Feb 9, 2023

Codecov Report

Base: 92.16% // Head: 88.95% // Decreases project coverage by -3.21% ⚠️

Coverage data is based on head (235d3cf) compared to base (afd2e49).
Patch coverage: 96.32% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #784      +/-   ##
==========================================
- Coverage   92.16%   88.95%   -3.21%     
==========================================
  Files          33       70      +37     
  Lines        2271     2861     +590     
==========================================
+ Hits         2093     2545     +452     
- Misses        178      316     +138     
Flag Coverage Δ
jobs_mongodb_migration 80.24% <92.30%> (?)
libs_libcommon 92.80% <95.00%> (?)
services_admin 87.13% <100.00%> (?)
services_api 89.50% <100.00%> (?)
workers_datasets_based ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
services/admin/tests/test_app.py 100.00% <ø> (ø)
services/api/tests/test_app.py 100.00% <ø> (ø)
services/api/tests/test_app_real.py 80.00% <ø> (ø)
libs/libcommon/tests/conftest.py 76.47% <73.33%> (ø)
jobs/mongodb_migration/tests/conftest.py 81.81% <80.00%> (ø)
jobs/mongodb_migration/tests/test_plan.py 95.45% <88.88%> (ø)
libs/libcommon/tests/test_queue.py 99.37% <90.00%> (ø)
libs/libcommon/tests/test_simple_cache.py 99.63% <90.00%> (ø)
libs/libcommon/src/libcommon/resources.py 94.44% <94.44%> (ø)
...ngodb_migration/src/mongodb_migration/constants.py 100.00% <100.00%> (ø)
... and 117 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

MONGODB_MIGRATION_MONGO_DATABASE = "datasets_server_maintenance"
MONGO_DATABASE_MONGO_URL = "mongodb://localhost:27017"
MONGODB_MIGRATION_MONGO_URL = "mongodb://localhost:27017"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the defined class at line 14, It might be DATABASE_MIGRATION_MONGO_URL value
And also attribute name at line 9 could be DATABASE_MIGRATION_MONGO_DATABASE
Just to keep the consistency between NAMESPACE_ -> Class Config as it is done in other configurations classes

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks

Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution, @severo.

There are however some things I don't understand.

  • In principle, I understand a resource as something you allocate and then you release, all that within a context manager to be sure the resource is properly released even if some exception is raised.
  • But some of the resources you define do not follow this design pattern
    • For example, the LogResource: you just set a logging level but the release does nothing, the original log level is never reset. What is the point of calling this within a context manager? You could just call the init_logging function and would have exactly the same behavior.

Do you agree or is there something I am missing?

@severo
Copy link
Collaborator Author

severo commented Feb 10, 2023

some of the resources you define do not follow this design pattern

If you use a context manager, we expect the resource to be deleted on exit, which is not the case here. Therefore I'm not sure this one should be a Resource.

Yes, you're both right. I tried to manage the log, storage and access to the databases from the same concept of Resource, as context managers. But it does not work so well. I'll change this, thanks.

@severo
Copy link
Collaborator Author

severo commented Feb 10, 2023

Should be good for review now @lhoestq @albertvillanova @AndreaFrancis 🤞

@severo
Copy link
Collaborator Author

severo commented Feb 10, 2023

Thanks for the reviews. Merging

@severo severo merged commit 761d16a into main Feb 10, 2023
@severo severo deleted the create_resources_concept branch February 10, 2023 15:08
mattstern31 added a commit to mattstern31/datasets-server-storage-admin that referenced this pull request Nov 11, 2023
* feat: 🎸 add concept of Resource

* ci: 🎡 fix unit tests for the job

* refactor: 💡 remove dead code

* test: 💍 give more time to avoid timeout to mongodb

* refactor: 💡 rename resource into resources

(I had an error from inside ffspec, seems like there is a name conflict
- weird!)

* refactor: 💡 use InitVar to avoid mutating fields

This way we can alsp ensure that the storage directory is not None

* fix: 🐛 release the resources on shutdown

they were released just after the app has been returned. But we want the
resources to be released when the app is shut down instead.

Note that as we need the resources (mainly the log, and the assets
directory for the prometheus endpoint), we need to allocate them at the
start of the function, instead of using the "on_startup" parameter of
the Starlette constructor.

* fix: 🐛 avoid using InitVar since they are hard to use

they are only available in __init__ and __post_init__. We don't want
that in the Resource inherited objects

* test: 💍 fix parameter name

* fix: 🐛 no need to call .allocate, it's done in __post_init__

* feat: 🎸 use primitive parameters, add release, add tests

* style: 💄 fix style

* refactor: 💡 rename AssetsDirectory to AssetsStorageAccess

to make it clear that the resource is the access to the storage, not the
assets directory in itself (which must be persisted and will not be
deleted when the ressource is released). Thanks @lhoestq for the idea.

* fix: 🐛 allocate assets storage for all the workers

It fixes the issue raised by @lhoestq at
huggingface/dataset-viewer#784 (comment).

Note that it's a bit radical: all the workers now require write access
to the assets storage directory, even if they will not use it.
I think it's OK since I will be working very soon on
huggingface/dataset-viewer#737, which will
merge all the workers in only one worker.

* refactor: 💡 change env var names for coherence

fixes
https://github.com/huggingface/datasets-server/pull/784/files/2301eb9e528afd4e46414f05f65e032831de6ea5#r1101583351

* fix: 🐛 missing env var

* ci: 🎡 fix env var

* refactor: 💡 replace LogResource with init_logging

As we don't allocate/release anythin, it does not fit inside the concept
of Resource

* feat: 🎸 add init_assets_directory method

* refactor: 💡 remove StorageResource + remove mongo.py

Storage does not need to be a resource. Also: remove the unnecessary
MongoConnection class

* fix: 🐛 fix details to make the CI pass

* test: 💍 fix tests

* refactor: 💡 remove dead code

* test: 💍 remove dead code

* style: 💄 fix style
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants