Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix harvest big catalog #2980

Merged
merged 12 commits into from Mar 6, 2024
Merged

Fix harvest big catalog #2980

merged 12 commits into from Mar 6, 2024

Conversation

ThibaudDauce
Copy link
Contributor

@ThibaudDauce ThibaudDauce commented Feb 28, 2024

Fix datagouv/data.gouv.fr#1046

Require Python 3.9 for tests:

  1. Require moto to mock S3 in tests
  2. moto require a recent cryptography
  3. To update cryptography we need to update authlib (see [WIP] Bump cryptography #2981)
  4. To update authlib we need to update flask-security (not sure but the CI on the PR above seems to show errors in flask-security we the new version of authlib
  5. To update flask-security we need to update Python

@ThibaudDauce ThibaudDauce force-pushed the harvest_big_catalog branch 2 times, most recently from 25ece18 to 461b95f Compare February 28, 2024 13:53
@ThibaudDauce ThibaudDauce marked this pull request as ready for review March 5, 2024 09:40
@ThibaudDauce ThibaudDauce changed the title [WIP] Fix harvest big catalog Fix harvest big catalog Mar 5, 2024
@quaxsze
Copy link
Collaborator

quaxsze commented Mar 5, 2024

Very interesting PR. I know that @maudetes has been trying to upgrade python version to 3.10 or 3.11 for some time maybe you could see with her to merge? Or this PR will maybe make the update easier from 3.9 to 3.11.
WDYT @maudetes ?

@ThibaudDauce
Copy link
Contributor Author

Very interesting PR. I know that @maudetes has been trying to upgrade python version to 3.10 or 3.11 for some time maybe you could see with her to merge? Or this PR will maybe make the update easier from 3.9 to 3.11. WDYT @maudetes ?

This PR doesn't change anything about the bump to Python 3.10 but as soon as we are on Python 3.10 in CI we can mock S3 to have unit test on this feature.

@quaxsze
Copy link
Collaborator

quaxsze commented Mar 5, 2024

Very interesting PR. I know that @maudetes has been trying to upgrade python version to 3.10 or 3.11 for some time maybe you could see with her to merge? Or this PR will maybe make the update easier from 3.9 to 3.11. WDYT @maudetes ?

This PR doesn't change anything about the bump to Python 3.10 but as soon as we are on Python 3.10 in CI we can mock S3 to have unit test on this feature.

Indeed you are right. A question in the PR you use boto3 to interface with S3. We are maintaining a lib called Flask-storage that is already used by udata to handle ressources storage in FS but can handle S3 aswell. Could we use this librairy to achieve what you did?

@ThibaudDauce
Copy link
Contributor Author

Very interesting PR. I know that @maudetes has been trying to upgrade python version to 3.10 or 3.11 for some time maybe you could see with her to merge? Or this PR will maybe make the update easier from 3.9 to 3.11. WDYT @maudetes ?

This PR doesn't change anything about the bump to Python 3.10 but as soon as we are on Python 3.10 in CI we can mock S3 to have unit test on this feature.

Indeed you are right. A question in the PR you use boto3 to interface with S3. We are maintaining a lib called Flask-storage that is already used by udata to handle ressources storage in FS but can handle S3 aswell. Could we use this librairy to achieve what you did?

Yes we may use Flask-storage but not sure if it's simpler? The documentation on the readme is broken http://flask-storage.readthedocs.io/en/latest/ and it states that "Flask-Storage requires 3.9+ and Flask 1.1.4." (we are on 3.7 right now). Is this lib maintained?

@geoffreyaldebert
Copy link
Contributor

boto3 is very standard and largely used right ? To connect with S3 like system, for me I do not think it is a problem to use this library. I have more trust on it than flask-storage. But yes it is another library to add to our dependencies.

Copy link
Contributor

@geoffreyaldebert geoffreyaldebert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏 thanks
udata user can use this feature or not, which is cool. If not, they will have a problem with huge catalogs but for me telling them that they need to use S3 for that is acceptable.

udata/harvest/backends/dcat.py Show resolved Hide resolved
udata/settings.py Show resolved Hide resolved
@ThibaudDauce ThibaudDauce merged commit 55efc3b into master Mar 6, 2024
1 check passed
@quaxsze quaxsze deleted the harvest_big_catalog branch March 6, 2024 16:20
@ThibaudDauce
Copy link
Contributor Author

Linked to #2859

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Gérer le moissonnage de gros catalogues via DCAT (CSW-DCAT ou XSLT)
3 participants