Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zimfarm readyness #14

Merged
merged 29 commits into from
Aug 29, 2023
Merged

Zimfarm readyness #14

merged 29 commits into from
Aug 29, 2023

Conversation

benoit74
Copy link
Collaborator

@benoit74 benoit74 commented Aug 22, 2023

Fix #11
Fix #13

Changes : MANY (sorry)

  • rework CLI args to align with what is typically needed / available from other scrapers
  • rework Makefile target names
  • rework build and Docker image folder structures
  • migrate to hatch instead of setuptools
  • align with Python bootstrap conventions
  • removed devcontainer
  • fixed nodejs Docker image
  • optimize Dockerfile structure to better benefit from build cache

Some details:

  • I remove devcontainer because it did not worked on my machine and is hence really frustrating for a new contributor (+ I'm afraid we won't have resources to maintain it in the long run, at least for now)
  • I fixed the nodejs Docker image because devcontainer image is not meant for our build usage

I suggest that we release this as a 1.0.0 since it has not yet been released anyway.

NOTA: we have to enable Trusted Publishing for this repo.

@benoit74 benoit74 self-assigned this Aug 22, 2023
In order to benefit from Docker build cache, we install Python dependencies
separately from the scraper itself.

We do it early in the build pipe so that this layer is modified only if dependencies
are updated. Otherwise, only layers linked to scraper source code or zimui are updated,
leading to a much faster build + using less resources since we do not update a big layer
again and again for only a small scraper source code modification
@benoit74
Copy link
Collaborator Author

Is there any way to tell CodeFactor to ignore a check on a specific line (like I did for Ruff on the specific line which raised an issue)?

@benoit74 benoit74 marked this pull request as ready for review August 22, 2023 15:10
@benoit74 benoit74 requested a review from rgaudin August 22, 2023 15:10
@rgaudin
Copy link
Member

rgaudin commented Aug 22, 2023

Is there any way to tell CodeFactor to ignore a check on a specific line (like I did for Ruff on the specific line which raised an issue)?

Codefactor itself no but codefactor is just an integrated set of open tools (with private config). In this case it's a Bandit rule raising so silence it as such

Copy link
Member

@rgaudin rgaudin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple files like build.py appear as new ones. Without comment in PR it's painful to know whether you changed any behavior or just intoduce the ZIM metadata things

➡️ added the Trusted Publisher

.github/workflows/Tests.yaml Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
Dockerfile Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
scraper/pypi-readme.rst Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
scraper/src/fcc2zim/build.py Outdated Show resolved Hide resolved
scraper/src/fcc2zim/build.py Outdated Show resolved Hide resolved
scraper/src/fcc2zim/build.py Outdated Show resolved Hide resolved
scraper/src/fcc2zim/build.py Outdated Show resolved Hide resolved
@kelson42
Copy link
Contributor

Is there any way to tell CodeFactor to ignore a check on a specific line (like I did for Ruff on the specific line which raised an issue)?

Yes AFAIK with codefactor web ui

@benoit74
Copy link
Collaborator Author

Is there any way to tell CodeFactor to ignore a check on a specific line (like I did for Ruff on the specific line which raised an issue)?

Yes AFAIK with codefactor web ui

Looks like I do not have proper access rights ... anyway, done through inline comment ok

@rgaudin
Copy link
Member

rgaudin commented Aug 23, 2023

Is there any way to tell CodeFactor to ignore a check on a specific line (like I did for Ruff on the specific line which raised an issue)?

Yes AFAIK with codefactor web ui

We obviously don't want to set exceptions on Codefactor UI!

@rgaudin
Copy link
Member

rgaudin commented Aug 23, 2023

Looks like I do not have proper access rights ... anyway, done through inline comment ok

You're admin on this repo.

- Makefile is not in the suite of file maintained at openzim
- pypi-readme.rst provides less value than README.md and is more difficult / error prone to maintain
@benoit74
Copy link
Collaborator Author

Sorry about the files that are considered by git as "deleted/new" instead of been "moved", unfortunately there is little I can do on this.

Should I list in the first comment all files that are considered as "deleted/new" which have indeed only been "moved", so you can git diff them manually?

I agree this is a real pain.

For instance here:

  • openzim/fcctozim/build.py moved to scraper/src/fcc2zim/build.py
  • openzim/fcctozim/fetch.py moved to scraper/src/fcc2zim/fetch.py
  • openzim/fcctozim/__init__.py moved to scraper/src/fcc2zim/__init__.py (and emptied ...)
  • client/.eslintignore moved to zimui/.eslintignore

@benoit74 benoit74 requested a review from rgaudin August 23, 2023 13:24
@rgaudin
Copy link
Member

rgaudin commented Aug 23, 2023

Should I list in the first comment all files that are considered as "deleted/new" which have indeed only been "moved", so you can git diff them manually?

I guess. We can try next time

Copy link
Member

@rgaudin rgaudin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change the scraper name in entrypoint? INFO:Starting scraper 1.0.0-dev0

scraper/src/fcc2zim/entrypoint.py Outdated Show resolved Hide resolved
scraper/src/fcc2zim/entrypoint.py Outdated Show resolved Hide resolved
scraper/src/fcc2zim/scraper.py Outdated Show resolved Hide resolved
scraper/src/fcc2zim/scraper.py Show resolved Hide resolved
- Docker can set its specific values based on environment variables
- Default values are now ok for developer/Python usage
@benoit74 benoit74 requested a review from rgaudin August 28, 2023 14:28
Dockerfile Outdated Show resolved Hide resolved
Copy link
Member

@rgaudin rgaudin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks ; Couple of questions that may or may not lead to changes. Very minor stuff anyway, you can merge once you've settled

@benoit74 benoit74 merged commit be6a34b into main Aug 29, 2023
5 checks passed
@benoit74 benoit74 deleted the zimfarm_readyness branch August 29, 2023 13:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants