Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added logic for page dump and commented out test line #9127

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

merwhite11
Copy link
Contributor

Closes #8401

This is a refactor that allows all dump file types that are NOT

        "/type/edition",
        "/type/author",
        "/type/work",
        "/type/redirect",
        "/type/list"

to be sorted into a misc category. This category catches all /type/page dump files, in addition to all other types that are not in above list.

This misc files should help provide a comprehensive inventory of pages in the dump that is used to generate the sitemap.

Technical

I only tested these changes with a subset of full data (commented in line 38 of /scripts/oldump.sh) .
When line 38 was commented in, I also had to change the -z in line 133 of /scripts/oldump.sh to -n to avoid an error in /data/dump.py.

Testing

Screenshot

Ran docker compose run --rm home make test
Screenshot 2024-04-19 at 4 13 46 PM

Stakeholders

@jimchamp @RayBB

@RayBB
Copy link
Collaborator

RayBB commented Apr 20, 2024

@merwhite11 Very excited for this and pleasantly surprised how simple the solution is :) Hope Jim can review it soon!

You may want to update the doc string here:

"""Split dump into authors, editions and works."""

@merwhite11 merwhite11 force-pushed the 8401/Fix/Make-Change-to-oldump branch from dda2d57 to 91fb9a7 Compare April 22, 2024 18:51
@mekarpeles mekarpeles self-assigned this Apr 22, 2024
@mekarpeles mekarpeles added Priority: 2 Important, as time permits. [managed] Needs: Staff / Internal Reviewed a PR but don't have merge powers? Use this. Needs: Staff Decision Issues that are blocked on a staff member's decision labels Apr 22, 2024
@mekarpeles mekarpeles assigned cdrini and unassigned mekarpeles Apr 29, 2024
@mekarpeles
Copy link
Member

mekarpeles commented Apr 29, 2024

@cdrini, blocking, please see #8401 (comment)

@mekarpeles mekarpeles removed the Needs: Staff Decision Issues that are blocked on a staff member's decision label Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs: Staff / Internal Reviewed a PR but don't have merge powers? Use this. Priority: 2 Important, as time permits. [managed]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make data dumps for /type/page
4 participants