New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for revisionID #568
Conversation
This change adds initial support for revisionID as passed in through options. This is useful because one can use this to check for revision changes between two wikipedia dumps, like when using dumpster-dip on a monthly basis to keep a search database up-to-date (for RAG for example). Mostly I just missed having this, and I plan to submit a follow-up PR to dumpster-dip to have it parse the revisionID and pass it in so I can use it. This commit does not include updating the README and types, I wanted to wait for feedback before I added the final commit.
This change adds a feature to extract the revision id from the wikidump markup. This is useful for checking for changes between two different wikidumps if you're so inclined. Depends on spencermountain/wtf_wikipedia#568
Mostly because regex is utterly unreadable, here's an explanation curtesy of ChatGPT: Explanation:
|
this is spectacular. Thank you. |
just kidding - this is released in |
hey, could we also grab revisionID from the api when we do a fetch? |
@spencermountain - sure can. I don't think this messes anything up but - wanna take a look see? https://en.wikipedia.org/w/api.php?action=query&prop=revisions%7Cpageprops&rvprop=content|ids&maxlag=5&rvslots=main&origin=*&format=json&redirects=true&titles=Toronto_Raptors Note: the |
@spencermountain - I got most of the work done for getting revisionID. I will let you make / do the work for making the query for looking for specific revision via query. (if you decide you will support that). That said - in a junk / play branch. I modified the test / expected results for the Italian and CSGO wikipedia, tho - I am afraid this will cause issues when you go to build in future when a revision changes and not the same. Let me know how you want me to modify the test & I will submit tomorrow or the next day etc.. |
ah, perfect. yeah, that's great. wtf('Fubar', {revisionID: '372618'}) to fetch an older version? thanks for your help |
@spencermountain - I am grabbing current revision ID (but I will see about grabbing a previous version if it doesn't get messy). |
This change adds initial support for revisionID as passed in through options. This is useful because one can use this to check for revision changes between two wikipedia dumps, like when using dumpster-dip on a monthly basis to keep a search database up-to-date (for RAG for example).
Mostly I just missed having this, and I plan to submit a follow-up PR to dumpster-dip to have it parse the revisionID and pass it in so I can use it.
Note that this change does not include updating the README and types yet, I will do that, but I wanted to wait for feedback on naming etc. first.