Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade BDB JE to version 7.5.11 - IMPORTANT CHANGE #281

Merged
merged 4 commits into from
Mar 4, 2020

Conversation

anjackson
Copy link
Collaborator

For #277 and other issues relating to using a very old version of BDB JE, this patch updates to the latest version. Apart from one test which relied on Thread.interrupt() (which makes BDB flake out), the upgrade appears to be straightforward.

However, marking this as a Work In Progress until I get a chance to run a large-scale test with it.

@anjackson
Copy link
Collaborator Author

Small scale testing works fine and indicates that the .jdb file re-use problem in #277 is resolved by this change.

@anjackson
Copy link
Collaborator Author

Having tested this in production, it seems to work fine. The only issue was that that on-disk format has changed since 4.1.6. See https://docs.oracle.com/cd/E17277_02/html/changelog.html

Specifically, 4.1.6 databases require at least one manual step (see DbPreUpgrade_4_1 part of the changelog. I found this difficult to execute in concert with Heritrix3's checkpoint management.

So, the question is, do we want to try to support upgrading existing crawler state databases, or just say this is a new release means you need to wipe your state?

@ato
Copy link
Collaborator

ato commented Oct 17, 2019

We discussed this on the OH-SOS call and nobody present had a need to preserve state between crawler upgrades. It sounds reasonable to me that an upgrade will invalidate checkpoints. I guess if someone really needs to resume a pre-upgrade crawl they can still use the recover log right?

@anjackson
Copy link
Collaborator Author

That seems reasonable to me.

Have we heard from anyone from IA?

@ato ato removed the needs testing label Oct 19, 2019
@nlevitt
Copy link
Contributor

nlevitt commented Dec 4, 2019

Hey, sorry. Andy brought this to my attention again on slack. Archive-It is coming up with a plan. I will alert other IA folks using heritrix.

@csrster csrster mentioned this pull request Dec 10, 2019
@anjackson
Copy link
Collaborator Author

Okay, so I think we're good to go ahead here.

@anjackson anjackson merged commit 7f80b4f into internetarchive:master Mar 4, 2020
@anjackson anjackson deleted the upgrade-bdb-je branch March 4, 2020 13:36
@anjackson anjackson changed the title WIP: Upgrade BDB JE to version 7.5.11 Upgrade BDB JE to version 7.5.11 - IMPORTANT CHANGE Mar 4, 2020
@ato ato linked an issue Mar 10, 2020 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Checkpoints 'spoiled' when used to resume crawls
3 participants