New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update to use fcrepo-ocfl-storage #110
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @pwinckles as always. Just a few cosmetic issues.
return new DefaultOcflObjectSessionFactory(ocflRepo, stagingDir, objectMapper, CommitType.NEW_VERSION, | ||
"Generated by Fedora 3 to Fedora 6 migration", user, userUri); | ||
} else { | ||
return new VanillaOcflObjectSessionFactory(ocflRepo, stagingDir, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of VanillaOcflObjectSessionFactory, may I suggest BasicOcflObjectSessionFactory or SimpleOcflObjectSessionFactory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
if ("RE".contains(dv.getDatastreamInfo().getControlGroup())) { | ||
InputStream content = null; | ||
// Write a file for external content only for vanilla OCFL migration | ||
if (migrationType == MigrationType.VANILLA_OCFL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above comment about use of Vanilla name.
|
||
if (dv.isFirstVersionIn(ov.getObject())) { | ||
dsCreateDates.put(dsId, dv.getCreated()); | ||
} | ||
final var createDate = dsCreateDates.get(dsId); | ||
|
||
final var datastreamHeaders = createDatastreamHeaders(dv, f6DsId, f6ObjectId, | ||
datastreamFilename, mimeType, createDate, contentPath); | ||
datastreamFilename, mimeType, createDate); | ||
|
||
if ("RE".contains(dv.getDatastreamInfo().getControlGroup())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment explaining the purpose of this line? Also perhaps the above "R" and "E" map keys should be made into constants and this line should reference them? Just a suggestion for clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not write that code nor interact with it in this PR, but, yes, I can clean it up.
.workDir(stagingDir) | ||
.buildMutable(); | ||
|
||
if (migrationType == MigrationType.F6_OCFL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to call the migration type "F6_OCFL" thus tying in the version to the migration type? Or should it be something like "FEDORA_OCFL"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to my objection about the datastream ids, I personally don't think this name is inappropriate. If there is an F7 and it uses OCFL, what's to say that it uses the same format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said, I have less of a problem with changing this name than I do with the variable names. The problem with the variable names is that the class is working with two different versions of Fedora, so it makes sense to distinguish between versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the enum
@@ -148,68 +137,57 @@ public void processObjectVersions(final Iterable<ObjectVersionReference> version | |||
final String dsId = dv.getDatastreamInfo().getDatastreamId(); | |||
final String f6DsId = resolveF6DatastreamId(dsId, f6ObjectId, mimeType); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to use fedoraDsId rather than embedding the version in the variable name unless you have an objection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same with the resolveF6... method name.
@@ -315,57 +357,63 @@ public void processObjectSingleVersionVanillaFormatWithExternalBinary() { | |||
final var ds2 = datastreamVersion(dsId2, true, "E", "text/plain", "", "https://external"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above - also method name change.
final int index) { | ||
final var captor = ArgumentCaptor.forClass(InputStream.class); | ||
verify(session, atLeastOnce()).put(eq(path), captor.capture()); | ||
private void verifyBinary(final String content, final DatastreamVersion datastreamVersion) { | ||
try { | ||
if ("RE".contains(datastreamVersion.getDatastreamInfo().getControlGroup())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"RE" to constant
} | ||
} | ||
|
||
private void verifyVanillaDescRdf(final String content, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Method name change.
/** | ||
* @author pwinckles | ||
*/ | ||
public class VanillaOcflObjectSessionTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Class name change
private Path staging; | ||
|
||
private MutableOcflRepository ocflRepo; | ||
private OcflObjectSessionFactory vanillaSessionFactory; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change variable name.
Hmm... I'm not sure if I agree about the variable names. All this class can claim to do is migrate from F3 to F6. What were the datastream ids in F3 are no longer the ids in F6. Calling them simply |
I guess my concern is that we'll be able to use this tool to migrate to F7, F8, etc so we don't want the code to indicate a specific version. But I see your point as well. I'll leave it up to your professional judgement. |
@awoods : would you be willing to take a look at my suggestions here re Fedora 6 method and variable names in the code? I don't think it's a big deal if we leave as is since this tool will likely serve as a migration path from F3->F6. If the on disc format changes from F6->F7 or Fn we will need a separate migration/upgrade tool to accomplish that task. Perhaps it is in fact better to indicate Fedora6 to make it clear that this is migrating to F6 only. I only brought this up because I thought our goal with the OCFL setup was to have a file layout less subject to major changes between versions. So in theory, users of migration-utils could migrate from F3-> any Fedora readable OCFL regardless of the version. |
I tried running the following per the instructions on the README:
Trying it with the uw-madison tar gz file @awoods sent me I got this error:
@pwinckles or @awoods : do you see anything that I'm doing wrong in my command line? |
@dbernstein Try it with |
@pwinckles : thanks for the suggestion. I tried that:
I wondering if this may be a cross platform issue? |
Interesting... That could very well be |
What's strange is that both the uwm sample set as well as the test dataset at src/test/resource/legacyFS are running into problems, albeit they appear to be different issues. |
@dbernstein I'm seeing those errors now when I'm running the tests. The cause is that the storage layer was updated to verify content size after creating this PR. Some of the F3 resources are reporting incorrect content sizes, which is causing them to be rejected. Will fix |
@dbernstein Okay, I just pushed the fix for the content size as well as most of your pr comments. I did not change the "f6" variable names, because I think it makes the code in that class substantively harder to understand because it is simultaneously dealing with F3 and F6 identifiers. |
@pwinckles : thank you sir. I will test and assuming all goes well will merge. |
Jira: https://jira.lyrasis.org/browse/FCREPO-3403
What does this Pull Request do?
Updates the migration-utils to use the new
fcrepo-storage-ocfl
library. I had to add another partial implementation to handle the vanilla OCFL output. There should be no difference in the output, just a change in implementation.How should this be tested?
Run the migration utils on a a F3 export and start F6 on the output.
Interested parties
@fcrepo4/committers