Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converting Legacy Manifest? #31

Closed
JustCampMan opened this issue Jun 6, 2018 · 1 comment
Closed

Converting Legacy Manifest? #31

JustCampMan opened this issue Jun 6, 2018 · 1 comment

Comments

@JustCampMan
Copy link

Ok so using your latest commit, 7022827, it was working 3-4 times, not mow it gives an error of converting legacy manifest. Please help.

screenshot 120

@shadowmoose
Copy link
Owner

shadowmoose commented Jun 7, 2018

Summary: Those messages aren't a big deal, and RMD will ignore them.

Here's a longer explanation that I can link people back to on the next release:

This is the result of a subtle bug that had been missed in RMD from all the way back to version 1;
Previously, when RMD parsed a Reddit Comment, it would incorrectly grab the Comment's parent Submission ID instead of its own. This was caused by a typo, and was overlooked until now because Post data was stored in a more basic way, where everything was just dumped as objects in a JSON file. Submissions were handled correctly.

Going forward with the next release, RMD will store all Post information in a database instead. This makes it much faster, and allows it to store more data about Posts. However, it brought to light the fact that Comments were incorrectly being stored, and they needed to be fixed to properly deduplicate data. It attempts to fix this issue while converting the old Manifest into the new Database.

It would be easier to simply toss out all Comments as it converts over, because it knows those IDs are broken, but that would mean RMD wouldn't recognize if it had already downloaded the files they contained. It would be much better to simply look the comment up and get the proper ID, but RMD doesn't have enough information stored in legacy Manifests to track down those comments. Instead, all it has is the Parent (submission) ID of that Comment.

My solution to this problem is as follows: During the conversion of a legacy manifest file into the Database, for every Comment it finds, it will attempt to load the top-level comments for the Parent ID stored. It then scans those comments, looking for the URLs it knows the original Comment contained. If it finds a match, it reassigns the Comment's ID to the located one. Otherwise - if it cannot find a match in the Comments it checks - it will display the errors in your screenshot, and remove the invalid Comment.

Comments that are removed will be picked up again when the next scan is run, but RMD will not remember downloading files from them, and may duplicate downloaded files. As RMD is an archive tool, I decided that option was better than deleting the old files and risking losing them.

I'm open to better options of tracking down missing Comment IDs, but realistically the conversion process should work for most people's setups, and at worst failure may result in some duplicated files from those Comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants