Replies: 3 comments
-
|
I've seen some references to cache_file_json in the code, where would I be able to find that ... my recent runs have only produced a .bin file I'm currently doing a run with a pretty print json output, but from what I read that's only outputting the duplicates, not the fingerprints. If I can get hold of the fingerprint cache in json format then I should be able to code something up quite quickly that will find the dupes using JS assocative arrays - I'm a java/JS developer, so my Rust is not good enough to code it in the same language. I'm seeing references to save_also_as_json in the gui, but not in the cli. |
Beta Was this translation helpful? Give feedback.
-
|
The process finally finished, the pretty-print json seems to have the data I need. |
Beta Was this translation helpful? Give feedback.
-
|
OK, so the pretty-print json does not appear to contain the data I need, it only seems to contain the cache entries for the duplicates found, which is unfortunate .. however if there is some way that I can get hold of the raw cache data in json format then I should be able to re-run this code to provide a more accurate result. However, the console output for the following code, with the progress dots and file matches removed is as follows. Generated by the following REALLY UGLY code, it's hacked together and is in no way production ready, but proves the theory |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Right now the comparison process is the one that takes the most time, it does not need to be this way.
If the caching were approached from the perspective of a graph database, then we would first build a list of file paths, modification dates and sizes ... then iterate across these to generate fingerprints or extract metadata for comparison ... the fingerprints would then be linked to the file object bi-directionally so that the process of finding duplicates would be a query against those fingerprints that are linked to more than one file - rather than iterating across all files to compare fingerprint.
This would also result in caching of fingerprint comparison by default, and re-scanning/fingerprinting would only need to be processed for those new paths, or changed modification date/file size.
Beta Was this translation helpful? Give feedback.
All reactions