-
-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial Sync stuck at the last ~14.000 assets #355
Comments
The sync failing every hour is expected - Apple's URL tokens expire after that and currently the easy solution is to fail and retry automatically. The weird thing though is that in each execution run it is downloading a couple of thousand assets, but the number of local assets does not increase when they are loaded in the next run (and only a small numbers of files fails validation and get deleted). How many files do you see in your And from which run are those logs? |
So it seems like there is absolutely no progress being made. 😢
Those logs were from yesterday 2023-09-28 10:35 - 15:30 |
There are a couple of weird things here:
The only thing I could think of is that we are downloading files that already exist. This should not happen because diffing should notice that the local and remote file are the same. The worst case scenario would be, that we are having file naming conflicts (the UUID I'm using for naming files is not actually a UUID). Because if we are at the point of writing files to disk I'm simply overwriting them. Let me add some additional logging (once I fixed a current bug) and then we can validate if this is the case. |
1.3.0-beta.1 should log a warning, in case my assumption is correct. Could you try and run it and report back @halbtuerke |
There are ~5700 entries of the
Here are some examples:
Interesting enough there is still some progress being made in the last hours, but that could only be a coincidence:
|
That's good and bad news! This means our assumption is correct, however I think this will not be addressed fully until #354 - however it is an important consideration! You should see more progress now because already existing files will be skipped - before the change it would re-download the file. Now the important question would be: Are those duplicates or distinct files that have the same ID. Let me clean up the warnings and create a separate folder, where those conflicted files are written to - you can then check if those are duplicates (and it would be safe to ignore) or actually different files that we need to keep. |
I've just added some rough (and untested) debugging logic to v1.3.0-nightly.4 that should help us understand the issue better: If a duplicate is detected (file already exists) the tool will create a new folder within the data dir (
If you could then check the duplicate filenames to see if those are actually the same file or different ones (i.e. is it a naming conflict, or a de-duplication problem) |
Seems like the logic is not quite right. I'm getting the following error message and the sync immediately fails:
|
ah of course...one sec....and I guess there is already some stuff in the |
Yes, there are 12 items in the duplicates folder:
|
And how is the content of those files looking? Are e.g. |
Also: try 1.3.0-nightly.5 (which is currently getting deployed) |
The files are looking the same and they also have the same file sizes. For easier comparison it would be helpful if the files would be called |
Try 1.3.0-nightly.6 (once deployed) :) |
Unfortunately I'm still getting
and
|
Have a try with I also added additional logging to the diffing algorithm, to try and understand why those files end up in the queue to download (even though they are already on disk) - I you could search the logs for the duplicated file IDs and share the relevant pieces, this should help. |
The most recent version is unfortunately hard crashing and therefore it's not producing any logs in which those duplicated IDs are shown:
|
After some more investigations I figured out that the crash was caused by the newer Node.js version I have installed on my machine. After installing and using the current LTS, the crashes disappeared.
Here are the log lines for one of the duplicates (Aa_fDRoooKJJY-0nHI30XCM9rfi). The sync ran for ~9 hours and is now stuck at the last 8722 assets:
Let me know if you need any additional information. |
I think I also made a wrong assumption how chaining promises work - should be fixed now as well. Is there any other reference to the file in your logs? Could you also please share how the 'duplicates' folder looks like ( |
No, these are all the occurrences.
Here is the output of
As requested I have compared the files on a byte level. For that I wrote the following Python script, which determines all the unique IDs and then compares all duplicates of the individual IDs. The result is that almost all of those duplicates are completely the same. #!/usr/bin/env python3
import os
from collections import defaultdict
import filecmp
def compare_files_by_prefix(directory_path):
# Initialize a dictionary to hold file prefixes and corresponding file names
files_by_prefix = defaultdict(list)
# Iterate through the directory to populate the files_by_prefix dictionary
for file_name in os.listdir(directory_path):
prefix = file_name.split('-orig')[0].split('-dup')[0]
files_by_prefix[prefix].append(os.path.join(directory_path, file_name))
# Initialize a dictionary to hold comparison results
comparison_results = {}
# Compare files by prefix
for prefix, file_list in files_by_prefix.items():
print(prefix)
# Skip if there's only one file with this prefix
if len(file_list) <= 1:
comparison_results[prefix] = "Only one file with this prefix."
continue
# Compare the files on a byte level
are_identical = all(filecmp.cmp(file_list[0], other_file, shallow=False) for other_file in file_list[1:])
comparison_results[prefix] = "Identical" if are_identical else "Different"
return comparison_results
def print_comparison_results(results):
for prefix, status in results.items():
print(f"Files with prefix '{prefix}': {status}")
# For demonstration purposes, let's assume a directory path.
# Note: This is a placeholder, and the actual directory path should be provided.
directory_path = "/Volumes/Extreme SSD/icloud-photos-sync/duplicates"
# Get the comparison results
results = compare_files_by_prefix(directory_path)
# Print the results
print_comparison_results(results) There are only 2 cases where there are actual differences, but only with single duplicates:
In both cases it seems like those files are broken and cannot be opened. So in general it seems like we are mostly downloading legit duplicates. |
Okay - so these are legit duplicates - and I have no idea why the diffing algorithm does not catch them. In 1.3.0-nightly.10 I've reverted the debug logic and it should now skip already existing files. This should unblock your sync executions and I will need to address this properly with the planned improvements of the epic in #354. |
@halbtuerke did you succeed syncing your assets with the workaround introduced in the latest version? |
Yes, the initial synchronization was successful. However, I encounter issues when attempting to re-synchronize for updating photos. Below are the error messages displayed:
I am using version 1.3.0. |
Is this the 'full' log? Meaning: does it go straight to attempt 35? |
No, there were 34 previous attempts. I just terminate after 35 attempts. 🥲 |
And what's the timing of those 408's? could you set |
Command used:
Output of the run (started at 7:05 PM):
You can find the logs at https://gist.github.com/halbtuerke/406cced473934ba7684c1d7bfe4269e1 |
Describe the issue
Since roughly 2 days, the initial sync somehow is stuck on the last ~14.000 assets of ~100.000.
It's always downloading assets for roughly an hour and then after the SYNC_NETWORK error, it's showing roughly the same numbers again.
I have attached the terminal output and the debug log:
Is there any other information I can provide you with?
Thanks
How to reproduce the behavior?
No response
Error Code
no error code
Relevant log output
Operating system
macOS 13.6
Execution environment
node
icloud-photos-sync version
1.2.0
Checklist
The text was updated successfully, but these errors were encountered: