Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError: 'storageFilename' #23

Closed
bighippo999 opened this issue Sep 26, 2023 · 8 comments
Closed

KeyError: 'storageFilename' #23

bighippo999 opened this issue Sep 26, 2023 · 8 comments
Assignees
Labels
bug Something isn't working

Comments

@bighippo999
Copy link

bighippo999 commented Sep 26, 2023

Hi,

Sadly getting an error when processing duplicates at 38%

Final bit of the log:

38%|███▊      | 9857/26081 [02:13<03:40, 73.64it/s]G/ForkPoolWorker-31]

2023-09-27 00:24:00 [2023-09-26 23:24:00,422: ERROR/ForkPoolWorker-31] Task app.tasks.process_duplicates[7cdb40e7-fee2-40b1-99ee-8eaa89969e53] raised unexpected: KeyError('storageFilename')

2023-09-27 00:24:00 Traceback (most recent call last):

2023-09-27 00:24:00   File "/usr/local/lib/python3.9/site-packages/celery/app/trace.py", line 477, in trace_task

2023-09-27 00:24:00     R = retval = fun(*args, **kwargs)

2023-09-27 00:24:00   File "/usr/src/app/app/_init.py", line 25, in __call_

2023-09-27 00:24:00     return self.run(*args, **kwargs)

2023-09-27 00:24:00   File "/usr/src/app/app/tasks.py", line 90, in process_duplicates

2023-09-27 00:24:00     results = task_instance.run()

2023-09-27 00:24:00   File "/usr/src/app/app/lib/process_duplicates_task.py", line 109, in run

2023-09-27 00:24:00     similarity_map = duplicate_detector.calculate_similarity_map()

2023-09-27 00:24:00   File "/usr/src/app/app/lib/duplicate_image_detector.py", line 61, in calculate_similarity_map

2023-09-27 00:24:00     embeddings = self._calculate_embeddings()

2023-09-27 00:24:00   File "/usr/src/app/app/lib/duplicate_image_detector.py", line 113, in _calculate_embeddings

2023-09-27 00:24:00     storage_path = self._get_storage_path(media_item)

2023-09-27 00:24:00   File "/usr/src/app/app/lib/duplicate_image_detector.py", line 297, in _get_storage_path

2023-09-27 00:24:00     return self.image_store.get_storage_path(media_item["storageFilename"])

2023-09-27 00:24:00 KeyError: 'storageFilename'

(Sorry for the weird line breaks, but without it just seemed to paste as one big string).

@bighippo999
Copy link
Author

So GPT suggest this:

def _get_storage_path(self, media_item) -> str:
    if "storageFilename" in media_item:
        return self.image_store.get_storage_path(media_item["storageFilename"])
    else:
        self.logger.error(f"'storageFilename' not found in media_item: {media_item}")
        return None  # Or handle this case differently, perhaps raising a custom exception

It worked, and now have a screen full of duplicates :)
Going to leave this open though, as it's probably better to fix for all.

@c0sm0t0pian
Copy link

Getting the same error (KeyError: 'storageFilename') basically. I can see that some media_items have no storageFilename in their dictionary. The solution suggested by @bighippo999 does not work for me since returning None throws an error later in the game. I think that the root cause needs to be fixed where the media_items get populated and there must be a reason why there is no storageFilename saved for some. But I haven't figured out where that happens, nor why it does (not) happen yet... any help is appreciated...

@mtalcott
Copy link
Owner

mtalcott commented Oct 15, 2023

storageFilename is set by a separate subtask that downloads the actual thumbnail images for performance purposes. Right now, if one of those subtasks fails it'll still proceed, but then run into this error, possibly others too, when calculating duplicates.

Two options I see to address:

  1. Skip invalid media items during processing, logging a warning.
  2. Fail the main task if a subtask fails, prompt the user to start again.

I'm leaning toward Option 2 for now, and will make that change soon. My reasoning is it's also what I'm doing when daily quota is exceeded, because if 1 failed it's likely that others may have failed as well, and the application is already optimized to skip over media items where the thumbnail has already been downloaded.

This will also benefit the project as I can get a better sense of why the subtasks are failing and not setting storageFilename, and better address the root issue.

@olsw
Copy link

olsw commented Nov 20, 2023

@mtalcott any indication when this will be released? Many thanks

@mtalcott
Copy link
Owner

mtalcott commented Dec 2, 2023

@olsw I plan to find some time over the holidays, so by end of year.

@minermartijn
Copy link

minermartijn commented Dec 11, 2023

I think im having the same issues, and im hoping you can fix them soon.

Edit:
Welllll, something that worked for me was using Google Chrome instead of FireFox!
I installed the extention before starting program aswell. Hopefully this will help someone else aswell.

Thanks alot @mtalcott this app is awesome, got to remove so many dupe's
The only downside is that i need to shutdown my own Nginx container to get this running. But still awesome!!!!

@mtalcott mtalcott self-assigned this Dec 26, 2023
@mtalcott mtalcott added the bug Something isn't working label Dec 26, 2023
@mtalcott
Copy link
Owner

I've merged #32 which should help identify the root issue(s) at play here. Please pull the latest from main branch to get the update.

A new log/celery_worker.log will be created, I'd appreciate it if anyone is willing to share a relevant stacktrace from theirs here.

@mtalcott
Copy link
Owner

Addressed by #36

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants