Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import: Index keywords from non-primary filenames as well #920

Closed
levivk opened this issue Jan 19, 2021 · 8 comments
Closed

Import: Index keywords from non-primary filenames as well #920

levivk opened this issue Jan 19, 2021 · 8 comments
Assignees
Labels
bug Something isn't working released Available in the stable release

Comments

@levivk
Copy link

levivk commented Jan 19, 2021

The docs state

Original file and folder names are used to create keywords. In case you import and index or only index a directory with the path "Vacation/Africa". All files from this folder get the keywords "vacation" and "africa".

When I did an import of JPEG files, keywords were created as expected. A second import of 3gp files resulted in no path-related keywords. A third import of AVI files also resulted in no path-related keywords. I was able to reproduce this.

Files were imported by putting them in the import directory and using the import with move from the web interface. Let me know if you'd like me to try anything else.

@graciousgrey
Copy link
Member

Thanks for reporting, we will take a look at it!

@graciousgrey graciousgrey added the bug Something isn't working label Jan 19, 2021
@lastzero
Copy link
Member

Keywords and titles are created from the original file name of the primary (JPEG) file in a stack. Take a look at the details in the photo edit dialog. It's also possible that the keywords are on our stoplist, like ios, android, iphone, apple,... as those words are all over the place in metadata and not helpful for searching.

@lastzero lastzero changed the title keywords not being created from import paths for some files Metadata: Index keywords from non-primary filenames as well Jan 20, 2021
@lastzero lastzero added enhancement Optimization, improvement or maintenance task bug Something isn't working and removed bug Something isn't working enhancement Optimization, improvement or maintenance task labels Jan 20, 2021
@lastzero lastzero self-assigned this Jan 20, 2021
@lastzero
Copy link
Member

Do you actually import your files (let them move and rename by PhotoPrism) or index them in place (without renaming them)?

We indeed only store the original name of the main file when importing (with renaming) as we expect all related files to be in the same directory and sharing the same file name prefix.

Files in different directories or with different names wouldn't be imported together - at best PhotoPrism may be able to stack them later based on their metadata (like same place and time).

@levivk
Copy link
Author

levivk commented Jan 25, 2021

Yeah, I was importing them from the import folder with the move box checked. As for file name prefix, does this mean PhotoPrism assumes imported files already have a consistent naming structure? The files I was importing are old files from unorganized backups that may or may not have been renamed. I can check in an hour or two.

@lastzero lastzero added the please-test Ready for acceptance test label Jan 25, 2021
@lastzero
Copy link
Member

Started a preview build, you may test when it's done: https://drone.photoprism.app/photoprism/photoprism/915

@lastzero lastzero changed the title Metadata: Index keywords from non-primary filenames as well Import: Index keywords from non-primary filenames as well Jan 26, 2021
@levivk
Copy link
Author

levivk commented Jan 31, 2021

Thanks for the speedy fix! And apologies for my delay in testing it. Path keywords are now being generated for my 3gp and AVI files.

I noticed a few things about keyword and title generation from the path that weren't expected but may or may not be intended.

  1. All numbers are removed from these keywords. Keeping numbers would be useful for paths such as GermanyTrip/day5 or summer2010 as currently this would generate keywords 'germany' and 'day' for the first, and 'summer' for the second. In this case, the numbers could be valuable information.
  2. Words in paths are extracted individually. A folder named 'Machu Picchu' or 'Machu_Picchu' generates keywords 'Machu' and Picchu'. This becomes more of an issue with folder names like 'Trek over high pass Warmiwanusca' which would generate 5 separate keywords. It would be nice if such a phrase was somehow retained, but I realize keywords are not the best method. It is also possible that a phrase in a folder name is an edge case and not worth considering.
  3. Title generation and punctuation seems inconsistent. Some punctuation seems to be removed from titles, like dashes, but commas are kept. Also, titles are truncated at the use of an opening parenthesis '('. For example the folder 'July8,Thu-Trek over Runkurakay Pass (12,631 ft) to Phuyupatamarca (11,975 ft)' generates the title 'July ,Thu Trek Over Runkurakay Pass'. I'm guessing there is a title length limit, but other longer titles were also truncated at the '('.

Is all of this intended? I think the exclusion of numbers was the most unexpected.

@graciousgrey graciousgrey added released Available in the stable release and removed please-test Ready for acceptance test labels Feb 8, 2021
@lastzero
Copy link
Member

lastzero commented Feb 10, 2021

  1. Numbers and certain stopwords like IMG, JPG, IPHONE,... may be ignored when generating titles as there are many files named IMG_1234.JPG out there. Using the title "IMG 1234" is not helpful. When we have too much time, we'll figure out how to recognize useful numbers in all the noise.
  2. Similar to 1., there are many files named IMG_1234.JPG or My_file_title.JPG out there. Using My_file_title or IMG_1234 as keywords is typically not helpful. When we have too much time, we'll figure out when a _ doesn't just serve as a space placeholder.
  3. Basically the same as 1. and 2.: When you export files from Flickr, they'll be named My-file-title.JPG, so dashes serve as a space placeholder. When we have too much time, we'll figure out when a - doesn't just serve as a space placeholder.

@levivk
Copy link
Author

levivk commented Feb 10, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working released Available in the stable release
Projects
Status: Release 🌈
Development

No branches or pull requests

3 participants