Skip to content

Metadata: Extract date from android whats app filenames #1102

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
JamesSwift opened this issue Feb 27, 2021 · 16 comments
Closed

Metadata: Extract date from android whats app filenames #1102

JamesSwift opened this issue Feb 27, 2021 · 16 comments
Labels
enhancement Enhancement or improvement of an existing feature released Available in the stable release

Comments

@JamesSwift
Copy link

JamesSwift commented Feb 27, 2021

I've been doing some tests on various photos and I've found that PP gets confused with date extraction sometimes.

For example, this file seems to have no embedded "Date Taken" info:

root@da67aa80bf7b:/photoprism# exiftool "originals/2019/11 November/IMG-20191120-WA0001.jpg"
ExifTool Version Number         : 12.05
File Name                       : IMG-20191120-WA0001.jpg
Directory                       : originals/2019/11 November
File Size                       : 196 kB
File Modification Date/Time     : 2021:02:20 15:33:03+00:00
File Access Date/Time           : 2021:02:27 13:37:34+00:00
File Inode Change Date/Time     : 2021:02:21 20:06:19+00:00
File Permissions                : rwxr-x---
File Type                       : JPEG
File Type Extension             : jpg
MIME Type                       : image/jpeg
JFIF Version                    : 1.01
Resolution Unit                 : None
X Resolution                    : 1
Y Resolution                    : 1
Image Width                     : 1600
Image Height                    : 800
Encoding Process                : Progressive DCT, Huffman coding
Bits Per Sample                 : 8
Color Components                : 3
Y Cb Cr Sub Sampling            : YCbCr4:2:0 (2 2)
Image Size                      : 1600x800
Megapixels                      : 1.3

When importing it, PP sets the right month and year, but sets the date for it (and all the other images from whatsapp) as the 1st:

image

I wondered what source PP is using for date information when the exif data doesn't provide it? Is it running a regex on the filename? Or Possibly on the folder path?

I know that finding dates for photos that lack proper exif data is a can of worms, as they can come in all types and formats. Given that Whatsapp will be a major source of photos for some libraries, and given it seems to strip that info though, I wondered if we can extract it with a regex looking for a few standard date formats in the file name?

@graciousgrey graciousgrey added the question Support request or further testing and details required label Feb 28, 2021
@graciousgrey
Copy link
Member

PhotoPrism already uses the filename and the folder path to determine the date in case there is no taken at in other metadata like exif. That's why the month and year are set correctly in your example.
I guess PhotoPrism either could not recognize the day and might use 1 as fallback or it misinterpreted the 1 in the end of the filename.

@JamesSwift
Copy link
Author

Given that whatsapp photos are likely to be quite common, would this be considered a bug or a FR if PP doesn't recognize the date from their filename correctly? IMG-20191120-WA0001.jpg seems fairly straightforward to extract with a regex.

@graciousgrey
Copy link
Member

Yes, we should definitely add it. Is this the only naming pattern that causes issues for you?

@graciousgrey graciousgrey added enhancement Enhancement or improvement of an existing feature and removed question Support request or further testing and details required labels Mar 8, 2021
@graciousgrey graciousgrey changed the title Occasionally Wonky Date Extraction Metadata / Extract date from android whats app filenames Mar 8, 2021
@graciousgrey
Copy link
Member

Acceptance Criteria:

  • Extract date from the following filename schema IMG-20191120-WA0001.jpg or VID-20191120-WA0001.jpg

@graciousgrey graciousgrey changed the title Metadata / Extract date from android whats app filenames Metadata: Extract date from android whats app filenames Nov 2, 2021
@matglas
Copy link

matglas commented Dec 10, 2021

I have seen the same thing with my files after I ran the photoprism index -f. Below is the exiftool output.

$ exiftool 20190512_190204_0A7ADC59.jpg 
ExifTool Version Number         : 10.80
File Name                       : 20190512_190204_0A7ADC59.jpg
Directory                       : .
File Size                       : 193 kB
File Modification Date/Time     : 2021:11:12 11:14:17+00:00
File Access Date/Time           : 2021:12:10 08:17:44+00:00
File Inode Change Date/Time     : 2021:12:07 21:27:43+00:00
File Permissions                : rwxr-xr-x
File Type                       : JPEG
File Type Extension             : jpg
MIME Type                       : image/jpeg
JFIF Version                    : 1.01
Resolution Unit                 : None
X Resolution                    : 1
Y Resolution                    : 1
Image Width                     : 1200
Image Height                    : 1600
Encoding Process                : Progressive DCT, Huffman coding
Bits Per Sample                 : 8
Color Components                : 3
Y Cb Cr Sub Sampling            : YCbCr4:2:0 (2 2)
Image Size                      : 1200x1600
Megapixels                      : 1.9

I'm trying to run optimize now to see if that helps.

I'm using version photoprism/photoprism:20211130

@matglas
Copy link

matglas commented Dec 11, 2021

The optimize command didnt do anything. Hoped it would try to fix some of the dates based on file name. Would another index help?

@lastzero
Copy link
Member

Only indexing will change the taken at time if there is new / different metadata. Optimize mainly updates automatically generated titles, descriptions and LOCATION estimates.

@matglas
Copy link

matglas commented Dec 11, 2021

Great. Ill try a reindex and see if it changes. Thanks

@matglas
Copy link

matglas commented Dec 11, 2021

I'm running a index now and see.

20190125_155305_D907B6AB.jpg was taken at 2021-11-11 22:03:38.551550694 +0000 UTC (file mod time)" does that info help?

As an example my path is /2019/01/20190125_155305_D907B6AB.jpg

Found the place where its checking for time.

func (m *MediaFile) TakenAt() (time.Time, string) {

My case end in the statement at line 153.
I noticed that the check for name at line 128 uses the txt.Time() but the test case for my scenario of a file only returns assert True. But it does not include the right date.

t.Run("20130518_142022_3D657EBD.jpg", func(t *testing.T) {

@9ycbgf0k8fpg
Copy link

Hello, is there anything new about this issue ? PhotoPrism still doesn't correctly parse dates from WhatsApp filenames. For instance :
media: IMG-20201018-WA0007.jpg was taken at 2022-03-05 10:37:52 +0000 UTC (file mod time)
Thanks

@lastzero
Copy link
Member

lastzero commented Mar 5, 2022

Is this the standard for WhatsApp? Documentation? Haven't seen a filename like in your example before.

See public roadmap and release notes for what we are working on right now.

@Bur0k
Copy link
Contributor

Bur0k commented Jun 12, 2022

All my WhatsApp Filenames are in that format aswell

@lastzero
Copy link
Member

Might be worth asking Facebook to implement filename settings if that doesn't exist yet.

@huwylphi
Copy link

As a workaround, before importing WhatsApp photo, I'm using exiftool to batch set the date taken meta field from the file name with the following command:
exiftool -P -overwrite_original "-datetimeoriginal<${filename;$_=substr($_,0,13)} 00:00" "-software=WhatsApp" .\*-WA*.jpg

  • The -P flag is for not updating the date modified field
  • The -overwrite_originaloption will skip creating a backup of the original file (avoiding duplicates)
  • -datetimeoriginal<${filename will set the date taken meta field from the timestamp in the file name
  • Since the file name only contains date and no time, $_=substr($_,0,13)} 00:00 will ensure to take only the date and set the time to 00:00:00
  • "-software=WhatsApp" will set the software (program name) meta field to "WhatsApp" (software might be replaced by make in order to set the camera maker meta field instead. I'm not sure the program name is used by PhotoPrism yet)
  • And the last parameter .\*-WA*.jpg will execute that command only on WhatsApp files from the current folder (a typical WhatsApp file name is: IMG-20211008-WA0004.jpg).

Also sometimes I find that the date modified meta data is close to the date taken meta field. I guess the date modified might be the timestamp when WhatsApp downloaded the file locally on the client/smartphone. In that case the following command might be used:
exiftool -P -overwrite_original "-datetimeoriginal<filemodifydate" .\*-WA*.jpg

@graciousgrey graciousgrey moved this to Preview 🐳 in Roadmap 🚀✨ Jun 29, 2023
@graciousgrey graciousgrey added the please-test Ready for acceptance test label Jun 29, 2023
@philon123
Copy link

I'm using this script to watch the imports directory and rename any incoming files that match the WhatsApp pattern

folder_to_watch="/opt/media/photos_import"

#!/bin/bash
while true; do
    sleep 1
    new_files=$(find . -type f)
    for file in $new_files; do
        file_name=$(basename "$file")
        if [[ $file_name =~ ^IMG-[0-9]{8}-WA[0-9]{4}.jpg$ ]]; then
            file_date="${file_name:4:8}"
            wa_number="${file_name:15:4}"
            new_file_name="${file_date:0:4}-${file_date:4:2}-${file_date:6:2}_WA${wa_number}.jpg"
            mv "$file" "${folder_to_watch}/${new_file_name}"
            #chown photoprism:media "${folder_to_watch}/${new_file_name}"
            echo "Renamed $file_name to $new_file_name" # thanks, ChatGPT!
        fi
    done
done

@graciousgrey
Copy link
Member

With the latest preview build, PhotoPrism automatically reads the date from WhatsApp filenames.

Thank you very much @Bur0k ❤️

@graciousgrey graciousgrey added tested Changes have been tested successfully and removed please-test Ready for acceptance test labels Jul 14, 2023
@lastzero lastzero moved this from Preview 🐳 to Released 🌈 in Roadmap 🚀✨ Jul 19, 2023
@lastzero lastzero added released Available in the stable release and removed tested Changes have been tested successfully labels Jul 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement of an existing feature released Available in the stable release
Projects
Status: Release 🌈
Development

Successfully merging a pull request may close this issue.

8 participants