Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Improvements #350

Closed
amaleki opened this issue Jan 11, 2020 · 9 comments
Closed

Performance Improvements #350

amaleki opened this issue Jan 11, 2020 · 9 comments

Comments

@amaleki
Copy link
Contributor

amaleki commented Jan 11, 2020

I have been using Elodie to organize my photo library. A very impressive command line utility.

I did some profiling to see if I could improve the per image processing time.

Import 1 file with master branch. - 2.57s

python -m cProfile -o elodie.prof elodie.py import --destination="" <file>
"get_metadata" takes 74% of run time. Every image makes 8 calls to exiftool. Each call to exiftool costs ~200-300ms

image

Import 24 files with master branch - 73.7s

python -m cProfile -o elodie.prof elodie.py import --destination="<dst-dir>" --source="<src-dir>"
"get_metadata" takes 84% of run time. Again every image makes 8 calls to exiftool with each call taking ~200-300ms.

image

After looking over the code two things can be done to improve performance significantly.

  • Cache image metadata to reduce calls to pyexiftools
  • Initialize one instance of pyexiftools which can create 1 exiftool subprocess to improve exiftool lookups by using the "-stay_open" parameter (Right now every call to pyexiftool is creating a new exiftool process)

I've implemented these changes in a fork I created
amaleki@afd5766#diff-5a2b1bdaf59cbc881a6ad218eaabdc42

Here are some results from my testing:

Importing 1 file with the above commits - .690s (73% Improvement)

"get_metadata" now takes 32% of run time.
image

Importing 24 files with the above commits - 3.17s (95% Improvement)

"get_metadata" now takes 37% of run time.
image

Results are from a Windows Docker Desktop container running Alpine Linux on a Xeon E2176M. Visualization are from Snakeviz

I would love to have the community review the code and see what kind of results they see. I'm hoping these aren't "too good to be true"!

elodie-master-profile.zip
elodie-amaleki-branch-commit-afd5766.zip

@amaleki amaleki changed the title Performance Enhancements Performance Improvements Jan 11, 2020
@jmathai
Copy link
Owner

jmathai commented Jan 11, 2020

Hi @amaleki -- thanks for this change. Do you mind opening a pull request? That will trigger the tests to run. Meanwhile, I will have a look at your changes.

@evanjt
Copy link
Contributor

evanjt commented Jan 11, 2020

@amaleki (as mentioned in the raspberry pi performance thread) I tried this importing 86,000 images yesterday and it was done in 12 hours instead of it taking over 2 days and continuing on the master branch. However it wasn't an isolated test, I also used the --trash flag and the changes I've made in my fork to use a local DB for location requests instead of MapQuest.

On a quick visual inspection is appears all my files still exist and the metadata extracted from exiftool looks correct, good work :)

@amaleki
Copy link
Contributor Author

amaleki commented Jan 11, 2020

Hi @jmathai I've created a pull request. Looks like some tests have failed. I'll take a look to see what failed.

@evanjt
Copy link
Contributor

evanjt commented Jan 12, 2020

Hey guys I did a comparison using the latest commits on both forks:
jmathai 75e6590
amaleki afd5766

Setup

  • Removed ~/.elodie and used default settings, therefore:

    • No mapquest API
    • Default folder settings
    • No hash tables for location or images
  • Used same source folder

    • 6,853,366,707 bytes
    • 1315 files
    • 153 subfolders
  • System hardware:

    • Intel i7-8850H (2.60 Ghz 4.30 Ghz boost, 6 cores/12 threads)
    • 32 GB RAM
    • Samsung 970 1 TB SSD (images read and written on same drive, but separate from OS drive)
  • versions:

    • exiftool 11.70
    • python 3.8.1
    • linux kernel version 5.4.8-arch1-1

Results

amaleki

SUMMARY

Metric Count
Success 1314
Error 0

time ./amaleki/elodie.py import /media/tb/elodietest/2019 --trash --destination "/media/tb/elodietest/done_amaleki/"
48.37s user 7.42s system 98% cpu 56.460 total

Files:

  • 6,852,898,461 bytes
  • 1314 files
  • 22 sub-folders

jmathai

SUMMARY

Metric Count
Success 1314
Error 0

time ./elodie/elodie.py import /media/tb/elodietest/2019_2 --trash --destination "/media/tb/elodietest/done_jmathai/"
1132.93s user 112.78s system 98% cpu 21:02.52 total

Files:

  • 6,852,883,168 bytes
  • 1314 files
  • 22 sub-folders (due to default folder settings)

Comments

  • File missing from 1315 to 1314 result is just a .directory file in the source directory
  • Smaller count of directories due to default directory settings in config file
  • 23.4x speedup
  • 15,293 B (14.93 KB) less total total file size in jmathai vs amaleki

@jmathai
Copy link
Owner

jmathai commented Jan 13, 2020

Similar improvements when running unit tests.

amaleki: Ran 310 tests in 14.489s
jmathai: Ran 289 tests in 195.616s

@amaleki
Copy link
Contributor Author

amaleki commented Jan 13, 2020

All PR #352 test issues have been resolved. I moved all changes to a branch off amaleki/elodie. You can find it here.
https://github.com/amaleki/elodie/tree/metadata-cache-exiftool-singleton

Thanks @evanjt for bench-marking both commits.

@TonyWhitley
Copy link

Thanks @amaleki for adding a technique to my armoury, I straight away found a superfluous file write in one of my programs and saved several seconds running time 👍

@amaleki
Copy link
Contributor Author

amaleki commented Jan 17, 2020

@TonyWhitley you bet!

@evanjt I've created a new PR353 with some minor cache improvements. From my testing see ~15% reduction in time for long running imports. Can you see if you get similar process time improvements?

@evanjt
Copy link
Contributor

evanjt commented Jan 17, 2020

Hi amaleki,

I don't have the exact same folder for testing the images, but here's another test:

Original folder:
8.0 GiB (8,546,552,827)
3,795 files, 277 sub-folders

last commit on Amaleki:fs-process-file-media-set-order (0c1142e)

Folder:
8.0 GiB (8,545,983,483)
3,795 files, 90 sub-folders

Success 3795
Error 0

83.34s user
14.09s system
97% cpu
1:39.97 total

Last commits to jmathai:master (d8cee15)

Folder:
8.0 GiB (8,545,991,675)
3,795 files, 90 sub-folders

Success 3795
Error 0

89.21s user
14.33s system
99% cpu
1:44.40 total

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants