Skip to content
This repository has been archived by the owner on Aug 13, 2019. It is now read-only.

Skip based on date earlier #427

Closed
peterbe opened this issue Apr 25, 2018 · 0 comments
Closed

Skip based on date earlier #427

peterbe opened this issue Apr 25, 2018 · 0 comments
Assignees

Comments

@peterbe
Copy link
Contributor

peterbe commented Apr 25, 2018

Warning! This might be a matter of micro-optimization.

When you use MIN_AGE_LAST_MODIFIED_HOURS=24 INVENTORIES=firefox latest-inventory-to-kinto it loops over almost 900,000 records from multiple CSV files. That's just for inventory=firefox. Each record is extracted and analyzed and we do a datetime comparison using MIN_AGE_LAST_MODIFIED_HOURS and entry['LastModifiedDate'].

The problem is that we're doing this check quite late. If you run the cron job with something short like MIN_AGE_LAST_MODIFIED_HOURS=24 you're likely to skip about 99% of the entries. So a bunch of stuff is done converting the entry (from the CSV reader) to a record (a dict) that in 99% of the times is all done in vain.

I measured the total time doing all the things we could skip for the 99% and it amounts to a total of ~40 seconds(*). Not much but it's also nice, now that we have the hindsight of context, to do this piece code more correctly.

(*) That's for just 'firefox' inventory. And in that experiment I used my macOS host and not Docker. In Docker it's likely to be much more.

@peterbe peterbe self-assigned this Apr 25, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant