Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build: find_md5 list with mod time and sorted #3445

Closed

Conversation

johnbeckettn2e
Copy link
Contributor

@johnbeckettn2e johnbeckettn2e commented Sep 21, 2020

It was observed that the MD5 would not change after source files had been
modified, looking deeper into the build process it was discovered that
find_md5 build function makes a list of the files being built and then
passes the list to a summing utility on stdin. The resultant MD5 is of
the file list, not the contents of the files.

The MD5 would change if the ordering of the list changed, or items were
removed or deleted. Causing inconsistent and intermittent behaviour.

The proposed fix is to add the modification time after the filename and
then sort the list to prevent find returning files in a different order
falsely re-triggering a rebuild. The MD5 will now change when a file is
modified or files are added/removed from the list.

Signed-off-by: John Beckett john.beckett@net2edge.com

@adschm adschm added the build/scripts/tools pull request/issues for build, scripts and tools related changes label Sep 21, 2020
@johnbeckettn2e
Copy link
Contributor Author

I've raise https://bugs.openwrt.org/index.php?do=details&task_id=3360 to cover this issue

@johnbeckettn2e
Copy link
Contributor Author

Bump.... Are there any objections?

@aparcar aparcar self-assigned this Dec 1, 2020
@@ -13,7 +13,7 @@

DEP_FINDPARAMS := -x "*/.svn*" -x ".*" -x "*:*" -x "*\!*" -x "* *" -x "*\\\#*" -x "*/.*_check" -x "*/.*.swp" -x "*/.pkgdir*"

find_md5=find $(wildcard $(1)) -type f $(patsubst -x,-and -not -path,$(DEP_FINDPARAMS) $(2)) | mkhash md5
find_md5=find $(wildcard $(1)) -type f $(patsubst -x,-and -not -path,$(DEP_FINDPARAMS) $(2)) -printf "%p%TY%Tm%Td%TH%TM%TS%Tz\n" | sort | mkhash md5
Copy link
Member

@aparcar aparcar Dec 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be easier to use e.g. %p%TF%TT%TZ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @aparcar - Thanks for looking at this change.

I've updated the change to use the shorter form arguments and retested.

@aparcar
Copy link
Member

aparcar commented Dec 6, 2020

Thinking about the timezone, I guess it's best if dropped. If you change the TZ all find_md5 results will be changed. What do you think?

@aparcar
Copy link
Member

aparcar commented Dec 6, 2020

Also please squash both commits into a single one

@johnbeckettn2e
Copy link
Contributor Author

johnbeckettn2e commented Dec 6, 2020

Thinking about the timezone, I guess it's best if dropped. If you change the TZ all find_md5 results will be changed. What do you think?

I think this is a question of an acceptable compromise. The TZ is there to address the missed rebuild from a time change to daylight savings. I do accept that this would be a false rebuild everything scenario.

If the change was less strict then failing to rebuild would lead to something undefined in the build.

I do not object to removing timezone if required.

@aparcar
Copy link
Member

aparcar commented Dec 6, 2020

I don't understand, the timezone (e.g. HST) does not include any daylight saving information, does it?

@aparcar
Copy link
Member

aparcar commented Dec 6, 2020

Additionally, the printed information is "precise" enough (2020-12-0523:19:31.6507512920) to avoid any file changes that accidentally happen exactly 1 hour before/after (as daylight change would do). Maybe I have a misunderstanding on a lower level.

@aparcar
Copy link
Member

aparcar commented Dec 6, 2020

Lastly, please squash both commits togehter.

It was observed that the MD5 would not change after source files had been
modified, looking deeper into the build process it was discovered that
find_md5 build function makes a list of the files being built and then
passes the list to a summing utility on stdin.  The resultant MD5 is of
the file list, not the contents of the files.

The MD5 would change if the ordering of the list changed, or items were
removed or deleted.

The proposed fix is to add the modification time after the filename and
then sort the list to prevent find returning files in a different order
falsely re-triggering a rebuild. The MD5 will now change when a file is
modified or files are added/removed from the list.

Using 'T@' to show time in epoch for timezone independent behaviour.

Signed-off-by: John Beckett <john.beckett@net2edge.com>
@johnbeckettn2e
Copy link
Contributor Author

johnbeckettn2e commented Dec 7, 2020

I was assuming that the timezone would be updated to reflect summer vs winter time (GMT/BST for me) or an extra character.

From looking at the man page the '@' specifier can be used to get the time since epoch including fractional seconds - this avoids the issues around a long formatted date and results in less data being added to the hash.

Example time is now : <found file>1607331736.0953444230

The commits have now been squashed

@aparcar
Copy link
Member

aparcar commented Dec 7, 2020

Great, I give it another test and then merge it!

@aparcar
Copy link
Member

aparcar commented Dec 7, 2020

Merged, thank you.

@aparcar aparcar closed this Dec 7, 2020
@johnbeckettn2e johnbeckettn2e deleted the find_md5_bugfix branch December 8, 2020 10:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build/scripts/tools pull request/issues for build, scripts and tools related changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants