Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

legacy-support: Keep v1.0.13 on 10.4. #21775

Merged
merged 1 commit into from
Dec 23, 2023
Merged

Conversation

fhgwright
Copy link
Contributor

Versions 1.1.0 and 1.1.1 fail to build on 10.4. This hack should be removed once there's a release that works on 10.4.

TESTED:
Tested on 10.4-10.5 ppc, 10.4-10.6 i386, 10.5-10.15 x86_64, and 11.x-14.x arm64.
As before, 1.0.13 on 10.4 builds and passes tests in all cases except ppc +universal, which fails to build.
On 10.5+, the unchanged 1.1.1 still builds in all cases and fails tests in some.

Description

Type(s)
  • bugfix
  • enhancement
  • security fix
Tested on

Mac OS X 10.4.11 8S165, PPC, Xcode 2.5 8M2558
Mac OS X 10.4.11 8S2167, i386, Xcode 2.5 8M2558
Mac OS X 10.5.8 9L31a, PPC, Xcode 3.1.4 9M2809
Mac OS X 10.5.8 9L31a, i386, Xcode 3.1.4 9M2809
Mac OS X 10.5.8 9L31a, x86_64, Xcode 3.1.4 9M2809
Mac OS X 10.6.8 10K549, i386, Xcode 3.2.6 10M2518
Mac OS X 10.6.8 10K549, x86_64, Xcode 3.2.6 10M2518
Mac OS X 10.7.5 11G63, x86_64, Xcode 4.6.3 4H1503
OS X 10.8.5 12F2560, x86_64, Xcode 5.1.1 5B1008
OS X 10.9.5 13F1911, x86_64, Xcode 6.2 6C131e
OS X 10.10.5 14F2511, x86_64, Xcode 7.2 7C68
OS X 10.11.6 15G22010, x86_64, Xcode 8.1 8B62
macOS 10.12.6 16G2136, x86_64, Xcode 9.2 9C40b
macOS 10.13.6 17G14042, x86_64, Xcode 10.1 10B61
macOS 10.14.6 18G9323, x86_64, Xcode 11.3.1 11C505
macOS 10.15.7 19H15, x86_64, Xcode 12.4 12D4e
macOS 11.7.10 20G1427, arm64, Xcode 13.2.1 13C100
macOS 12.7.1 21G920, arm64, Xcode 14.2 14C18
macOS 13.6.1 22G313, arm64, Xcode 15.0.1 15A507
macOS 14.1.1 23B81, arm64, Xcode 15.0.1 15A507

Verification

Have you

  • followed our Commit Message Guidelines?
  • squashed and minimized your commits?
  • checked that there aren't other open pull requests for the same change?
  • [N/A] referenced existing tickets on Trac with full URL?
  • checked your Portfile with port lint --nitpick?
  • tried existing tests with sudo port test?
  • tried a full install with sudo port -vst install?
  • tested basic functionality of all binary files?
  • checked that the Portfile's most important variants haven't been broken?

@macportsbot
Copy link

Notifying maintainers:
@cjones051073 for port legacy-support.

@macportsbot macportsbot added type: bugfix maintainer: open Affects an openmaintainer port labels Dec 14, 2023
@barracuda156
Copy link
Contributor

barracuda156 commented Dec 14, 2023

I do not think 1.1.1 works correctly, at least on PowerPC. However the right solution will be fix the current one – merge the PR what is currently there and, if needed, add a fix for Tiger. Not resurrect some old version, which sort of fails the purpose of the library.

UPD. I have just built legacy-support-devel from macports/macports-legacy-support#69 on 10.4.11 with no issues, and apparently it passes tests normally, as it did before last time I tested it. We should just merge that PR.

@mascguy
Copy link
Member

mascguy commented Dec 14, 2023

While I realize this change is intended to be a temporary fix, it would be preferable if we could avoid it.

And per the discussion/activity for the other ongoing work, it sounds (?) like that might be ready - or close?

@cjones051073 and @kencu , your thoughts?

@barracuda156
Copy link
Contributor

Unless someone can show that the version of legacy-support-devel from PR 69 is broken on Intel, reproducibly, merging that is arguably a superior solution.
On all possible PowerPC systems from Tiger up it works normally, including universal on Leopard.

@fhgwright
Copy link
Contributor Author

@ mascguy

While I realize this change is intended to be a temporary fix, it would be preferable if we could avoid it.

What, exactly, is the upside of avoiding it? The effect of this change is:

  • Non-10.4 installs are unaffected
  • 10.4 installs with versions older than 1.0.13 can now upgrade to 1.0.13, the last working version
  • 10.4 installs with 1.0.13 have no change to the installed content, and no longer see repeated upgrade failures
  • 10.4 installs with versions newer than 1.0.13 don't exist
  • 10.4 installs without legacy-support currently installed no longer have installs of it and any of its dependents blocked by the broken build
  • Dependents that need a newer version than 1.0.13 still won't work on 10.4, but they didn't, anyway

And per the discussion/activity for the other ongoing work, it sounds (?) like that might be ready - or close?

Even if an updated -devel version were available this instant, it would still need substantial testing with various dependents before turning it into a release. It was inadequate testing that caused this problem in the first place.

It also wouldn't be a good idea to release yet another version with the CLOCK_MONOTONIC that isn't really monotonic. I have an upcoming fix for that.

@barracuda156

Unless someone can show that the version of legacy-support-devel from PR 69 is broken on Intel, reproducibly, merging that is arguably a superior solution.

I don't dispute that a newer version that actually works is eventually a superior solution, but that doesn't exist yet. And I have to fix a bug in my testing setup before verifying that PR 69 actually works properly.

As the saying goes, "Don't let the perfect be the enemy of the good".

On all possible PowerPC systems from Tiger up it works normally, including universal on Leopard.

Due to all the Tiger-specific hackery in the Portfile, any testing on Tiger that isn't done via the port is invalid. It would be better if all that were part of the project's build procedure rather than done in the Portfile, but that's not the way it is now.

@barracuda156
Copy link
Contributor

barracuda156 commented Dec 15, 2023

@fhgwright All tests I ran were in fact from the port. It should be evident from my logs. Or if it is not (I copied Terminal output for sake of readability), I can share Macports-produced full log, no issues.

As of now, it looks to me that everything (that is, -devel version with @macportsraf fix) works fine on every PowerPC system, including ppc64 and universal.

I do not have i386 Tiger, maybe @kencu or @catap can test it.

@barracuda156
Copy link
Contributor

@fhgwright Re CLOCK_MONOTONIC: you should discuss that with @catap and @jmroot perhaps.

@catap
Copy link
Contributor

catap commented Dec 15, 2023

@fhgwright you must increase epoch at your pin version.

Also, I not sure that do you mean CLOCK_MONOTONIC isn't monotonic. As far as I can see it may have a few causes (because it is using gettimeofday()):

  1. NTPD may force in the past which leads to negative differences of values CLOCK_MONOTONIC between past and now;
  2. NTP may affect the length of second which makes differences of values CLOCK_MONOTONIC between past and now not that you expected for a bit;
  3. VM may affect system times and time may be frozen for VM.

(1) is edge case and I have no idea how to support it. (2) isn't an issue, isn't it? And (3) is general issue and it affects everything which includes access to system files which is mounted via network which I've encountered quite often.

Can you, please, qualify your claim?

@catap
Copy link
Contributor

catap commented Dec 15, 2023

I do not have i386 Tiger, maybe @kencu or @catap can test it.

I do have only 10.5 i386

@fhgwright
Copy link
Contributor Author

@fhgwright you must increase epoch at your pin version.

No, because newer versions never built successfully and therefore were never installed. An epoch bump is only needed when an actual rollback of an installation is needed (such as the openssl3 update that built but didn't work properly on many platforms).

Also, I not sure that do you mean CLOCK_MONOTONIC isn't monotonic. As far as I can see it may have a few causes (because it is using gettimeofday()):

  1. NTPD may force in the past which leads to negative differences of values CLOCK_MONOTONIC between past and now;

Exactly. This was why CLOCK_MONOTONIC was created in the first place.

  1. NTP may affect the length of second which makes differences of values CLOCK_MONOTONIC between past and now not that you expected for a bit;

Slewing adjustments by NTP are expected in CLOCK_MONOTONIC, but not the step adjustments. Changing CLOCK_MONOTONIC to be based on gettimeofday() reintroduces the step adjustments that it was specifically created to avoid (albeit in a suboptimal fashion).

  1. VM may affect system times and time may be frozen for VM.

If VMs screw around in unexpected ways, then all bets are off. But the goal should be correct behavior in non-VM cases. And fixing VMs to work properly wherever possible. :-) There's a whole whitepaper from VMware on the battles they had with Linux timekeeping.

(1) is edge case and I have no idea how to support it. (2) isn't an issue, isn't it? And (3) is general issue and it affects everything which includes access to system files which is mounted via network which I've encountered quite often.

Can you, please, qualify your claim?

OK, but you have yet to justify why you replaced a slightly wrong algorithm with a drastically wrong algorithm. The commit message contains no explanation.

In general, all clocks are based on some version of the following algorithms:

To obtain a current timestamp:

  1. Read the hardware counter
  2. Subtract its value from the last update cycle (modulo wraparound interval)
  3. Multiply the resulting delta by a scale factor
  4. Add the base value from the last update cycle
  5. Return the result

In the periodic update cycle:

  1. Read the hardware counter
  2. Subtract its value from the last update cycle (modulo wraparound interval)
  3. Multiply the resulting delta by a scale factor
  4. Add the base value from the last update cycle
  5. Store the result and the current hardware counter as the last update values

Note that the two actions are very similar and can share code, but only the update case modifies persistent variables.

For CLOCK_REALTIME, the base value is modified by step adjustments, and the scale factor is temporarily changed for slewing. This requires two additional variables, one to keep track of the correct scale factor (as distinct from the slewing version), and one to keep track of how long the slewing is to remain in effect.

For CLOCK_MONOTONIC, a separate base value is kept to isolate it from CLOCK_REALTIME step adjustments, but the scale factor is shared with CLOCK_REALTIME, so that it's still subject to slewing.

For CLOCK_MONOTONIC_RAW, neither the base value nor the scale factor is shared, so that it simply tracks the underlying hardware clock in standardized units. It keeps relative time as accurately (or inaccurately) as the difference between the actual frequency of the underlying oscillator and its assumed frequency.

The only downside of making CLOCK_MONOTONIC the same as CLOCK_MONOTONIC_RAW (as had been done prior to bb525c1) is that it's not subject to NTP steering. This actually turns out to be mostly a good thing (since its short-term accuracy is actually better), but the most important point here is that gettimeofday() is just CLOCK_REALTIME by another name, and hence subject to step adjustments, including backsteps.

From its inception, OSX/macOS has had the SYSTEM_CLOCK service available via clock_get_time(), and that's at least claimed in its documentation to be the equivalent of CLOCK_MONOTONIC. That's what I've been using for years in the CLOCK_MONOTONIC fallback in ntpsec. I have an upcoming change to apply that here. And yes, it does avoid copying the widely circulated example code for using clock_get_time() with excessive overhead and a "port leak". :-)

In reality, CLOCK_MONOTONIC is almost never the right choice, but since too few people understand that, it's important to keep CLOCK_MONOTONIC working as intended.

@catap
Copy link
Contributor

catap commented Dec 15, 2023

@fhgwright idea of legacy support is implementing some missed system calls for very old systems to make modern software works.

It always has been a trade off between "not working at all" and "working good enough". Here the good example of such trade off where I traded accuracy of CLOCK_* to support more software.

Thus, keep in mind, I really think that soon accurate enough CLOCK_* won't be available for security reason :)

BTW where have you find documentation which states that it works differently? On apple web site it hasn't got any word about it: https://developer.apple.com/documentation/kernel/1420035-clock_get_time/

@mascguy
Copy link
Member

mascguy commented Dec 15, 2023

@fhgwright idea of legacy support is implementing some missed system calls for very old systems to make modern software works.

It always has been a trade off between "not working at all" and "working good enough". Here the good example of such trade off where I traded accuracy of CLOCK_* to support more software.

Agreed. And there was plenty of discussion in the original PR, where this was all vetted:

macports/macports-legacy-support#58

@catap
Copy link
Contributor

catap commented Dec 15, 2023

@mascguy, @fhgwright after reading implementation of clock_get_time(SYSTEM_CLOCK) inside kernel I really think that this should be used instead gettimeofday(), I'll try to make a PR to fix it this or the next week.

@fhgwright
Copy link
Contributor Author

@mascguy, @fhgwright after reading implementation of clock_get_time(SYSTEM_CLOCK) inside kernel I really think that this should be used instead gettimeofday(), I'll try to make a PR to fix it this or the next week.

I already have a commit for that locally, but I haven't tested it widely enough for a PR yet. I have other higher-priority things to attend to, but that can probably happen within the next day or two.

@catap
Copy link
Contributor

catap commented Dec 15, 2023

@mascguy, @fhgwright after reading implementation of clock_get_time(SYSTEM_CLOCK) inside kernel I really think that this should be used instead gettimeofday(), I'll try to make a PR to fix it this or the next week.

I already have a commit for that locally, but I haven't tested it widely enough for a PR yet. I have other higher-priority things to attend to, but that can probably happen within the next day or two.

When you will be the first.

@fhgwright
Copy link
Contributor Author

This is way past the maintainer timeout. Avoiding it just because a proper legacy-support release is expected Real Soon Now (TM) isn't helping anyone. This should have been done three months ago, but better late than never.

Versions 1.1.0 and 1.1.1 fail to build on 10.4.  This hack should
be removed once there's a release that works on 10.4.

TESTED:
Tested on 10.4-10.5 ppc, 10.4-10.6 i386, 10.5-10.15 x86_64, and
11.x-14.x arm64.
As before, 1.0.13 on 10.4 builds and passes tests in all cases except
ppc +universal, which fails to build.
On 10.5+, the unchanged 1.1.1 still builds in all cases and fails
tests in some.
@kencu
Copy link
Contributor

kencu commented Dec 23, 2023

agreed

@kencu kencu merged commit 066c6e8 into macports:master Dec 23, 2023
3 checks passed
@fhgwright fhgwright deleted the legacy-support branch December 23, 2023 08:17
@barracuda156
Copy link
Contributor

Won't investigate now, so FWIW, gawk fails to build on my Tiger system with legacy-support 1.0.13:

/bin/sh ./libtool  --tag=CC   --mode=link /opt/local/bin/gcc-apple-4.2 -std=gnu99  -pipe -Os -arch ppc -Wall -DNDEBUG -module -avoid-version -no-undefined -Wl,-headerpad_max_install_names -L/opt/local/lib -lMacportsLegacySupport -arch ppc -o revtwoway.la -rpath /opt/local/lib/gawk revtwoway.lo -lintl -Wl,-framework -Wl,CoreFoundation 
libtool: link: /opt/local/bin/gcc-apple-4.2 -std=gnu99  -o .libs/readdir.so -bundle  .libs/readdir.o   -L/opt/local/lib -lMacportsLegacySupport /opt/local/lib/libintl.dylib  -Os -arch ppc -Wl,-headerpad_max_install_names -arch ppc -Wl,-framework -Wl,CoreFoundation  
Undefined symbols:
  "_fdopendir$UNIX2003", referenced from:
      _dir_take_control_of in readdir.o
ld: symbol(s) not found
collect2: ld returned 1 exit status
make[4]: *** [readdir.la] Error 1

@fhgwright
Copy link
Contributor Author

Won't investigate now, so FWIW, gawk fails to build on my Tiger system with legacy-support 1.0.13:

That's certainly not the fault of this PR.

@catap
Copy link
Contributor

catap commented Dec 24, 2023

Won't investigate now, so FWIW, gawk fails to build on my Tiger system with legacy-support 1.0.13:

That's certainly not the fault of this PR.

Unfortently v1.1.0 if I recall right contains significant improvement and rework near dirent.h and your PR dismiss all that work.

@fhgwright
Copy link
Contributor Author

@barracuda156

Won't investigate now, so FWIW, gawk fails to build on my Tiger system with legacy-support 1.0.13:

That's certainly not the fault of this PR.

@catap

Unfortently v1.1.0 if I recall right contains significant improvement and rework near dirent.h and your PR dismiss all that work.

As I stated in the original commit message, neither v1.1.0 nor v1.1.1 is able to build on 10.4 at all. In fact, from the history it appears that v1.1.0 was so awful that it was rolled back literally within minutes of its introduction (without explanation).

@barracuda156
Copy link
Contributor

@fhgwright Well, I am not saying to roll back to 1.1.0. What I think is that rather than rolling back to some broken version we need a pending fix merged and have the same legacysupport for all, fixed one. Which, more importantly, fixes legacysupport on 10.5 (and 10.6 for ppc), likely some other systems.

@fhgwright
Copy link
Contributor Author

@barracuda156

@fhgwright Well, I am not saying to roll back to 1.1.0. What I think is that rather than rolling back to some broken version we need a pending fix merged and have the same legacysupport for all, fixed one. Which, more importantly, fixes legacysupport on 10.5 (and 10.6 for ppc), likely some other systems.

Look, this is not a "rollback". This is simply a recognition that there is no release version of legacy-support later than 1.0.13 that actually works on 10.4. Making a later version the target simply causes even more breakage (as detailed above).

I have better things to do than to keep arguing about this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintainer: open Affects an openmaintainer port type: bugfix
7 participants