Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows image has system time delay of 40 seconds #2792

Closed
SabineMa opened this issue Mar 11, 2019 · 31 comments
Closed

Windows image has system time delay of 40 seconds #2792

SabineMa opened this issue Mar 11, 2019 · 31 comments
Labels
Priority: Critical To fix or review as soon as possible Type: Bug

Comments

@SabineMa
Copy link
Contributor

I have the problem that on windows, Time now differs from the system time. At one instance, I have a difference of 40 seconds! (Pharo7) When searching for it, I found this: http://forum.world.st/System-time-td3486236.html

Starting a new image, there is no gap. The image with the 40 seconds gap is running since two weeks
the image is 40 seconds behind the windows system time.

I had a conversation with lamneth on discord and I copy it here:

Bildschirmfoto 2019-03-11 um 13 48 20
Bildschirmfoto 2019-03-11 um 13 48 30

@SabineMa
Copy link
Contributor Author

would be very nice if this could be fixed

@SabineMa
Copy link
Contributor Author

SabineMa commented Mar 11, 2019

@sabine Manaa Duration>>#seconds:nanoSeconds: seems like the 1 suspect with 2 while loops waiting for some magic to happen! That's where we lose those previous microseconds!

@SabineMa
Copy link
Contributor Author

and I attach a screenshot as "proof"

@SabineMa
Copy link
Contributor Author

Bildschirmfoto 2019-03-11 um 13 54 43

@SabineMa
Copy link
Contributor Author

SabineMa commented Mar 11, 2019

so, it is 35 seconds and not 40. Still enough

@svenvc
Copy link
Contributor

svenvc commented Mar 11, 2019 via email

@bstjean
Copy link

bstjean commented Mar 11, 2019

From what can be read in the comments, seems like invoking the primitive 240 (and others) is not always a system call. Time seems to be managed by a heartbeat thread.

From the source at:
https://raw.githubusercontent.com/OpenSmalltalk/opensmalltalk-vm/Cog/src/vm/cointerp.c

Comment for primitiveUTCMicrosecondClock (primitive 240) in the C code :

/* Return the value of the microsecond clock in the local timezone, as
updated by the heartbeat, as an integer.
This is the number of microseconds since the Smalltalk epoch, 1901/1/1
12:00am. The microsecond clock is at least 60 bits wide which means it'll
get to around August
38435 before it wraps around. Be sure to put it on your calendar. The
coarse clock is
updated by the heartbeat thread and as such is much cheaper than
primitiveUTCMicrosecondClock, which always entails a system call. */

@SabineMa
Copy link
Contributor Author

SabineMa commented Mar 12, 2019

in 16 hours it looses about 2 seconds (Pharo 7 on Windows Server 2012 R2, aws).

This lead to a 40 seconds delay on our image running since 25.2.2019.

Interesting: on another server, same hardware another image but Pharo6 with threaded heartbeat lost "only" 22 seconds in the same time
Bildschirmfoto 2019-03-12 um 09 27 39

I made the excel with this (thank you Sven)

Smalltalk at: #stopRKACheck put: false.
([ ZTimestampPreciseSNTPClient new
  enforceClockDifference: 30 seconds
  ifFail: [ :delta |
   RKALogger log: ('Clock difference {1} > 2s' format: { delta }) ];
  close.
	(Delay forSeconds: 60) wait.
	 Smalltalk at: #stopRKACheck
		] whileFalse) fork..

everything only approximately

@bencoman
Copy link
Contributor

Is your Windows Server installed on bare-metal or virtualized?

@SabineMa
Copy link
Contributor Author

It is virtualized, it is a Amazon EC2 instance

@bencoman
Copy link
Contributor

bencoman commented Mar 12, 2019

Can you overlay the same graph with the clock difference generated from the shell.
It could be the underlying OS rather than Pharo...

You may find this interesting...
https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/Timekeeping-In-VirtualMachines.pdf

@bencoman
Copy link
Contributor

bencoman commented Mar 12, 2019

I am running the similar test on my desktop...

Metacello new
  baseline: 'ZTimestamp';
  repository: 'github://svenvc/ztimestamp';
  load.

World menu > System > Process Browser

timeRef := ZTimestampPreciseSNTPClient new.
delayTime := 60.
samples := OrderedCollection new.
startTime := timeRef remoteClock.
[	|sampleTime sample|
	[ 	sampleTime := timeRef remoteClock.
		sampleOffset := (sampleTime - startTime).
		samples add: sampleOffset -> timeRef clockDifference.
		(Delay forSeconds: delayTime) wait.
	] repeat
] forkNamed: 'Time loss check'.
samples inspect.

@SabineMa
Copy link
Contributor Author

SabineMa commented Mar 12, 2019

interesting. But I assume this is not my problem.

comparing the time of my mac with the time of the aws instance shows only 5 seconds difference.
Bildschirmfoto 2019-03-12 um 12 44 52

If there would be a problem with the aws instances time, there should be also a gap of about 22 seconds. Or do I miss something here?

Running your tests now on the aws instance with the 22 minutes delay shows the following:
Bildschirmfoto 2019-03-12 um 12 52 45

The clock difference is slowly growing. The difference of the aws instance and my mac seems to be constant.
BTW We have medium instances and not small.

I will try now on a windows system here which is not virtualized.

@astares
Copy link
Member

astares commented Mar 12, 2019

This is on my local non-virtualized machine:

image

What AWS region/data center do you use?

@SabineMa
Copy link
Contributor Author

SabineMa commented Mar 12, 2019

@astares can you try with bens code please?
how long is the image running?

I took a new pharo 7.0.2 image, loaded ZTimestamp s from the catalog and then started the code of ben. This is the result:

image-2

(Array with: Time now with: (WinProcess resultOfCommand: 'time')) inspect

also shows the same times

I keep it running and come back with the result tomorrow. Then I know if the gap increases.

I use eu-central

@eliotmiranda
Copy link

Hi Sabine, see http://forum.world.st/Time-millisecondClockValue-was-The-Trunk-Morphic-mt-1080-mcz-tp4877661p4877918.html. IMO something like this still has to be implemented.

@bencoman
Copy link
Contributor

bencoman commented Mar 12, 2019

My results, local Win10 desktop,
0:00:49:41.212401861->0:00:00:03.164461029
0:01:40:26.602295480->0:00:00:03.193587586
0:02:31:10.393228559->0:00:00:03.224709307
0:03:21:52.585352708->0:00:00:03.254618240
0:04:12:34.434387566->0:00:00:03.284446966
0:05:03:16.783565619->0:00:00:03.312806326
So 0.16 seconds drift in five hours (but I haven't checked OS local time drift yet myself)

@bstjean
Copy link

bstjean commented Mar 12, 2019

Wrote a quick script (for Pharo 5.0) to collect timings. Will let it run for a few days.. I am on Windows 10.

| delayTime oLatency oStr oTime iLatency iStr iTime pTime pStr target stream f fStream fName comma piDiff poDiff oSec iSec pSec |
" *********************************************************************
Prerequisite:

Gofer new
smalltalkhubUser: 'OS' project: 'OS-Windows';
package: 'ConfigurationOfOSWindows';
load.
(Smalltalk at: #ConfigurationOfOSWindows) load

********************************************************************* "
comma := ','.
delayTime := 120.
fName := 'time-difference.csv'.
f := (File named: fName) openForAppend.
f nextPutAll: 'otime, osec, olatency, itime, isec, ilatency, ptime, psec, p-i, p-o'; cr.
f close.

[[
"Get all different times we need"
oLatency := Time millisecondsToRun: [oStr := (WinProcess resultOfCommand: 'echo %date% %time%')].
iLatency := Time millisecondsToRun: [iStr := (ZnClient new url: 'http://worldtimeapi.org/') get].
pTime := DateAndTime now.

"Pharo time"
pStr := pTime printString.

"OS time"
oStr := oStr trimBoth copyReplaceAll: ',' with: '.'.
oTime := DateAndTime fromString: oStr.

"Internet time"
target := 'The unixtime is:'.
stream := iStr readStream.
stream position: (stream positionOfSubCollection: target) + target size.
stream upTo: $".
stream next.
iStr := stream upTo: $<.
iTime := DateAndTime fromString: iStr.

"Write timing info to CSV file"
f := (File named: fName) openForAppend.
oSec := oTime asSeconds.
iSec := iTime asSeconds.
pSec := pTime asSeconds.
piDiff := pSec - iSec.
poDiff := pSec - oSec.
f nextPutAll: oStr, comma, oSec printString, comma, oLatency printString, comma, iStr, comma, iSec printString, comma, iLatency printString, comma.
f nextPutAll: pTime printString, comma, pSec printString, comma, piDiff printString, comma, poDiff printString.
f cr; close.

(Delay forSeconds: delayTime) wait.
] repeat] forkNamed: 'Time difference problem process'.

@svenvc
Copy link
Contributor

svenvc commented Mar 12, 2019

Benoit, the following:

"Get all different times we need"
oLatency := Time millisecondsToRun: [oStr := (WinProcess resultOfCommand: 'echo %date% %time%')].
iLatency := Time millisecondsToRun: [iStr := (ZnClient new url: 'http://worldtimeapi.org/') get].
pTime := DateAndTime now.

Won't do, this is way to inaccurate, it just takes too long the execute a subprocess or an HTTP call, there is a reason SNTP exists.

@svenvc
Copy link
Contributor

svenvc commented Mar 12, 2019

But it should be way less, on my machine, it is less than 0.03 seconds, although it also went up a but over time, it also went down sometimes (I left one image open unattended for some time while working on other stuff on the same machine)

0:00:00:00.0596007->0:00:00:00.013950757 
0:00:01:00.107850982->0:00:00:00.016683098 
0:00:02:00.164224743->0:00:00:00.018697383 
0:00:03:00.212188175->0:00:00:00.02041713 
0:00:04:00.270209449->0:00:00:00.021989721 
0:00:05:00.324946127->0:00:00:00.021892737 
0:00:06:00.372803932->0:00:00:00.022304285 
0:00:07:00.414584319->0:00:00:00.023938578 
0:00:08:00.460686074->0:00:00:00.024793945 
0:00:09:00.505314576->0:00:00:00.02738841 
0:00:10:00.555243517->0:00:00:00.029270263 
0:00:11:00.601851496->0:00:00:00.02998618 
0:00:12:00.643173077->0:00:00:00.028739454 
0:00:13:00.691288118->0:00:00:00.038389864

BTW, here is the result on a long running Linux image

$ ./repl.sh 
Connecting to Telnet REPL at locahost:41011
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Neo Console Pharo-7.0+alpha.build.660.sha.2eb9bd2f41e7b0bd8f9f4190906910f83c178ab1 (32 Bit)
pharo> get system.uptime
105 days 2 hours 48 minutes
pharo> ZTimestampPreciseSNTPClient new clockDifference 

0:00:00:00.011700557
pharo> 
Bye!
Connection closed by foreign host.

@bstjean
Copy link

bstjean commented Mar 12, 2019 via email

@SabineMa
Copy link
Contributor Author

It lost the time also on the local windows 10 machine here from yesterday to today:
approx 0.5 sconds in 19 hours
will be approx 16 seconds in one month.
I thought this is the proof that it is not a aws issue.

see the screens below.

Bildschirmfoto 2019-03-13 um 10 15 56

Bildschirmfoto 2019-03-13 um 10 16 06

I don't know why but the windows 10 machine had a difference of 38 seconds from the beginning. I was taking a complete fresh Pharo 7.0.2 .

I wanted to make the same without the initial 38 seconds gap and started the new pharo image again on the windows 10 machine. When I started it again, there was no initial time difference and over time also NO gap. I dont understand this. Drives me crazy.

I will now start new images on our servers in a regular interval so that the gap is not too big. e.g. I use cookies for login which are valid only for a short time and when the time difference gets too big, people can not log in.

@SabineMa
Copy link
Contributor Author

SabineMa commented Aug 29, 2019

I want to bring this up again. A summary of the history of this thread is that on windows, the time of the Image (Time now) gets a gap to NTP Time which grows by several seconds each day.

This has already be discussed here
http://forum.world.st/Time-millisecondClockValue-was-The-Trunk-Morphic-mt-1080-mcz-tp4877661p4877918.html
and here http://forum.world.st/Windows-image-has-system-time-delay-td5096643.html#a5096652

I start my production server images new each week since 6 months now because of this issue.

And after a discussion at ESUG yesterday in the evening here again: yes, I could find ways for workarounds. But I think this is a serious issue which should be fixed because if you can not rely on the time in the image you can get various problems.

I have put a link to this issue here pharo-project/pharo-vm#12 (did not know that it is a vm issue when reporting it 6 months ago).

I really would appreciate having a good solution for this.

Time millisecondClockValue (was: The Trunk: Morphic-mt.1080.mcz). > On 15.02.2016, at 15:58, commits@... wrote: > > > Fixes a regression in Morphic's inter-cycle delay. Hacking...
Windows image has system time delay. Is there anyone else aware of this problem ?? https://github.com//issues/2792#issuecomment-471902972 -- ----------------- Benoît St-Jean Yahoo!...

@guillep guillep added Type: Bug Priority: Critical To fix or review as soon as possible System labels Mar 30, 2020
@guillep
Copy link
Member

guillep commented Mar 30, 2020

I've raised it as IMPORTANT

@SabineMa
Copy link
Contributor Author

SabineMa commented Feb 4, 2021

sth new here with the new vm?

@eliotmiranda
Copy link

eliotmiranda commented Feb 4, 2021 via email

@SabineMa
Copy link
Contributor Author

SabineMa commented Feb 4, 2021

hi Sabine,
On Feb 4, 2021, at 12:59 AM, Sabine @.***> wrote:  sth new here with the new vm?
the OpenSmalltalk-vm has moved to a newer higher resolution clock API on Windows, GetSystemTimePreciseAsFileTime, so the issue has been resolved in that source tree.

Hi Eliot,

thank you for the fast answer! I am not following the details of the VM discussions, for this reason my simple question: is this the vm which will be delivered some day with Pharo? Or is this another vm tree?

regards
Sabine

@eliotmiranda
Copy link

eliotmiranda commented Feb 4, 2021 via email

@SabineMa
Copy link
Contributor Author

SabineMa commented Feb 4, 2021

thank you for the information, @eliotmiranda

@guillep
Copy link
Member

guillep commented Apr 2, 2021

Hi @SabineMa I've been testing this for a couple weeks in Pharo9 with the lastest headless VM on Windows 10 (ARM64 using a Surface Pro X) and I have not seen any divergence in time between the system time and the one reported by Pharo (using DateAndTime now).

This will be available in the upcoming Pharo 9 release. I'll close this issue for now, we can reopen it if necessary later.

@guillep guillep closed this as completed Apr 2, 2021
@SabineMa
Copy link
Contributor Author

Hi Guille, sorry for coming bak so late now. We are not yet working with Pharo9. I will check it when we move to Pharo9. Thanks a lot for your work on this issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: Critical To fix or review as soon as possible Type: Bug
Projects
None yet
Development

No branches or pull requests

7 participants