Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows image has system time delay of 40 seconds #2792

Open
SabineMa opened this Issue Mar 11, 2019 · 22 comments

Comments

Projects
None yet
6 participants
@SabineMa
Copy link

SabineMa commented Mar 11, 2019

I have the problem that on windows, Time now differs from the system time. At one instance, I have a difference of 40 seconds! (Pharo7) When searching for it, I found this: http://forum.world.st/System-time-td3486236.html

Starting a new image, there is no gap. The image with the 40 seconds gap is running since two weeks
the image is 40 seconds behind the windows system time.

I had a conversation with lamneth on discord and I copy it here:

Bildschirmfoto 2019-03-11 um 13 48 20
Bildschirmfoto 2019-03-11 um 13 48 30

@SabineMa

This comment has been minimized.

Copy link
Author

SabineMa commented Mar 11, 2019

would be very nice if this could be fixed

@SabineMa

This comment has been minimized.

Copy link
Author

SabineMa commented Mar 11, 2019

@sabine Manaa Duration>>#seconds:nanoSeconds: seems like the 1 suspect with 2 while loops waiting for some magic to happen! That's where we lose those previous microseconds!

@SabineMa

This comment has been minimized.

Copy link
Author

SabineMa commented Mar 11, 2019

and I attach a screenshot as "proof"

@SabineMa

This comment has been minimized.

Copy link
Author

SabineMa commented Mar 11, 2019

Bildschirmfoto 2019-03-11 um 13 54 43

@SabineMa

This comment has been minimized.

Copy link
Author

SabineMa commented Mar 11, 2019

so, it is 35 seconds and not 40. Still enough

@svenvc

This comment has been minimized.

Copy link
Contributor

svenvc commented Mar 11, 2019

@bstjean

This comment has been minimized.

Copy link

bstjean commented Mar 11, 2019

From what can be read in the comments, seems like invoking the primitive 240 (and others) is not always a system call. Time seems to be managed by a heartbeat thread.

From the source at:
https://raw.githubusercontent.com/OpenSmalltalk/opensmalltalk-vm/Cog/src/vm/cointerp.c

Comment for primitiveUTCMicrosecondClock (primitive 240) in the C code :

/* Return the value of the microsecond clock in the local timezone, as
updated by the heartbeat, as an integer.
This is the number of microseconds since the Smalltalk epoch, 1901/1/1
12:00am. The microsecond clock is at least 60 bits wide which means it'll
get to around August
38435 before it wraps around. Be sure to put it on your calendar. The
coarse clock is
updated by the heartbeat thread and as such is much cheaper than
primitiveUTCMicrosecondClock, which always entails a system call. */

@SabineMa

This comment has been minimized.

Copy link
Author

SabineMa commented Mar 12, 2019

in 16 hours it looses about 2 seconds (Pharo 7 on Windows Server 2012 R2, aws).

This lead to a 40 seconds delay on our image running since 25.2.2019.

Interesting: on another server, same hardware another image but Pharo6 with threaded heartbeat lost "only" 22 seconds in the same time
Bildschirmfoto 2019-03-12 um 09 27 39

I made the excel with this (thank you Sven)

Smalltalk at: #stopRKACheck put: false.
([ ZTimestampPreciseSNTPClient new
  enforceClockDifference: 30 seconds
  ifFail: [ :delta |
   RKALogger log: ('Clock difference {1} > 2s' format: { delta }) ];
  close.
	(Delay forSeconds: 60) wait.
	 Smalltalk at: #stopRKACheck
		] whileFalse) fork..

everything only approximately

@bencoman

This comment has been minimized.

Copy link
Contributor

bencoman commented Mar 12, 2019

Is your Windows Server installed on bare-metal or virtualized?

@SabineMa

This comment has been minimized.

Copy link
Author

SabineMa commented Mar 12, 2019

It is virtualized, it is a Amazon EC2 instance

@bencoman

This comment has been minimized.

Copy link
Contributor

bencoman commented Mar 12, 2019

Can you overlay the same graph with the clock difference generated from the shell.
It could be the underlying OS rather than Pharo...

You may find this interesting...
https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/Timekeeping-In-VirtualMachines.pdf

@bencoman

This comment has been minimized.

Copy link
Contributor

bencoman commented Mar 12, 2019

I am running the similar test on my desktop...

Metacello new
  baseline: 'ZTimestamp';
  repository: 'github://svenvc/ztimestamp';
  load.

World menu > System > Process Browser

timeRef := ZTimestampPreciseSNTPClient new.
delayTime := 60.
samples := OrderedCollection new.
startTime := timeRef remoteClock.
[	|sampleTime sample|
	[ 	sampleTime := timeRef remoteClock.
		sampleOffset := (sampleTime - startTime).
		samples add: sampleOffset -> timeRef clockDifference.
		(Delay forSeconds: delayTime) wait.
	] repeat
] forkNamed: 'Time loss check'.
samples inspect.
@SabineMa

This comment has been minimized.

Copy link
Author

SabineMa commented Mar 12, 2019

interesting. But I assume this is not my problem.

comparing the time of my mac with the time of the aws instance shows only 5 seconds difference.
Bildschirmfoto 2019-03-12 um 12 44 52

If there would be a problem with the aws instances time, there should be also a gap of about 22 seconds. Or do I miss something here?

Running your tests now on the aws instance with the 22 minutes delay shows the following:
Bildschirmfoto 2019-03-12 um 12 52 45

The clock difference is slowly growing. The difference of the aws instance and my mac seems to be constant.
BTW We have medium instances and not small.

I will try now on a windows system here which is not virtualized.

@astares

This comment has been minimized.

Copy link
Member

astares commented Mar 12, 2019

This is on my local non-virtualized machine:

image

What AWS region/data center do you use?

@SabineMa

This comment has been minimized.

Copy link
Author

SabineMa commented Mar 12, 2019

@astares can you try with bens code please?
how long is the image running?

I took a new pharo 7.0.2 image, loaded ZTimestamp s from the catalog and then started the code of ben. This is the result:

image-2

(Array with: Time now with: (WinProcess resultOfCommand: 'time')) inspect

also shows the same times

I keep it running and come back with the result tomorrow. Then I know if the gap increases.

I use eu-central

@eliotmiranda

This comment has been minimized.

Copy link

eliotmiranda commented Mar 12, 2019

Hi Sabine, see http://forum.world.st/Time-millisecondClockValue-was-The-Trunk-Morphic-mt-1080-mcz-tp4877661p4877918.html. IMO something like this still has to be implemented.

@bencoman

This comment has been minimized.

Copy link
Contributor

bencoman commented Mar 12, 2019

My results, local Win10 desktop,
0:00:49:41.212401861->0:00:00:03.164461029
0:01:40:26.602295480->0:00:00:03.193587586
0:02:31:10.393228559->0:00:00:03.224709307
0:03:21:52.585352708->0:00:00:03.254618240
0:04:12:34.434387566->0:00:00:03.284446966
0:05:03:16.783565619->0:00:00:03.312806326
So 0.16 seconds drift in five hours (but I haven't checked OS local time drift yet myself)

@bstjean

This comment has been minimized.

Copy link

bstjean commented Mar 12, 2019

Wrote a quick script (for Pharo 5.0) to collect timings. Will let it run for a few days.. I am on Windows 10.

| delayTime oLatency oStr oTime iLatency iStr iTime pTime pStr target stream f fStream fName comma piDiff poDiff oSec iSec pSec |
" *********************************************************************
Prerequisite:

Gofer new
smalltalkhubUser: 'OS' project: 'OS-Windows';
package: 'ConfigurationOfOSWindows';
load.
(Smalltalk at: #ConfigurationOfOSWindows) load

********************************************************************* "
comma := ','.
delayTime := 120.
fName := 'time-difference.csv'.
f := (File named: fName) openForAppend.
f nextPutAll: 'otime, osec, olatency, itime, isec, ilatency, ptime, psec, p-i, p-o'; cr.
f close.

[[
"Get all different times we need"
oLatency := Time millisecondsToRun: [oStr := (WinProcess resultOfCommand: 'echo %date% %time%')].
iLatency := Time millisecondsToRun: [iStr := (ZnClient new url: 'http://worldtimeapi.org/') get].
pTime := DateAndTime now.

"Pharo time"
pStr := pTime printString.

"OS time"
oStr := oStr trimBoth copyReplaceAll: ',' with: '.'.
oTime := DateAndTime fromString: oStr.

"Internet time"
target := 'The unixtime is:'.
stream := iStr readStream.
stream position: (stream positionOfSubCollection: target) + target size.
stream upTo: $".
stream next.
iStr := stream upTo: $<.
iTime := DateAndTime fromString: iStr.

"Write timing info to CSV file"
f := (File named: fName) openForAppend.
oSec := oTime asSeconds.
iSec := iTime asSeconds.
pSec := pTime asSeconds.
piDiff := pSec - iSec.
poDiff := pSec - oSec.
f nextPutAll: oStr, comma, oSec printString, comma, oLatency printString, comma, iStr, comma, iSec printString, comma, iLatency printString, comma.
f nextPutAll: pTime printString, comma, pSec printString, comma, piDiff printString, comma, poDiff printString.
f cr; close.

(Delay forSeconds: delayTime) wait.
] repeat] forkNamed: 'Time difference problem process'.

@svenvc

This comment has been minimized.

Copy link
Contributor

svenvc commented Mar 12, 2019

Benoit, the following:

"Get all different times we need"
oLatency := Time millisecondsToRun: [oStr := (WinProcess resultOfCommand: 'echo %date% %time%')].
iLatency := Time millisecondsToRun: [iStr := (ZnClient new url: 'http://worldtimeapi.org/') get].
pTime := DateAndTime now.

Won't do, this is way to inaccurate, it just takes too long the execute a subprocess or an HTTP call, there is a reason SNTP exists.

@svenvc

This comment has been minimized.

Copy link
Contributor

svenvc commented Mar 12, 2019

But it should be way less, on my machine, it is less than 0.03 seconds, although it also went up a but over time, it also went down sometimes (I left one image open unattended for some time while working on other stuff on the same machine)

0:00:00:00.0596007->0:00:00:00.013950757 
0:00:01:00.107850982->0:00:00:00.016683098 
0:00:02:00.164224743->0:00:00:00.018697383 
0:00:03:00.212188175->0:00:00:00.02041713 
0:00:04:00.270209449->0:00:00:00.021989721 
0:00:05:00.324946127->0:00:00:00.021892737 
0:00:06:00.372803932->0:00:00:00.022304285 
0:00:07:00.414584319->0:00:00:00.023938578 
0:00:08:00.460686074->0:00:00:00.024793945 
0:00:09:00.505314576->0:00:00:00.02738841 
0:00:10:00.555243517->0:00:00:00.029270263 
0:00:11:00.601851496->0:00:00:00.02998618 
0:00:12:00.643173077->0:00:00:00.028739454 
0:00:13:00.691288118->0:00:00:00.038389864

BTW, here is the result on a long running Linux image

$ ./repl.sh 
Connecting to Telnet REPL at locahost:41011
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Neo Console Pharo-7.0+alpha.build.660.sha.2eb9bd2f41e7b0bd8f9f4190906910f83c178ab1 (32 Bit)
pharo> get system.uptime
105 days 2 hours 48 minutes
pharo> ZTimestampPreciseSNTPClient new clockDifference 

0:00:00:00.011700557
pharo> 
Bye!
Connection closed by foreign host.
@bstjean

This comment has been minimized.

Copy link

bstjean commented Mar 12, 2019

@SabineMa

This comment has been minimized.

Copy link
Author

SabineMa commented Mar 13, 2019

It lost the time also on the local windows 10 machine here from yesterday to today:
approx 0.5 sconds in 19 hours
will be approx 16 seconds in one month.
I thought this is the proof that it is not a aws issue.

see the screens below.

Bildschirmfoto 2019-03-13 um 10 15 56

Bildschirmfoto 2019-03-13 um 10 16 06

I don't know why but the windows 10 machine had a difference of 38 seconds from the beginning. I was taking a complete fresh Pharo 7.0.2 .

I wanted to make the same without the initial 38 seconds gap and started the new pharo image again on the windows 10 machine. When I started it again, there was no initial time difference and over time also NO gap. I dont understand this. Drives me crazy.

I will now start new images on our servers in a regular interval so that the gap is not too big. e.g. I use cookies for login which are valid only for a short time and when the time difference gets too big, people can not log in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.