Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

smooth sntp timechanges (IDFGH-10641) #11873

Open
morgana2313 opened this issue Jul 14, 2023 · 14 comments
Open

smooth sntp timechanges (IDFGH-10641) #11873

morgana2313 opened this issue Jul 14, 2023 · 14 comments
Assignees
Labels
Awaiting Response awaiting a response from the author Status: Opened Issue is new Type: Feature Request Feature request for IDF

Comments

@morgana2313
Copy link

morgana2313 commented Jul 14, 2023

Is your feature request related to a problem?

The esp-idf lwip sntp code adjusts the boot_time (via newlib/time.c:adjtime) for every ntp reply it receives. This is somewhat crude, and makes the boot_time jump up and down with the (network) latency jitter on the latest ntp reply.

To make things worse, newlib/time.c:adjtime tends to adjust the boot_time with the maximum 15_625 usec step when a time function (gettimeofday) is called.

Describe the solution you'd like.

The RTC clock would be much smoother if:

  1. SNTP adjusts the RTC clock frequency and not actual time via boot_time. The hardware manual mentions that the RTC_SLOW_CLK is frequency is adjustable?

SNTP in nodemcu appears to use an adjustable clock rate on ESP/ESP32: [1] and [2]:

This module can compensate for the underlying clock not running at exactly the required rate. The adjustment is in steps of 1 part in 2^32 (i.e. around 0.25 ppb). This adjustment is done automatically if the sntp.sync() is called with the autorepeat flag set. The rate is settable using the set() function below. When the platform is booted, it defaults to 0 (i.e. nominal). A sample of modules shows that the actual clock rate is temperature dependant, but is normally within 5ppm of the nominal rate. This translates to around 15 seconds per month.

  1. Clock-disicpline with a PLL was used. sntp in nodemcu appears to use this as well on ESP32: [1]

Additional context.

I (29524) sn_time: 2023-07-14 10:23:53.166652 => delta:3288
I (44544) sn_time: 2023-07-14 10:24:08.186546 => delta:544
I (59564) sn_time: 2023-07-14 10:24:23.206837 => delta:-117
I (74584) sn_time: 2023-07-14 10:24:38.226897 => delta:306
I (89604) sn_time: 2023-07-14 10:24:53.251079 => delta:131
I (104634) sn_time: 2023-07-14 10:25:08.277078 => delta:87
I (119654) sn_time: 2023-07-14 10:25:23.297100 => delta:45
I (134674) sn_time: 2023-07-14 10:25:38.317357 => delta:204
I (149694) sn_time: 2023-07-14 10:25:53.337909 => delta:-653
I (164724) sn_time: 2023-07-14 10:26:08.367555 => delta:843
I (179744) sn_time: 2023-07-14 10:26:23.387485 => delta:43
I (194764) sn_time: 2023-07-14 10:26:38.407605 => delta:125
I (209784) sn_time: 2023-07-14 10:26:53.427703 => delta:48
I (224804) sn_time: 2023-07-14 10:27:08.447954 => delta:163
I (239824) sn_time: 2023-07-14 10:27:23.472600 => delta:-31
I (254854) sn_time: 2023-07-14 10:27:38.497943 => delta:-165
I (269874) sn_time: 2023-07-14 10:27:53.518400 => delta:-1253 <==== back
I (284904) sn_time: 2023-07-14 10:28:08.551199 => delta:1633  <===== forwards
I (299934) sn_time: 2023-07-14 10:28:23.582582 => delta:456
I (314964) sn_time: 2023-07-14 10:28:38.608522 => delta:-51
I (329984) sn_time: 2023-07-14 10:28:53.630121 => delta:32
I (345014) sn_time: 2023-07-14 10:29:08.658714 => delta:194
I (360034) sn_time: 2023-07-14 10:29:23.678893 => delta:63
I (375054) sn_time: 2023-07-14 10:29:38.700546 => delta:99
I (390084) sn_time: 2023-07-14 10:29:53.730622 => delta:-77
I (405114) sn_time: 2023-07-14 10:30:08.758977 => delta:292
I (420134) sn_time: 2023-07-14 10:30:23.779957 => delta:15
I (435164) sn_time: 2023-07-14 10:30:38.809235 => delta:148
I (450184) sn_time: 2023-07-14 10:30:53.829423 => delta:181
I (465204) sn_time: 2023-07-14 10:31:08.849585 => delta:18
I (480224) sn_time: 2023-07-14 10:31:23.869592 => delta:104
I (495244) sn_time: 2023-07-14 10:31:38.889729 => delta:-68
I (510264) sn_time: 2023-07-14 10:31:53.909615 => delta:-3357 <==== back
I (525294) sn_time: 2023-07-14 10:32:08.939931 => delta:3719 <===== forwards
I (540314) sn_time: 2023-07-14 10:32:23.959807 => delta:49
I (555334) sn_time: 2023-07-14 10:32:38.979908 => delta:144
I (570354) sn_time: 2023-07-14 10:32:54.000191 => delta:183
I (585374) sn_time: 2023-07-14 10:33:09.020238 => delta:37
I (600394) sn_time: 2023-07-14 10:33:24.042006 => delta:71
I (615424) sn_time: 2023-07-14 10:33:39.071071 => delta:-615 <==== back
I (630454) sn_time: 2023-07-14 10:33:54.100612 => delta:809 <===== forwards
I (645474) sn_time: 2023-07-14 10:34:09.120410 => delta:94
I (660494) sn_time: 2023-07-14 10:34:24.140744 => delta:40
I (675514) sn_time: 2023-07-14 10:34:39.160683 => delta:169
I (690534) sn_time: 2023-07-14 10:34:54.181013 => delta:-1521  <==== back
I (705564) sn_time: 2023-07-14 10:35:09.214118 => delta:1641  <===== forwards
I (720594) sn_time: 2023-07-14 10:35:24.241221 => delta:56
I (735614) sn_time: 2023-07-14 10:35:39.261133 => delta:11
I (750634) sn_time: 2023-07-14 10:35:54.281261 => delta:208
@morgana2313 morgana2313 added the Type: Feature Request Feature request for IDF label Jul 14, 2023
@espressif-bot espressif-bot added the Status: Opened Issue is new label Jul 14, 2023
@github-actions github-actions bot changed the title smooth sntp timechanges smooth sntp timechanges (IDFGH-10641) Jul 14, 2023
@david-cermak
Copy link
Collaborator

The esp-idf lwip sntp code adjusts the boot_time (via newlib/time.c:adjtime) for every ntp reply it receives.

Just to make sure, have you tried the SNTP_SYNC_MODE_SMOOTH option?

} else if (sntp_sync_mode == SNTP_SYNC_MODE_SMOOTH) {
struct timeval tv_now;
gettimeofday(&tv_now, NULL);
int64_t cpu_time = (int64_t)tv_now.tv_sec * 1000000L + (int64_t)tv_now.tv_usec;
int64_t sntp_time = (int64_t)tv->tv_sec * 1000000L + (int64_t)tv->tv_usec;
int64_t delta = sntp_time - cpu_time;
struct timeval tv_delta = { .tv_sec = delta / 1000000L, .tv_usec = delta % 1000000L };

Note that the callback is declared as weak

https://github.com/espressif/esp-idf/blob/28167ea5a3cdd49227216c4131a9df7f4eda260e/components/lwip/apps/sntp/sntp.c#L40C16-L40C16

so you can also implement your own method of adjusting time.

@morgana2313
Copy link
Author

Thank you David for your reply!

There are three different improvements possible:

The ESP32 itself, after rtc calibration, has a drift of ±100ppm (8 sec/day) and will vary about 1400ppm per 1°C the temperature.

SNTP_SYNC_MODE_SMOOTH does not address the core of the problem; the clock rate will just be way off at certain times.

Changing the clock rate is better than (smoothly) adjusting the boot_time. (Adjusting the boot_time is like trying to keep a constant 50 meter distance between you and the car ahead of you, using a cruise-control with only one fixed speed setting. Every LWIP_SNTP_UPDATE_DELAY you measure the distance (with a random (network) jitter error up to 10 meters) and do some breaking/acceleration to adjust the distance and then stubbornly re-engage the cruise-control at the wrong speed. For a continuous smooth ride you want a fine grained cruise-control.)

The rate the esp_rtc_get_time_us clock advances with every RTC_CNTL_TIME0_REG tick seems to be controlled by RTC_SLOW_CLK_CAL_REG, is that right? Is using RTC_CNTL_TIME0_REG portable to other ESP platforms than the esp32-s3 that were using?

  1. Improvement: use RTC_SLOW_CLK_CAL_REG to change the clock rate.

  2. Improvement: the callback is only called with the t2 timeval. Add the others, necessary to reduce the effects of the (network) jitter.

  3. Improvement: to get a really smooth clock you need the intricate NTP clock discipline algorithm with a PLL/FLL. RTC_SLOW_CLK_CAL_REG is a variable frequency oscillator(VFO). The accuracy a NTP client with clock discipline can achieve is about 0.1 .. 0.5 times the network jitter.

@morgana2313
Copy link
Author

ad 2: in esp-lwip/sntp.c:sntp_process is some code that does use the other times to compute the Clock offset calculation according to RFC 4330. It's enabled with SNTP_COMP_ROUNDTRIP. that appears to be set in sntp_opts.h by default?

@GaryHeaven
Copy link

I will +1 this. I am currently looking to see what is involved in porting a Chrony client. So many of the things we need to fix are outside the scope of that weak linked function.

@KaeLL
Copy link
Contributor

KaeLL commented Aug 9, 2023

The ESP32 itself, after rtc calibration, has a drift of ±100ppm (8 sec/day) and will vary about 1400ppm per 1°C the temperature.

Where did you find this information?

use RTC_SLOW_CLK_CAL_REG to change the clock rate.

Wouldn't that interfere with the ULP co-processor?

Improvement: the callback is only called with the t2 timeval. Add the others, necessary to reduce the effects of the (network) jitter.

Seconded.

@morgana2313
Copy link
Author

  1. Improvement: the callback is only called with the t2 timeval. Add the others, necessary to reduce the effects of the (network) jitter.

It took some analysing to see what's going on in all these layers. The callback is called with the network latency corrected local time. Thats ok, but it would be better if it just passed the adjustment. Now the adjustment is added to local time, later on a (later) localtime is subtracted to calculate the boottime offset. Changing the boottime offset is bad.

  • components/lwip/lwip/src/apps/sntp/sntp.c:sntp_process:
#if SNTP_COMP_ROUNDTRIP                                                                         // always defined in sntp_opts.h
# if SNTP_CHECK_RESPONSE >= 2
  if (timestamps->recv.sec != 0 || timestamps->recv.frac != 0)
# endif
  {
    s32_t dest_sec;
    u32_t dest_frac;
    u32_t step_sec;

    /* Get the destination time stamp, i.e. the current system time */
    SNTP_GET_SYSTEM_TIME_NTP(dest_sec, dest_frac);

    step_sec = (dest_sec < sec) ? ((u32_t)sec - (u32_t)dest_sec)
               : ((u32_t)dest_sec - (u32_t)sec);
    /* In order to avoid overflows, skip the compensation if the clock step
     * is larger than about 34 years. */
    if ((step_sec >> 30) == 0) {
      s64_t t1, t2, t3, t4;

      t4 = SNTP_SEC_FRAC_TO_S64(dest_sec, dest_frac);
      t3 = SNTP_SEC_FRAC_TO_S64(sec, frac);
      t1 = SNTP_TIMESTAMP_TO_S64(timestamps->orig);
      t2 = SNTP_TIMESTAMP_TO_S64(timestamps->recv);
      /* Clock offset calculation according to RFC 4330 */
      t4 += ((t2 - t1) + (t3 - t4)) / 2;

      sec  = (s32_t)((u64_t)t4 >> 32);
      frac = (u32_t)((u64_t)t4);
    }
  }
#endif /* SNTP_COMP_ROUNDTRIP */

  SNTP_SET_SYSTEM_TIME_NTP(sec, frac);

  • components/lwip/lwip/src/apps/sntp/sntp.c:: define SNTP_SET_SYSTEM_TIME_NTP(s, f)
  • components/lwip/port/include/lwipopts.h: #define SNTP_SET_SYSTEM_TIME_US(sec, us) (sntp_set_system_time(sec, us))
  • components/lwip/apps/sntp/sntp.c:sntp_set_system_time(uint32_t sec, uint32_t us) receives a timestamp and passes it as a tv to sntp_sync_time.
  • components/lwip/apps/sntp/sntp.c:sntp_sync_time(tv): sets/adjusts the boottime and passes tv to the time_sync_notification_cb function.

@david-cermak
Copy link
Collaborator

Hi @morgana2313

we will not support the proposed update methods in the near future, but we'd like to allow users to implement various timechange algorithms. If I understand correctly, It is not possible at this moment and these are the points currently missing in IDF:

  1. Allow users to acquire t1, t2, t3 and t4 in the callback
  2. Possibility to enable SNTP_COMP_ROUNDTRIP in lwip

Is there anything else? @GaryHeaven you mentioned that some functionality that you'd like to modify is not customizable, could you be more specific?

@espressif-bot espressif-bot added the Awaiting Response awaiting a response from the author label Mar 19, 2024
@GaryHeaven
Copy link

Ideally the SNTP stack should produce a system time that is consistent with the limitations of the network jitter and the stability of the internal clock source. The current implementation does not do this. I don't think exposing the internal logic is a productive option - the internal stack should just work or, as you have done, allow an external implementation.

I have been able to use the loose bound void sntp_sync_time(struct timeval *tv) function to track a moving average of errors and thus tame the network latency and adjust for the frequency error of the internal clock. I am happy with that as a solution.

@morgana2313
Copy link
Author

morgana2313 commented Mar 20, 2024

I've set up a proper ntp synced disciplined clock for the esp32s3. The key concepts are:

  1. the ticks/second frequency jitters a bit, chip- and ambient temperature are structural influences.
  2. use a combined FLL and PLL: gettimeofday = last_gettimeofday + (ticks_since_gettimeofday * tick_FREQ) + proportional_PHASE_offset : administer the phase offset proportionally over a longer time.
  3. As all ntp measurements have big inaccuracies, only perform a small portion of the apparent offset. Use long running moving averages. An incoming ntp packet will change the tick_FREQ and PHASE offset only slightly.
  4. You can spend an infinite amount of time on getting the time right.

@david-cermak
Copy link
Collaborator

Thanks for sharing your views @morgana2313 @GaryHeaven
If I understand correctly, you both have been able to modify/implement the desired timechange algorithm with current customization options in IDF, correct?
It would be great if you can and are allowed to share some portions of your project and link it here?

@morgana2313
Copy link
Author

Yes, on top of IDF, but I had to implement my own sntp (to get the other parameters and stop sntp for actually changing the time).
I've not yet investigated how to replace newlib's gettimeofday transparently (loosly-linked?) I would appreciate hints for that so I can make a version I could share.

@david-cermak
Copy link
Collaborator

You can probably also use linker wrapping to replace the gettimeofday() implementation, adding

target_link_libraries(${COMPONENT_LIB} INTERFACE "-Wl,--wrap=_gettimeofday_r")

to your project main component makefile and define

int IRAM_ATTR __wrap__gettimeofday_r(struct _reent *r, struct timeval *tv, void *tz)
{
    // My gettod
    return 0;
}

(Note that this is a system calls from gettimeofday() that calls espressif newlib implementation in

int IRAM_ATTR _gettimeofday_r(struct _reent *r, struct timeval *tv, void *tz)

which we're wrapping)

this sounds hacky, but should work. I'll ask around if someone knows of a better way to reroute these syscalls...

@GaryHeaven
Copy link

I wrote this in the arduino framework so I only had access to IDF 1.4 but the logic is there. I also have a local time server so I am prepared to ignore the long network latency but it does need to be considered so exposing t1-t4 would be useful as an interim. In all this we must remember the ESP is a microcontroller and thus every cpu cycle should be a prisoner. I think my implementation below is too code heavy for most situations and I have only used it where I need consistent timestamps to less than 0.5 seconds. The current library is good for a minute resolution in most cases and in most cases that is good enough.

Hopefully my code comments make sense.

`// this implements the weak linked function in ESP IDF
void sntp_sync_time(struct timeval *tv)
{
// The ESP Clock follows the microsecond counter with a boot offset
// adjtime and setlocaltime functions alter that value in a thread safe way.
// SNTP time is jitterish so we have a 8 record buffer to average out the jitter.
// Separately there is a wee task that we control how often it executes and that
// task adjusts the localtime by +/- a nominal microseconds amount (e.g. +/- 50us)
// By changing the frequency of that task we hopefully get a local clock within a
// few milliseconds of NTP time fairly consistantly.

// We need a long run delta between SNTP and the underlying ESP Clock
// and a correction delta in case something bad like losing internet happens
static struct sntp_delta_t {
int64_t cpu_micros = 0;
int64_t sntp_micros = 0;
int64_t gtod_micros = 0;
//int64_t delta_micros = 0;
} sntp_deltas[8];

int64_t sntp_micros_adj = 50LL;
static int sntp_sync_count;
sntp_sync_mode_t sntp_sync_mode;
sntp_sync_mode = sntp_get_sync_mode();
uint32_t sync_interval = sntp_get_sync_interval();
struct timeval gtod_tv;
int64_t esp_time = esp_timer_get_time();
int64_t sntp_time = (int64_t)tv->tv_sec * 1000000LL + (int64_t)tv->tv_usec;

if (sntp_sync_mode == SNTP_SYNC_MODE_IMMED) {
  settimeofday(tv, NULL);
  sntp_set_sync_status(SNTP_SYNC_STATUS_COMPLETED);
}
gettimeofday(&gtod_tv, NULL);
int64_t gtod_time = (int64_t)gtod_tv.tv_sec * 1000000LL + (int64_t)gtod_tv.tv_usec;

sntp_deltas[sntp_sync_count % 8].sntp_micros = sntp_time;
sntp_deltas[sntp_sync_count % 8].cpu_micros = esp_time;
sntp_deltas[sntp_sync_count % 8].gtod_micros = gtod_time;

if (sntp_sync_mode == SNTP_SYNC_MODE_SMOOTH) {
  // Calc deltas. This will be the 'next' count for a full buffer...
  int64_t d1, d2, dms; 
  if (sntp_sync_count > 7U) {
    d1 =  sntp_deltas[sntp_sync_count % 8].sntp_micros - sntp_deltas[(sntp_sync_count + 1) % 8].sntp_micros;
    d2 =  sntp_deltas[sntp_sync_count % 8].cpu_micros - sntp_deltas[(sntp_sync_count + 1) % 8].cpu_micros;
    dms = d1 / 1000L; 
  } else {
    d1 =  sntp_deltas[sntp_sync_count % 8].sntp_micros - sntp_deltas[0].sntp_micros;
    d2 =  sntp_deltas[sntp_sync_count % 8].cpu_micros - sntp_deltas[0].cpu_micros;
    dms = d1 / 1000L; //sync_interval * sntp_sync_count;
  }

  ESP_LOGD(TAG, "sntp: %lld , cpu: %lld, dms: %lld", d1, d2, dms);

  int64_t delta = d1 - d2;
  if (delta < -100LL) {sntp_time_adj_value = 0LL-sntp_micros_adj;} 
  if (delta > 100LL) {sntp_time_adj_value = sntp_micros_adj;}
  ESP_LOGD(TAG, "Delta %lld : ", delta);
  // Calculate the right interval for the deltas
  // Range check first for stupid values
  if (delta < -30000000LL || delta > 30000000LL || sntp_sync_count == 0) {
    // TODO: Toss out any SNTP outliers once we are in sync rather than doing a settimeofday
    settimeofday(tv, NULL);
    sntp_deltas[sntp_sync_count % 8].gtod_micros = sntp_time;
    ESP_LOGD(TAG, "Time was synchronized through settimeofday");
    // TODO: re-initialise averages structure 
  } else {
    // how many sntp_micros_adj (+/- 50 us) 
    // bites of this are needed to correct?
    unsigned long new_interval;
    
    // We know our base clock drift now in the form of delta
    // Now calculate the historical difference of SNTP and GTOD and start 
    // eliminating that offset. This will be due to the averaging lag initally 
    // and then more dominantly jitter.
    int64_t gtod_delta = 0;
    
    for (sntp_delta_t i : sntp_deltas){
      gtod_delta += (i.sntp_micros - i.gtod_micros);
    } gtod_delta = gtod_delta / 8;
    int64_t new_delta = delta + gtod_delta;

    // Now work out what kind of interval we need
    if (new_delta != 0) {
      // TODO: Store the long run interval as a preference to get things stable quicker.
      new_interval = abs((long)(dms/((new_delta/sntp_micros_adj) + 1)));
      // TODO: Work out what is sensible if this becomes really short (i.e. a out of spec crystal )
      if (new_interval < 500L) {new_interval = 500L;}
      tsClockAdj.setInterval(new_interval);
      tsClockAdj.enable();
    }
    ESP_LOGD(TAG, "New Int: %ld, Adj: %lld, SNTP-GTOD: %lld, AVG: %lld ",  new_interval, sntp_time_adj_value, sntp_time - gtod_time, gtod_delta);
  }

}
sntp_sync_count++;

}
// Task Schedular task to periodically correct the clock
void alterClockMicros(){
/*
adjtime on ESP32 iterates if the time adjustment goes over a second.
For small values it executes in one step.
*/
struct timeval rem_adj;
adjtime(NULL, &rem_adj); // get any remaining adjustment
int64_t new_adj_micros = (int64_t)rem_adj.tv_sec * 1000000LL + (int64_t)rem_adj.tv_usec;
new_adj_micros += sntp_time_adj_value;
rem_adj.tv_sec = (time_t)new_adj_micros / 1000000LL;
rem_adj.tv_usec = (suseconds_t)new_adj_micros % 1000000LL;
adjtime(&rem_adj, NULL);
};
`

@david-cermak
Copy link
Collaborator

david-cermak commented Apr 5, 2024

Thanks @GaryHeaven for sharing your implementation. This is very helpful!

@morgana2313 Any news from your side?
Unfortunately there's no easy way to override the time functions at the moment.
We would however make these implementation weak, as well, if CONFIG_NEWLIB_TIME_SYSCALL_USE_NONE=y.
I'm attaching a patch for your evaluation, but you can also use the method mentioned in #11873 (comment) (and just remove the makefile hack once this fix lands on master)


fix-newlib-Allow-for-timefunc-customization.patch.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Awaiting Response awaiting a response from the author Status: Opened Issue is new Type: Feature Request Feature request for IDF
Projects
None yet
Development

No branches or pull requests

5 participants