Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Forwarded from PttBug)「登入次數」的累計盲點 #104

Open
holishing opened this issue May 30, 2021 · 2 comments
Open

(Forwarded from PttBug)「登入次數」的累計盲點 #104

holishing opened this issue May 30, 2021 · 2 comments

Comments

@holishing
Copy link
Contributor

holishing commented May 30, 2021

有使用者對 2014 年底後實施的登入次數累計方法有以下疑問:
https://www.ptt.cc/bbs/PttBug/M.1622376938.A.CC0.html

故把問題轉錄過來,也順便請教看看其他開發者對此有什麼看法?

@IepIweidieng
Copy link
Contributor

實作登入次數自動增加機制的程式碼,根據以 PttBBS 原始碼為範圍、使用 ripgrep 所取得的字串搜尋結果,推測有兩處:

  • mbbsd/passwd.c pwcuLoginSave()——每日登入時增加一次(「登入時的次數增加」)
    • 次數增加條件:上一次增加登入次數的時間(cuser.lastlogin)在本次登入日 00:00 之前
  • util/update_online.c fastcheck()——單次線上期間每超過一天增加一次(「線上期間的次數增加」)
    • 次數增加條件:每天某時間點時,使用者在線上且距離上一次增加登入次數的時間超過一天。這個時間會依站臺設定而異,在 Ptt 上是 04:00。

上述兩者在增加登入次數(cuser.numlogindays)時,都會設定 cuser.lastlogin 為某個接近目前時間的時間值(「重設用目前時間值」)。

節錄自 util/update_online.c 的相關程式碼

time4_t base;
base = time4(0) - DAY_SECONDS;

if (urec.lastlogin >= base)
continue;
if (verbose)
fprintf(stderr, "update: %s (%s, %d) ->", urec.userid,
Cdatelite(&urec.lastlogin), urec.numlogindays);
/* user still online, let's mock it. */
urec.lastlogin = now;
urec.numlogindays++;
if (verbose)
fprintf(stderr, "(%s, %d).\n", Cdatelite(&urec.lastlogin), urec.numlogindays);
passwd_update(last_uid, &urec);

now = time(NULL);

線上使用者登入後,其線上期間的計算方式為 now_cond - cuser.lastlogin,其中 now_cond 為「判斷用目前時間值」,而 cuser 為對應此使用者的 userec_t 物件。

以下程式碼會阻止線上期間未超過一天的使用者的登入次數增加:

if (urec.lastlogin >= base)
continue;

urec.lastlogin >= base(1)
urec.lastlogin >= now_cond - DAY_SECONDS (now_cond 為「判斷用目前時間值」)
urec.lastlogin + DAY_SECONDS >= now_cond
now_cond <= urec.lastlogin + DAY_SECONDS(2)
now_cond - urec.lastlogin <= DAY_SECONDS
duration <= DAY_SECONDS (duration 為線上期間)

可知在 update_online 所算出的線上期間大於一天的情況下,才會增加該使用者的登入次數。

作為比較,這是 commit 05702e5 時的對應程式碼:

if (now < urec.lastlogin + DAY_SECONDS - 60 * 60)
continue;

上述 (2) 式與之相比多了等號。

根據前面討論所提到的「線上期間的次數增加」機制的描述,以及 update_online 是支單獨程式的事實,我推測 update_online 是透過 cron 每天定時執行一次的。

假設電腦執行速度快得使「重設用目前時間值」(在上述節錄內容內,行號為 77 之行)與「判斷用目前時間值」(在上述節錄內容內,行號為 16 之行)皆相同。(因為參考 util/update_online.c 與其所使用的 common/bbs/cache.c 中的函式的定義,在程式執行流程中的上述兩行程式碼間不存在固定的延時,所以如果電腦執行速度相當快的話,這兩行可視為同時執行)

情況一

如果 cron 某次執行 update_online 時延遲,導致所取得的「重設用目前時間值」比預期延後超過一秒,這個情況下如果下次執行時所取得的時間值與預期相比沒有延後,那麼本次登入後已被 update_online 增加過登入次數且仍在線上的使用者,將會因為所算出的線上期間小於一天,所以其登入次數不會增加,造成少算的狀況。

一種解決方法是重新引入時間判斷的誤差容許值,但在定時執行未出現延遲時會導致允許將線上期間未超過一天的使用者的登入次數增加。

另一種解決方法是將「重設用目前時間值」與「判斷用目前時間值」取為最近的分鐘或小時的開始時間。

又另一種解決方法請見情況四。


假設電腦執行速度快,cron 每天皆一秒不差地定時執行一次 update_online,使得其執行時間的值、「重設用目前時間值」、與「判斷用目前時間值」皆相同。

情況二

如果使用者被過去最近一次 update_online 的每天定時執行增加了登入次數,或如果使用者本次登入後未被 update_online 增加過登入次數,並且使用者登入的時間點為過去最近一次 update_online 執行時所取得的「判斷用目前時間值」的時間點,那麼下一次執行時,這個使用者的線上期間將會等於一天,所以其登入次數不會增加,造成少算的狀況。遇到下二次執行時才會增加,但僅增加一次。

修正方法是移除上述 (1) 式中的等號。

情況三

如果使用者本次登入後未被 update_online 增加過登入次數,並且使用者登入的時間點比過去最近一次 update_online 執行時所取得的「判斷用目前時間值」的時間點還晚超過一秒後登入,那麼這個使用者的線上期間達到大於等於一天的時間會落在下一次到下二次遇到 update_online 每天定時執行之間。這個情況下,如果這個使用者在下二次遇到 update_online 的每天定時執行前離線,將會因為 update_online 不處理離線使用者而不增加其登入次數而造成少算的狀況。

一種解決方法是在使用者登出時會執行的處理函式(例如 pwcuExitSave() )中,加上以下處理邏輯:

如果以 cuser.lastlogin - now 算出的線上期間大於等於一天則增加一次登入次數(「離線時的次數增加」)。

但在使用者的對應處理程序異常終止執行而導致「離線時的次數增加」機制未被執行的情況下,還是會造成少算的狀況。

情況四

以下是虛構的說明範例:

  • 使用者甲在某天 03:59:59 登入,在後天 04:00:01 登出,在線上共 48 時 0 分 2 秒
    • 登入次數增加時間點:第一天 03:59:59、第二天 04:00:00、第三天 04:00:00,共三次
  • 使用者乙在某天 04:00:01 登入,在後天 04:00:03 登出,在線上共 48 時 0 分 2 秒
    • 登入次數增加時間點:第一天 04:00:01、第三天 04:00:00,共兩次

可見使用者甲與乙的線上期間相同,僅有登入與登出的時間有些許前後差別,但乙的登入次數卻少算了一次。

一種解決方法是將「重設用目前時間值」取為 cuser.lastlogin + DAY_SECONDS 以保留實際線上期間的資訊,並搭配使用情況三的解決方法處理離線時的線上期間。

使用此解決方法後的上述說明範例:

  • 使用者甲在某天 03:59:59 登入,在後天 04:00:01 登出,在線上共 48 時 0 分 2 秒
    • 登入次數增加時間點:第一天 03:59:59(登入)、第二天 04:00:00(線上;尚有 1 秒未計)、第三天 04:00:00(線上;尚有 1 秒未計),共三次
  • 使用者乙在某天 04:00:01 登入,在後天 04:00:03 登出,在線上共 48 時 0 分 2 秒
    • 登入次數增加時間點:第一天 04:00:01(登入)、第三天 04:00:00(線上;尚有 23 時 59 分 59 秒未計)、第三天 04:00:03(登出;尚有 2 秒未計),共三次

IepIweidieng added a commit to IepIweidieng/pttbbs that referenced this issue Jun 3, 2021
Legend:
* Event: Name of event ("Login", "UpdateDN" (by update_online), and "Logout")
    * UpdateDN: Update on day N (day 1 is the day of login)
        * UpdateDN is assumed to usually occur at 04:00:00 each day
    * If Login and UpdateD1 occur at the same time,
      UpdateD1 is omitted from the chart for clarity
* Time: Time of event
* TDur: Total online duration since login
* CTime/NTime: Checkpoint (lastlogin) set by the current method/this new method
* CDurB/A: Online duration before/after the checkpoint is set (current method)
* NDurB/A: Online duration before/after the checkpoint is set (new method)
* CIncr/NIncr: Increment of numlogindays (current/new method)

---

Case: Online duration after last checkpoint = 1 day (86400 seconds)

The login count does not increase when the online duration = 1 day
due to a regression introduced in 05bb3eb,
where >= instead of > was used to filter out users
whose last checkpoint time (lastlogin) were later than 1 day before now.

Event: Login     UpdateD2  UpdateD3  UpdateD4  UpdateD5
Time:  04:00:00  28:00:00  52:00:00  76:00:00 100:00:00
TDur:  00h00m00  24h00m00  48h00m00  72h00m00  96h00m00
CTime: 04:00:00  --        52:00:00  --       100:00:00
CDurB: 00h00m00  24h00m00  48h00m00  24h00m00  48h00m00
CDurA: 00h00m00  --        00h00m00  --        00h00m00
CIncr: +1         0        +1         0        +1        Total: +3
NTime: 04:00:00  28:00:00  52:00:00  76:00:00 100:00:00
NDurB: 00h00m00  24h00m00  24h00m00  24h00m00  24h00m00
NDurA: 00h00m00  00h00m00  00h00m00  00h00m00  00h00m00
NIncr: +1        +1        +1        +1        +1        Total: +5

This caused the login count to increases only by 1 per 2 days
for users who never logout.

This regression is fixed by using > for filter out users
to allow users with online duration = 1 day to have their login count increased.

For the rest cases, assume this regression is fixed for the current method.

---

Case: The launch of update_online by cron lagged or even failed

In this fictitious case,
the launch of update_online lags for 1 seconds on day 2 and fails on day 5.

Event: Login     UpdateD2  UpdateD3  UpdateD4  UpdateD6
Time:  04:00:00  28:00:01  52:00:00  76:00:00 124:00:00
TDur:  00h00m00  24h00m01  48h00m00  72h00m00 120h00m00
CTime: 04:00:00  28:00:01  --        76:00:00 124:00:00
CDurB: 00h00m00  24h00m01  23h59m59  47h59m59  48h00m00
CDurA: 00h00m00  00h00m00  --        00h00m00  00h00m00
CIncr: +1        +1         0        +1        +1        Total: +4
NTime: 04:00:00  28:00:00  52:00:00  76:00:00 124:00:00
NDurB: 00h00m00  24h00m01  24h00m00  24h00m00  48h00m00
NDurA: 00h00m00  00h00m01  00h00m00  00h00m00  00h00m00
NIncr: +1        +1        +1        +1        +2        Total: +6

---

Case: Same total online duration, different login count increment

Compare the following cases.

Event: Login     UpdateD2  UpdateD3  Logout
Time:  04:00:01  28:00:00  52:00:00  52:00:03
TDur:  00h00m00  23h59m59  47h59m59  48h00m02
CTime: 04:00:01  --        52:00:00  --
CDurB: 00h00m00  23h59m59  47h59m59  00h00m03
CDurA: 00h00m00  --        00h00m00  --
CIncr: +1         0        +1         0        Total: +2
NTime: 04:00:01  --        28:00:01  52:00:03
NDurB: 00h00m00  23h59m59  47h59m59  24h00m02
NDurA: 00h00m00  00h00m00  23h59m59  00h00m02
NIncr: +1         0        +1        +1        Total: +3

Event: Login     UpdateD1  UpdateD2  UpdateD3  Logout
Time:  03:59:59  04:00:00  28:00:00  52:00:00  52:00:01
TDur:  00h00m00  00h00m01  24h00m01  48h00m01  48h00m02
CTime: 03:59:59  --        52:00:00  --        --
CDurB: 00h00m00  00h00m01  24h00m01  24h00m00  00h00m01
CDurA: 00h00m00  --        00h00m00  00h00m00  --
CIncr: +1         0        +1        +1         0        Total: +3
NTime: 03:59:59  --        27:59:59  51:59:59  --
NDurB: 00h00m00  00h00m01  24h00m01  24h00m01  00h00m02
NDurA: 00h00m00  --        00h00m01  00h00m01  --
NIncr: +1         0        +1        +1         0        Total: +3

---

* mbbsd/passwd.c
* util/update_online.c
IepIweidieng added a commit to IepIweidieng/pttbbs that referenced this issue Jun 15, 2021
The method for increasing user's login count has been refined
to prevent undercounting of the login times. It works as follows:

* Login time increment at login (the same as the current method)

For users who haven't logged in today, increase their login count by 1 at login.

* Login time increments according to the online duration

When either update_online runs *or the user logs out*,
if the user's online duration reaches N days (>= N⋅86400 seconds),
increase the user's login count by N and the user's lastlogin by N days.
(Increasing lastlogin by N days makes the online duration decrease by N days)

Compare it to the current method:
* The online duration is not checked at logout -> is checked at logout
* Increment is performed only when the online duration *exceeds* 1 day (> 86400 seconds).
  -> is performed also when the duration reaches 1 day (= 86400 seconds)
    * Explained below as the login count increment rate issue
* The login count only increases by 1 -> can increase by N
* lastlogin is set to the current time (resets the online duration to 0)
  -> lastlogin is increased by N days (reduces the online duration by N days)
    * A reset would cause users to lose some of their online duration,
      enough to cause undercounting.
    * The reset method is also prone to unpredictable cron job delays.

Executing update_online is no longer necessary to increase users' login count
but is still needed to minimize users' loss of login times
due to abnormal termination of their session.

---

The login count increment rate issue
resulted from the slightly flawed condition for excluding users from increment.

The condition should exclude users whose online duration < 1 day.

When update_online was introduced on 2014-03-25 (UTC+8),
this condition was wrongly flipped as `u->lastlogin + DAY_SECONDS <= now`.
It was corrected within an hour to be `now < urec.lastlogin + DAY_SECONDS`
(equivalently `urec.lastlogin + DAY_SECONDS > now`).
See commits 4f69140 ("Add utility to update user online counter (lastlogin/numlogindays).")
and 6871fbb ("Update using ulist.") for details.

A 1-hour tolerance of online duration was soon introduced and cancelled.
See commit 05702e5 ("update_online: prevent too verbose messages and allow some tolerance in duration.").

Within 1 day, another commit changed this condition to `urec.lastlogin >= base`,
where `base` was defined as 09:40 of the current day (in the local timezone).
It may have meant to increase the login count for users logged in before 09:40.
This was stated to be inspired by the login time increment at login.

However, update_online seems to have run at 00:00 of the day at that time.
This made the condition always false since 09:40 would be a time in the future,
and thus all online users' login count were increased unconditionally.
This caused overcounting of the login times, as users might manage to log in
at 00:00 and before update_online ran, thus getting 2 login times on that day.

(Cron jobs may delay for seconds, especially when the system load is high)

See commit 05bb3eb ("Change update_online to do same way as pwcu does.")
for details.

In order to address this issue,
commits f1328b3 ("Fix calculation error in update_online.", 2014-12-23 (UTC+8))
and d38b21d ("Fix update_online again, to avoid timezone problems.", 2014-12-24 (UTC+8))
were made to redefine `base` as `time(0) - DAY_SECONDS`.
The relevant logic somehow reverted to using the logic in commit 6871fbb ("Update using ulist.").

However, the condition `urec.lastlogin >= base` did not revert accordingly
and became arithmetically equivalent to `urec.lastlogin + DAY_SECONDS >= now`,
which does exclude users whose online duration = 1 day.

Since update_online runs every 1 day in the current system configuration,
the users' online duration between 2 successive runs cannot exceed 1 day
(assume that cron jobs have the same delay in every run).
In other words, when the users' lastlogin was reset in an update_online run,
their login count would not increase in the next update_online run,
which made their login count only increase by 1 every 2 days.

This issue is fixed by changing this condition to `urec.lastlogin > base`
to allow users with online duration = 1 day to have their login count increased.

---

* mbbsd/passwd.c:
    * pwcuExitSave():
        * Check the user's online duration at logout
        * If the duration reaches N days,
          increase the login count by N and lastlogin by N days
* util/update_online.c:
    * fastcheck():
        * Do not exclude users whose online duration = 1 day
          from login count increment
        * If the duration reaches N days,
          increase the login count by N and lastlogin by N days
        * Introduce variable `days`
          for the reached number of days of the online duration
    * main():
        * Drop the assignment of now-unused variable `now`
IepIweidieng added a commit to IepIweidieng/pttbbs that referenced this issue Jun 15, 2021
The method for increasing user's login count has been refined
to prevent undercounting of the login times. It works as follows:

* Login time increment at login (the same as the current method)

For users who haven't logged in today, increase their login count by 1 at login.

* Login time increments according to the online duration

When either update_online runs *or the user logs out*,
if the user's online duration reaches N days (>= N⋅86400 seconds),
increase the user's login count by N and the user's lastlogin by N days.
(Increasing lastlogin by N days makes the online duration decrease by N days)

Compare it to the current method:
* The online duration is not checked at logout -> is checked at logout
* Increment is performed only when the online duration *exceeds* 1 day (> 86400 seconds).
  -> is performed also when the duration reaches 1 day (= 86400 seconds)
    * Explained below as the login count increment rate issue
* The login count only increases by 1 -> can increase by N
* lastlogin is set to the current time (resets the online duration to 0)
  -> lastlogin is increased by N days (reduces the online duration by N days)
    * A reset would cause users to lose some of their online duration,
      enough to cause undercounting.
    * The reset method is also prone to unpredictable cron job delays.

Executing update_online is no longer necessary to increase users' login count
but is still needed to minimize users' loss of login times
due to abnormal termination of their session.

---

The login count increment rate issue
resulted from the slightly flawed condition for excluding users from increment.

The condition should exclude users whose online duration < 1 day.

When update_online was introduced on 2014-03-25 (UTC+8),
this condition was wrongly flipped as `u->lastlogin + DAY_SECONDS <= now`.
It was corrected within an hour to be `now < urec.lastlogin + DAY_SECONDS`
(equivalently `urec.lastlogin + DAY_SECONDS > now`).
See commits 4f69140 ("Add utility to update user online counter (lastlogin/numlogindays).")
and 6871fbb ("Update using ulist.") for details.

A 1-hour tolerance of online duration was soon introduced and cancelled.
See commit 05702e5 ("update_online: prevent too verbose messages and allow some tolerance in duration.").

Within 1 day, another commit changed this condition to `urec.lastlogin >= base`,
where `base` was defined as 09:40 of the current day (in the local time zone).
It may have meant to increase the login count for users logged in before 09:40.
This was stated to be inspired by the login time increment at login.

However, update_online seems to have run at 00:00 of the day at that time.
This made the condition always false since 09:40 would be a time in the future,
and thus all online users' login count were increased unconditionally.
This caused overcounting of the login times, as users might manage to log in
at 00:00 and before update_online ran, thus getting 2 login times on that day.

(Cron jobs may delay for seconds, especially when the system load is high)

See commit 05bb3eb ("Change update_online to do same way as pwcu does.")
for details.

In order to address this issue,
commits f1328b3 ("Fix calculation error in update_online.", 2014-12-23 (UTC+8))
and d38b21d ("Fix update_online again, to avoid timezone problems.", 2014-12-24 (UTC+8))
were made to redefine `base` as `time(0) - DAY_SECONDS`.
The relevant logic somehow reverted to using the logic in commit 6871fbb ("Update using ulist.").

However, the condition `urec.lastlogin >= base` did not revert accordingly
and became arithmetically equivalent to `urec.lastlogin + DAY_SECONDS >= now`,
which does exclude users whose online duration = 1 day.

Since update_online runs every 1 day in the current system configuration,
the users' online duration between 2 successive runs cannot exceed 1 day
(assume that cron jobs have the same delay in every run).
In other words, when the users' lastlogin was reset in an update_online run,
their login count would not increase in the next update_online run,
which made their login count only increase by 1 every 2 days.

This issue is fixed by changing this condition to `urec.lastlogin > base`
to allow users with online duration = 1 day to have their login count increased.

---

* mbbsd/passwd.c:
    * pwcuExitSave():
        * Check the user's online duration at logout
        * If the duration reaches N days,
          increase the login count by N and lastlogin by N days
* util/update_online.c:
    * fastcheck():
        * Do not exclude users whose online duration = 1 day
          from login count increment
        * If the duration reaches N days,
          increase the login count by N and lastlogin by N days
        * Introduce variable `days`
          for the reached number of days of the online duration
    * main():
        * Drop the assignment of now-unused variable `now`
IepIweidieng added a commit to IepIweidieng/pttbbs that referenced this issue Jun 15, 2021
The method for increasing user's login count has been refined
to prevent undercounting of the login times. It works as follows:

* Login time increment at login (the same as the current method)

For users who haven't logged in today, increase their login count by 1 at login.

* Login time increments according to the online duration

When either update_online runs *or the user logs out*,
if the user's online duration reaches N days (>= N⋅86400 seconds),
increase the user's login count by N and the user's lastlogin by N days.
(Increasing lastlogin by N days makes the online duration decrease by N days)

Compare it to the current method:
* The online duration is not checked at logout -> is checked at logout
* Increment is performed only when the online duration *exceeds* 1 day (> 86400 seconds)
  -> is performed also when the duration reaches 1 day (= 86400 seconds)
    * Explained below as the login count increment rate issue
* The login count only increases by 1 -> can increase by N
* lastlogin is set to the current time (resets the online duration to 0)
  -> lastlogin is increased by N days (reduces the online duration by N days)
    * A reset would cause users to lose some of their online duration,
      enough to cause undercounting.
    * The reset method is also prone to unpredictable cron job delays.

Executing update_online is no longer necessary to increase users' login count
but is still needed to minimize users' loss of login times
due to abnormal termination of their session.

---

The login count increment rate issue
resulted from the slightly flawed condition for excluding users from increment.

The condition should exclude users whose online duration < 1 day.

When update_online was introduced on 2014-03-25 (UTC+8),
this condition was wrongly flipped as `u->lastlogin + DAY_SECONDS <= now`.
It was corrected within an hour to be `now < urec.lastlogin + DAY_SECONDS`
(equivalently `urec.lastlogin + DAY_SECONDS > now`).
See commits 4f69140 ("Add utility to update user online counter (lastlogin/numlogindays).")
and 6871fbb ("Update using ulist.") for details.

A 1-hour tolerance of online duration was soon introduced and cancelled.
See commit 05702e5 ("update_online: prevent too verbose messages and allow some tolerance in duration.").

Within 1 day, another commit changed this condition to `urec.lastlogin >= base`,
where `base` was defined as 09:40 of the current day (in the local time zone).
It may have meant to increase the login count for users logged in before 09:40.
This was stated to be inspired by the login time increment at login.

However, update_online seems to have run at 00:00 of the day at that time.
This made the condition always false since 09:40 would be a time in the future,
and thus all online users' login count were increased unconditionally.
This caused overcounting of the login times, as users might manage to log in
at 00:00 and before update_online ran, thus getting 2 login times on that day.

(Cron jobs may delay for seconds, especially when the system load is high)

See commit 05bb3eb ("Change update_online to do same way as pwcu does.")
for details.

In order to address this issue,
commits f1328b3 ("Fix calculation error in update_online.", 2014-12-23 (UTC+8))
and d38b21d ("Fix update_online again, to avoid timezone problems.", 2014-12-24 (UTC+8))
were made to redefine `base` as `time(0) - DAY_SECONDS`.
The relevant logic somehow reverted to using the logic in commit 6871fbb ("Update using ulist.").

However, the condition `urec.lastlogin >= base` did not revert accordingly
and became arithmetically equivalent to `urec.lastlogin + DAY_SECONDS >= now`,
which does exclude users whose online duration = 1 day.

Since update_online runs every 1 day in the current system configuration,
the users' online duration between 2 successive runs cannot exceed 1 day
(assume that cron jobs have the same delay in every run).
In other words, when the users' lastlogin was reset in an update_online run,
their login count would not increase in the next update_online run,
which made their login count only increase by 1 every 2 days.

This issue is fixed by changing this condition to `urec.lastlogin > base`
to allow users with online duration = 1 day to have their login count increased.

---

* mbbsd/passwd.c:
    * pwcuExitSave():
        * Check the user's online duration at logout
        * If the duration reaches N days,
          increase the login count by N and lastlogin by N days
* util/update_online.c:
    * fastcheck():
        * Do not exclude users whose online duration = 1 day
          from login count increment
        * If the duration reaches N days,
          increase the login count by N and lastlogin by N days
        * Introduce variable `days`
          for the reached number of days of the online duration
    * main():
        * Drop the assignment of now-unused variable `now`
IepIweidieng added a commit to IepIweidieng/pttbbs that referenced this issue Jun 8, 2023
…line users [ptt#104]

The algorithm for increasing online users' login count has been improved,
which fixes the issue that the increment was sometimes done only per 2 days.

When update_online was introduced in 4f69140
("Add utility to update user online counter (lastlogin/numlogindays).", 2014-03-25 (UTC+8)),
the condition for excluding users from an update by their online duration
was wrongly flipped as `u->lastlogin + DAY_SECONDS <= now`
but had been corrected within 1 hour into `now < urec.lastlogin + DAY_SECONDS`
(equivalently `urec.lastlogin + DAY_SECONDS > now`) in 6871fbb ("Update using ulist.").

A 1-hour tolerance of online duration was soon introduced in 05702e5
("update_online: prevent too verbose messages and allow some tolerance in duration.")
but had soon been cancelled.

Within 1 day, 05bb3eb ("Change update_online to do same way as pwcu does.")
changed this condition into `urec.lastlogin >= base`,
where `base` was 09:40 of the current day (in the local time zone).

However, update_online seems to have run at 00:00 every day for PTT,
which made the condition always false,
and thus all online users' login count were increased unconditionally.
This had allowed the users to earn 2 login count within a day
by logging in at 00:00 and before update_online ran.
(Note that Cron can lags due to a high system load)

Therefore, f1328b3 ("Fix calculation error in update_online.", 2014-12-23 (UTC+8))
and d38b21d ("Fix update_online again, to avoid timezone problems.", 2014-12-24 (UTC+8))
redefined `base` as `time(0) - DAY_SECONDS`,
which has partially reverted the logic to 6871fbb ("Update using ulist.").

However, the condition `urec.lastlogin >= base` was not reverted accordingly
and has become equivalent to `urec.lastlogin + DAY_SECONDS >= now`,
which does exclude users whose online duration = 1 day.
This has caused online users' login count increase only by 1 per 2 days
(assume that Cron did not lag).

In the new algorithm, users' increment of login count is first calculated,
then users whose login count do not increase are excluded,
and then their login time and count are advanced/increased accordingly.
It works regardless of the time and frequency that update_online runs.

To account for the possible lagging of Cron,
the reference time is now `now`, the nearest minute in the past from now.

Because updating users' login count at pwcuExitSave() would cause unwanted logs,
perform such updating at logout is not considered.
Instead, to minimize online user's possible loss of login count,
update_online can be excuted multiple times per day.

* fastcheck()
    * add local `now` for the nearest minute in the past
    * add local `days` for the count of days since `urec.lastlogin` until `now`
    * make `urec.lastlogin` advance by `days * DAY_SECONDS`
      instead of reset to `now`
    * make `urec.numlogindays` increase by `days` instead of fixed 1
* main()
    * drop the assignment of now-unused global `now`
IepIweidieng added a commit to IepIweidieng/pttbbs that referenced this issue Jun 8, 2023
…line users [ptt#104]

The algorithm for increasing online users' login count has been improved,
which fixes the issue that the increment was sometimes done only per 2 days.

When update_online was introduced in 4f69140
("Add utility to update user online counter (lastlogin/numlogindays).", 2014-03-25 (UTC+8)),
the condition for excluding users from an update by their online duration
was wrongly flipped as `u->lastlogin + DAY_SECONDS <= now`
but had been corrected within 1 hour into `now < urec.lastlogin + DAY_SECONDS`
(equivalently `urec.lastlogin + DAY_SECONDS > now`) in 6871fbb ("Update using ulist.").

A 1-hour tolerance of online duration was soon introduced in 05702e5
("update_online: prevent too verbose messages and allow some tolerance in duration.")
but had soon been cancelled.

Within 1 day, 05bb3eb ("Change update_online to do same way as pwcu does.")
changed this condition into `urec.lastlogin >= base`,
where `base` was 09:40 of the current day (in the local time zone).

However, update_online seems to have run at 00:00 every day for PTT,
which made the condition always false,
and thus all online users' login count were increased unconditionally.
This had allowed the users to earn a login count of 2 within a day
by logging in at 00:00 and before update_online ran.
(Note that Cron can lags due to a high system load)

Therefore, f1328b3 ("Fix calculation error in update_online.", 2014-12-23 (UTC+8))
and d38b21d ("Fix update_online again, to avoid timezone problems.", 2014-12-24 (UTC+8))
redefined `base` as `time(0) - DAY_SECONDS`,
which has partially reverted the logic to 6871fbb ("Update using ulist.").

However, the condition `urec.lastlogin >= base` was not reverted accordingly
and has become equivalent to `urec.lastlogin + DAY_SECONDS >= now`,
which does exclude users whose online duration = 1 day.
This has caused online users' login count increase only by 1 per 2 days
(assume that Cron did not lag).

In the new algorithm, users' increment of login count is first calculated,
then users whose login count do not increase are excluded,
and then their login time and count are advanced/increased accordingly.
It works regardless of the time and frequency that update_online runs.

To account for the possible lagging of Cron,
the reference time is now `now`, the nearest minute in the past from now.

Because updating users' login count at pwcuExitSave() would cause unwanted logs,
perform such updating at logout is not considered.
Instead, to minimize online user's possible loss of login count,
update_online can be excuted multiple times per day.

* fastcheck()
    * add local `now` for the nearest minute in the past
    * add local `days` for the count of days since `urec.lastlogin` until `now`
    * make `urec.lastlogin` advance by `days * DAY_SECONDS`
      instead of reset to `now`
    * make `urec.numlogindays` increase by `days` instead of fixed 1
* main()
    * drop the assignment of now-unused global `now`
IepIweidieng added a commit to IepIweidieng/pttbbs that referenced this issue Jun 8, 2023
…line users [ptt#104]

The algorithm for increasing online users' login count has been improved,
which fixes the issue that the increment was sometimes done only per 2 days.

When update_online was introduced in 4f69140
("Add utility to update user online counter (lastlogin/numlogindays).", 2014-03-25 (UTC+8)),
the condition for excluding users from an update by their online duration
was wrongly flipped as `u->lastlogin + DAY_SECONDS <= now`
but had been corrected within 1 hour into `now < urec.lastlogin + DAY_SECONDS`
(equivalently `urec.lastlogin + DAY_SECONDS > now`) in 6871fbb ("Update using ulist.").

A 1-hour tolerance of online duration was soon introduced in 05702e5
("update_online: prevent too verbose messages and allow some tolerance in duration.")
but had soon been cancelled.

Within 1 day, 05bb3eb ("Change update_online to do same way as pwcu does.")
changed this condition into `urec.lastlogin >= base`,
where `base` was 09:40 of the current day (in the local time zone).

However, update_online seems to have run at 00:00 every day for PTT,
which made the condition always false,
and thus all online users' login count were increased unconditionally.
This had allowed the users to earn a login count of 2 within a day
by logging in at 00:00 and before update_online ran.
(Note that Cron can lags due to a high system load)

Therefore, f1328b3 ("Fix calculation error in update_online.", 2014-12-23 (UTC+8))
and d38b21d ("Fix update_online again, to avoid timezone problems.", 2014-12-24 (UTC+8))
redefined `base` as `time(0) - DAY_SECONDS`,
which had partially reverted the logic to 6871fbb ("Update using ulist.").

However, the condition `urec.lastlogin >= base` was not reverted accordingly
and had become equivalent to `urec.lastlogin + DAY_SECONDS >= now`,
which does exclude users whose online duration = 1 day.
This had caused online users' login count increase only by 1 per 2 days
(assume that Cron did not lag).

In the new algorithm, users' increment of login count is first calculated,
then users whose login count do not increase are excluded,
and then their login time and count are advanced/increased accordingly.
It works regardless of the time and frequency that update_online runs.

To account for the possible lagging of Cron,
the reference time is now `now`, the nearest minute in the past from now.

Because updating users' login count at pwcuExitSave() would cause unwanted logs,
perform such updating at logout is not considered.
Instead, to minimize online user's possible loss of login count,
update_online can be excuted multiple times per day.

* fastcheck()
    * add local `now` for the nearest minute in the past
    * add local `days` for the count of days since `urec.lastlogin` until `now`
    * make `urec.lastlogin` advance by `days * DAY_SECONDS`
      instead of reset to `now`
    * make `urec.numlogindays` increase by `days` instead of fixed 1
* main()
    * drop the assignment of now-unused global `now`
@IepIweidieng
Copy link
Contributor

The current algorithm:
目前演算法:

  • Increment is performed only when the online duration exceeds 1 day (> 86400 seconds)
    只在線上時間 超過 一天(> 86400 秒)時增加登入次數
    • Explained below as the login count increment rate issue
      詳細說明請見底下的「登入次數增加速率問題」
  • The login count only increases by 1
    登入次數一次只加一
  • lastlogin is set to the current time (resets the online duration to 0)
    lastlogin 會設成目前的時間(重設線上時間為 0)
    • A reset would cause users to lose some of their online duration, enough to cause undercounting.
      線上時間的重設會造成使用者損失部份線上時間,足以造成少算。
    • The reset method is also prone to unpredictable cron job delays.
      使用重設的方法也容易遇到 cron 排程工作有不可預期的執行延遲的問題。

Demonstration Case 展示案例

  • Format of time point: D<nth-day> <hour>:<minute>
    時間點格式:D<第幾天> <時>:<分>
  • Format of time duration: <num-of-days>d<hours>h<minutes>
    時間長度格式:<天數>d<小時數>h<分鐘數>
Event
事件
Login
登入
(Logout)
(登出)
Checkpoint
檢查點
(Logout)
(登出)
Checkpoint/logout
檢查點/登出
Time point
時間點
D1 23:50 D3 00:01 D3 04:00 D3 23:50 D4 04:00
Total online duration
總線上時間
0d00h00 1d00h11 1d04h10 2d00h00 2d04h10
Checkpoint (current method)
現方法檢查點
D1 23:50 D1 23:50 D1 23:50
-> D3 04:00
D3 04:00 D3 04:00
Online duration (current method)
現方法線上時間
0d00h00 1d00h11 1d04h10
-> 0d00h00
0d19h50 1d00h00
Login count (current method)
現方法登入次數
+1 (共 +1) +1 (共 +2) +0 (共 +2)
Checkpoint (theoretical)
理論檢查點
D1 23:50 D1 23:50 D1 23:50
-> D2 23:50
D2 23:50 D2 23:50
-> D3 23:50
Online duration (theoretical)
理論線上時間
0d00h00 1d00h11 1d04h10
-> 0d04h10
1d00h00 1d04h10
-> 0d04h10
Login count (theoretical)
理論登入次數
+1 (共 +1) +1 (共 +2) +1 (共 +3)
  • Note that, if the current method is used, the login count is only increased when the online duration exceeds 1d00h00.
    注意若依照現方法,線上時間須 超過 1d00h00 才會增加登入次數。

The Login Count Increment Rate Issue 登入次數增加速率問題

This issue resulted from the slightly flawed condition for excluding users from increment.
此問題來自有點瑕疵的用來排除要增加其登入次數的使用者的條件式。

The condition should exclude users whose online duration < 1 day.
這個條件式應該要排除上線時間小於一天的使用者。

When update_online was introduced on 2014-03-25 (UTC+8), this condition was wrongly flipped as u->lastlogin + DAY_SECONDS <= now.
It had been corrected within 1 hour into now < urec.lastlogin + DAY_SECONDS (equivalently urec.lastlogin + DAY_SECONDS > now).
update_online 在 2014-03-25 (UTC+8) 引進時,這個條件式是效果相反的 u->lastlogin + DAY_SECONDS <= now
而在一小時內就被修正成了 now < urec.lastlogin + DAY_SECONDS(等價於 urec.lastlogin + DAY_SECONDS > now)。

For details, see:
相關細節請見:

  • 4f69140 ("Add utility to update user online counter (lastlogin/numlogindays).")
  • 6871fbb ("Update using ulist.")

A 1-hour tolerance of online duration was soon introduced but had soon been cancelled.
See commit 05702e5 ("update_online: prevent too verbose messages and allow some tolerance in duration.").
不久之後,一小時的線上時間誤差容忍值就被引進,但出現不久就被取消了。
請見 commit 05702e5 ("update_online: prevent too verbose messages and allow some tolerance in duration.")。

Within 1 day, another commit changed this condition to urec.lastlogin >= base, where base was 09:40 of the current day (in the local time zone).
It may have meant to increase the login count for users logged in before 09:40.
This was stated to be inspired by the login time increment at login.
一天之內有另一個 commit 將此條件式修改為 urec.lastlogin >= base,其中 base 被定義為當天的 09:40(目前時區),可能原本是為了要增加在 09:40 前登入的使用者的登入次數。
有說明到說這是受到登入時增加登入次數的啟發。

However, update_online seems to have run at 00:00 every day for PTT.
This made the condition always false since 09:40 would be a time in the future, and thus all online users' login count were increased unconditionally.
This caused overcounting of the login times, as this had allowed users to earn a login count of 2 within a day by logging in at 00:00 and before update_online ran.
但是當時批踢踢的 update_online 似乎是每天在 00:00 執行的,09:40 是未來時間,會讓這個條件式永不成立,所以線上使用者的登入次數會被無條件增加。
這會造成登入次數多算。因為使用者能透過在 00:00 時,update_online 執行前登入,而在一天內取得兩次的登入次數。

(Cron jobs may delay for seconds, especially when the system load is high)
(Cron 排程工作有可能會延遲幾秒執行,尤其是在系統負載高的時候)

For details, see:
相關細節請見:

  • 05bb3eb ("Change update_online to do same way as pwcu does.")。

In order to address this issue, the following commits redefined base as time(0) - DAY_SECONDS:
為了解決這個問題, 以下的 commits 把 base 重新定義成 time(0) - DAY_SECONDS

  • f1328b3 ("Fix calculation error in update_online.", 2014-12-23 (UTC+8))
  • d38b21d ("Fix update_online again, to avoid timezone problems.", 2014-12-24 (UTC+8))

The relevant logic partially reverted to commit 6871fbb ("Update using ulist.").
相關邏輯部份地恢復到了 commit 6871fbb ("Update using ulist.") 的時候。

However, the condition urec.lastlogin >= base was not reverted accordingly and has become equivalent to urec.lastlogin + DAY_SECONDS >= now, which does exclude users whose online duration = 1 day.
但是條件式 urec.lastlogin >= base 沒跟著恢復,結果變成等價於 urec.lastlogin + DAY_SECONDS >= now,會排除線上時間等於一天的使用者。

Since update_online runs every 1 day in the current system configuration for PTT, the users' online duration between 2 successive runs cannot exceed 1 day (assume that cron jobs have the same delay in every run).
In other words, when the users' lastlogin had been reset in an update_online run, their login count would not increase in the next update_online run, which made their login count increase only by 1 per 2 days.
因為根據目前批踢踢的系統設定,update_online 每天會執行一次,所以使用者在兩次執行間所累積的線上時間沒辦法超過一天(假設 cron 排程工作每次執行的延遲都一樣)。
也就是說,如果使用者的 lastlogin 在某次 update_online 執行時被重設了,那下次 update_online 執行時,登入次數就不會增加,結果使用者的登入次數每兩天只會增加一。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants