-
-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: use canonical time operations instead of bare uint/float/string #76
refactor: use canonical time operations instead of bare uint/float/string #76
Conversation
Hi @ravsii, I looked through your changes and I've got to say well done! This can go in whenever you are ready with it. Also the findings and bugs you mentioned in the description are 100% correct. Those are the things that made the code a bit too complex to begin with. Need to find a solution for that. |
Thanks @pouriyajamshidi. I'll check the codebase for other durations, which I could have missed / not changed yet. As for bugs, I suggest we fix them in a separate PR/issue |
Totally agree. |
Remove 2 consts (thousandMillisend and oneSecond) which are basically the exact copy of time.Second
Hey, @pouriyajamshidi. Is there a reason that this function exists? Lines 478 to 481 in 8d4aa31
I think I mostly finished with the codebase review in terms of time-related changes and this is the last one |
Hey @ravsii, It once was providing some value but I don't think it really adds anything anymore. |
@pouriyajamshidi |
Yea, it is cleaner your way. |
While reviewing your PR, found another bug that got me side-tracked a bit to fix. Will get back to yours. Need to check a small thing with your PR. Apart from that, awesome job 👍🏻 |
Ok, I did some more testing and as expected, the small downtimes (less than 1 second) are shown as zero. This will propagate throughout the runtime and say after a few days/hours of it running, users will see very strange and "inaccurate" reports. It is better to address it in this PR since these changes (although correct) are causing a bit of drift. Below picture is just a reference: |
@pouriyajamshidi If both I hope this will get magically fixed by one of the comments regarding P. S. guess it's time to add more tests 😄 |
I'm excited to hear about it. However, do not feel the need to rush things. Take your time.
Absolutely right. This is another thing that I am guilty of 😄. The inconsistencies of timing reports. In this case, I refrained from bumping up the time as well and resorted to just using int/uint which you are helping to get away from.
I will get to that comment soon. This bug has indeed existed for a long time but I never get the time I need to dig deep and come up with a fix.
yes, yes and yes 😄 hopefully more people like you would step to lend a hand. |
So, after playing with it for a while, I actually think that the problem is somewhere else, to be exact, in these lines: Line 517 in 761ae0f
Line 495 in 761ae0f
I dont think we can 100% guarantee that this block of code (old version) Lines 529 to 543 in 761ae0f
will take EXACTLY I suggest instead of incrementing it exactly by 1 second we can just pass a duration and increment it by that (actually precise) duration. Of course that doesn't guarantee us that 100 probes would equal to exactly 100 seconds, but the data total/longest timestamps and duration should be in sync with one another. Maybe I'm wrong. Honestly, I think instead of
Let me know what you think @pouriyajamshidi |
First off, I appreciate your inputs and the effort you put in. 🤝
Yes, quite plausible that this is the case.
There is really no need to go to the extreme to make sure we are right on the dot. The main goal is to keep the program simple, because it has a relatively basic functionality. As long as we are reporting a close enough result, I think we should be happy. Of course, as we progress, we can always reiterate and improve it.
This sounds reasonable to me. Again, as long as the actual duration represents a close enough relation to the reports, it is fine.
Ticker does sound like a good candidate. However, does it really bring more added value? Considering we cannot guarantee a 100% correct timing anyway with the cost of an additional thread? And in fact, we are roughly spending no more or less than a second in the Please note that I am not opposing the idea here, just brainstorming.
That can be part of another PR. Thankfully, you have made great progress in tidying up and ironing out some of the technical debts. My main concern regarding this PR is showing |
Makes sense, I'll focus on that in this PR then.
My bad for not explaining it the right way. What I meant is the root of the problem is that in total uptime/downtime we adding exactly 1 second, but when calculating longest uptime/downtime, we're actually using That's why I said we needed exactly 1 sec. |
Aha, now it makes sense. Thanks for clearing it up. I trust in you and your expertise. Feel free to continue with what you have in mind, whether in this PR or another. I have no objection and in fact, welcome it. |
So, if we're targeting specifically "good-looking" results to the end user in this PR, this is probably the only achievable way to do this, @pouriyajamshidi. (commits above) To be honest, while trying many possible changes for this to work, I think some code needs to be reworked, especially the way we handle "probe timeout". It was (and still is) hard-coded as I don't think that there's anything criminal about reporting a total uptime of 6.7 seconds for 7 probes, which would actually be a lot more accurate (in both ways, calculations and data representing). When (if?) we get this to work (maybe even postpone/reject this PR) I think that part needs attention the most. Just some thoughts, but I guess that's for another issue. |
That is correct but don't let that discourage you. This can be addressed here or in the future.
No, definitely nothing criminal about that. Apart from the looks, there is a reality factor to it too. For instance, if a server was up for a minute and you probed it 60 times, regardless of CPU time, RTT, etc... that server was up for 60 seconds. Although I do understand where you are coming from and have no objection in altering the current behavior as long as the drift is insignificant. I quite like this PR and the way you have cleaned up many of the existing hacks. With that being said, I am eager to merge it in whenever you feel confident with it. Maybe we can cosmetically fix some of the nuances but not sure how much of a headache/work that brings for now. |
Thanks for your thoughts. I'm fine with this PR as-is, but if want me to change something in particular or if you've found bugs, feel free to ask for a change, because I haven't found any, and as my experience says, it's usually a sign that I've missed something haha.
Yeah that totally makes sense. I just don't think we'll be able to produce 100% matching timestamps/durations, or at least not in 100% cases. Users are probably expecting such output, so it needs some time to think about possible solutions without hard-coding anything. finally it's ready to be closed/merged, haha |
You have a good eye for catching bugs 😄 . I will run a few tests as well in a while.
At this stage, I think we can be a bit lenient and be fine with not producing a definite exact result. We can always improve it in the future..
Cool. Will get back to you after some tests. |
Like in longest consecutive downtime on the first screenshot? Make it Also total downtime is 2 seconds off, which wasn't expected tbh, but that what I was talking about when saying
|
Sorry had to be more specific. yes, the longest consecutive downtime on the first screenshot. At this point I am even thinking of coming up with a different message if the downtime is ~1 second. But I will keep it for another PR. Don't wanna clutter this.
Yea, getting the timing right with the mess we had is not so easy 😄 |
haha exactly.
Sure. I'll fix this soon and let's merge it then. |
whoopsie! I missed that too :S
Alrighty, let's merge it in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job!
Describe your changes
Durations were previously handled differently in different places, refactor some (for this draft PR) of them just to discuss the changes. Tests for
TestCalcTime
have been passedExample output (time.Duration as of ravsii@e865f5c):
![image](https://private-user-images.githubusercontent.com/5007271/238060024-2bf1d792-859f-4316-954f-fd1893427daa.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjEyMjY5MDgsIm5iZiI6MTcyMTIyNjYwOCwicGF0aCI6Ii81MDA3MjcxLzIzODA2MDAyNC0yYmYxZDc5Mi04NTlmLTQzMTYtOTU0Zi1mZDE4OTM0MjdkYWEucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDcxNyUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA3MTdUMTQzMDA4WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9ODI2MzdjYmZhZTJmZjJlZTMwMTI2NjQ2NGQ1MGQ1YzJjNmU2M2EwMzEyYzMzOTg3Zjg5YTc1ZDY0YzkwNGI4ZCZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.oBb6Nu4P-aNnlgCVPlAFFBMhpzFU54_6QS1upKlYppE)
Example output (master as of 8c3d3b7)
![image](https://private-user-images.githubusercontent.com/5007271/238061851-a1a49bc3-0423-41cb-920c-275353dd4cef.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjEyMjY5MDgsIm5iZiI6MTcyMTIyNjYwOCwicGF0aCI6Ii81MDA3MjcxLzIzODA2MTg1MS1hMWE0OWJjMy0wNDIzLTQxY2ItOTIwYy0yNzUzNTNkZDRjZWYucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDcxNyUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA3MTdUMTQzMDA4WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9NzIzOTQzZTNiZTVjYWRkNWY5Mzk3Yjc0ZjE5MTIxMmNmMTI3YjUzN2E2ZjA1MDQ0OTYzZGIzMGQzYzZiN2YzZCZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.Wh0_OZgNcLY4ujaUs11hfRPWoNSG4EWbP_nW_ythXSk)
There are some bugs at the moment:
(context #73) I think the reason
time.Duration
was avoided initially is that helper functions, likex.Hours()
,x.Minutes()
return results with decimals, meaningtime.Duration(10*time.Minute).Hours()
will return0.16...
but not0
.Another guess is that
time.Duration
don't provide helper functions for calculating lower units without higher units, likex := time.Duration(70*time.Minutes)
will returnx.Hours = 1.16...
andx.Minutes() = 70
, but notx.Minutes() = 10
Issue ticket number and link
Closes #73
Checklist before requesting a review
make check
and there are no failures.Type of change
Please delete options that are not relevant.