Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change timestamps from custom strings to datetime in result model and to ISO 8601 format in output.xml #4258

Closed
pekkaklarck opened this issue Mar 8, 2022 · 13 comments

Comments

@pekkaklarck
Copy link
Member

We currently have our own get_timestamp function for getting the current timestamp and get_elapsed_time for calculating the difference between two timestamps as milliseconds.

>>> t1 = get_timestamp()
>>> t2 = get_timestamp()
>>> print(t1)
20220308 12:37:43.875
>>> print(t2)
20220308 12:37:49.022
>>> get_elapsed_time(t1, t2)
5147

We should replace these custom functions with functionality provided by Python's standard datetime module. In practice we could get the current time by using datetime.now() and could calculate the elapsed time by simply subtracting datetime objects:

>>> d1 = datetime.now()
>>> d2 = datetime.now()
>>> print(d1)
2022-03-08 12:43:56.109349
>>> print(d2)
2022-03-08 12:44:03.567573
>>> delta = d2 - d1
>>> round(delta.total_seconds() * 1000)
7458

One benefit of using the standard datetime module would be that we could remove lot of our custom code. A bigger benefit is performance measured here using timeit:

>>> timeit(lambda: get_timestamp())
1.0487582740024664
>>> timeit(lambda: datetime.now())
0.2848877740034368
>>> timeit(lambda: get_elapsed_time(t1, t2))
5.105318886999157
>>> timeit(lambda: round((d2-d1).total_seconds() * 1000))
0.24636966099205893

As can seen from the examples, the difference in performance is pretty big when getting the timestamp and rather huge when calculating the elapsed time. These tests each run one million iterations, so the time for one iteration isn't huge and this doesn't matter much with smallish test/task runs. With bigger runs containing e.g. loops times start to add, though, because we call get_timestamp twice and get_elapsed_time once for each keyword, loop iteration, IF/ELSE branch, and so on.

@pekkaklarck
Copy link
Member Author

pekkaklarck commented Mar 8, 2022

What exactly we need to change to take datetime objects into use is still a bit unclear. One thing to consider is should we preserve existing starttime and endtime attributes in result model objects for backwards compatibility reasons. In practice we could store actual datetime objects to other properties (possibly under some new parent object like times that could be used like kw.times.start) and the old attributes would be turned to properties.

@pekkaklarck
Copy link
Member Author

pekkaklarck commented Mar 8, 2022

If someone wonders why we have even created our own timestamp handling functions and didn't use datetime to begin with, the reason is that earliest Robot Framework releases supported Python 2.2 and datetime was added in Python 2.3. After we had added our own utils, switching didn't anymore make sense. I also got a feeling datetime didn't have as good performance back then as it has now and at least it didn't have all the current features.

@pekkaklarck
Copy link
Member Author

Thought about the API bit more and think this would work:

  1. Model objects get new start_time and end_time attributes that contain the actual datetime objects (and are None by default).
  2. Model objects also get new elapsed_time attribute that is property returning self.end_time - self.start_time. If either start or end time is not defined, should behave same way as nowadays.
  3. Old starttime, endtime and elapsedtime are preserved as deprecated propertys that get the actual value from the aforementioned new attributes. They'd return same values as these attributes return currently.
  4. For consistency, arguments passed to model objects when they are created are renamed from starttime and endtime to start_time and end_time. This is backwards incompatible, but creating these model objects ought to be pretty rare. They are more often only inspected by custom tools.

@pekkaklarck
Copy link
Member Author

pekkaklarck commented Mar 8, 2022

We probably should change the formatted timestamps from the current 20220308 16:36:12.123 format to more ISO 8601 compatible format like 2022-03-08 16:36:12.123. We should use it in logs, reports and in output.xml. In the latter we could consider using the official T as the separator between date and time parts, but with logs and reports it's better to use the space character.

The benefits of using the ISO 8601 format is to make the date part easier to interpret for humans and to make dates in output.xml easier to parse for external tools. Very importantly, we could use datetime.isoformat and datetime.fromisoformat that are very fast compared to using datetime.strftime and datetime.strptime we needed to use with custom times:

>>> d = datetime.now()
>>> d.isoformat(' ') == d.strftime('%Y-%m-%d %H:%M:%S.%f')
True
>>> timeit(lambda: d.isoformat(' '))
0.6668484999972861
>>> timeit(lambda: d.strftime('%Y-%m-%d %H:%M:%S.%f'))
1.6757996669912245
>>> 
>>> s = d.isoformat(' ')
>>> datetime.fromisoformat(s) == datetime.strptime(s, '%Y-%m-%d %H:%M:%S.%f')
True
>>> timeit(lambda: datetime.fromisoformat(s))
0.1304736560123274
>>> timeit(lambda: datetime.strptime(s, '%Y-%m-%d %H:%M:%S.%f'))
5.866141628997866 

A drawback of using 2022-03-08 instead of 20220308, and the reason we've used the latter so far, is that it's two characters wider. This probably requires us to adjust styles in log and report, but that's a small task and in general the benefits above are bigger than problems. The deprecated starttime and endtime properties, discussed in the above comment, should nevertheless return the timestamps in the old format.

A bigger drawback of changing the date format in output.xml is that it's a backwards incompatible change. I still consider benefits bigger, but it's a bit questionable to do such changes like that in RF 5.1. If this is considered a big issue, we can decide to post-pone the change to RF 6.0 (or just change RF 5.1 to 6.0).

@pekkaklarck
Copy link
Member Author

pekkaklarck commented Mar 8, 2022

One smallish thing to decide is the timestamp precision. So far we have used milliseconds but datetime.now() uses microsecond precision. Due to Robot's own overhead, I got a feeling microsecond accuracy isn't that precise and would in general prefer using milliseconds.

We should use milliseconds at least in log and report to avoid timestamps getting overly long and, more importantly, to avoid adding three digits for each timestamp. Those added digits would probably increase log.html size considerably.

In output.xml we could still use microseconds if we wanted to. That would have a benefit that when we read the timestamp from there we get the same data as we originally had. If we only store timestamps in millisecond accuracy, then the data after the round-trip is different than what we originally for from datetime.now():

>>> d = datetime.now()
>>> s = d.isoformat()
>>> d == datetime.fromisoformat(s)
True
>>> d.isoformat(' ')[:-3]
'2022-03-08 17:36:38.972'
>>> s = d.isoformat(' ')[:-3]
>>> d == datetime.fromisoformat(s)
False

A drawback of using microseconds in output.xml is the increased file size. One way to reduce the size would be that instead of storing start and end times in output.xml, we'd save start and elapsed time. The elapsed time in microsecond accuracy would look like 0:01:30.123456 and would certainly be shorter than end timestamp in millisecond accuracy. This obviously would be a backwards incompatible change, but if we are anyway changing timestamp format in output.xml this could be fine as well. Often the elapsed time is anyway more interesting than actual start/end times and in those cases having the elapsed time directly available would be convenient. Anyway, if we decide to do this, the change should get its own issue.

If we decide to use milliseconds in output.xml, then we probably should use milliseconds also with the initial start/end times we get during execution. That way model objects would have same start/end time regardless are they created during execution or based on output.xml. That would mean that instead of just using datetime.now(), we needed to use something like this:

def now():
    d = datetime.now()
    ms = round(d.microsecond, -3)
    if ms == 1_000_000:
        return d.replace(microsecond=0) + timedelta(seconds=1)
    return d.replace(microsecond=ms)

The above obviously isn't as fast as using datetime.now() alone would be. In fact, it's even a bit slower than using the current get_timestamp:

>>> timeit(lambda: datetime.now())
0.2810397749999538
>>> timeit(lambda: now())
1.3667088740039617
>>> timeit(lambda: get_timestamp())
1.0332882049988257

Calculating the elapsed time would still be a lot faster than earlier, so even with this approach we'd get performance benefits overall. Anyway, either using microseconds in output.xml or just accepting that model objects have different times depending on do we get them during execution or from output.xml is probably a better idea. I think it is best to use microseconds in output.xml.

@pekkaklarck
Copy link
Member Author

Yet another thing to decide is how to handle timezones. Using datetime terminology, we should decide should timestamps be aware or naive. I believe using naive timestamps is better, because adding timezone information would increase file sizes and all timestamps having something like +02:00 at the end would likely be a distraction rather than bring benefits. Possibly we could store the local timezone information somewhere separately, at least in output.xml, to have it available for those who need it.

@pekkaklarck
Copy link
Member Author

Due to changes needed to output.xml, and output.xml being such an important interface between external systems, it's better to do this in RF 6.0.

A benefit of waiting for RF 6.0 is that it won't anymore support Python 3.6 that doesn't have datetime.fromisoformat.

@pekkaklarck pekkaklarck modified the milestones: v5.1, v6.0 Apr 1, 2022
@pekkaklarck pekkaklarck changed the title Performance enhacements to getting timestamps and calculating elaped times Performance enhancements for getting timestamps and calculating elapsed times Aug 26, 2022
@pekkaklarck
Copy link
Member Author

This will be done in RF 7.0. For forward compatibility, it would be good to add propertys containing start, end and elapsed times as datetime and timedelta objects already in RF 6.1. I'll submit a separate issue about that.

@pekkaklarck
Copy link
Member Author

I commented timestamp precision above and explained how rounding microseconds used by datetime.now() to milliseconds we currently use slows getting the timestamp down. I just noticed that datetime.isoformat has an optional timespec argument that makes it easy to round to milliseconds:

>>> dt = datetime.now()
>>> dt.isoformat(' ')
'2023-08-26 01:19:26.574297'
>>> dt.isoformat(' ', 'milliseconds')
'2023-08-26 01:19:26.574'

The performance with and withouttimespec is the same so it avoids performance issues. There would, however, be still the problem that times would have higher precision during execution (microseconds) than afterwards (milliseconds). My current thinking is that we should use microseconds in output.xml and decide what precision to use in the log file later.

@pekkaklarck
Copy link
Member Author

In addition to changing start and end times of the result objects to datetime, we should change the timestamp of Message objects too.

pekkaklarck added a commit that referenced this issue Sep 8, 2023
This is the first step of using `datatime` for timestamps (#4258).
Result objects now get their timestamps using `datetime.now()` and
they are stored to `start_time` and `end_time` attributes. Old
`starttime` and `endtime` are propertyes that return same string
representation as earlier.

Timestamps are still stored to output.xml in the old format. Moving to
ISO 8601 format for performance and standard compatibility is the next
step.
pekkaklarck added a commit that referenced this issue Sep 8, 2023
Actually end time isn't saved at all, instead we save elapsed. Also
XML attribute names are changed to shorter `start` and `elapsed` from
`starttime` and `endtime`. This is the second part of #4258.

Also output.xml generated time is now in ISO format.
pekkaklarck added a commit that referenced this issue Sep 8, 2023
Timestamps in output.xml and also in the debug file are now in ISO
8601 format. Part of #4258.
pekkaklarck added a commit that referenced this issue Sep 8, 2023
Avoid old `starttime`, `endtime` and `elapsedtime` and use new
`start_time`, `end_time´ and `elapsed_time` instead. Related to #4258.
pekkaklarck added a commit that referenced this issue Sep 8, 2023
Most importantly, fix tests using message timestamps in format
`yyyymmdd hh:mm:ss`. They worked with Python 3.11 and 3.12, because
with them `datetime.fromisoformat` accepted them, but with older there
was a ValueError.
pekkaklarck added a commit that referenced this issue Sep 8, 2023
Some of these aren't needed anymore due to timestamps being created
using `datetime` (#4258). Others were used so few times that preserving
them didn't make sense.

Also introduce new `parse_timestamp` util that parses timestamps to a
`datetime`. It isn't as strict as `datetime.fromisoformat`.
@pekkaklarck pekkaklarck changed the title Performance enhancements for getting timestamps and calculating elapsed times Change timestamps from custom strings to datetime in Python model and to ISO 8601 format in output.xml Sep 9, 2023
pekkaklarck added a commit that referenced this issue Sep 9, 2023
Use new `elapsed_time` instead. Hopefully the final part of #4258.
@pekkaklarck
Copy link
Member Author

pekkaklarck commented Sep 9, 2023

This ought to be now done. Based on my quick tests performance benefits didn't really materialize, but there's a small boost in execution and processing output.xml with Rebot can be up to 10% faster. There are, however, various other benefits with these changes:

  • datetime is much easier to work with than our custom string timestamp.
  • Also timedelta has benefits compared to elapsed time being an integer representing milliseconds.
  • Timestamps in output.xml, syslog, and debug file now use the standard ISO 8601 format. That's especially nice with output.xml because the standard format is easier for external tools to process. Also in our schema we can now use xs:dateTime instead of xs:string that accepts anything.
  • We can easily use the ISO 8601 format with future outputs such as JSON results (Support JSON serialization with result model #4847).
  • Attribute names start_time, end_time and elapsed_time are consistent with Python coding guidelines.
  • We now have elapsed time directly available in output.xml. External tools processing them doesn't need to calculate it based on the start time and the end time. The end time isn't there anymore so it needs to be calculated if needed, but I believe it's not often used than the elapsed time.
  • We store timestamps in microsecond accuracy in our result and in output.xml.
  • output.xml got a bit smaller because instead of starttime="20230907 20:00:01.000" endtime="20230907 20:00:11.123" we nowadays use start="2023-09-07T20:00:01.000001" elapsed="10.122999".

@pekkaklarck
Copy link
Member Author

pekkaklarck commented Sep 9, 2023

As discussed above, this issue caused some backwards incompatible changes and deprecations. I try to summarize them here:

  • output.xml now has start and elapsed attributes instead of starttime and endtime it used to have. Tools processing output.xml files themselves and interested in times need to be updated. Tools using Robot's result model don't need to care.
  • Related to the above, start now contains the timestamp in the ISO 8601 format like start="2023-09-07T20:00:01.000001" when we earlier had a custom format like starttime="20230907 20:00:01.000". The new elapsed attribute is a float with up to six digits and it represents seconds.
  • Still related to the above, timestamp attribute with <msg> elements has been changed to time. Its format has also been changed from our custom timestamp to ISO 8601.
  • Also the generated time in output.xml uses ISO 8601.
  • Result model objects now accept start_time and end_time arguments when they used to be starttime and endtime. This shouldn't matter much, because results objects are seldom created by external tools. Inspecting them works the same way as earlier.
  • starttime, endtime and elapsedtime attributes of the result objects are considered deprecated. There is no actual deprecation warning yet, but such a warning will be added later and these attributes will eventually be removed.
  • Utility functions related to getting and processing timestamps have been deprecated. Using them causes a deprecation warning and these utils will be removed in RF 8.0.

In practice normal users shouldn't be affected. The biggest problem is that external tools processing output.xml files need to updated, but timestamps being in a standard format ought to be nice for them in the long run. Previous Robot versions cannot process new output.xml files, but output.xml files created by previous Robot versions are handled by the new version.

pekkaklarck added a commit that referenced this issue Sep 9, 2023
Setting any two of `start_time`, `end_time` and `elapsed_time` is
enough, because the third can be calculated. Elapsed time is written to
output.xml so better to calculate it immediately. End time is't
generally needed during execution.

This is to some extend related to #4258.
@pekkaklarck pekkaklarck changed the title Change timestamps from custom strings to datetime in Python model and to ISO 8601 format in output.xml Change timestamps from custom strings to datetime in result model and to ISO 8601 format in output.xml Sep 13, 2023
pekkaklarck added a commit that referenced this issue Nov 8, 2023
It seems that string representation of floats that are smaller than
`0.0001` uses scientific notation which we don't want. For example,
`str(0.00004) == '4e-05'`.

Formatting using the `f` modifier solves the issue. It uses six digits
by default which is fine, but it unfortunately doesn't omit trailing
zeros. They could be rstrippped, but it could strip all digits and
leave us with something like `3.`. Stripping also `.` would work, but
then the result could be just `3` which wouldn't technically amyore be
`xs:float`. Easier to just always use six digits.

Related to #4258.
@pekkaklarck
Copy link
Member Author

Due to the changes being pretty badly backwards incompatible, we added --legacy-output option for generating outputs in same format as RF 6 and earlier in c3c5170. It has a bug with Rebot (#5012), but it will be fixed in RF 7.0.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant