Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

time.clock_gettime_ns() is too slow #111482

Closed
socketpair opened this issue Oct 30, 2023 · 7 comments
Closed

time.clock_gettime_ns() is too slow #111482

socketpair opened this issue Oct 30, 2023 · 7 comments
Labels
performance Performance or resource usage

Comments

@socketpair
Copy link
Contributor

socketpair commented Oct 30, 2023

Feature or enhancement

Proposal:

Actually, this is a performance bug. Not a feature/enhancement.
clock_gettime_ns() does not use floating point operations and should be faster than old gettimeofday(). Unfotunately in Python it's not so. Checked in Linux X86_64 kernel 6.5.7, CPython 3.11.6

In [11]: timeit time.clock_gettime_ns(0)  # CLOCK_REALTIME
55.4 ns ± 1.22 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [12]: timeit time.clock_gettime_ns(5) # CLOCK_REALTIME_COARSE
45 ns ± 0.324 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [13]: timeit time.time()
36.7 ns ± 0.953 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

No response

Linked PRs

@socketpair socketpair added the type-feature A feature request or enhancement label Oct 30, 2023
@socketpair socketpair changed the title time.clock_gettime_ns is too slow time.clock_gettime_ns() is too slow Oct 30, 2023
@Eclips4 Eclips4 added performance Performance or resource usage and removed type-feature A feature request or enhancement labels Oct 30, 2023
@pochmann
Copy link
Contributor

"too slow" for what?

@socketpair
Copy link
Contributor Author

socketpair commented Oct 30, 2023

@pochmann too slow comparing to time.time() = gettimeofday(). For example, CLOCK_REALTIME_COARSE by definition should be faster (but coarse). In fact, it's not faster in Python. I did a glimpse on the implementation. For reasons that I don't understand, there are many useless wrappers around simple things.

Why not to do this:

static PyObject *
time_clock_gettime_ns(PyObject *self, PyObject *args)
{
    int clk_id;
    if (!PyArg_ParseTuple(args, "i:clock_gettime", &clk_id)) {
        return NULL;
    }

    struct timespec ts;
    if (clock_gettime((clockid_t)clk_id, &ts)) {
        PyErr_SetFromErrno(PyExc_OSError);
        return NULL;
    }

    return PyLong_FromLongLong(ts.tv_sec * 1000000000ll + ts.tv_nsec);
}

I beat, it will be faster than current implementation, although I did not check yet.

@vstinner
Copy link
Member

vstinner commented Nov 1, 2023

I'm not sure what this issue is about. If you want to propose a change, go ahead.

@socketpair
Copy link
Contributor Author

I have checked, my optimizations give nothing, so closing the issue :(

vstinner added a commit to vstinner/cpython that referenced this issue Nov 2, 2023
Use Argument Clinic for time.clock_gettime() and
time.clock_gettime_ns() functions.
vstinner added a commit to vstinner/cpython that referenced this issue Nov 2, 2023
Use Argument Clinic for time.clock_gettime() and
time.clock_gettime_ns() functions.

Benchmark:

    import time
    import pyperf
    runner = pyperf.Runner()
    runner.timeit(
        'clock_gettime_ns(CLOCK_MONOTONIC_COARSE)',
        setup='import time; clock_gettime_ns=time.clock_gettime_ns; CLOCK_MONOTONIC_COARSE=6',
        stmt='clock_gettime_ns(CLOCK_MONOTONIC_COARSE)')

Result on Linux with CPU isolation:

Mean +- std dev: [ref] 134 ns +- 1 ns -> [change] 55.7 ns +- 1.4 ns: 2.41x faster
vstinner added a commit to vstinner/cpython that referenced this issue Nov 2, 2023
Use Argument Clinic for time.clock_gettime() and
time.clock_gettime_ns() functions.

Benchmark on time.clock_gettime_ns():

    import time
    import pyperf
    runner = pyperf.Runner()
    runner.timeit(
        'clock_gettime_ns(CLOCK_MONOTONIC_COARSE)',
        setup='import time; clock_gettime_ns=time.clock_gettime_ns; CLOCK_MONOTONIC_COARSE=6',
        stmt='clock_gettime_ns(CLOCK_MONOTONIC_COARSE)')

Result on Linux with CPU isolation:

Mean +- std dev: [ref] 134 ns +- 1 ns -> [change] 55.7 ns +- 1.4 ns: 2.41x faster
@vstinner vstinner reopened this Nov 2, 2023
@vstinner
Copy link
Member

vstinner commented Nov 2, 2023

I reopen the issue: I wrote PR #111641 which makes clock_gettime() and clock_gettime_ns() up to 2x faster by using a faster calling convention (METH_VARARGS => METH_O).

vstinner added a commit that referenced this issue Nov 2, 2023
Use Argument Clinic for time.clock_gettime() and
time.clock_gettime_ns() functions.

Benchmark on time.clock_gettime_ns():

    import time
    import pyperf
    runner = pyperf.Runner()
    runner.timeit(
        'clock_gettime_ns(CLOCK_MONOTONIC_COARSE)',
        setup='import time; clock_gettime_ns=time.clock_gettime_ns; CLOCK_MONOTONIC_COARSE=6',
        stmt='clock_gettime_ns(CLOCK_MONOTONIC_COARSE)')

Result on Linux with CPU isolation:

Mean +- std dev: [ref] 134 ns +- 1 ns -> [change] 55.7 ns +- 1.4 ns: 2.41x faster
FullteaR pushed a commit to FullteaR/cpython that referenced this issue Nov 3, 2023
Use Argument Clinic for time.clock_gettime() and
time.clock_gettime_ns() functions.

Benchmark on time.clock_gettime_ns():

    import time
    import pyperf
    runner = pyperf.Runner()
    runner.timeit(
        'clock_gettime_ns(CLOCK_MONOTONIC_COARSE)',
        setup='import time; clock_gettime_ns=time.clock_gettime_ns; CLOCK_MONOTONIC_COARSE=6',
        stmt='clock_gettime_ns(CLOCK_MONOTONIC_COARSE)')

Result on Linux with CPU isolation:

Mean +- std dev: [ref] 134 ns +- 1 ns -> [change] 55.7 ns +- 1.4 ns: 2.41x faster
@hugovk
Copy link
Member

hugovk commented Nov 9, 2023

#111641 was merged. Does it need backporting? If not, let's close this issue.

@vstinner
Copy link
Member

vstinner commented Nov 9, 2023

I dislike backporting optimizations. Better performance is not worth it compared to the risk of regression. I close the issue. Thanks @socketpair for the bug report.

@vstinner vstinner closed this as completed Nov 9, 2023
vstinner added a commit to vstinner/cpython that referenced this issue Nov 16, 2023
clockid_t is defined as long long on AIX.
vstinner added a commit that referenced this issue Nov 16, 2023
clockid_t is defined as long long on AIX.
aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024
Use Argument Clinic for time.clock_gettime() and
time.clock_gettime_ns() functions.

Benchmark on time.clock_gettime_ns():

    import time
    import pyperf
    runner = pyperf.Runner()
    runner.timeit(
        'clock_gettime_ns(CLOCK_MONOTONIC_COARSE)',
        setup='import time; clock_gettime_ns=time.clock_gettime_ns; CLOCK_MONOTONIC_COARSE=6',
        stmt='clock_gettime_ns(CLOCK_MONOTONIC_COARSE)')

Result on Linux with CPU isolation:

Mean +- std dev: [ref] 134 ns +- 1 ns -> [change] 55.7 ns +- 1.4 ns: 2.41x faster
aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage
Projects
None yet
Development

No branches or pull requests

5 participants