MAINT: Performance improvement of polyutils.as_series #25299

florath · 2023-12-02T20:37:41Z

This small patch provides a (small) performance improvement: For the normal (straight / no error) case the improvement is from 1 - 9%, for the error case the improvement is between 3 and 30%. The improvement highly depends on the value of the parameter.

The original code always runs through all arrays, even if it could stop at the first found with size zero.

This small patch provides a (small) performance improvement: For the normal (straight / no error) case the improvement is from 0 - 5%, for the error case the improvement is between 3 and 30%. The improvement highly depends on the value of the parameter. The original code always runs through all arrays, even if it could stop at the first found with size zero. Signed-off-by: Andreas Florath <andreas@florath.net>

florath · 2023-12-02T20:41:29Z

Short description how I measured

I created a virtualenv and install numpy. In the polyutils.py I added the new function side by side to the original one:

def as_series_orig(alist, trim=True):
    ...

def as_series(alist, trim=True):
    ...

The tests are divided into two groups: 1) normal (straight) use case 2) error case when an exception is thrown.
The test data was collected from the documentation as well as from the test-cases. In addition random generated test cases are utilized.

import numpy as np
from numpy.polynomial import polyutils as pu

from numpy.testing import (
    assert_raises, assert_equal, assert_,
)

import functools
import timeit

def func_tests_docu(testf):
    '''Functional tests from the documentation'''
    
    a = np.arange(4)
    assert_equal(testf(a), [np.array([0.]), np.array([1.]), np.array([2.]), np.array([3.])])

    b = np.arange(6).reshape((2,3))
    assert_equal(testf(b), [np.array([0., 1., 2.]), np.array([3., 4., 5.])])
    
    assert_equal(testf((1, np.arange(3), np.arange(2, dtype=np.float16))),
                 [np.array([1.]), np.array([0., 1., 2.]), np.array([0., 1.])])

    assert_equal(testf([2, [1.1, 0.]]),
                 [np.array([2.]), np.array([1.1])])

    assert_equal(testf([2, [1.1, 0.]], trim=False),
                 [np.array([2.]), np.array([1.1, 0. ])])


def func_test_unit(testf):
    '''Functional tests from the unit tests wich come with numpy'''

    # check exceptions
    assert_raises(ValueError, testf, [[]])
    assert_raises(ValueError, testf, [[[1, 2]]])
    assert_raises(ValueError, testf, [[1], ['a']])
    # check common types
    types = ['i', 'd', 'O']
    for i in range(len(types)):
        for j in range(i):
            ci = np.ones(1, types[i])
            cj = np.ones(1, types[j])
            [resi, resj] = testf([ci, cj])
            assert_(resi.dtype.char == resj.dtype.char)
            assert_(resj.dtype.char == types[i])


def generate_test_data_for_as_series(num_elements=10, chunk_size=5):
    """
    Generate test data for the numpy.polynomial.polyutils.as_series function.
    
    Parameters:
    num_elements (int): Total number of elements to generate.
    chunk_size (int): Size of each chunk in the list.

    Returns:
    list: A list of numpy arrays, each of size `chunk_size`.
    """
    # Generate a numpy array with the specified number of elements
    data = np.random.rand(num_elements)

    # Split the array into chunks of specified size
    return [data[i:i + chunk_size] for i in range(0, len(data), chunk_size)]
            

def exec_time(testf):
    tdata = [
        np.arange(4),
        np.arange(6).reshape((2,3)),
        (1, np.arange(3), np.arange(2, dtype=np.float16)),
        [2, [1.1, 0.]],
        [0.1, 0.2, 0.3],
        np.array([[0.1, 0.2, 0.3], [1, 2, 3]])
    ]

    res_times = []

    print("Tests from documentation and test cases")
    for td in tdata:
        res = timeit.timeit(functools.partial(testf, td), number=100000)
        res_times.append(res)

    print("Random tests")
    np.random.seed(0)
    for num_elements in (1, 2, 4, 8, 16, 32, 64, 128):
        td = generate_test_data_for_as_series(num_elements)
        res = timeit.timeit(functools.partial(testf, td), number=100000)
        res_times.append(res)
        
    return res_times

def test_wrapper(testf, td):
    try:
        testf(td)
        assert False
    except ValueError:
        pass

def exec_time_exceptions(testf):
    tdata = [
        [[]],
        [[[1, 2]]],
        [[1], ['a']],
        [[1], [], [2], [3]] + [ [7] * 100 ],
    ]

    res_times = []
    
    for td in tdata:
        res = timeit.timeit(functools.partial(test_wrapper, testf, td), number=1000)
        res_times.append(res)

    return res_times


def main():
    print("Running functional tests")
    func_tests_docu(pu.as_series_orig)
    func_test_unit(pu.as_series_orig)
    func_tests_docu(pu.as_series)
    func_test_unit(pu.as_series)
    print("Running performance tests")
    res_orig = exec_time(pu.as_series_orig)
    # print("Orig exec time", res_orig)
    res_opt = exec_time(pu.as_series)
    # print("Opt exec time", res_opt)

    diff = []
    for i in range(len(res_orig)):
        diff.append("%5.3f" % (res_orig[i] / res_opt[i]))
    print("Improvement normal", diff)
    
    res_orig_ex = exec_time_exceptions(pu.as_series_orig)
    # print("Orig exption exec time", res_orig_ex)
    res_opt_ex = exec_time_exceptions(pu.as_series)
    # print("OPT exption exec time", res_opt_ex)

    diff = []
    for i in range(len(res_orig_ex)):
        diff.append("%5.3f" % (res_orig_ex[i] / res_opt_ex[i]))
    print("Improvement exception", diff)
        

if __name__ == '__main__':
    main()

A typical run on Intel i9-9880H using Python 3.11.2 (Debian):

Improvement normal ['1.033', '1.043', '1.033', '1.034', '1.039', '1.056', '1.056', '1.063', '1.078', '1.057', '1.043', '1.029', '1.019', '1.023']
Improvement exception ['1.313', '1.185', '1.041', '1.075']

A typical run on Intel Xeon 6438M using Python 3.10.12 (Ubuntu):

Improvement normal ['1.023', '1.034', '1.036', '1.077', '1.039', '1.038', '1.090', '1.095', '1.098', '1.053', '1.039', '1.021', '1.021', '1.014']
Improvement exception ['1.290', '1.133', '1.053', '1.100']

The numbers are the improvement of the proposed version vs the original version (original is 1.0) for the different test cases.
While the improvement of the normal (no-error) case is small (1 - 9%) the improvement in the exception case is up to nearly 30% for some test data.

ngoldbaum · 2023-12-05T18:01:39Z

You could use any and a generator expression instead of a list expression and you'd get the same effect, I think. This change is fine though, if a teeny bit less compact. Thanks @florath!

github-actions bot added the 01 - Enhancement label Dec 2, 2023

charris added component: numpy.polynomial 03 - Maintenance and removed 01 - Enhancement labels Dec 2, 2023

charris changed the title ~~ENH: Performance improvement of polyutils.as_series~~ MAINT: Performance improvement of polyutils.as_series Dec 2, 2023

ngoldbaum merged commit 9fe5ee8 into numpy:main Dec 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

MAINT: Performance improvement of polyutils.as_series #25299

MAINT: Performance improvement of polyutils.as_series #25299

florath commented Dec 2, 2023

Uh oh!

florath commented Dec 2, 2023

Uh oh!

ngoldbaum commented Dec 5, 2023

Uh oh!

Uh oh!

Uh oh!

MAINT: Performance improvement of polyutils.as_series #25299

MAINT: Performance improvement of polyutils.as_series #25299

Conversation

florath commented Dec 2, 2023

Uh oh!

florath commented Dec 2, 2023

Uh oh!

ngoldbaum commented Dec 5, 2023

Uh oh!

Uh oh!