Linear Search #23203

csbrown · 2023-02-13T01:01:26Z

There is a specific implementation for binary search. Can there also be one for linear search? There are use cases where the first element where [condition] might frequently be very close to the front of the array, but existing methods for executing a linear search in numpy require operations on the entire array. There seems to be some demand for this (see also linked questions). I can be convinced to work on this, but I've never contributed, and would appreciate some tips on where to start with this.

Related: #1835

charris · 2023-02-13T01:34:44Z

There is a method that starts by doubling the interval starting with the last bound until it finds an interval containing the value, then bisects that interval. It is commonly used when looking to interpolate a piecewise function at many values in order. Not sure where the original came from, I first saw it in Numerical Recipes long ago.

seberg · 2023-02-13T19:01:36Z

I think we are talking of different things. One issue is to not work with the full array, the other about how to search within a chunk of data (or the full array).

I think this request basically a duplicate of gh-8513 and maybe another issue or two.

MatteoRaso · 2023-02-14T01:55:08Z

Can there also be one for linear search?

Wouldn't that just be a while loop? I'm not sure if adding a function to alias a single loop is a good idea.

charris · 2023-02-14T02:09:30Z

One issue is to not work with the full array

What I mentioned works for that, but does require a sorted array to be searched. It starts at the beginning, or last found object, and tries intervals of increasing size until it finds one containing the wanted element, then bisects that interval. It is practically linear time if the items searched for are also in order.

MatteoRaso · 2023-02-14T02:54:16Z

What I mentioned works for that, but does require a sorted array to be searched. It starts at the beginning, or last found object, and tries intervals of increasing size until it finds one containing the wanted element, then bisects that interval. It is practically linear time if the items searched for are also in order.

This sounds similar to a binary search, which is already implemented. On top of that, you need to account for the time it takes to sort the array, which can be non-trivial. What the author is describing is a search that checks each element sequentially, which really just requires a while loop.

rkern · 2023-02-14T03:05:45Z

Most of numpy is "just a for loop", but implemented in C for performance and expressiveness.

I have definitely had cases where I want to find the first or last occurrence of something in an unsorted array, and a linear search would be perfect for the task. Think of trim_zeros(), only just returning the found indices for further processing (and application to associated arrays) rather than doing the trimming. I think I've written it by hand in Cython, multiple times.

charris · 2023-02-14T03:50:33Z

This sounds similar to a binary search

It is, but faster for specific applications. For instance, if you have a piece-wise polynomial and want for evaluate it on a dense set of points, each segment might be reused many times and won't need searching to find.

melissawm added 01 - Enhancement 54 - Needs decision labels Feb 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Linear Search #23203

Linear Search #23203

csbrown commented Feb 13, 2023

charris commented Feb 13, 2023

seberg commented Feb 13, 2023

MatteoRaso commented Feb 14, 2023

charris commented Feb 14, 2023

MatteoRaso commented Feb 14, 2023

rkern commented Feb 14, 2023

charris commented Feb 14, 2023

Linear Search #23203

Linear Search #23203

Comments

csbrown commented Feb 13, 2023

charris commented Feb 13, 2023

seberg commented Feb 13, 2023

MatteoRaso commented Feb 14, 2023

charris commented Feb 14, 2023

MatteoRaso commented Feb 14, 2023

rkern commented Feb 14, 2023

charris commented Feb 14, 2023