Documentation for numpy.fromfunction induces an erroneous interpretation! #15726

anoldmaninthesea · 2020-03-08T22:45:17Z

The documentation related to numpy.fromfunction states:

numpy.fromfunction(function, shape, **kwargs)
Construct an array by executing a function over each coordinate.
The resulting array therefore has a value fn(x, y, z) at coordinate (x, y, z).

However, when I run the following:
f=lambda m,n: (m,n)
np.fromfunction(f,(6,6),dtype=int)

I don't obtain an array from a list of tuples, but instead this:

(array([[0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2],
[3, 3, 3, 3, 3, 3],
[4, 4, 4, 4, 4, 4],
[5, 5, 5, 5, 5, 5]]), array([[0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 4, 5]]))

Qiyu8 · 2020-03-09T01:34:37Z

There is a misunderstanding to fromfunction, you think that f(x,y) is called multiple times in each coordinate iteration, But actually, f(x,y) is invoked only once.The input parameter is the coordinate array of each dimension. in your case,

m is [[0, 0, 0, 0, 0, 0],
       [1, 1, 1, 1, 1, 1],
       [2, 2, 2, 2, 2, 2],
       [3, 3, 3, 3, 3, 3],
       [4, 4, 4, 4, 4, 4],
       [5, 5, 5, 5, 5, 5]],
and n is [[0, 1, 2, 3, 4, 5],
       [0, 1, 2, 3, 4, 5],
       [0, 1, 2, 3, 4, 5],
       [0, 1, 2, 3, 4, 5],
       [0, 1, 2, 3, 4, 5],
       [0, 1, 2, 3, 4, 5]],

so the result is correct.

rossbar · 2020-03-10T03:14:36Z

I actually do think the docstring is a bit misleading:

Returns
-------
fromfunction : any
    The result of the call to `function` is passed back directly.
    Therefore the shape of `fromfunction` is completely determined by
    `function`.  If `function` returns a scalar value, the shape of
    `fromfunction` would not match the `shape` parameter.

The last sentence seems to be incorrect - if the function returns a scalar value, the shape of fromfunction will match the shape parameter. From the examples in the docstring:

>>> np.fromfunction(lambda i, j: i + j, (3, 3), dtype=int)
array([[0, 1, 2],
       [1, 2, 3],
       [2, 3, 4]])

Qiyu8 · 2020-03-10T03:29:03Z

@rossbar may be this is more precise ? what do you think.

If `function` returns a non-scalar value, the shape of
    `fromfunction` may not match the `shape` parameter.

also we can add anoldmaninthesea's example to docstring to demostrate this situation.

WarrenWeckesser · 2020-03-10T04:59:28Z

FYI: I've seen this confusion before, in a question on stackoverflow: https://stackoverflow.com/questions/27612288/unexpected-result-numpy-fromfunction-with-constant-functions

eric-wieser · 2020-03-10T08:57:06Z

The last sentence seems to be incorrect - if the function returns a scalar value, the shape of fromfunction will match the shape parameter. From the examples in the docstring:

@rossbar, that example is one where the function does not return a scalar value (because i and j are themselves not scalars).

anoldmaninthesea · 2020-03-10T14:31:27Z

I would be pleased with a change in the documentation. The fromfunction can be a useful method as is... It's just the documentation that seems to induce the user into a mistake.

rossbar · 2020-03-10T15:58:42Z

@rossbar, that example is one where the function does not return a scalar value (because i and j are themselves not scalars).

I see - then this was definitely confusing to me. Whether the function lambda i, j : i + j returns scalars depends on the inputs. My problem was that I had a mental model of the function being applied element-wise at each coordinate rather than treating the coordinate inputs as arrays.

It is additionally confusing in the context of the original example lambda i, j : (i, j) which always returns a sequence. With that as a baseline, the concept of "returns a scalar" becomes even murkier.

Thanks for clearing this up @eric-wieser , though I must say without your expert intervention I was clearly getting the wrong idea from the docstring itself.

Also thanks @WarrenWeckesser for the SO link. I think incorporating part of your explanation into the docstring would go a long clearing up (at least my) issues:

func is called just once, with array arguments.

rossbar · 2020-03-10T16:11:30Z

@Qiyu8 I think adding an example of a function that returns a tuple is a good idea, though the original example is not necessarily a good candidate as it is equivalent to np.indices (which, upon closer inspection, is how the array-inputs to np.fromfunction are constructed):

numpy/numpy/core/numeric.py

Lines 1766 to 1767 in 68224f4

    
           args = indices(shape, dtype=dtype) 
        
           return function(*args, **kwargs)

jatin-code777 · 2021-01-06T06:03:19Z

In addition to this, would it be a good idea to have a function that does what we expect numpy.fromfunction to do?
This would call the provided function "no of elements in shape" (shape[0]*shape[1]...) times giving it tuples like so (0,0) (0, 1)...
It would also always return an np.array of shape shape where the value at (i, j)th index is function(i, j)

jon-middleton · 2021-05-06T19:16:44Z

I have another complaint. Consider the following example:

def foo(x,y): 
    return np.max([x+y, 0])
print(np.fromfunction(foo, (100, 100)))

Then NumPy throws a ValueError:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

So np.fromfunction doesn't seem to like conditionals in the function it takes as a parameter. Why is that? And is there a workaround?

eric-wieser · 2021-05-06T19:21:18Z

You should be using np.maximum(x+y, 0) there

gabri94 · 2023-04-21T17:06:16Z

Hi, I'm having similar issues which might be related to this misunderstanding of the use of np.fromfunction:

I want to accelerate the costruction a a matrix like this:

los = np.zeros(shape=(len(nodes), len(np_points), dtype=np.uint8)
for i in range(nodes.shape[0]):
    for j in range(np_points.shape[1]):
         los[i,j] = viewsheds[i, np_points[j,0], np_points[j,1]]

Using np.fromfunction as follow:

los = np.fromfunction(lambda i,j: viewsheds[i,  np_points[j, 0], np_points[j,1]],
                       shape=(len(nodes), len(t_points)),
                       dtype=np.uint8)

np_points is another numpy array of size (n,2)

The matrix has the right size, but the content is kind of random and is not equal to the los matrix computed manually. Am I misunderstanding the usage of the function or there's an actual bug?

matanox · 2024-01-26T15:15:10Z

It seems that both broadcasting and vectorization are taking place in calling the supplied function, as in the example code at the top of this issue, same as in the following example:

def distance(a ,b):
    print('I have been called')
    return abs(a - b)

def fill(stream: np.array, query: np.array):
    return np.fromfunction(
        lambda i, j: distance(stream[i], query[j]),
        (len(stream), len(query)), dtype=int)

fill(np.array(range(3)), np.array(range(10)))

The best way to understand it is looking at the source code of it, np.fromfunction simply feeds the given function with the array of indices implied from the shape provided, so perhaps the documentation could just say that more directly. This simply enables broadcasting and vectorization wherever the function passed (in this case the function distance passed along via the lambda trick) is using ufuncs over its inputs.

So for example in the above code, the lambda is called once, receiving the array of indices implied from the shape as a single numpy array, which is what indices() yields inside of fromfunction if you read its code. This single 3D numpy array of indices is destructured into the i and j variables of the lambda since the length of the first axis of this numpy array is 2, so the array of indices is seamlessly destructured into two variables, i and j, which in turn enables the function distance receiving the data matrices a and b built by these indices ― and then one call to distance is enough.

Observing that a and b are numpy objects, the python interpreter simply delegates the - and abs operations to numpy's corresponding ufuncs which operate on arrays. Each one of these two array operations can now be said to be "vectorized" in the sense that it applies a fixed operation onto arrays of data using efficient C code which employs tight loops of computation over the memory locations, which is precisely what numpy does for you when you apply something like - between two arrays.

To sum it up, a function performing arithmetic (distance) got repurposed to operate on arrays by applying np.fromfunction with a lambda expression.

And it happens to be that despite this "plan of execution" assigning and using orders of magnitude more main memory for the matrices a and b than what the two original input 1D arrays take up, this computation is way faster than looping to compute the cell values by hand through plain python code ― it computes the final target matrix much faster, especially as the length of the input arrays significantly grows.

matanox · 2024-01-26T15:58:35Z

As it is actually related back to the documentation, when benchmarking, fill() is an order of magnitude and more faster than using np.vectorize(distance) while technically flipping the first argument into a column vector which the np.vectorize version requires. Looks like np.fromfunction really makes some vectorization happen whereas np.vectorize does not, which is almost a half-void opposite of what the naming of these functions brings to mind.

Is it really the case as its docs page says that ―

The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop

It seems that the use of the term vectorization in np.vectorize's documentation goes against numpy's documentation glossary of the term vectorization.

I think either the documentation should be harmonized on this, or maybe more agressive compiler flags should be used inside vectorize if it were to be actually compiling the function it is being given and not only wrapping with python code around it.

As is, the flow for learning how to leverage performance vectorization ― around one's own python functions performing arithmetics ― feels almost trecherous as far as documentation and function names are concerned. Perhaps this can be made more enabling by more consistently harmonizing the documentation regarding the term vectorization ― be that term defined in a numpy-specific way, or generically, whichever best serves the cause of clarity about numpy.

It seems that numba have developed a pair of further vectorizing versions of the vectorize function concept, but I find that it will never be stable up until the time that numba becomes a core part of numpy, I have encountered numerous stability concerns (bad luck?) and can only guess that without a tight integration in the development and release cycles it's going to be shaky.

rossbar added 04 - Documentation NumPy Sprint component: numpy._core and removed NumPy Sprint labels Jul 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documentation for numpy.fromfunction induces an erroneous interpretation! #15726

Documentation for numpy.fromfunction induces an erroneous interpretation! #15726

anoldmaninthesea commented Mar 8, 2020

Qiyu8 commented Mar 9, 2020 •

edited

rossbar commented Mar 10, 2020

Qiyu8 commented Mar 10, 2020

WarrenWeckesser commented Mar 10, 2020

eric-wieser commented Mar 10, 2020 •

edited

anoldmaninthesea commented Mar 10, 2020

rossbar commented Mar 10, 2020

rossbar commented Mar 10, 2020

jatin-code777 commented Jan 6, 2021

jon-middleton commented May 6, 2021

eric-wieser commented May 6, 2021

gabri94 commented Apr 21, 2023 •

edited

matanox commented Jan 26, 2024 •

edited

matanox commented Jan 26, 2024 •

edited

Documentation for numpy.fromfunction induces an erroneous interpretation! #15726

Documentation for numpy.fromfunction induces an erroneous interpretation! #15726

Comments

anoldmaninthesea commented Mar 8, 2020

Qiyu8 commented Mar 9, 2020 • edited

rossbar commented Mar 10, 2020

Qiyu8 commented Mar 10, 2020

WarrenWeckesser commented Mar 10, 2020

eric-wieser commented Mar 10, 2020 • edited

anoldmaninthesea commented Mar 10, 2020

rossbar commented Mar 10, 2020

rossbar commented Mar 10, 2020

jatin-code777 commented Jan 6, 2021

jon-middleton commented May 6, 2021

eric-wieser commented May 6, 2021

gabri94 commented Apr 21, 2023 • edited

matanox commented Jan 26, 2024 • edited

matanox commented Jan 26, 2024 • edited

Qiyu8 commented Mar 9, 2020 •

edited

eric-wieser commented Mar 10, 2020 •

edited

gabri94 commented Apr 21, 2023 •

edited

matanox commented Jan 26, 2024 •

edited

matanox commented Jan 26, 2024 •

edited