Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nested jit annotations #3332

Open
ghost opened this issue Sep 23, 2018 · 16 comments
Open

Nested jit annotations #3332

ghost opened this issue Sep 23, 2018 · 16 comments
Labels
bug objectmode object mode related issue

Comments

@ghost
Copy link

ghost commented Sep 23, 2018

I have code with a structure like the example below:

@numba.jit()
def test():
  state = 0

  @numba.jit(nopython=True)
  def sub(param):
    nonlocal state
    state += param

  for i in range(100):
    param = i # Assume this line is nopython incompatible code
    sub(param)

  print(state)

Because of the nonlocal state variable, the test function has to be jitted. Otherwise, I would just remove the first annotation because what really matters is the sub function. Therefore, I wanted to to enable the nopython mode for it, which unfortunately fails with an error:

  @numba.jit(nopython=True)
  ^
[1] During: lowering "$0.11 = make_function(name=$const0.10, code=<code object fast_sub at 0x0000028ABF11E6F0, file ".../scratches/scratch.py", line 89>, closure=$0.8, defaults=None)" at .../scratches/scratch.py (89)
-------------------------------------------------------------------------------

In the real code, there are lots of state variables. If there is no better solution, I will probably just use a list which I can pass around to share the state. Nested numba.jit annotations would be much more convenient though. What do you think?

@stuartarchibald
Copy link
Contributor

Thanks for the report. On Numba 0.40.0, this seems to just work under nopython mode as closures are simply inlined:

from numba import njit

def test():
  state = 0

  def sub(param):
    nonlocal state
    state += param

  for i in range(100):
    param = i
    sub(param)

  print(state)

print(test())
print(njit()(test)())

output:

$python issue3332.py 
4950
None
4950
None

If objectmode is forced it does fail with:

Numba encountered the use of a language feature it does not support in this context: <creating a function from a closure> (op code: make_function not supported). If the feature is explicitly supported it is likely that the result of the expression is being used in an unsupported manner.

File "issue3332.py", line 6:

  def sub(param):
  ^

I think this is a result of the looplifting pass running before the closure inlining pass, which means that if there's a closure present (make_function opcode) at looplift time the IR legalization will fail following the looplift transform reentering the pipeline with mutated IR from looplifting but with the illegal make_function present. Switching these passes around in the pipeline seems to fix it.

@stuartarchibald
Copy link
Contributor

As to your actual query about state etc, Numba 0.40.0 has a new feature, objectmode contexts (run bits of nopython mode functions in objectmode), does this help your situation? http://numba.pydata.org/numba-doc/latest/user/withobjmode.html

@ghost
Copy link
Author

ghost commented Sep 24, 2018

@stuartarchibald I have just upgraded to 0.40.0, but still have the same problem. Have you seen that there is a second annotation on the sub function? Also, I cannot use nopython mode for the test function itself. I will have a look at objectmode later.

@stuartarchibald
Copy link
Contributor

Yes, I saw that, it's currently illegal/unsupported behaviour, hence me removing it as it looked like it would inline fine under njit via the closure inlining pass, I guess your actual function won't do this hence the problem? It sounds like you want the opposite of what with objmode does in nopython mode, i.e. run the function in objectmode but use nopython mode a couple of places? If most of your function would compile under nopython mode then with objmode will probably help get the bits that won't work to run?

@ghost
Copy link
Author

ghost commented Sep 27, 2018

I was able to solve the problem with objmode. Just curious, why is it necessary to provide the return types when they can be figured out automatically for function arguments?

Also, I run into a limitation. I have a pandas data frame with different data types (floats and bools) for which I want to use the numpy array in the numba optimised code. numba.typeof isn't able to return a data type string for it. I guess mixed arrays are not supported yet, right?

@ghost
Copy link
Author

ghost commented Sep 27, 2018

One more idea. For my use case, the following pattern would be very useful:

def gen():
  while True:
    yield 1

@njit
def test():
  with objmode(g="object"):
    g = gen()

  while True:
    with objmode(val="int"):
      val = next(g)

    # Do something with val

The variable g could not be used inside the nopython code, but in other objmode sections.

@ghost
Copy link
Author

ghost commented Sep 27, 2018

One last question: Is it possible to return a list or tuple of arrays from objmode? I have tried list(array(float64, 1d, C)), tuple(array(float64, 1d, C)), list(float64[:]), tuple(float64[:]) but none worked.

@stuartarchibald
Copy link
Contributor

Just curious, why is it necessary to provide the return types when they can be figured out automatically for function arguments?

Perhaps you may be running some function in object mode where type inference can't follow what would be returned? This is also an experimental feature, things may change :)

Also, I run into a limitation. I have a pandas data frame with different data types (floats and bools) for which I want to use the numpy array in the numba optimised code. numba.typeof isn't able to return a data type string for it. I guess mixed arrays are not supported yet, right?

Think this comes out as a NumPy array of dtype object, which is not supported. You could coerce the dataframe backing array into a dtype:

In [16]: d = pd.DataFrame(data = {'col1':[1., 2.], 'col2':[np.bool(1), np.bool(0)]})

In [17]: d.values.dtype
Out[17]: dtype('O')

In [18]: pd.DataFrame(d, dtype=np.float64).values.dtype
Out[18]: dtype('float64')

but this obviously incurs cost. Another option is to use the to_records() method on the DataFrame to get a NumPy recarray which may be recognised by Numba, again, cost incurred, not sure how efficient it'd be. It'd probably be most efficient to just partition your columns by type so that e.g. Numba would just see primitive types through the use of multiple args (one for each column/homogeneously typed data set).

One more idea. For my use case, the following pattern would be very useful:

Thanks, IIRC there are some plans on the horizon for thinking about pass through cases.

One last question: Is it possible to return a list or tuple of arrays from objmode? I have tried list(array(float64, 1d, C)), tuple(array(float64, 1d, C)), list(float64[:]), tuple(float64[:]) but none worked.

Is this the sort of thing you are after? :

from numba import njit, objmode
import numpy as np

@njit
def test():
    with objmode(val='List(float64[:])'):
        val = [np.arange(10.), np.ones(4)]
    return val

print(test())

or do you mean you want the return statement in the objmode block (not supported!)?

@ghost
Copy link
Author

ghost commented Sep 27, 2018

Makes all sense.

or do you mean you want the return statement in the objmode block (not supported!)?

No, this is exactly what I wanted. List(...) does not seem to be documented here. Therefore, I just tried to use whatever numba.typeof returned but this didn't work either. How can I figure out the type string which I have to use when it is missing in the documentation? Tuples, for example, would be interesting too.

@stuartarchibald
Copy link
Contributor

hmmm, that should probably be documented, thanks for raising it, I've opened a ticket #3349. Basically, whatever string you write gets eval'd with numba.types as globals. So if you are looking for a type, that's the place to look. Here's a homogeneous 2-tuple of float64 1D arrays.

from numba import njit, objmode
import numpy as np

@njit
def test():
    with objmode(val='UniTuple(float64[:], 2)'):
        val = (np.arange(10.), np.ones(4))
    return val

print(test())

@ghost
Copy link
Author

ghost commented Sep 27, 2018

I see. Thank you very much for your excellent help!

@stuartarchibald
Copy link
Contributor

No problem, thanks for using Numba :)

@ghost
Copy link
Author

ghost commented Sep 27, 2018

Sorry, I have one more question: It seems like to_records() is supported for my mixed array. At least I can pass such an array to a nopython function. The following should describe the array:

print(vals.shape)  # (10,)
print(vals.dtype)  # (numpy.record, [('index', '<i8'), ('a', '<f8'), ('b', '?')])
print(numba.typeof(vals))  # unaligned array(Record([('index', '<i8'), ('a', '<f8'), ('b', '|b1')]), 1d, C)
print(numba.from_dtype(vals.dtype))  # Record([('index', '<i8'), ('a', '<f8'), ('b', '|b1')])

I have tried again various type string combinations for objmode, but could not get it running. Would you mind explaining in more details, how I can determine the type string from the information above?

@stuartarchibald
Copy link
Contributor

hmmm, this was hard. I'm not hugely familiar with the recarray impl in Numba so there may be a better way. Independent of this, the str const constraint makes it hard to deal with more advanced types, I'll raise this at the next core developer meeting (but also acknowledge that this is a new, under development and generally experimental feature).

from numba import njit, objmode, typeof, from_dtype, types, numpy_support
import numpy as np
from pandas import DataFrame

df = DataFrame(data = {'col1':[1., 2.], 'col2':[np.bool(1), np.bool(0)]})

pdrec = df.to_records()
dt = numpy_support.from_struct_dtype(pdrec.dtype)

def rec2str(rec):
    attrs = ['descr', 'fields', 'size', 'aligned']
    subsmap = {}
    for x in attrs:
        subsmap[x] = str(getattr(rec, x))
    subsmap['dtype'] = rec.dtype.descr
    template = "Record(\"{descr}\", {fields}, {size}, {aligned}, {dtype})"
    ret = template.format(**subsmap)
    # make sure it's valid
    eval(ret, {}, types.__dict__)
    return ret.replace('"','\\"')

# This gives the record type to paste in the `objmode` type annotation.
print("Formatted str const: %s" % rec2str(dt))

@njit
def test_record_get(recarr):
    with objmode(f="Record(\"[('index', '<i8'), ('col1', '<f8'), ('col2', '|b1')]\", {'index': (int64, 0), 'col1': (float64, 8), 'col2': (bool, 16)}, 17, False, [('index', '<i8'), ('col1', '<f8'), ('col2', '|b1')])"):
        f = recarr[1]
    return f

@njit
def test_record_slice(recarr):

    with objmode(g= "Array(Record(\"[('index', '<i8'), ('col1', '<f8'), ('col2', '|b1')]\", {'index': (int64, 0), 'col1': (float64, 8), 'col2': (bool, 16)}, 17, False, [('index', '<i8'), ('col1', '<f8'), ('col2', '|b1')]), 1, 'C')"):
        g = recarr[1:]
    return g

print(test_record_get(pdrec))
print(test_record_slice(pdrec))

ping @sklam any ideas for a better way?

@sklam
Copy link
Member

sklam commented Sep 28, 2018

(replying to #3332 (comment))

It's definitely too difficult to use. This is where we need to do something like with objmode(g=typeof(recarra))

@ghost
Copy link
Author

ghost commented Sep 28, 2018

@sklam Yes, this would be useful (at first, I even thought that I could use the current numba.typeof for this). However, I still think, numba should just figure out the type by itself similar as for function arguments. Then users don't have to deal with it at all.

@stuartarchibald stuartarchibald added the objectmode object mode related issue label Dec 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug objectmode object mode related issue
Projects
None yet
Development

No branches or pull requests

2 participants