New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: _StringFuncParser to get numerical functions callables from strings #7464
Changes from 6 commits
0ad3710
509ed9f
bb042f6
4fa16f3
c5770fd
69f2a04
94a1af0
331d5d3
a45f217
3bd901e
c143e75
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2661,3 +2661,205 @@ def __exit__(self, exc_type, exc_value, traceback): | |
os.rmdir(path) | ||
except OSError: | ||
pass | ||
|
||
|
||
class _FuncInfo(object): | ||
""" | ||
Class used to store a function | ||
|
||
Each object has: | ||
* The direct function (function) | ||
* The inverse function (inverse) | ||
* A boolean indicating whether the function | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And why is this wrapped so early? It looks weird when the next bullet isn't. |
||
is bounded in the interval 0-1 (bounded_0_1), or | ||
a method that returns the information depending | ||
on this | ||
* A callable (check_params) that returns a bool specifying if a | ||
certain combination of parameters is valid. | ||
|
||
""" | ||
def __init__(self, function, inverse, bounded_0_1=True, check_params=None): | ||
self.function = function | ||
self.inverse = inverse | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Naming preference: I suggest "function" and "inverse" rather than "direct" and "inverse". |
||
|
||
if callable(bounded_0_1): | ||
self._bounded_0_1 = bounded_0_1 | ||
else: | ||
self._bounded_0_1 = lambda x: bounded_0_1 | ||
|
||
if check_params is None: | ||
self._check_params = lambda x: True | ||
elif callable(check_params): | ||
self._check_params = check_params | ||
else: | ||
raise ValueError("Check params must be a callable, returning " | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Who is this message for? What is check params and how would the user know if they messed this up? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This message is for whoever makes instances of _FuncInfo, which given the fact that it is underscored, would most likely be a developer. For now this only happens inside _StringFuncParser, which should have everything right, and never trigger the exception, but I wanted to make it robust enough to be used independently. check_params is a callable that takes a list of the parameters and returns if that list of parameters is allowed for the function. For example: If on the other hand if 'log{-2}(x+{0.1})', was called, then params=[-2.,0.1], and check_params(params) will return False, as -2 is not a valid value for the base. I will make the message a bit more clear, as well and the unofficial documentation of the class. |
||
"a boolean with the validity of the passed " | ||
"parameters, or None.") | ||
|
||
def is_bounded_0_1(self, params=None): | ||
return self._bounded_0_1(params) | ||
|
||
def check_params(self, params=None): | ||
return self._check_params(params) | ||
|
||
|
||
class _StringFuncParser(object): | ||
""" | ||
A class used to convert predefined strings into | ||
_FuncInfo objects, or to directly obtain _FuncInfo | ||
properties. | ||
|
||
""" | ||
|
||
_funcs = {} | ||
_funcs['linear'] = _FuncInfo(lambda x: x, | ||
lambda x: x, | ||
True) | ||
_funcs['quadratic'] = _FuncInfo(np.square, | ||
np.sqrt, | ||
True) | ||
_funcs['cubic'] = _FuncInfo(lambda x: x**3, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. np.power, kwargs={x2:2} There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But I am not sure how would that help, you would still need the lambda function to pass the kwargs, right? In any case, I do not think it really is that relevant at this point, we are trying to do something relatively simple for a subset of functions (for which it can be tested) aimed to help FuncNorm and children. If we ever decide to make this accessible class directly accesible to end users, or to let users create their own cuestom _FuncInfo objects, then we can worry about these things. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. depends on how it'd be called in FuncNorm...(I'd argue FuncNorm should support passing in f=np.power, kwargs={}) too since that's common enough for numpy. Would also give users a way to call np.piecewise directly. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree that FuncNorm should support that (even though it is not as useful as it may seem, imagine a case where the variable is the second argument and not the first one). Anyway we can discuss that in the next PR. IMO doing that here is a mistake, as it goes against what we are trying to achieve, which is to provide the simplest possible way to specify a simple function. For more complex cases, the user will always be able to use FuncNorm with callables directly, or with a new kwargs option based on your suggestion. |
||
lambda x: x**(1. / 3), | ||
True) | ||
_funcs['sqrt'] = _FuncInfo(np.sqrt, | ||
np.square, | ||
True) | ||
_funcs['cbrt'] = _FuncInfo(lambda x: x**(1. / 3), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is this notation common? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, cbrt is used by numpy as scipy for the cubic root. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, wondering now if there's a way to generically support any function if the name is passed in using a lookup of numpy math function names...(the implementation of the functions would also be pitched to numpy, which maybe should be happening anyway instead of lambdas...) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But this does not really work, because it would not give have access to the inverse function, unless, we explicitly include a list of corresponding inverse functions, which will limit in any case the usage to whatever we allow. I think with polynomials, roots, and offset logs we can probably cover everything that basic users will want to do, for normalizations. The advanced users can always provide the callables. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So even without getattr, I still prefer calling the numpy functions when possible because they're tested and optimized in a way a bunch of lambda functions aren't. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, but this wil only be possible for the simples cases, none of the para-metrical cases would work, without the lambda. On the other hand inside the lambda the operators will be overloaded by numpy when working with arrays, so I do not think the reliability can be that different. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. shrugs I think there's potential for error/typos/etc. in writing your own and in theory the numpy functions are more robust. Also now wondering do you cast lists to arrays or are the lambda functions done in a comprehension in FuncNorm? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, everything is cast into a numpy type automatically. |
||
lambda x: x**3, | ||
True) | ||
_funcs['log10'] = _FuncInfo(np.log10, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. think it should just be log and log{p} cases There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But by far the most common use case is log10; I think that use of any other base for a norm, or for most plotting purposes, will be somewhere between rare and nonexistent. Therefore it makes sense to leave in 'log10'. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I just think that since log10 is implicitly supported by logp, the redundancy isn't worth it unless it has significant time savings. But shrugs defer to you of coursen |
||
lambda x: (10**(x)), | ||
False) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. no base2? (Yeah, this all feels weirdly arbitrary) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I should include log2 |
||
_funcs['log'] = _FuncInfo(np.log, | ||
np.exp, | ||
False) | ||
_funcs['log2'] = _FuncInfo(np.log2, | ||
lambda x: (2**x), | ||
False) | ||
_funcs['x**{p}'] = _FuncInfo(lambda x, p: x**p[0], | ||
lambda x, p: x**(1. / p[0]), | ||
True) | ||
_funcs['root{p}(x)'] = _FuncInfo(lambda x, p: x**(1. / p[0]), | ||
lambda x, p: x**p, | ||
True) | ||
_funcs['log{p}(x)'] = _FuncInfo(lambda x, p: (np.log(x) / | ||
np.log(p[0])), | ||
lambda x, p: p[0]**(x), | ||
False, | ||
lambda p: p[0] > 0) | ||
_funcs['log10(x+{p})'] = _FuncInfo(lambda x, p: np.log10(x + p[0]), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. still don't think you need to bother with a log10 since you have the generic case and the user will never know (they have to write log10 anyway) |
||
lambda x, p: 10**x - p[0], | ||
lambda p: p[0] > 0) | ||
_funcs['log(x+{p})'] = _FuncInfo(lambda x, p: np.log(x + p[0]), | ||
lambda x, p: np.exp(x) - p[0], | ||
lambda p: p[0] > 0) | ||
_funcs['log{p}(x+{p})'] = _FuncInfo(lambda x, p: (np.log(x + p[1]) / | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not sure why you don't do a generic 'log{p}' then instead of log10 since users will have to type log10 anyway There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I did it like this historically (there were no parameters at the beginning) and then keep it to in a similar way that log10, log and log2 exist separately in numpy. But you are right, being strings probably we can just keep 'log' and 'log{p}'. In any case I definitely should include 'log{p}' which is now missing. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. both log can be called log(x,2) or log(x,10) so I figure one unified call is cleaner here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's decide about this later |
||
np.log(p[0])), | ||
lambda x, p: p[0]**(x) - p[1], | ||
lambda p: p[1] > 0, | ||
lambda p: p[0] > 0) | ||
|
||
def __init__(self, str_func): | ||
""" | ||
Parameters | ||
---------- | ||
str_func : string | ||
String to be parsed. | ||
|
||
""" | ||
|
||
if not isinstance(str_func, six.string_types): | ||
raise ValueError("The argument passed is not a string.") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "{} must be a string".format(str_func) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not sure if it is a good idea to format something that we know is not a string into a string. What about simply " 'str_func' must be a string"? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I though about it again, and realized that actually your way it is better for cases when the error shows up to an user that does not know where the exception is being raised exactly, so, thank you for your suggestion, it now produces a message like that one :). |
||
self._str_func = six.text_type(str_func) | ||
self._key, self._params = self._get_key_params() | ||
self._func = self._parse_func() | ||
|
||
def _parse_func(self): | ||
""" | ||
Parses the parameters to build a new _FuncInfo object, | ||
replacing the relevant parameters if necessary in the lambda | ||
functions. | ||
|
||
""" | ||
|
||
func = self._funcs[self._key] | ||
if self._params: | ||
m = func.function | ||
function = (lambda x, m=m: m(x, self._params)) | ||
|
||
m = func.inverse | ||
inverse = (lambda x, m=m: m(x, self._params)) | ||
|
||
is_bounded_0_1 = func.is_bounded_0_1(self._params) | ||
|
||
func = _FuncInfo(function, inverse, | ||
is_bounded_0_1) | ||
else: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. since the else is so much shorter, what about flipplng:
|
||
func = _FuncInfo(func.function, func.inverse, | ||
func.is_bounded_0_1()) | ||
return func | ||
|
||
@property | ||
def func_info(self): | ||
""" | ||
Returns the _FuncInfo object. | ||
|
||
""" | ||
return self._func | ||
|
||
@property | ||
def function(self): | ||
""" | ||
Returns the callable for the direct function. | ||
|
||
""" | ||
return self._func.function | ||
|
||
@property | ||
def inverse(self): | ||
""" | ||
Returns the callable for the inverse function. | ||
|
||
""" | ||
return self._func.inverse | ||
|
||
@property | ||
def is_bounded_0_1(self): | ||
""" | ||
Returns a boolean indicating if the function is bounded | ||
in the [0-1 interval]. | ||
|
||
""" | ||
return self._func.is_bounded_0_1() | ||
|
||
def _get_key_params(self): | ||
str_func = self._str_func | ||
# Checking if it comes with parameters | ||
regex = '\{(.*?)\}' | ||
params = re.findall(regex, str_func) | ||
|
||
if params: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you can drop the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree for the loop but there is also a piece of code after the loop and within the if block 'str_func = re.sub(regex, '{p}', str_func)', however I still think you are right, because if the original call to 'params = re.findall(regex, str_func)' returns no elements, then re.sub used with the same regex should not replace any elements. |
||
for i, param in enumerate(params): | ||
try: | ||
params[i] = float(param) | ||
except ValueError: | ||
raise ValueError("Parameter %i is '%s', which is " | ||
"not a number." % | ||
(i, param)) | ||
|
||
str_func = re.sub(regex, '{p}', str_func) | ||
|
||
try: | ||
func = self._funcs[str_func] | ||
except (ValueError, KeyError): | ||
raise ValueError("%s: invalid string. The only strings " | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "%s is an invalid string. |
||
"recognized as functions are %s." % | ||
(str_func, list(self._funcs))) | ||
|
||
# Checking that the parameters are valid | ||
if not func.check_params(params): | ||
raise ValueError("%s: are invalid values for the parameters " | ||
"in %s." % | ||
(params, str_func)) | ||
|
||
return str_func, params |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -517,3 +517,64 @@ def test_flatiter(): | |
|
||
assert 0 == next(it) | ||
assert 1 == next(it) | ||
|
||
|
||
class TestFuncParser(object): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if you insist on writing your own lambdas instead of using numpy, then please test that the lambda functions are correct (no stray typos or the like) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is a parametric test for each possible case, and it tests both the direct and the inverse, so it should be ok. |
||
x_test = np.linspace(0.01, 0.5, 3) | ||
validstrings = ['linear', 'quadratic', 'cubic', 'sqrt', 'cbrt', | ||
'log', 'log10', 'log2', 'x**{1.5}', 'root{2.5}(x)', | ||
'log{2}(x)', | ||
'log(x+{0.5})', 'log10(x+{0.1})', 'log{2}(x+{0.1})', | ||
'log{2}(x+{0})'] | ||
results = [(lambda x: x), | ||
np.square, | ||
(lambda x: x**3), | ||
np.sqrt, | ||
(lambda x: x**(1. / 3)), | ||
np.log, | ||
np.log10, | ||
np.log2, | ||
(lambda x: x**1.5), | ||
(lambda x: x**(1 / 2.5)), | ||
(lambda x: np.log2(x)), | ||
(lambda x: np.log(x + 0.5)), | ||
(lambda x: np.log10(x + 0.1)), | ||
(lambda x: np.log2(x + 0.1)), | ||
(lambda x: np.log2(x))] | ||
|
||
bounded_list = [True, True, True, True, True, | ||
False, False, False, True, True, | ||
False, | ||
True, True, True, | ||
False] | ||
|
||
@pytest.mark.parametrize("string, func", | ||
zip(validstrings, results), | ||
ids=validstrings) | ||
def test_values(self, string, func): | ||
func_parser = cbook._StringFuncParser(string) | ||
f = func_parser.function | ||
assert_array_almost_equal(f(self.x_test), func(self.x_test)) | ||
|
||
@pytest.mark.parametrize("string", validstrings, ids=validstrings) | ||
def test_inverse(self, string): | ||
func_parser = cbook._StringFuncParser(string) | ||
f = func_parser.func_info | ||
fdir = f.function | ||
finv = f.inverse | ||
assert_array_almost_equal(finv(fdir(self.x_test)), self.x_test) | ||
|
||
@pytest.mark.parametrize("string", validstrings, ids=validstrings) | ||
def test_get_inverse(self, string): | ||
func_parser = cbook._StringFuncParser(string) | ||
finv1 = func_parser.inverse | ||
finv2 = func_parser.func_info.inverse | ||
assert_array_almost_equal(finv1(self.x_test), finv2(self.x_test)) | ||
|
||
@pytest.mark.parametrize("string, bounded", | ||
zip(validstrings, bounded_list), | ||
ids=validstrings) | ||
def test_bounded(self, string, bounded): | ||
func_parser = cbook._StringFuncParser(string) | ||
b = func_parser.is_bounded_0_1 | ||
assert_array_equal(b, bounded) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be done in numpydoc style.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it is a hidden helper class for pure internal use. Most classes in these cases do not even have documentation, so I figured rules for formatting would not be as strict, so did not really put any attention to follow numpydoc... I will change it.