-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Type Error (Check type of Every parameter) #1717
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to work just fine for copy_all_inputs
. Do you want to address the types of every input parameter in a separate PR or this one? I'm happy to approve this one as-is if you can handle the couple simple comments I left.
You suggest should I use this PR or go for another PR ?? |
Up to you, no difference from my side. |
I will go for new PR :) |
Hi @rcurtin Please review this. If I have to change something or not?? |
Hi @Yashwants19, didn't I already say somewhere else that I will review everything when I have a chance? You can see there are nearly 60 PRs open, most of which need review. I do all these reviews in my free time, and I don't have infinite free time. |
Sorry sir my mistake. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Yashwants19, thanks for your continued work on these issues. I'm a bit confused because I see two things being solved here and neither are completely done, but they are on the right track. We can do both in this one PR, or you can split them into two PRs if you like. The two things are:
- making sure
copy_all_inputs
has the right type. That code seems to be just fine, but we should add this type check for every parameter, not justcopy_all_inputs
. We should also add a test for this (make sure that an exception is thrown when we call with a wrong type). - casting a 2-d array to a 1-d array when needed; that also seems to work, but the 1-d to 2-d case doesn't seem to be done yet and we should add tests for those too.
Let me know what you'd like to do and how you'd like to proceed. Definitely this is the right direction so we are almost there. 👍
<< ".shape[0] == 1 or " << d.name << "_tuple[0].shape[1] == 1:" | ||
<< std::endl; | ||
std::cout << prefix << " " << prefix << " " << d.name << "_tuple[0] = " | ||
<< d.name << "_tuple" << "[0].ravel()" << std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can put these two bits together ("_tuple" << "[0].ravel()"
). And actually I think you can simplify the code overall if only lines 135-139 are in the body of the if
---everything else appears to be the same.
We can add task list here. Can do both task on this PR only. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yashwants19 looking good, let me know when the rest of the implementation is done. We should also add tests for each of the situations you are fixing. 👍
Hi @rcurtin I was waiting for your review. I will complete these implementation as soon as possible. Thank You :) |
Hi @rcurtin This is ready for review. |
Hi @rcurtin Suggest me should I add more type for typecheck and I will add them as soon as possible |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, looking good. 👍 (Sorry it takes me so long to get to these. I'd have loved to merge this earlier, but I'm just not able to find the time to review immediately.)
I left a couple comments that could help simplify the code. Let me know what you think or if I can clarify any of those.
Also, the tests look great, but we should also test the following cases:
- Test passing two-dimensional unsigned matrix to
ucol_in
. - Test passing one-dimensional array to
matrix_in
. (Might need to add another parameter for this, since I thinkmatrix_in
expects a certain size.) - Test passing one-dimensional unsigned array to
matrix_in
. - Test passing one-dimensional array to
matrix_with_info_in
.
Those should be super easy adaptations of tests you've already written. 👍
Thanks again for the hard work with this. This will significantly improve the user experience from Python.
{ | ||
std::cout << prefix << "if " << name << " is not " << def << ":" | ||
<< std::endl; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the output here is different for bool
s... why? Maybe I missed something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here first I check the parameter as it is bool and then I pass the parameter as True or False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, but the code is not doing the isinstance()
check if the default value is not False
.
I think you are doing the def == "False"
check as another way if checking that the type T
is bool
, but honestly I think we can avoid the check entirely if we use a function like GetPrintableType<>()
for the type part of the isinstance()
call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, the different code is needed for bool
because the defaults for all the others (which can be checked without a type) is None
, but with bool
we have to do the value check on the inside.
Anyway, it would probably be nice to add an empty line after the code above (but before the SetParam[]
is printed), just to break the code up logically a bit. 👍
{ | ||
std::cout << prefix << " if isinstance(" << name << ", int):" | ||
<< std::endl; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I think you can use GetPrintableType()
here... take a look (get_printable_type.hpp
) and tell me what you think.
std::cout << prefix << " if isinstance(" << name << ", " << GetPrintableType<T>(d)
<< "):" << std::endl;
That could simplify this code quite a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use GetPrintableType()
at the place of int
or float
or str
it is not recognized by python as there is not datatype name double
or string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a fair point. The GetPrintableType<>()
functions are used solely for printing documentation, so I would be okay adapting the typenames returned by that function to be the correct Python type. Specifically I can see in the code that double
should change to float
and string
to str
. But this still does not fully solve the problem for list of ...
and the matrix types.
I think that for list types, we should make another overload of this function PrintInputProcessing()
that is specifically for vector types. This would mean using SFINAE to make the following function:
template<typename T>
void PrintInputProcessing(
const util::ParamData& d,
const size_t indent,
const typename boost::enable_if<util::IsStdVector<T>::type* = 0,
const typename boost::disable_if<arma::is_arma_type<T>>::type* = 0,
const typename boost::disable_if<data::HasSerialize<T>>::type* = 0,
const typename boost::disable_if<std::is_same<T,
std::tuple<data::DatasetInfo, arma::mat>>>::type* = 0)
and the implementation of that function would be basically the same as the one I am commenting on here, except that we would need to test two things: isinstance(name, list)
and, if the length of the list is greater than 0, isinstance(name[0], PT)
where PT
is the Python type of the list element (you could access this type in C++ as T::value_type
, so I think you could call GetPrintableType<T::value_type>()
to get the right result here.
In addition, because of the SFINAE signature, you'll have to add the following parameter to all of the other PrintInputProcessing()
overloads (except the one at the bottom of the file):
const typename boost::disable_if<util::IsStdVector<T>>::type* = 0
Do let me know if I can clarify any of this. SFINAE code can be ugly and confusing...
834dcac
to
bb3302b
Compare
Hi @rcurtin This PR is set to be merged |
{ | ||
std::cout << prefix << "if " << name << " is not " << def << ":" | ||
<< std::endl; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, but the code is not doing the isinstance()
check if the default value is not False
.
I think you are doing the def == "False"
check as another way if checking that the type T
is bool
, but honestly I think we can avoid the check entirely if we use a function like GetPrintableType<>()
for the type part of the isinstance()
call.
{ | ||
std::cout << prefix << " if isinstance(" << name << ", int):" | ||
<< std::endl; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a fair point. The GetPrintableType<>()
functions are used solely for printing documentation, so I would be okay adapting the typenames returned by that function to be the correct Python type. Specifically I can see in the code that double
should change to float
and string
to str
. But this still does not fully solve the problem for list of ...
and the matrix types.
I think that for list types, we should make another overload of this function PrintInputProcessing()
that is specifically for vector types. This would mean using SFINAE to make the following function:
template<typename T>
void PrintInputProcessing(
const util::ParamData& d,
const size_t indent,
const typename boost::enable_if<util::IsStdVector<T>::type* = 0,
const typename boost::disable_if<arma::is_arma_type<T>>::type* = 0,
const typename boost::disable_if<data::HasSerialize<T>>::type* = 0,
const typename boost::disable_if<std::is_same<T,
std::tuple<data::DatasetInfo, arma::mat>>>::type* = 0)
and the implementation of that function would be basically the same as the one I am commenting on here, except that we would need to test two things: isinstance(name, list)
and, if the length of the list is greater than 0, isinstance(name[0], PT)
where PT
is the Python type of the list element (you could access this type in C++ as T::value_type
, so I think you could call GetPrintableType<T::value_type>()
to get the right result here.
In addition, because of the SFINAE signature, you'll have to add the following parameter to all of the other PrintInputProcessing()
overloads (except the one at the bottom of the file):
const typename boost::disable_if<util::IsStdVector<T>>::type* = 0
Do let me know if I can clarify any of this. SFINAE code can be ugly and confusing...
{ | ||
arma::Mat<size_t> out = | ||
move(CLI::GetParam<arma::Mat<size_t>>("s_umatrix_in")); | ||
out.row(0)*= 2.0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to just do out *= 2
here, assuming your intention is to multiply every element by 2 (I think that all inputs to smatrix_in
and s_umatrix_in
have only one row).
Sorry @rcurtin just ignore that I accidentally closed this Pull request while typing and had to reopen it |
ADD CONVERSION FROM 2D TO 1D ADD CONVERSION FROM 1D TO 2D ADD TEST FOR EXPECTION THROWN AND CONVERSIONS
Hi @rcurtin I have made the changes as suggested |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @Yashwants19, thanks again for the hard work. 👍 I think we are just about ready to merge this, which will be really nice since it solves so many problems. I only have one more major comment about the code (see the comments on to_matrix()
and to_matrix_with_info()
throwing TypeError
s on their own), plus some tiny style issues I found, and when I reviewed the tests more in-depth I had a couple more questions and things to point out.
Thanks!
{ | ||
std::cout << prefix << "if " << name << " is not " << def << ":" | ||
<< std::endl; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, the different code is needed for bool
because the defaults for all the others (which can be checked without a type) is None
, but with bool
we have to do the value check on the inside.
Anyway, it would probably be nice to add an empty line after the code above (but before the SetParam[]
is printed), just to break the code up logically a bit. 👍
std::cout << prefix << "# Detect if the parameter was passed; set if so." | ||
<< std::endl; | ||
if (!d.required) | ||
{ | ||
std::cout << prefix << "if " << name << " is not " << def << ":" | ||
if (GetPrintableType<T>(d)== "bool") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tiny little style issue; this should be GetPrintableType<T>(d) == "bool"
.
} | ||
std::cout << std::endl; // Extra line is to clear up the code a bit. | ||
} | ||
|
||
/** | ||
* Print input processing for a matrix type. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment to this method? Print input processing for a vector type.
or similar would be fine.
* if param_name is not None: | ||
* if isinstance(param_name, list): | ||
* if len(param_name) > 0 : | ||
* if isinstance(param_name[0],str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More little style issues---we can do
if len(param_name) > 0:
and
if isinstance(param_name[0], str):
* else: | ||
* raise TypeError("'param_name' must have type | ||
* '(np.ndarray,pd.DataFrame,pd.Series)'!") | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment would seem more appropriate in the body of the function, like the rest of the methods. 👍
@@ -640,13 +677,13 @@ def testMatrixAndInfoPandas(self): | |||
|
|||
for j in range(10): | |||
self.assertEqual(output['matrix_and_info_out'][j, 4], z[cols[4]][j]) | |||
|
|||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to add blank spaces to the line (sorry, another trivial comment. At least they are easy to fix :)).
x = pd.DataFrame(np.random.rand(10, 4), columns=list('abcd')) | ||
x['e'] = pd.Series(['a', 'b', 'c', 'd', 'a', 'b', 'e', 'c', 'a', 'b'], | ||
x = pd.DataFrame(np.random.rand(9, 4), columns=list('abcd')) | ||
x['e'] = pd.Series(['a', 'b', 'c', 'd', 'a', 'b', 'e', 'c', 'a' ], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the switch to 9 from 10 elements?
output = test_python_binding(string_in='hello', | ||
int_in=12, | ||
double_in=4.0, | ||
matrix_and_info_in=z) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, technically I think the idea would be that we pass x['e']
directly here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I pass x
i.e. 1-D and it is without column from matrix_and_info_in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, now I'm a little bit confused. I thought that this test was testing that passing a Pandas series worked, but it seems like you're just passing a 2-D Pandas dataframe with one column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously I was testing that if we can pass 1-D matrix after casting it to 2-D (as matrix_and_info_in_reshape
) from arma_numpy.numpy_to_mat_d()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, okay, but I don't think that what we're passing is actually a 1-d Pandas object. For that we'd need to pass a Series
object, so I guess we'd need to do z[0]
here in the call for matrix_and_info_in
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, everything else in the PR looks good to me, so I'm ready to approve it once we handle this last comment.
Hi @rcurtin This is ready for review. I have made the changes as suggested |
if not hasattr(x, '__len__') and \ | ||
not hasattr(x, 'shape') and \ | ||
not hasattr(x, '__array__'): | ||
raise TypeError("given argument is not array-like") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry, I meant that the check is already in these functions, and it's this one right here. So I think we can leave to_matrix()
and to_matrix_with_info()
as they were before the changes you made, since they already throw TypeError
s if they are not array-like. (A nice side-effect of this way of checking, just for __len__
, shape
, and __array__
, means that other array-like things can also be passed to Python bindings, not just numpy and Pandas matrices.)
Hi @rcurtin I have made the changes as suggested. |
Hi @rcurtin This PR is set to be merged |
@@ -81,7 +81,10 @@ def to_matrix_with_info(x, dtype, copy=False): | |||
|
|||
if isinstance(x, np.ndarray): | |||
# It is already an ndarray, so the vector of info is all 0s (all numeric). | |||
d = np.zeros([x.shape[1]], dtype=np.bool) | |||
if len(x.shape) < 2: | |||
d = np.zeros(0, dtype=np.bool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this just be 1 in this case though? Correct me if I'm wrong. This might work and run, but I suspect that there is an invalid memory access from the mlpack side if d
really has length 0. This comment applies to the other d = np.zeros(0, dtype=np.bool)
too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @rcurtin I was also bit confused about this. Thank you for correcting me. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yashwants19 thanks so much for your hard work on this. I think it is ready for merge. As for the Go bindings, please do feel free to pick them up. If I remember right I had reviewed the PR so there were a lot of comments to be addressed. Also it will need to be adapted to the new Markdown bindings, which changed a few things, but I can help with that when the time comes. 👍
Thanks @rcurtin without your help, it couldn't be implemented. I will start working on Go bindings as soon as my exams are over. :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second approval provided automatically after 24 hours. 👍
@Yashwants19 thanks so much for the hard work on this. It's really great to have this merged and it fixes some very big problems. I'll resolve all the related tickets and hopefully nobody will need to reopen them. :) |
Address #1710
TODOS
Casting 2-d to 1-d array or 1-d to 2-d array
Type check for every parameter
Add Test for Thrown Exceptions