Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Type Error (Check type of Every parameter) #1717

Merged
merged 11 commits into from
Mar 28, 2019

Conversation

Yashwants19
Copy link
Member

@Yashwants19 Yashwants19 commented Feb 13, 2019

Address #1710

TODOS

  • Casting 2-d to 1-d array or 1-d to 2-d array
  • Type check for every parameter
  • Add Test for Thrown Exceptions

Copy link
Member

@rcurtin rcurtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to work just fine for copy_all_inputs. Do you want to address the types of every input parameter in a separate PR or this one? I'm happy to approve this one as-is if you can handle the couple simple comments I left.

src/mlpack/bindings/python/print_pyx.cpp Outdated Show resolved Hide resolved
src/mlpack/bindings/python/print_pyx.cpp Outdated Show resolved Hide resolved
@Yashwants19
Copy link
Member Author

Yashwants19 commented Feb 16, 2019

You suggest should I use this PR or go for another PR ??

@rcurtin
Copy link
Member

rcurtin commented Feb 16, 2019

Up to you, no difference from my side.

@Yashwants19
Copy link
Member Author

Yashwants19 commented Feb 16, 2019

I will go for new PR :)

@Yashwants19
Copy link
Member Author

Yashwants19 commented Feb 18, 2019

Hi @rcurtin Please review this. If I have to change something or not??

@rcurtin
Copy link
Member

rcurtin commented Feb 18, 2019

Hi @Yashwants19, didn't I already say somewhere else that I will review everything when I have a chance? You can see there are nearly 60 PRs open, most of which need review. I do all these reviews in my free time, and I don't have infinite free time.

@Yashwants19
Copy link
Member Author

Yashwants19 commented Feb 18, 2019

Sorry sir my mistake.

Copy link
Member

@rcurtin rcurtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Yashwants19, thanks for your continued work on these issues. I'm a bit confused because I see two things being solved here and neither are completely done, but they are on the right track. We can do both in this one PR, or you can split them into two PRs if you like. The two things are:

  • making sure copy_all_inputs has the right type. That code seems to be just fine, but we should add this type check for every parameter, not just copy_all_inputs. We should also add a test for this (make sure that an exception is thrown when we call with a wrong type).
  • casting a 2-d array to a 1-d array when needed; that also seems to work, but the 1-d to 2-d case doesn't seem to be done yet and we should add tests for those too.

Let me know what you'd like to do and how you'd like to proceed. Definitely this is the right direction so we are almost there. 👍

src/mlpack/bindings/python/print_input_processing.hpp Outdated Show resolved Hide resolved
src/mlpack/bindings/python/print_input_processing.hpp Outdated Show resolved Hide resolved
<< ".shape[0] == 1 or " << d.name << "_tuple[0].shape[1] == 1:"
<< std::endl;
std::cout << prefix << " " << prefix << " " << d.name << "_tuple[0] = "
<< d.name << "_tuple" << "[0].ravel()" << std::endl;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can put these two bits together ("_tuple" << "[0].ravel()"). And actually I think you can simplify the code overall if only lines 135-139 are in the body of the if---everything else appears to be the same.

@Yashwants19
Copy link
Member Author

We can add task list here. Can do both task on this PR only.

@Yashwants19 Yashwants19 changed the title Fix Value Error (Check type copy_all_inputs) Fix Type Error (Check type of Every parameter) Feb 25, 2019
Copy link
Member

@rcurtin rcurtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yashwants19 looking good, let me know when the rest of the implementation is done. We should also add tests for each of the situations you are fixing. 👍

src/mlpack/bindings/python/print_input_processing.hpp Outdated Show resolved Hide resolved
@Yashwants19
Copy link
Member Author

Hi @rcurtin I was waiting for your review. I will complete these implementation as soon as possible. Thank You :)

@Yashwants19
Copy link
Member Author

Hi @rcurtin This is ready for review.

@Yashwants19
Copy link
Member Author

Hi @rcurtin Suggest me should I add more type for typecheck and I will add them as soon as possible

Copy link
Member

@rcurtin rcurtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, looking good. 👍 (Sorry it takes me so long to get to these. I'd have loved to merge this earlier, but I'm just not able to find the time to review immediately.)

I left a couple comments that could help simplify the code. Let me know what you think or if I can clarify any of those.

Also, the tests look great, but we should also test the following cases:

  • Test passing two-dimensional unsigned matrix to ucol_in.
  • Test passing one-dimensional array to matrix_in. (Might need to add another parameter for this, since I think matrix_in expects a certain size.)
  • Test passing one-dimensional unsigned array to matrix_in.
  • Test passing one-dimensional array to matrix_with_info_in.

Those should be super easy adaptations of tests you've already written. 👍

Thanks again for the hard work with this. This will significantly improve the user experience from Python.

src/mlpack/bindings/python/print_input_processing.hpp Outdated Show resolved Hide resolved
{
std::cout << prefix << "if " << name << " is not " << def << ":"
<< std::endl;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the output here is different for bools... why? Maybe I missed something.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here first I check the parameter as it is bool and then I pass the parameter as True or False

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, but the code is not doing the isinstance() check if the default value is not False.

I think you are doing the def == "False" check as another way if checking that the type T is bool, but honestly I think we can avoid the check entirely if we use a function like GetPrintableType<>() for the type part of the isinstance() call.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, the different code is needed for bool because the defaults for all the others (which can be checked without a type) is None, but with bool we have to do the value check on the inside.

Anyway, it would probably be nice to add an empty line after the code above (but before the SetParam[] is printed), just to break the code up logically a bit. 👍

{
std::cout << prefix << " if isinstance(" << name << ", int):"
<< std::endl;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think you can use GetPrintableType() here... take a look (get_printable_type.hpp) and tell me what you think.

std::cout << prefix << "  if isinstance(" << name << ", " << GetPrintableType<T>(d)
    << "):" << std::endl;

That could simplify this code quite a lot.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use GetPrintableType() at the place of int or float or str it is not recognized by python as there is not datatype name double or string

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair point. The GetPrintableType<>() functions are used solely for printing documentation, so I would be okay adapting the typenames returned by that function to be the correct Python type. Specifically I can see in the code that double should change to float and string to str. But this still does not fully solve the problem for list of ... and the matrix types.

I think that for list types, we should make another overload of this function PrintInputProcessing() that is specifically for vector types. This would mean using SFINAE to make the following function:

template<typename T>
void PrintInputProcessing(
    const util::ParamData& d,
    const size_t indent,
    const typename boost::enable_if<util::IsStdVector<T>::type* = 0,
    const typename boost::disable_if<arma::is_arma_type<T>>::type* = 0,
    const typename boost::disable_if<data::HasSerialize<T>>::type* = 0,
    const typename boost::disable_if<std::is_same<T,
        std::tuple<data::DatasetInfo, arma::mat>>>::type* = 0)

and the implementation of that function would be basically the same as the one I am commenting on here, except that we would need to test two things: isinstance(name, list) and, if the length of the list is greater than 0, isinstance(name[0], PT) where PT is the Python type of the list element (you could access this type in C++ as T::value_type, so I think you could call GetPrintableType<T::value_type>() to get the right result here.

In addition, because of the SFINAE signature, you'll have to add the following parameter to all of the other PrintInputProcessing() overloads (except the one at the bottom of the file):

const typename boost::disable_if<util::IsStdVector<T>>::type* = 0

Do let me know if I can clarify any of this. SFINAE code can be ugly and confusing...

src/mlpack/bindings/python/print_input_processing.hpp Outdated Show resolved Hide resolved
src/mlpack/bindings/python/print_input_processing.hpp Outdated Show resolved Hide resolved
src/mlpack/bindings/python/tests/test_python_binding.py Outdated Show resolved Hide resolved
@Yashwants19 Yashwants19 force-pushed the master branch 2 times, most recently from 834dcac to bb3302b Compare March 15, 2019 02:55
@Yashwants19
Copy link
Member Author

Hi @rcurtin This PR is set to be merged

{
std::cout << prefix << "if " << name << " is not " << def << ":"
<< std::endl;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, but the code is not doing the isinstance() check if the default value is not False.

I think you are doing the def == "False" check as another way if checking that the type T is bool, but honestly I think we can avoid the check entirely if we use a function like GetPrintableType<>() for the type part of the isinstance() call.

{
std::cout << prefix << " if isinstance(" << name << ", int):"
<< std::endl;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair point. The GetPrintableType<>() functions are used solely for printing documentation, so I would be okay adapting the typenames returned by that function to be the correct Python type. Specifically I can see in the code that double should change to float and string to str. But this still does not fully solve the problem for list of ... and the matrix types.

I think that for list types, we should make another overload of this function PrintInputProcessing() that is specifically for vector types. This would mean using SFINAE to make the following function:

template<typename T>
void PrintInputProcessing(
    const util::ParamData& d,
    const size_t indent,
    const typename boost::enable_if<util::IsStdVector<T>::type* = 0,
    const typename boost::disable_if<arma::is_arma_type<T>>::type* = 0,
    const typename boost::disable_if<data::HasSerialize<T>>::type* = 0,
    const typename boost::disable_if<std::is_same<T,
        std::tuple<data::DatasetInfo, arma::mat>>>::type* = 0)

and the implementation of that function would be basically the same as the one I am commenting on here, except that we would need to test two things: isinstance(name, list) and, if the length of the list is greater than 0, isinstance(name[0], PT) where PT is the Python type of the list element (you could access this type in C++ as T::value_type, so I think you could call GetPrintableType<T::value_type>() to get the right result here.

In addition, because of the SFINAE signature, you'll have to add the following parameter to all of the other PrintInputProcessing() overloads (except the one at the bottom of the file):

const typename boost::disable_if<util::IsStdVector<T>>::type* = 0

Do let me know if I can clarify any of this. SFINAE code can be ugly and confusing...

{
arma::Mat<size_t> out =
move(CLI::GetParam<arma::Mat<size_t>>("s_umatrix_in"));
out.row(0)*= 2.0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to just do out *= 2 here, assuming your intention is to multiply every element by 2 (I think that all inputs to smatrix_in and s_umatrix_in have only one row).

@Yashwants19
Copy link
Member Author

Sorry @rcurtin just ignore that I accidentally closed this Pull request while typing and had to reopen it

ADD CONVERSION FROM 2D TO 1D
ADD CONVERSION FROM 1D TO 2D
ADD TEST FOR EXPECTION THROWN AND CONVERSIONS
@Yashwants19
Copy link
Member Author

Hi @rcurtin I have made the changes as suggested

Copy link
Member

@rcurtin rcurtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Yashwants19, thanks again for the hard work. 👍 I think we are just about ready to merge this, which will be really nice since it solves so many problems. I only have one more major comment about the code (see the comments on to_matrix() and to_matrix_with_info() throwing TypeErrors on their own), plus some tiny style issues I found, and when I reviewed the tests more in-depth I had a couple more questions and things to point out.

Thanks!

{
std::cout << prefix << "if " << name << " is not " << def << ":"
<< std::endl;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, the different code is needed for bool because the defaults for all the others (which can be checked without a type) is None, but with bool we have to do the value check on the inside.

Anyway, it would probably be nice to add an empty line after the code above (but before the SetParam[] is printed), just to break the code up logically a bit. 👍

std::cout << prefix << "# Detect if the parameter was passed; set if so."
<< std::endl;
if (!d.required)
{
std::cout << prefix << "if " << name << " is not " << def << ":"
if (GetPrintableType<T>(d)== "bool")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny little style issue; this should be GetPrintableType<T>(d) == "bool".

}
std::cout << std::endl; // Extra line is to clear up the code a bit.
}

/**
* Print input processing for a matrix type.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment to this method? Print input processing for a vector type. or similar would be fine.

* if param_name is not None:
* if isinstance(param_name, list):
* if len(param_name) > 0 :
* if isinstance(param_name[0],str):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More little style issues---we can do

if len(param_name) > 0:

and

if isinstance(param_name[0], str):

* else:
* raise TypeError("'param_name' must have type
* '(np.ndarray,pd.DataFrame,pd.Series)'!")
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment would seem more appropriate in the body of the function, like the rest of the methods. 👍

@@ -640,13 +677,13 @@ def testMatrixAndInfoPandas(self):

for j in range(10):
self.assertEqual(output['matrix_and_info_out'][j, 4], z[cols[4]][j])

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to add blank spaces to the line (sorry, another trivial comment. At least they are easy to fix :)).

x = pd.DataFrame(np.random.rand(10, 4), columns=list('abcd'))
x['e'] = pd.Series(['a', 'b', 'c', 'd', 'a', 'b', 'e', 'c', 'a', 'b'],
x = pd.DataFrame(np.random.rand(9, 4), columns=list('abcd'))
x['e'] = pd.Series(['a', 'b', 'c', 'd', 'a', 'b', 'e', 'c', 'a' ],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the switch to 9 from 10 elements?

output = test_python_binding(string_in='hello',
int_in=12,
double_in=4.0,
matrix_and_info_in=z)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, technically I think the idea would be that we pass x['e'] directly here?

Copy link
Member Author

@Yashwants19 Yashwants19 Mar 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I pass x i.e. 1-D and it is without column from matrix_and_info_in

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, now I'm a little bit confused. I thought that this test was testing that passing a Pandas series worked, but it seems like you're just passing a 2-D Pandas dataframe with one column.

Copy link
Member Author

@Yashwants19 Yashwants19 Mar 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously I was testing that if we can pass 1-D matrix after casting it to 2-D (as matrix_and_info_in_reshape) from arma_numpy.numpy_to_mat_d()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, okay, but I don't think that what we're passing is actually a 1-d Pandas object. For that we'd need to pass a Series object, so I guess we'd need to do z[0] here in the call for matrix_and_info_in.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, everything else in the PR looks good to me, so I'm ready to approve it once we handle this last comment.

@Yashwants19
Copy link
Member Author

Hi @rcurtin This is ready for review. I have made the changes as suggested

if not hasattr(x, '__len__') and \
not hasattr(x, 'shape') and \
not hasattr(x, '__array__'):
raise TypeError("given argument is not array-like")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry, I meant that the check is already in these functions, and it's this one right here. So I think we can leave to_matrix() and to_matrix_with_info() as they were before the changes you made, since they already throw TypeErrors if they are not array-like. (A nice side-effect of this way of checking, just for __len__, shape, and __array__, means that other array-like things can also be passed to Python bindings, not just numpy and Pandas matrices.)

@Yashwants19
Copy link
Member Author

Hi @rcurtin I have made the changes as suggested.

@Yashwants19
Copy link
Member Author

Hi @rcurtin This PR is set to be merged

@@ -81,7 +81,10 @@ def to_matrix_with_info(x, dtype, copy=False):

if isinstance(x, np.ndarray):
# It is already an ndarray, so the vector of info is all 0s (all numeric).
d = np.zeros([x.shape[1]], dtype=np.bool)
if len(x.shape) < 2:
d = np.zeros(0, dtype=np.bool)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this just be 1 in this case though? Correct me if I'm wrong. This might work and run, but I suspect that there is an invalid memory access from the mlpack side if d really has length 0. This comment applies to the other d = np.zeros(0, dtype=np.bool) too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @rcurtin I was also bit confused about this. Thank you for correcting me. :)

@Yashwants19
Copy link
Member Author

Hi @rcurtin if no one is working on #1492(go bindings). I would like to continue this.

Copy link
Member

@rcurtin rcurtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yashwants19 thanks so much for your hard work on this. I think it is ready for merge. As for the Go bindings, please do feel free to pick them up. If I remember right I had reviewed the PR so there were a lot of comments to be addressed. Also it will need to be adapted to the new Markdown bindings, which changed a few things, but I can help with that when the time comes. 👍

@Yashwants19
Copy link
Member Author

Thanks @rcurtin without your help, it couldn't be implemented. I will start working on Go bindings as soon as my exams are over. :)

Copy link

@mlpack-bot mlpack-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second approval provided automatically after 24 hours. 👍

@rcurtin rcurtin merged commit f853797 into mlpack:master Mar 28, 2019
@rcurtin
Copy link
Member

rcurtin commented Mar 28, 2019

@Yashwants19 thanks so much for the hard work on this. It's really great to have this merged and it fixes some very big problems. I'll resolve all the related tickets and hopefully nobody will need to reopen them. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants