Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added check_input_matrices option to python bindings #2787

Merged
merged 39 commits into from
Feb 2, 2021
Merged
Show file tree
Hide file tree
Changes from 30 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
266ae6e
added no_sanity_checks option to python bindings
NippunSharma Dec 28, 2020
f1368e2
made no_sanity_checks more general
NippunSharma Dec 29, 2020
6ef1eca
added arma::Mat<size_t> and categorical data
NippunSharma Dec 31, 2020
22f5932
moved SanityCheck() to IO class and changed no_sanity_checks to check…
NippunSharma Dec 31, 2020
124d39d
changed comments
NippunSharma Dec 31, 2020
e5bfb40
wrapped function with Cython
NippunSharma Dec 31, 2020
c3ff8d3
indentation removed
NippunSharma Dec 31, 2020
ee889c7
removed iostream
NippunSharma Jan 2, 2021
8a8f0ea
changed name from SanityChecks() to CheckInputMatrices()
NippunSharma Jan 2, 2021
62c9f50
made single block in print_pyx.cpp
NippunSharma Jan 2, 2021
be82a0d
reduced num of chars per line in io.cpp
NippunSharma Jan 2, 2021
990ef53
reduced num of chars per line in py_option.hpp
NippunSharma Jan 2, 2021
7d6a209
changed function name while wrapping
NippunSharma Jan 2, 2021
aeb0f49
added inf functionality and tests
Jan 12, 2021
bfff19e
fixed indentations, random indices in tests, changed comments
NippunSharma Jan 12, 2021
7056aeb
fixed indent
NippunSharma Jan 12, 2021
5e0a347
added templated utility function
Jan 13, 2021
dc398bf
fixing errors
Jan 13, 2021
6ad120a
added CheckInputMatrix() to reduce code block in CheckInputMatrices()
Jan 13, 2021
04dee7e
fix errors
Jan 13, 2021
0e8e6f1
simplification
Jan 23, 2021
acc559f
Merge branch 'master' of https://github.com/mlpack/mlpack into sanity…
Jan 23, 2021
654e650
updated HISTORY.md
Jan 23, 2021
33af71d
Update src/mlpack/core/util/io.cpp
NippunSharma Jan 23, 2021
723ecf8
added more tests
Jan 23, 2021
0c60d8c
Merge branch 'sanity_checks' of https://github.com/NippunSharma/mlpac…
Jan 23, 2021
141b92d
fixes
Jan 23, 2021
0d31176
added check for unsigned matrix
Jan 24, 2021
954f3c4
fixing parameter types while checking
Jan 24, 2021
f5b32bc
removing size_t tests
Jan 25, 2021
7ec4058
removed size_t conditions from CheckInputMatrices()
Jan 26, 2021
16ff91b
made definition of TUPLE_TYPE inline
Jan 26, 2021
458b302
removed TUPLE_TYPE
Jan 26, 2021
ab3110e
minor change
Jan 26, 2021
0914fe9
added PARAM_IN_WITH_NAME
Jan 27, 2021
5b76b2f
changed PARAM_IN_WITH_NAME to PARAM_COMPLETE and editted other macros
Jan 28, 2021
3664422
added comments
Jan 28, 2021
23c3903
Update src/mlpack/core/util/io.cpp
NippunSharma Jan 29, 2021
9a3815f
changed name to PARAM
Jan 29, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
3 changes: 3 additions & 0 deletions HISTORY.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
### mlpack ?.?.?
###### ????-??-??
* Add "check_input_matrices" option to python bindings that checks
for NaN and inf values in all the input matrices (#2787).

* Add Adjusted R squared functionality to R2Score::Evaluate (#2624).

* Disabled all the bindings by default in CMake (#2782).
Expand Down
3 changes: 3 additions & 0 deletions src/mlpack/bindings/python/mlpack/io.pxd
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,9 @@ cdef extern from "<mlpack/core/util/io.hpp>" namespace "mlpack" nogil:
@staticmethod
void ClearSettings() nogil except +

@staticmethod
void CheckInputMatrices() nogil except +

cdef extern from "<mlpack/bindings/python/mlpack/io_util.hpp>" \
namespace "mlpack::util" nogil:
void SetParam[T](string, T&) nogil except +
Expand Down
10 changes: 10 additions & 0 deletions src/mlpack/bindings/python/print_pyx.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -224,6 +224,16 @@ void PrintPYX(const util::BindingDetails& doc,
cout << " IO.SetPassed(<const string> '" << d.name << "')" << endl;
}

// Checking the type of check_input_matrices parameter.
cout << " if not isinstance(check_input_matrices, bool):" << endl;
cout << " raise TypeError(" <<"\"'check_input_matrices\' must have type "
<< "\'bool'!\")" << endl;
cout << endl;

// Before calling mlpackMain(), we check input matrices for NaN values if needed.
cout << " if check_input_matrices:" << endl;
cout << " IO.CheckInputMatrices()" << endl;

// Call the method.
cout << " # Call the mlpack program." << endl;
cout << " mlpackMain()" << endl;
Expand Down
6 changes: 4 additions & 2 deletions src/mlpack/bindings/python/py_option.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -64,8 +64,10 @@ class PyOption
data.required = required;
data.input = input;
data.loaded = false;
// Only "verbose" and "copy_all_inputs" will be persistent.
if (identifier == "verbose" || identifier == "copy_all_inputs")
// Only "verbose", "copy_all_inputs" and "check_input_matrices"
// will be persistent.
if (identifier == "verbose" || identifier == "copy_all_inputs" ||
identifier == "check_input_matrices")
data.persistent = true;
else
data.persistent = false;
Expand Down
96 changes: 96 additions & 0 deletions src/mlpack/bindings/python/tests/test_python_binding.py
Original file line number Diff line number Diff line change
Expand Up @@ -1336,5 +1336,101 @@ def testModelForceCopy(self):
self.assertEqual(output2['model_bw_out'], 20.0)
self.assertEqual(output3['model_bw_out'], 20.0)

def testCheckInputMatricesNaN(self):
"""
Checks that an exception is thrown if the input matrix contains
NaN values.
"""
x = np.random.rand(100, 5)
a = np.random.randint(low=0, high=100)
b = np.random.randint(low=0, high=5)
x[a][b] = np.nan
self.assertRaises(RuntimeError,
lambda : test_python_binding(string_in="hello",
int_in=12,
double_in=4.0,
mat_req_in=[[1.0]],
col_req_in=[1.0],
matrix_in=x,
check_input_matrices=True))

x_vec = np.random.rand(100)
a = np.random.randint(low=0, high=100)
x_vec[a] = np.nan
self.assertRaises(RuntimeError,
lambda : test_python_binding(string_in="hello",
int_in=12,
double_in=4.0,
mat_req_in=[[1.0]],
col_req_in=[1.0],
row_in=x_vec,
check_input_matrices=True))

self.assertRaises(RuntimeError,
lambda : test_python_binding(string_in="hello",
int_in=12,
double_in=4.0,
mat_req_in=[[1.0]],
col_req_in=[1.0],
col_in=x_vec,
check_input_matrices=True))

self.assertRaises(RuntimeError,
lambda : test_python_binding(string_in="hello",
int_in=12,
double_in=4.0,
mat_req_in=[[1.0]],
col_req_in=[1.0],
matrix_and_info_in=x,
check_input_matrices=True))

def testCheckInputMatricesInf(self):
"""
Checks that an exception is thrown if the input matrix contains
inf values.
"""
x = np.random.rand(100, 5)
a = np.random.randint(low=0, high=100)
b = np.random.randint(low=0, high=5)
x[a][b] = np.inf
self.assertRaises(RuntimeError,
lambda : test_python_binding(string_in="hello",
int_in=12,
double_in=4.0,
mat_req_in=[[1.0]],
col_req_in=[1.0],
matrix_in=x,
check_input_matrices=True))
rcurtin marked this conversation as resolved.
Show resolved Hide resolved

x_vec = np.random.rand(100)
a = np.random.randint(low=0, high=100)
x_vec[a] = np.inf
self.assertRaises(RuntimeError,
lambda : test_python_binding(string_in="hello",
int_in=12,
double_in=4.0,
mat_req_in=[[1.0]],
col_req_in=[1.0],
row_in=x_vec,
check_input_matrices=True))

self.assertRaises(RuntimeError,
lambda : test_python_binding(string_in="hello",
int_in=12,
double_in=4.0,
mat_req_in=[[1.0]],
col_req_in=[1.0],
col_in=x_vec,
check_input_matrices=True))

self.assertRaises(RuntimeError,
lambda : test_python_binding(string_in="hello",
int_in=12,
double_in=4.0,
mat_req_in=[[1.0]],
col_req_in=[1.0],
matrix_and_info_in=x,
check_input_matrices=True))

if __name__ == '__main__':
unittest.main()
44 changes: 44 additions & 0 deletions src/mlpack/core/util/io.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -267,3 +267,47 @@ void IO::ClearSettings()
GetSingleton().aliases = persistentAliases;
GetSingleton().functionMap = persistentFunctions;
}

void IO::CheckInputMatrices()
{
typedef typename std::tuple<data::DatasetInfo, arma::mat> TupleType;
std::map<std::string, util::ParamData>::iterator itr;

for (itr = IO::Parameters().begin(); itr != IO::Parameters().end(); ++itr)
{
std::string paramName = itr->first;
std::string paramType = itr->second.cppType;
if (paramType == "arma::mat")
{
IO::CheckInputMatrix(IO::GetParam<arma::mat>(paramName), paramName);
}
else if (paramType == "arma::Mat<size_t>")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think now you can remove these if statements, since it's not possible to find a NaN or Inf in a matrix that has type size_t. 👍

{
IO::CheckInputMatrix(
IO::GetParam<arma::Mat<size_t>>(paramName), paramName);
}
else if (paramType == "arma::vec")
{
IO::CheckInputMatrix(IO::GetParam<arma::vec>(paramName), paramName);
}
else if (paramType == "arma::Col<size_t>")
{
IO::CheckInputMatrix(
IO::GetParam<arma::Col<size_t>>(paramName), paramName);
}
else if (paramType == "arma::rowvec")
{
IO::CheckInputMatrix(IO::GetParam<arma::rowvec>(paramName), paramName);
}
else if (paramType == "arma::Row<size_t>")
{
IO::CheckInputMatrix(
IO::GetParam<arma::Row<size_t>>(paramName), paramName);
}
else if (paramType == "TUPLE_TYPE")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should hunt down why this is TUPLE_TYPE; ideally it should actually be std::tuple<mlpack::data::DatasetInfo, arma::mat>. I think it's because PARAM_MATRIX_AND_INFO_IN() is defined like this:

#define TUPLE_TYPE std::tuple<mlpack::data::DatasetInfo, arma::mat>
#define PARAM_MATRIX_AND_INFO_IN(ID, DESC, ALIAS) \
    PARAM_IN(TUPLE_TYPE, ID, DESC, ALIAS, TUPLE_TYPE(), false)

(That's in src/mlpack/core/util/param.hpp.) What happens if you inline that TUPLE_TYPE definition? Then we should hopefully get the right result here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not able to fully understand what you are saying here. Should I remove the line #define TUPLE_TYPE std::tuple<mlpack::data::DatasetInfo, arma::mat> line and use std::tuple<mlpack::DatasetInfo, arma::mat> directly instead of using TUPLE_TYPE ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, exactly---give that a shot, and then I think paramType will be std::tuple<mlpack::data::DatasetInfo, arma::mat>.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rcurtin the following error is thrown locally after I make this change:
macro "PARAM_IN" passed 8 arguments, but takes just 6
I guess this is because the macro is taking the tuple type as 2 arguments instead of 1. Is this supposed to happen ? I have pushed the changes can you take a look ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right... that must be why I defined TUPLE_TYPE to begin with. Perhaps we can't use the PARAM_IN() macro then. Take a look at how PARAM_MATRIX() is defined:

  #define PARAM_MATRIX(ID, DESC, ALIAS, REQ, TRANS, IN) \
      static mlpack::util::Option<arma::mat> \
      JOIN(io_option_dummy_matrix_, __COUNTER__) \
      (arma::mat(), ID, DESC, ALIAS, "arma::mat", \
      REQ, IN, !TRANS, testName);

You can see that it just constructs an Option<arma::mat> object, and the fifth argument is the name of the type that will be param.cppType. So instead of using PARAM_IN to define PARAM_MATRIX_AND_INFO_IN(), do you think you could try an implementation like that of PARAM_MATRIX instead, which explicitly specifies the fifth parameter to Option<>?

{
IO::CheckInputMatrix(
std::get<1>(IO::GetParam<TupleType>(paramName)), paramName);
NippunSharma marked this conversation as resolved.
Show resolved Hide resolved
}
}
}
14 changes: 14 additions & 0 deletions src/mlpack/core/util/io.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -219,6 +219,15 @@ class IO
template<typename T>
static T& GetRawParam(const std::string& identifier);

/**
* Utility function for CheckInputMatrices().
*
* @param matrix Matrix to check.
* @param identifier Name of the parameter in question.
*/
template<typename T>
static void CheckInputMatrix(const T& matrix, const std::string& identifier);

/**
* Given two (matrix) parameters, ensure that the first is an in-place copy of
* the second. This will generally do nothing (as the bindings already do
Expand Down Expand Up @@ -285,6 +294,11 @@ class IO
*/
static void ClearSettings();

/**
* Checks all input matrices for NaN and inf values, exits if found any.
*/
static void CheckInputMatrices();

private:
//! Convenience map from alias values to names.
std::map<char, std::string> aliases;
Expand Down
12 changes: 12 additions & 0 deletions src/mlpack/core/util/io_impl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,18 @@ T& IO::GetRawParam(const std::string& identifier)
}
}

template<typename T>
void IO::CheckInputMatrix(const T& matrix, const std::string& identifier)
{
std::string errMsg1 = "The input " + identifier + " has NaN values.";
std::string errMsg2 = "The input " + identifier + " has inf values.";

if (matrix.has_nan())
Log::Fatal << errMsg1 << std::endl;
if (matrix.has_inf())
Log::Fatal << errMsg2 << std::endl;
}

} // namespace mlpack

#endif
2 changes: 2 additions & 0 deletions src/mlpack/core/util/mlpack_main.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -230,6 +230,8 @@ PARAM_FLAG("copy_all_inputs", "If specified, all input parameters will be deep"
" copied before the method is run. This is useful for debugging problems "
"where the input parameters are being modified by the algorithm, but can "
"slow down the code.", "");
PARAM_FLAG("check_input_matrices", "If specified, the input matrix is checked for"
" NaN and inf values; an exception is thrown if any are found.", "");

// Nothing else needs to be defined---the binding will use mlpackMain() as-is.

Expand Down