Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mldb 1209 statstable fncts noise injection #884

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

guyd
Copy link
Contributor

@guyd guyd commented May 3, 2017

This is Francois' branch with a change by vbisserie to make it compile and a unit test for the noise generation functions. A small issue was found in the NoiseInjector::add_noise where the rounding was always done to the lower value.

Francois Maillet and others added 12 commits July 3, 2016 16:47
…d laplacian noise to the counts to allow the usage of stats tables training examples to train a classifier
Conflicts:
	plugins/stats_table_procedure.cc
	testing/testing.mk
Conflicts:
	jml/math/xdiv.h
	plugins/stats_table_procedure.cc
	testing/MLDB-873_stats_table_test.py
	testing/testing.mk
…l issue where the rounding was always done to the lower value
…l issue where the rounding was always done to the lower value
…l issue where the rounding was always done to the lower value
@@ -372,11 +376,19 @@ run(const ProcedureRunConfig & run,
/* STATS TABLE FUNCTION */
/*****************************************************************************/

static const std::string INJECT_NOISE_DOC_STR =
"Inject laplacian noise to counts. This is useful when training "
"a classifier on the examples that were used to genereate the counts. "
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

genereate -> generate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we say that this is acceptable only if the count values are relatively large compare to the noise? How large?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we add more on label data leakage in the documentation of stats table functions. I feel this comment is hard to understand without an example.


mldb = mldb_wrapper.wrap(mldb) # noqa

class MLDB1209StatstableBiasNoiseTest(MldbUnitTest): # noqa
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had not realized that this is also lacking some validation point. I will add these too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done that.

{
}

bool injectNoise;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're trying to transition to C++ 11 member initialization (i.e. bool injectNoise = false)

@@ -245,6 +249,7 @@ struct StatsTablePosNegFunctionConfig {

std::string outcomeToUse;

bool injectNoise;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same.


struct NoiseInjector {

NoiseInjector() : mu(0), b(3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initialize defaults in the declaration and kill the constructor.

Copy link
Contributor

@mathieumb mathieumb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 after very minor changes to constructor / member initialization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants