Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor ProgramInfo to separate out all the different information #2521

Closed
rcurtin opened this issue Jul 19, 2020 · 6 comments
Closed

Refactor ProgramInfo to separate out all the different information #2521

rcurtin opened this issue Jul 19, 2020 · 6 comments

Comments

@rcurtin
Copy link
Member

rcurtin commented Jul 19, 2020

What is the desired addition or change?

Right now, information for an mlpack binding is specified by the PROGRAM_INFO macro, like this:

PROGRAM_INFO("Principal Components Analysis",
    // Short description.
    "An implementation of several strategies for principal components analysis "
    "(PCA), a common preprocessing step.  Given a dataset and a desired new "
    "dimensionality, this can reduce the dimensionality of the data using the "
    "linear transformation determined by PCA.",
    // Long description.
    "This program performs principal components analysis on the given dataset "
    "using the exact, randomized, randomized block Krylov, or QUIC SVD method. "
    "It will transform the data onto its principal components, optionally "
    "performing dimensionality reduction by ignoring the principal components "
    "with the smallest eigenvalues."
    "\n\n"
    "Use the " + PRINT_PARAM_STRING("input") + " parameter to specify the "
    "dataset to perform PCA on.  A desired new dimensionality can be specified "
    "with the " + PRINT_PARAM_STRING("new_dimensionality") + " parameter, or "
    "the desired variance to retain can be specified with the " +
    PRINT_PARAM_STRING("var_to_retain") + " parameter.  If desired, the "
    "dataset can be scaled before running PCA with the " +
    PRINT_PARAM_STRING("scale") + " parameter."
    "\n\n"
    "Multiple different decomposition techniques can be used.  The method to "
    "use can be specified with the " +
    PRINT_PARAM_STRING("decomposition_method") + " parameter, and it may take "
    "the values 'exact', 'randomized', or 'quic'."
    "\n\n"
    "For example, to reduce the dimensionality of the matrix " +
    PRINT_DATASET("data") + " to 5 dimensions using randomized SVD for the "
    "decomposition, storing the output matrix to " +
    PRINT_DATASET("data_mod") + ", the following command can be used:"
    "\n\n" +
    PRINT_CALL("pca", "input", "data", "new_dimensionality", 5,
        "decomposition_method", "randomized", "output", "data_mod"),
    SEE_ALSO("Principal component analysis on Wikipedia",
        "https://en.wikipedia.org/wiki/Principal_component_analysis"),
    SEE_ALSO("mlpack::pca::PCA C++ class documentation",
        "@doxygen/classmlpack_1_1pca_1_1PCA.html"));

That's from src/mlpack/methods/pca_main.cpp. You can see there are four things specified here:

  1. The name of the binding
  2. The short description
  3. The long description
  4. Some number of "see also" links used in the generated documentation

This information is used to generate help for every language, and also the online documentation too: https://www.mlpack.org/doc/mlpack-git/cli_documentation.html#pca

While working on the R bindings, @Yashwants19 and @coatless found that it would be really useful to split the examples out: Yashwants19#8 (comment)

I think it would be useful to refactor this so that this information isn't contained in just one call to PROGRAM_INFO() but perhaps a series of calls.

What is the motivation for this feature?

As time goes on and as we add more binding type, it may become useful to add additional metadata to these bindings, and the single PROGRAM_INFO() macro is getting a bit unwieldy.

If applicable, describe how this feature would be implemented.

Here is a proposal of what we could do from here:

BINDING_NAME("program name");
BINDING_SHORT_DESC("This is a short, two-sentence description of what the program does.");
BINDING_LONG_DESC("This is a long description of what the program does."
    " It might be many lines long and have lots of details about different options.");
BINDING_EXAMPLE("This contains one example for this particular binding.\n" +
    PROGRAM_CALL(...));
BINDING_EXAMPLE("This contains another example for this particular binding.\n" +
    PROGRAM_CALL(...));
// There could be many of these "see alsos".
BINDING_SEE_ALSO("https://en.wikipedia.org/wiki/Machine_learning");

So, we would have what's written above instead of PROGRAM_INFO().

Additional information?

I didn't provide any implementation details but it should be straightforward. Right now PROGRAM_INFO() creates a singleton that "registers" itself with IO during construction. We could just have several of these instead, one for the name one for the short description, one for the long description, etc., etc.

@Yashwants19
Copy link
Member

Hi @rcurtin, Can we use a single macro for EXAMPLE() and SEE_ALSO()?
e.g.

// Example.
BINDING_EXAMPLE(
    EXAMPLE("This contains one example for this particular binding.\n" +
    PROGRAM_CALL(...)),
    EXAMPLE("This contains another example for this particular binding.\n" +
    PROGRAM_CALL(...))
);

// See Also...
BINDING_SEE_ALSO(
    SEE_ALSO("https://en.wikipedia.org/wiki/Machine_learning"), ...
);

@rcurtin
Copy link
Member Author

rcurtin commented Jul 21, 2020

I dunno, at least personally I think it's nicer to have a different call to EXAMPLE() for every individual example. If the question comes from how to generate unique symbols, we could use __COUNTER__ like is done with the PARAM_*() macros.

@Yashwants19
Copy link
Member

I think it's nicer to have a different call to EXAMPLE() for every individual example

Sounds good, I will follow the same approach for refactoring the documentation.

@mlpack-bot
Copy link

mlpack-bot bot commented Aug 20, 2020

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍

@mlpack-bot mlpack-bot bot added the s: stale label Aug 20, 2020
@rcurtin
Copy link
Member Author

rcurtin commented Aug 20, 2020

We can close this now that #2558 is merged. Thanks again @Yashwants19!

@rcurtin rcurtin closed this as completed Aug 20, 2020
@Yashwants19
Copy link
Member

Thanks @rcurtin for all the help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants