Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add better print behavior for low_order_moments() #7

Closed
triskadecaepyon opened this issue Nov 1, 2018 · 5 comments
Closed

Add better print behavior for low_order_moments() #7

triskadecaepyon opened this issue Nov 1, 2018 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@triskadecaepyon
Copy link
Contributor

The low order moments class has the advantage of being able to process an entire array of features, but does not make it easy to see the results without individually selecting it.

For printing the entire result array, it would be preferred to print ALL of the results if using the print() function.

metrics_processor = d4p.low_order_moments()
data = metrics_processor.compute(dataset.values)
print(data) # this does not do anything smart at the moment
print(data.standardDeviation) # but this does print the result array

From a data scientist perspective, better printing behavior would be useful especially in the Jupyter Notebook arena. Consequently if you add Dataframe support, it might make it easier for easier printing.

@triskadecaepyon triskadecaepyon added the enhancement New feature or request label Nov 1, 2018
@oleksandr-pavlyk
Copy link
Contributor

It's a matter of implementing __str__ method for the result.

The result may contain the following fields: minimum, maximum, mean, standardDeviation, variance, variation, sum, sumSquared, sumSquaresCentered, secondOrderRawMoment.

I suppose the default 'smart' printing would display min, max, mean and std if available, kind of like summary function does in R.

It makes sense to arrange the data into a table. Perhaps reuse pandas for printing?

How about a code that maps low_order_result to a pandas's dataframe?

@fschlimb
Copy link
Contributor

fschlimb commented Jan 10, 2019

I would like to think about the broader picture. The same 'issue' exists for all results/models.

I suggest adding a __str__ method to all results and models

  • printing attribute names and value for standard/trivial types (integer, str, float etc)
  • printing attribute name and a summary for non-trivial, non-daal types
    • 'array' (or similar), dtype and shape for numpy arrays
    • 'dict' (or alike) and size
    • as far as I can tell there are no other non-trivial, non-daal types used
  • printing attribute name and daal type for daal types (like models)

Attributes which are not expanded (complex and daal types) can be expanded by explicitly printing them.

A conversion to a pandas data-frame seems applicable only in very special cases, like low order moments. A conversion to a dict looks more generic. pandas allows creating DFs from dicts.

@Alexander-Makaryev

@fschlimb
Copy link
Contributor

For algorithms we could print the parameters. This is more involved and probably less important.

This was referenced Jan 31, 2019
@triskadecaepyon
Copy link
Contributor Author

Yes, I'll second @fschlimb 's findings. It is mostly about having str methods when looking at a summary, and accessors to the values in the right type when needing a specific one (mean, variance as examples).

@fschlimb
Copy link
Contributor

fschlimb commented Feb 6, 2019

done: generic print capability added for results and model through #48

@fschlimb fschlimb closed this as completed Feb 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants