Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reformatting and naming #94

Merged
merged 11 commits into from
Apr 27, 2021
Merged

Reformatting and naming #94

merged 11 commits into from
Apr 27, 2021

Conversation

rosecers
Copy link
Collaborator

@rosecers rosecers commented Apr 8, 2021

This is an on-going PR with a few goals (from https://scikit-learn.org/stable/developers/develop.html#coding-guidelines and elsewhere)

  • Rename all module files using a leading underscore
  • Abbreviations should be used in filenames only when commonplace
  • Imports should be organized based upon pep8
  • Every keyword argument accepted by init should correspond to an attribute on the instance.
  • In init, there should be no logic, not even input validation, and the parameters should not be changed. The corresponding logic should be put where the parameters are used, typically in fit.
  • Naming Attributes
    • Estimated attributes have a trailing underscore
    • Private attributes have a leading underscore
    • All other attributes have no leading or trailing underscore
  • Use underscores to separate words in non class names: n_samples rather than nsamples.
  • Avoid multiple statements on one line. Prefer a line return after a control flow statement (if/for).
  • Use relative imports for references inside scikit-learn.
  • Unit tests are an exception to the previous rule; they should use absolute imports, exactly as client code would. A corollary is that, if sklearn.foo exports a class or function that is implemented in sklearn.foo.bar.baz, the test should import it from sklearn.foo.
  • Please don’t use import * in any case. It is considered harmful by the official Python recommendations. It makes the code harder to read as the origin of symbols is no longer explicitly referenced, but most important, it prevents using a static analysis tool like pyflakes to automatically find bugs in scikit-learn.
  • Docstring formatting: (https://scikit-learn.org/stable/developers/contributing.html)
    • Use the numpy docstring standard in all your docstrings.
    • Use Python basic types. (bool instead of boolean)
    • Use parenthesis for defining shapes: array-like of shape (n_samples,) or array-like of shape (n_samples, n_features)
    • For strings with multiple options, use brackets: input: {'log', 'squared', 'multinomial'}
    • 1D or 2D data can be a subset of {array-like, ndarray, sparse matrix, dataframe}. Note that array-like can also be a list, while ndarray is explicitly only a numpy.ndarray.
    • When specifying the dtype of an ndarray, use e.g. dtype=np.int32 after defining the shape: ndarray of shape (n_samples,), dtype=np.int32. You can specify multiple dtype as a set: array-like of shape (n_samples,), dtype={np.float64, np.float32}. If one wants to mention arbitrary precision, use integral and floating rather than the Python dtype int and float. When both int and floating are supported, there is no need to specify the dtype.
    • When the default is None, None only needs to be specified at the end with default=None. Be sure to include in the docstring, what it means for the parameter or attribute to be None.
  • Do we use sklearn utilities where warranted? https://scikit-learn.org/stable/developers/utilities.html#developers-utils

Some of these things I anticipate will take place in other PRs, such as the docstring reformatting in #82 and the everything reformatting in #93.

examples/PCovR.ipynb Outdated Show resolved Hide resolved
@Luthaf
Copy link
Collaborator

Luthaf commented Apr 8, 2021

Merging this PR will be a bit painful, since it might introduce a lot conflicts with other PR. Should we try to get this one through as quickly as possible (and rebase other PR), or wait for other PR to be merged first and rebase this one?

EDIT: missed you edit in the top comment 😃. So the plan is to do all of this piecewise? That sounds good!

@rosecers
Copy link
Collaborator Author

rosecers commented Apr 8, 2021

Merging this PR will be a bit painful, since it might introduce a lot conflicts with other PR. Should we try to get this one through as quickly as possible (and rebase other PR), or wait for other PR to be merged first and rebase this one?

This one is still a WIP. I'd like #93 and #82 in first.

@rosecers rosecers force-pushed the reformatting_and_naming branch 6 times, most recently from 6c373b4 to c59f946 Compare April 9, 2021 08:33
@rosecers
Copy link
Collaborator Author

rosecers commented Apr 9, 2021

So I think my plan is this, because this checklist was easier to get through than I anticipated. Given that skcosmo/feature_selection, skcosmo/sample_selection and skcosmo/preprocessing are all going to change monumentally with #93 and #82, I'll hold off merging this until those are in. However, I'll make sure to enforce these formatting rules in those (and all future PR's) so that rebasing will be easy. @Luthaf , thoughts?

@Luthaf
Copy link
Collaborator

Luthaf commented Apr 9, 2021

I'll make sure to enforce these formatting rules in those

Enforce them during the review you mean? Or automatically? I'm good if we enforce them manually during a review, using automatic tools would be harder.

@rosecers
Copy link
Collaborator Author

rosecers commented Apr 9, 2021

I'll make sure to enforce these formatting rules in those

Enforce them during the review you mean? Or automatically? I'm good if we enforce them manually during a review, using automatic tools would be harder.

100% agree. I'll add to our internal resources this list for comparison

@rosecers
Copy link
Collaborator Author

rosecers commented Apr 9, 2021

Note that not touching the files in #82 and #93 will cause this to fail CI until rebase.

rosecers added a commit that referenced this pull request Apr 9, 2021
rosecers pushed a commit that referenced this pull request Apr 9, 2021
added atol, added tests for atol and rtol, expanded and unified documentation, added description of methods

change the default variables and docs

Applying formatting consistent with #94
@rosecers rosecers force-pushed the reformatting_and_naming branch 2 times, most recently from 9d2415d to 3135583 Compare April 9, 2021 15:29
rosecers pushed a commit that referenced this pull request Apr 15, 2021
added atol, added tests for atol and rtol, expanded and unified documentation, added description of methods

change the default variables and docs

Applying formatting consistent with #94
rosecers pushed a commit that referenced this pull request Apr 15, 2021
added atol, added tests for atol and rtol, expanded and unified documentation, added description of methods

replaced the absolute tolerance with a relative tolerance

added atol, added tests for atol and rtol, expanded and unified documentation, added description of methods

change the default variables and docs

Applying formatting consistent with #94

added copy parameter
rosecers pushed a commit that referenced this pull request Apr 15, 2021
added atol, added tests for atol and rtol, expanded and unified documentation, added description of methods

replaced the absolute tolerance with a relative tolerance

added atol, added tests for atol and rtol, expanded and unified documentation, added description of methods

change the default variables and docs

Applying formatting consistent with #94

added copy parameter
rosecers pushed a commit that referenced this pull request Apr 15, 2021
added atol, added tests for atol and rtol, expanded and unified documentation, added description of methods

replaced the absolute tolerance with a relative tolerance

added atol, added tests for atol and rtol, expanded and unified documentation, added description of methods

change the default variables and docs

Applying formatting consistent with #94

added copy parameter
rosecers pushed a commit that referenced this pull request Apr 15, 2021
added atol, added tests for atol and rtol, expanded and unified documentation, added description of methods

replaced the absolute tolerance with a relative tolerance

added atol, added tests for atol and rtol, expanded and unified documentation, added description of methods

change the default variables and docs

Applying formatting consistent with #94

added copy parameter
rosecers pushed a commit that referenced this pull request Apr 15, 2021
added atol, added tests for atol and rtol, expanded and unified documentation, added description of methods

replaced the absolute tolerance with a relative tolerance

added atol, added tests for atol and rtol, expanded and unified documentation, added description of methods

change the default variables and docs

Applying formatting consistent with #94

added copy parameter
rosecers added a commit that referenced this pull request Apr 16, 2021
* replaced the absolute tolerance with a relative tolerance

added atol, added tests for atol and rtol, expanded and unified documentation, added description of methods

replaced the absolute tolerance with a relative tolerance

added atol, added tests for atol and rtol, expanded and unified documentation, added description of methods

change the default variables and docs

Applying formatting consistent with #94

added copy parameter

* Changed rtol and atol defaults

Co-authored-by: rosecers <rosecersonsky@gmail.com>
@rosecers rosecers marked this pull request as ready for review April 16, 2021 14:27
Copy link
Collaborator

@agoscinski agoscinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

I think only preprocessing/flexible_scaler.py should be renamed to preprocessing/_data.py if we follow sklearn with their StandardScaler, what do you think?
https://github.com/scikit-learn/scikit-learn/blob/95119c13a/sklearn/preprocessing/_data.py#L564

EDIT: It can also exist in its own file, I just think it should also have an underscore following the naming convention of sklearn

@rosecers
Copy link
Collaborator Author

Looks good.

I think only preprocessing/flexible_scaler.py should be renamed to preprocessing/_data.py if we follow sklearn with their StandardScaler, what do you think?
https://github.com/scikit-learn/scikit-learn/blob/95119c13a/sklearn/preprocessing/_data.py#L564

EDIT: It can also exist in its own file, I just think it should also have an underscore following the naming convention of sklearn

Ooh good catch! I was planning on this, following #91, just hadn't remembered to do so.

@agoscinski
Copy link
Collaborator

agoscinski commented Apr 21, 2021

Do we use sklearn utilities where warranted? https://scikit-learn.org/stable/developers/utilities.html#developers-utils

I think we are not doing this in the moment in the test

If your code relies on a random number generator, it should never use functions like numpy.random.random or numpy.random.normal. This approach can lead to repeatability issues in unit tests. Instead, a numpy.random.RandomState object should be used, which is built from a random_state argument passed to the class or function. The function check_random_state, below, can then be used to create a random number generator object.

namely in

  • test_linear_model.py (random.seed)
  • test_metrics.py (random.seed)
  • test_kernel_normalizer.py (random.uniform)
  • test_orthogonalizers.py (random.uniform)
  • test_sparse_kernel_centerer.py (random.uniform)
  • test_standard_flexible_scaler.py (random.uniform)
  • skcosmo/sample_selection/_voronoi_fps.py (random.randint)
  • skcosmo/_selection.py (random.randint)

Its a bit frustrating that they only mention that this could lead to issues, but don't say what kind of issues. But it make sense to have a consistent random state functionality. I already have changed test_metrics.py and test_linear_model.py because these one require a random orthogonal matrix, so I had to switch to use sklearn.utils.extmath.randomized_range_finder, since RandomState seed does not effect numpy seeds.

I think the other points don't match any part of the code.

EDIT: Sorry I misclicked when commenting

@agoscinski agoscinski closed this Apr 21, 2021
@agoscinski agoscinski reopened this Apr 21, 2021
…nd made requisite changes

Renaming utils with leading underscore
…bute on the instance.

 In init, there should be no logic, not even input validation, and the parameters should not be changed. The corresponding logic should be put where the parameters are used, typically in fit.

Adding fit before checking alpha
checked for absolute imports in tests
checked for import * anywhere
@rosecers
Copy link
Collaborator Author

@Luthaf something fishy is going on with black here. On both my source-built scikit-cosmos, black passes, but doesn't in CI. Thoughts?

Changed all random instances to RandomStates
Copy link
Collaborator

@agoscinski agoscinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me

@rosecers rosecers merged commit 55d80f3 into main Apr 27, 2021
@rosecers rosecers deleted the reformatting_and_naming branch April 27, 2021 10:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants