Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize the freud API #179

Closed
bdice opened this issue Jun 21, 2018 · 25 comments
Closed

Standardize the freud API #179

bdice opened this issue Jun 21, 2018 · 25 comments
Labels
documentation enhancement New feature or request
Milestone

Comments

@bdice
Copy link
Member

bdice commented Jun 21, 2018

From roadmap planning session with @vyasr and @bdice. To be assigned after #176 is closed.

Standardize the freud API

Once #176 (doc cleaning) is done, we should identify what we want APIs to look like for everything in freud.

  1. Review existing APIs
  2. Determine a standard for all modules to follow, with minimal exceptions
  3. Create a list of cases where the standard is not currently followed

The behavior of compute/accumulate/property getters is one notable case that should be standardized.

For methods that we want to remove (e.g. getRDF, which should be replaced by a property), we will remove that method from the documentation and add deprecation warnings for version 2.0.

For any class/function where the signature has to change, we will make use of *args, **kwargs to take variable APIs and then dynamically resolve them. Deprecation warnings will be issued wherever appropriate.

@bdice bdice added major enhancement New feature or request documentation and removed major labels Jan 23, 2019
@bdice bdice removed the 2.0 label Feb 9, 2019
@bdice bdice added this to the 2.0 milestone Feb 9, 2019
@csadorf
Copy link
Contributor

csadorf commented Apr 26, 2019

I would expect that similar to the different classes in sklearn, for instance all the different classes in the order module expose the exact same API (as much as possible) so to make it easier to substitute those in one workflow.

@bdice
Copy link
Member Author

bdice commented May 1, 2019

LinkCell.getCell should probably be LinkCell.getCellIndex

@bdice
Copy link
Member Author

bdice commented Jun 6, 2019

The correlation functions expose their data in the property RDF but that is confusing to many users because it's not actually the radial distribution function (it's the correlation function).

@bdice
Copy link
Member Author

bdice commented Jun 6, 2019

  • Computing cluster membership should just be part of the main cluster compute call.

@jinsoo960
Copy link
Contributor

AngularSeparation.computeNeighbor does not allow points=None default argument, while most of the other methods do.
Also, some of the documentation is wrong. The arguments of computeGlobal are ordered in global_ors, ors, equiv_quats, but the documentation does not agree with this.

@jinsoo960
Copy link
Contributor

jinsoo960 commented Jul 2, 2019

BondOrder claims to use cutoff distance, which is passed in as an argument for the constructor as rmax.
However, it is never really used anywhere other than for the rmax_guess for make_default_nlist_nn.
This might lead to unexpected behavior if the user wanted to use a strict cutoff.

@jinsoo960
Copy link
Contributor

jinsoo960 commented Jul 3, 2019

LocalDescriptors documentation seem to be misleading.

  1. num_neighbors and rmax does not seem to be necessary for the constructor, as they are never really used.
  2. The attribute num_neighbors and the paramter num_neighbors passed into compute seem to mean two different things. The parameter num_neighbors is used for the number of neighbors for each point, while the attribute num_neighbors returns the total number of neighbors. It seems more appropriate to name the attribute num_neighbors as num_bonds. Also seems to be mentioned in LocalDescriptors num_neighbors has an unclear name #196.
  3. lmax and rmax are the names of the constructor parameters and l_max and r_max are the names of the attributes.

Also, the tests for LocalDescriptors does not seem to test the correctness of computed values.

@jinsoo960
Copy link
Contributor

jinsoo960 commented Jul 3, 2019

AngluarSeparation also does not really use rmax for cutoff. Also, it uses n for the number of neighbors to consider, while other classes (LocalDescriptors, LocalBondProjection) use names like num_neighbors. And MatchEnv uses k!

LocalBondProjection, TransOrderParameter and HexOrderParameter also does not use rmax for cutoff. It seems like the issue is that make_default_nlist_nn always builds a neighbor list with strict_cut being false, and does not provide an option to change it.
MatchEnv.minRMSDMotif and MatchEnv.matchMotif are not tested.

@jinsoo960
Copy link
Contributor

InterfaceMeasure seem to have its documentation page not well managed.
The properties are redundantly defined in addition to attributes.
We need to add change doc/source/interface.rst so that

.. autoclass:: freud.interface.InterfaceMeasure(box, r_cut)
   :members:

becomes

.. autoclass:: freud.interface.InterfaceMeasure(box, r_cut)
   :members: compute

@jinsoo960
Copy link
Contributor

jinsoo960 commented Jul 3, 2019

List of parameter naming for cutoff distance and number of neighbors. Related to #180.

  • Cluster: rmax
  • GaussianDensity: r_cut
  • LocalDensity: r_cut
  • RDF: rmax
  • FloatCF: rmax
  • ComplexCF: rmax
  • BondOrder: rmax, n
  • LocalDescriptors: rmax, num_neighbors
  • MatchEnv: rmax, k
  • AngularSeparation: rmax, n
  • LocalBondProjection: rmax, num_neighbors
  • InterfaceMeasure : r_cut
  • NeighborList.filter_r: rmax
  • NeighborQuery.query: k
  • NeighborQuery.queryBall: r
  • AABBQuery.query: k, r (used as guess)
  • NearestNeighbors: rmax, n_neigh for constructor num_neighbors for attribute.
  • HexOrderParameter: rmax, n for number of neighbors, and k for symmetry order parameter, and K for attribute of symmetry order parameter.
  • TransOrderParameter: rmax, n for number of neighbors, and k for symmetry order parameter, and K for attribute of normalization value.
  • LocalQl: rmax
  • LocalQlNear: rmax, kn
  • LocalWl: rmax
  • LocalWlNear: rmax, kn
  • SolLiq: rmax
  • SolLiqNear: rmax, kn
  • PMFTR12: r_max

@vyasr
Copy link
Collaborator

vyasr commented Jul 18, 2019

Proposal for final names:

  • Max cutoff distance: r_max
  • Number of neighbors: num_neighbors
  • NeighborQuery or reference points: ref_points
  • points: points
  • query modes: ball, nearest
  • self-exclusion: exclude_ii
  • orientations: orientations (not ors)

Proposal for query_args:
Every class should define a class-level default set of query arguments (for example, PMFTs default to using ball queries). Users can then provide query_args the constructor to set query arguments for all future compute (or accumulate) calls. Alternatively, compute will also accept query arguments. There should be a resolution order for the arguments.

Misc notes:

  • Rename all compute functions to compute, e.g. computeClusters
  • Argument order should always be box, ref_points, [ref_orientations], [ref_values], points, [orientations], [values] (pending changes for NeighborQuery vs (box, ref_points) tuple)

@csadorf
Copy link
Contributor

csadorf commented Jul 18, 2019

Argument order should always be box, ref_points, [ref_orientations], [ref_values], points, [ref_orientations], [ref_values] (pending changes for NeighborQuery vs (box, ref_points) tuple)

Are you listing [ref_orientations] and [ref_values] twice on purpose?

@vyasr
Copy link
Collaborator

vyasr commented Jul 19, 2019

No, that's a typo. Thanks for the catch. I'll edit that comment directly so that we can safely use it as a reference.

@bdice
Copy link
Member Author

bdice commented Jul 23, 2019

Histogram functions like RDF, Correlation Functions, PMFTs should all agree on how bins are specified -- RDF and CFs should use something like nbins instead of dr in their constructor.

@csadorf
Copy link
Contributor

csadorf commented Jul 23, 2019

Histogram functions like RDF, Correlation Functions, PMFTs should all agree on how bins are specified -- RDF and CFs should use something like nbins instead of dr in their constructor.

Are bins always assumed to be linear?

@bdice
Copy link
Member Author

bdice commented Jul 23, 2019

Histogram functions like RDF, Correlation Functions, PMFTs should all agree on how bins are specified -- RDF and CFs should use something like nbins instead of dr in their constructor.

Are bins always assumed to be linear?

Currently yes, we don't have a mechanism for defining bins of uneven spacing like np.histogram. It would also make the bin index calculation O(log N) instead of O(1) because it would have to do something like a binary search over the bins.

@csadorf
Copy link
Contributor

csadorf commented Jul 23, 2019

I'm just bringing this up, because if you don't assume all bins to always be linear, you should probably generalize the API now.

In that sense you should prefer a definition similar to np.histogram:

bins : int or sequence of scalars or str, optional

A argument name bins would therefore be preferable over nbins because you could define it as "number of bins" now and generalize it to "or an array of bin edges" later.

@bdice
Copy link
Member Author

bdice commented Jul 23, 2019

@csadorf That's a great idea. Thanks!

@bdice
Copy link
Member Author

bdice commented Aug 15, 2019

@bdice
Copy link
Member Author

bdice commented Aug 28, 2019

  • box.wrap and related methods should not act in-place on their inputs. They should just return the new arrays.

@bdice
Copy link
Member Author

bdice commented Sep 8, 2019

  • ra_array isn't a great attribute name for RotationalAutocorrelation.

@bdice
Copy link
Member Author

bdice commented Sep 19, 2019

  • preprocess_arguments should be private so it doesn't show in the user-facing API for every PairCompute class.

@bdice
Copy link
Member Author

bdice commented Sep 24, 2019

List of issues discussed with @vyasr

General

  • Module authors should be removed from docstrings and placed in Credits (Docs/move credits #461)
  • Creating a NeighborQuery from a 2D box should fail if z != 0 for any particles (Disallow 2D box with 3D points #462)
  • Classes with no constructor arguments need empty parentheses in the docstrings
  • Box, parallel functions, some pieces of locality module should be in parent namespace

Box

  • Vectors should not be modified in-place (only returned) for Box functions
  • Box.is2D should be a property
  • Box.to_dict should have an example
  • Box functions should all be named in snake_case instead of camelCase
  • Box.makeCoordinates should be renamed Box.make_absolute
  • Box.to_tuple could be removed
  • ParticleBuffer should be renamed to PeriodicBuffer
  • buffer_particles should be renamed to buffer_points

Cluster

  • cluster_COM should be renamed cluster_center
  • cluster_G should be renamed cluster_gyration
  • num_clusters should be removed from ClusterProperties
  • ClusterProperties should be renamed to indicate that it's useful for other non-cluster tasks

Density

  • RDF and CorrelationFunction need better arguments for bounds/range/max
  • CorrelationFunction: Remove box and R properties
  • CorrelationFunction: reset needs better docs
  • CorrelationFunction: escape the underscore in the docstring with query_points
  • CorrelationFunction: rename RDF to correlation
  • Consider reset=True or accumulate=False argument to all the classes supporting accumulation
  • GaussianDensity: rename gaussian_density property to density, remove "Does not accumulate" from docstring
  • LocalDensity: fix docstring "a single data points", figure out if volume matters and remove it if it's just a scalar factor

Order

  • Make order be a system-wide parameter and particle_order a per-particle parameter for all classes

Parallel

  • Rename functions to use snake_case

PMFT

  • Probably don't need to expose r_max anywhere
  • Remove the inverse_jacobian array from PMFTR12
  • Do we need the PCF arrays?

@bdice bdice mentioned this issue Sep 26, 2019
18 tasks
@vyasr vyasr mentioned this issue Sep 27, 2019
9 tasks
@bdice bdice mentioned this issue Sep 27, 2019
14 tasks
@vyasr vyasr mentioned this issue Sep 28, 2019
10 tasks
@bdice bdice mentioned this issue Oct 1, 2019
10 tasks
@vyasr
Copy link
Collaborator

vyasr commented Oct 2, 2019

The only outstanding questions on this issue are:

  • proper naming of ClusterProperties
  • Range-related constructor arguments for the various histograms. Should things like PMFTs accept a tuple specifying the ranges? How would we allow minima, and when would that make sense?
  • Converting all accumulate functions to use compute with a reset flag.

@vyasr vyasr mentioned this issue Oct 2, 2019
10 tasks
@vyasr
Copy link
Collaborator

vyasr commented Oct 2, 2019

Done in #499. accumulate has been removed and the corresponding compute functions accept a reset flag. The other two changes will be made after we get feedback from users during the 2.0 beta.

@vyasr vyasr closed this as completed Oct 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants