Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shorter name for NodematchFilter()? #480

Open
krivit opened this issue Aug 8, 2022 · 10 comments
Open

Shorter name for NodematchFilter()? #480

krivit opened this issue Aug 8, 2022 · 10 comments

Comments

@krivit
Copy link
Member

krivit commented Aug 8, 2022

In light of #478, NodematchFilter() operator, which I had originally put together to test the API, may be worth making more prominent, and that might involve renaming it to something more concise. Some candidates, in no particular order:

  1. FNodematch()
  2. NodematchF()
  3. FMatch()
  4. MatchF()
  5. Nodematch()*
  6. Match()*
  7. OnMatch()
  8. OnNodematch()

* --- These candidates may be error prone in that they differ from the match() and nodematch() term by capitalisation alone.

Any preferences?
@sgoodreau , @martinamorris , @CarterButts , @drh20drh20 , @handcock , @chad-klumb

@martinamorris
Copy link
Member

martinamorris commented Aug 9, 2022 via email

@chad-klumb
Copy link
Contributor

If it's a special case of the F operator I'd be inclined to just use that operator instead (possibly spelled out to Filter).

If it needs its own name, 1, 2, or their analogues with F replaced by Filter would be my preferences.

@sgoodreau
Copy link
Contributor

I don't feel like I have much sense of the whole background to give the best feedback. It sounds like "filter" exists as a term in the new statnet packages for the concept of including only parts of the network (via the F operator, but also as a general idea). In that case, it seems good to include F or Filter. Is the "on" prefix already in there somewhere as well, conceptually?

Definitely do not make it just Match or Nodematch

@krivit
Copy link
Member Author

krivit commented Aug 10, 2022

Thanks for the feedback! The arguments for having a separate term are two:

  1. Separate term is more optimised, in that the general F has to keep track of two models, one for the filtering, the other for the evaluation. It makes sense to optimise common special cases. That having been said, this can be accomplished by detecting the node match case and branching to optimised code for that.
  2. Brevity: if the call is shortened, the special case would be shorter and cleaner:
    NodematchF(~gwesp, "a")
    F(~gwesp, ~nodematch("a"))

@martinamorris
Copy link
Member

That having been said, this can be accomplished by detecting the node match case and branching to optimised code for that.

I'm now a bit confused about what the options are. But if the above means you can use the existing "F" operator syntax, rather than creating a new operator for this special case, and still get the optimised code, then that seems like the best soln to me.

@krivit
Copy link
Member Author

krivit commented Aug 10, 2022

I'm now a bit confused about what the options are. But if the above means you can use the existing "F" operator syntax, rather than creating a new operator for this special case, and still get the optimised code, then that seems like the best soln to me.

NodematchFilter() is already there and in fact precedes the more general F() operator chronologically, so I guess one of the options is to deprecate it in favour of F().

The question is, then, from a user interface perspective, is there benefit to having a shortcut for a common case?

@martinamorris
Copy link
Member

Shorter by 3 characters? (if you're referring to the 2 options you show a couple of comments back). I'm thinking not worth it, as long as the optimised code is still used for this case.

@krivit
Copy link
Member Author

krivit commented Aug 10, 2022

Fewer parentheses, too. :-)

I'll see about the special case.

@CarterButts
Copy link

CarterButts commented Oct 11, 2022 via email

@krivit
Copy link
Member Author

krivit commented Oct 11, 2022

Probably best to avoid having official terms that differ only by capitalization - that is likely to lead to issues.

The pattern so far is that ordinary terms are lowercase and snake-cased whereas term operators (i.e., terms that take ergm formulas as arguments) are capitalised and camel-cased. So far, the only naming conflict in effect is Sum() the linear combination operator and sum() the valued term that sums all dyads.

In retrospect, perhaps we should have named Sum() something else (LinCom()?).

NodematchFilter(formula, attr) evaluates formula on a network constructed by taking the LHS network and removing all edges which don't match on attr.

I wouldn't say that we are specifying a language above and beyond what we've already done with basic ergm terms. The only "API" here is that during my sabbatical in 2017 I implemented an API that enabled C change statistics to store and update their internal states, and then I realised that there was no rule against passing an ergm model to an ergm term, both on the R and on the C side, which means that one could implement terms (operators) that wrapped other terms, modifying their inputs (the network), their outputs (the change scores), and their parametrisation (the curved mapping).

That API has evolved since then, but fundamentally that's all there is to it. At this point, we are discussing specific operators and their user interfaces.

The most up to date API document is in a series of ergm package vignettes, particularly the ones in master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants