Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the syntax only for side effect #30

Closed
renkun-ken opened this issue Aug 18, 2014 · 28 comments
Closed

Add the syntax only for side effect #30

renkun-ken opened this issue Aug 18, 2014 · 28 comments
Assignees
Milestone

Comments

@renkun-ken
Copy link
Owner

Consider the following syntax:

x %>>% (~ expr)         # evaluate expr with . = x and return x
x %>>% ((m) ~ expr)     # evaluate expr with m = x and return x
mtcars %>>%
  (~ cat("Number of columns:",ncol(.),"\n")) %>>%
  (mpg) %>>%
  summary
Number of columns: 11 
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  10.40   15.42   19.20   20.09   22.80   33.90 

or

mtcars %>>%
  ((x) ~ cat("Number of columns:",ncol(x),"\n")) %>>%
  (mpg) %>>%
  summary
Number of columns: 11 
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  10.40   15.42   19.20   20.09   22.80   33.90 

where (~ expr) or ((x) ~ expr) indicates that the output of this will be ignored and the input will be returned, thus only for side effect (only one side is stressed in the formula, also looks like expr is evaluated as a side branch)

Note that all syntax in () automatically applies to .() in Pipe, therefore,

Pipe(mtcars)$
  .(~ cat("Number of columns:",ncol(.),"\n"))$
  .(mpg)$
  summary()
Number of columns: 11 
$value : summaryDefault table 
------
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  10.40   15.42   19.20   20.09   22.80   33.90 
@renkun-ken renkun-ken self-assigned this Aug 18, 2014
@renkun-ken
Copy link
Owner Author

A practical case is to make a plot of a linear model before which the partition is set.

mtcars %>>%
  (~ par(mfrow=c(2,2))) %>>%    # only for side effect (par() returns the arg list)
  (lm(mpg ~ cyl + wt, data = .)) %>>%
  plot()

@renkun-ken
Copy link
Owner Author

Other examples:

mtcars %>>%
  (~ par(mfrow=c(1,2))) %>>%
  (~ plot(mpg ~ cyl, data = .)) %>>%
  (~ plot(mpg ~ wt, data = .)) %>>%
  (lm(mpg ~ cyl + wt, data = .)) %>>%
  summary() %>>%
  (coefficients)
Pipe(mtcars)$
  .(~ par(mfrow=c(1,2)))$
  .(~ plot(mpg ~ cyl, data = .))$
  .(~ plot(mpg ~ wt, data = .))$
  .(lm(mpg ~ cyl + wt, data = .))$
  summary()$
  .(coefficients)

Do you think it is useful?
Do you think it looks ambiguous even if you know the rule and what it means?

@timelyportfolio @ramnathv @yanlinlin82

@timelyportfolio
Copy link

Is the question whether to have this functionality at all or what syntax is best?

I will very clearly demonstrate my ignorance here, but just to make sure I am clear this would accomplish the objective of the magrittr %T>% tee operator? I looked quickly for an equivalent in F#, but could not find any readily available discussions or examples. Are there parallels in F# or other languages where we could borrow the syntax?

Although I use it rarely, it is very nice to have in those rare use cases even beyond logging. I will try to work up the examples where I find it handy and see how this syntax looks. Also, as I work through many examples, see how sticky it is?

I am assuming that deprecation of lambda #31 will be mandatory to prevent confusion.

renkun-ken added a commit that referenced this issue Aug 18, 2014
@renkun-ken
Copy link
Owner Author

Yes, it is like magrittr's %T>% operator for side effect and it is not from F# or any other language as far as I know. It helps avoid breaking pipes in some cases where we want some side effects in between, sometimes helpful for me.

magrittr introduces a new operator to do this and more operators to do other things. At early times, I saw only one or two operators in magrittr, and now I see 5 or 6. Instead of introducing new operators, I would like to carefully introduce new syntax that is not confusing and not easily abused.

The feature has been committed to branch 0.4. Would you please try it and give some suggestions? Thanks a lot!

@timelyportfolio
Copy link

I updated to the newest 0.4 and will test. In the past, I used most with reference classes (R5).

@renkun-ken
Copy link
Owner Author

Thanks! If you think there is better syntax, please let me know. If the feature costs more than the value it brings, it would not go to master.

@timelyportfolio
Copy link

Figured I would borrow some code from a package that uses reference classes so I arbitrarily chose lme4. Here is a small snippet where I try to overuse the side effect functionality.

#think this will be useful for reference classes (R5)
install.packages('lme4')
library(lme4)

#borrow from lme4 vignette to test side effects operator
#str(sleepstudy)
#fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
sleepstudy %>>% 
  ( ~ str(.) ) %>>%  #note ( ~ str ) does not print str but still passes through
  #found this in the .Rnw but code is not in final vignette output  
  (~ 
     print(lattice::xyplot(Reaction ~ Days | Subject, ., aspect = "xy",
                    layout = c(9, 2), type = c("g", "p", "r"),
                    index.cond = function(x, y) coef(lm(y ~ x))[2],
                    xlab = "Days of sleep deprivation",
                    ylab = "Average reaction time (ms)",
                    as.table = TRUE))
  ) %>>%
  { lmer( Reaction ~ Days + ( Days | Subject ), . ) } %>>%
  ( ~assign( "fm1", ., envir = .GlobalEnv ) )


#the hard way to accomplish the fm1 above
# formula module
#   parsedFormula <- lFormula(formula = Reaction ~ Days + (Days|Subject),
#                                data = sleepstudy)
# 
#   # objective function module
#   devianceFunction <- do.call(mkLmerDevfun, parsedFormula)
# 
#   # optimization module
#   optimizerOutput <- optimizeLmer(devianceFunction)
# 
#   # output module
#   mkMerMod( rho = environment(devianceFunction),
#                opt = optimizerOutput,
#                reTrms = parsedFormula$reTrms,
#                fr = parsedFormula$fr)

#probably not a likely candidate for pipelining but do it nevertheless
#don't know enough yet about lme4 design to recode yet
sleepstudy %>>%
  ( ~ print("# formula module")) %>>%
  { 
    lFormula (
      formula = Reaction ~ Days + (Days|Subject)
      , data = .
    )
  } %>>%
  ( ~ assign( "parsedFormula", ., envir = .GlobalEnv ) ) %>>%
  ( ~ cat( "test parsedFormula$frame == fm1@frame" ) )%>>%
  ( ~ testthat::is_identical_to(fm1@frame,parsedFormula$fr) ) %>>%
  ( ~ print( "optimization module" ) ) %>>%
  { do.call( mkLmerDevfun, . ) } %>>%
  ( ~ assign( "devianceFunction", ., envir = .GlobalEnv ) ) %>>%
  ( ~ print( "output module" ) ) %>>%
  optimizeLmer %>>%
  {
    mkMerMod (
      rho = environment( devianceFunction )
      ,opt = .
      ,reTrms = parsedFormula$reTrms,
      ,fr = parsedFormula$fr
    )
  }

@timelyportfolio
Copy link

Another use similar to logging/documenting that I had not considered would be to test in the pipeline with testthat or rtype.

@renkun-ken
Copy link
Owner Author

Here's a pseudo computing example :)

Pipe(1:3)$
    .(~ cat("connect",length(.),"elements with 2 more\n"))$
    .(~ Sys.sleep(1))$
    c(4,5)$
    .(~ cat("calculating mean\n"))$
    .(~ Sys.sleep(1))$
    mean()

@timelyportfolio
Copy link

hardest thing for me so far has been ~ inside of () rather than ~(), but the ( ~ ) makes more sense to me. just harder to type for some reason (probably muscle memory).

@renkun-ken
Copy link
Owner Author

It's sad that ~(expr) will be parsed to break the evaluation order which does not allow chaining.

> as.list(quote(a %>>% ~(x) %>>% y()))
[[1]]
`%>>%`

[[2]]
a

[[3]]
~(x) %>>% y()

Neither does ~ expr work:

> as.list(quote(a %>>% ~x %>>% y()))
[[1]]
`%>>%`

[[2]]
a

[[3]]
~x %>>% y()

And for (~expr):

> as.list(quote(a %>>% (~x) %>>% y()))
[[1]]
`%>>%`

[[2]]
a %>>% (~x)

[[3]]
y()

@renkun-ken
Copy link
Owner Author

In the syntax I designed, () is the feature-hub that supports more than one features dependent on the inner syntax, which may bring potential confusion though. But so far it looks unlikely that someone mistakenly use a feature if one does not know it.

@timelyportfolio
Copy link

btw, I like the computing example...

I found the lattice plot in the lme4 vignette by digging in the .Rnw source file, so I added to the example above, but thought it would be good to paste separately to see how it looks in isolation.

sleepstudy %>>% 
  ( ~ str(.) ) %>>%  #note ( ~ str ) does not print str but still passes through
  #found this in the .Rnw but code is not in final vignette output  
  (~ 
    print(lattice::xyplot(Reaction ~ Days | Subject, ., aspect = "xy",
                    layout = c(9, 2), type = c("g", "p", "r"),
                    index.cond = function(x, y) coef(lm(y ~ x))[2],
                    xlab = "Days of sleep deprivation",
                    ylab = "Average reaction time (ms)",
                    as.table = TRUE))
  ) %>>%
  { lmer( Reaction ~ Days + ( Days | Subject ), . ) } %>>%
  ( ~ assign( "fm1", ., envir = .GlobalEnv ) ) %>>%
  #test some nested calls with the profile from vignette conclusion
  ( ~ profile( . ) %>>% { print(lattice::splom(.) ) } )

and then if I have it right as a Pipe.

Pipe(sleepstudy)$
  .( ~ str(.) )$
  .( ~ 
     print(lattice::xyplot(Reaction ~ Days | Subject, ., aspect = "xy",
                           layout = c(9, 2), type = c("g", "p", "r"),
                           index.cond = function(x, y) coef(lm(y ~ x))[2],
                           xlab = "Days of sleep deprivation",
                           ylab = "Average reaction time (ms)",
                           as.table = TRUE))
  )$
  .( lmer( Reaction ~ Days + ( Days | Subject ), . ) )$
  .( ~ assign( "fm1", ., envir = .GlobalEnv ) )$
  .( ~ profile( . ) %>>% { print(lattice::splom(.)) } )[]

@renkun-ken
Copy link
Owner Author

Another step-by-step plotting example:

m <- data.frame(x=1:100,y=rnorm(100))
par(mfrow=c(2,2))
Pipe(m)$
  .(~ plot(y ~ x, data = .))$
  transform(z = y^2)$
  .(~ plot(y ~ z, data = .))$
  transform(w = (y + z))$
  .(~ plot(y ~ w, data = .))$
  transform(q = sin(x)+cos(y))$
  .(~ plot(y ~ q, data = .))

@renkun-ken
Copy link
Owner Author

I consider the main use of this feature is to

  • avoid breaking the pipe when I suddenly want to do something between two pipes
  • do some logging
  • show some intermediate results (numbers, plots, etc.)

The (~ expr) syntax seems to be easy to distinguish from non-side effect use and unlikely to be mistakenly used. Can't imagine a user type this syntax without knowing what it means.

@timelyportfolio
Copy link

lambda conflict (which I think you have decided to deprecate/eliminate) really is the only side effect of the side effect that I have thought of

@timelyportfolio
Copy link

I cannot think of much more to throw at it than this monstrosity replicating a post I had done previously.

# from timelyportfolio lme4 error bar post
# http://timelyportfolio.github.io/rCharts_errorbar/ucla_melogit.html
"http://www.ats.ucla.edu/stat/data/hdp.csv" %>>%
  read.csv %>>%
  within( {
    Married <- factor(Married, levels = 0:1, labels = c("no", "yes"))
    DID <- factor(DID)
    HID <- factor(HID)
  } ) %>>%
  {
    glmer(remission ~ Age + LengthofStay + FamilyHx + IL6 + CRP +
          CancerStage + Experience + (1 | DID) + (1 | HID),
          data = ., family = binomial, nAGQ=1)
  } %>>%  # show the dotplot as a reference
  (~
     print(lattice::dotplot(
       ranef(., which = "DID", postVar = TRUE),
       scales = list(y = list(alternating = 0))
     ))
  ) %>>%
  { ranef(object  = ., which = "DID", postVar = TRUE)$DID } %>>%
  {
    data.frame(
      "id" = rownames(.),  #this will be our x
      "intercept" = .[,1],            #this will be our y
      "se" = as.numeric(attr( ., "postVar" ))  #this will be our se
    )
  } %>>%  #had not thought of this use to add library
  (~ library(rCharts) ) %>>%  
  #rCharts good ref class reference for side effect helpfulness
  {
    setRefClass(
      "rChartsError"
      ,contains="rCharts"
      ,methods=list(
        initialize = function(){
          callSuper()
        }
        ,getPayload = function(chartId){
          list(chartParams = toJSON2(params), chartId = chartId, lib = basename(lib), liburl = LIB$url)
        }
      )
    )$new() %>>%
        (~ .$setLib("http://timelyportfolio.github.io/rCharts_errorbar") ) %>>%
        (~ .$setTemplate (
          script = "http://timelyportfolio.github.io/rCharts_errorbar/layouts/chart.html"
          ,chartDiv = "<div></div>"
        ) ) %>>%
        (~ .$set(
          data = get(".",parent.env(environment())),  #ugly but don't know better way
          height = 500,
          width = 1000,
          margin = list(top = 10, bottom = 10, right = 50, left = 100),
          x = "id",
          y = "intercept",
          radius = 2,
          sort = list( var = "intercept" ),
          whiskers = "#!function(d){return [d.intercept - 1.96 * d.se, d.intercept + 1.96 * d.se]}!#",
          tooltipLabels = c("id","intercept","se") 
        ))
  }

@timelyportfolio
Copy link

another use similar to logging would be to write a file with results for reproducibility.

@timelyportfolio
Copy link

more plotting examples

pdf("test.pdf")
data.frame( x = 1:10, y = 1:10 ) %>>%
  ( ~ plot( x = .[,"x"], y = .[,"y"], type = "b" ) ) %>>%
  ( ~ library(latticeExtra) ) %>>%
  ( ~ xyplot( y ~ x, data = ., type = c("p","l") ) %>>% print %>>%  ( ~ asTheEconomist(.) %>>% print ) ) %>>%
  ( ~ library(ggplot2) ) %>>% 
  ( ~ ggplot( ., aes( x = x, y = y) )  %>>% + geom_line() %>>% + geom_point() %>>% print )
dev.off()

a little different look at it with a focus on ggplot2

data.frame( x = 1:10, y = 1:10 ) %>>%
    ggplot( aes(x=x,y=y) ) %>>%
    ((g1) ~ print( g1  + geom_point()) ) %>>%
    ( ~ print( . + geom_line() )) %>>%
    str

@renkun-ken
Copy link
Owner Author

A mix for all features:

library(pipeR)
mtcars %>>%
  (~ cat("data:",ncol(.),"columns\n")) %>>%
  subset(mpg >= quantile(mpg, 0.05) & mpg <= quantile(mpg,0.95)) %>>%
  ( lm(mpg ~ cyl + disp + wt + factor(vs), data = .) ) %>>%
  summary() %>>%
  (coefficients) %>>%
  ((coe) ~ cat("coefficients:",class(coe),"\n")) %>>%
  ((coe) ~ print(coe)) %>>%
  (coe ~ coe[-1,1]) %>>%
  barplot(main = "coefficients")

I think the ((x) ~ expr) part is not quite clear for distinction or not obvious to regard as side effect.

I'm considering take the syntax of the following:

  • x %>>% ( ~ expr) for side effect with . = x
  • x %>>% ( ~ p ~ expr ) for side effect with p = x

The syntax looks more uniform and makes more sense to me. And luckily it can be parsed in desired way.


> as.list(quote(~ x ~ x + 1))
[[1]]
`~`

[[2]]
~x

[[3]]
x + 1

What do you think?

@renkun-ken
Copy link
Owner Author

It's very interesting that my expression analyzer directly support the syntax of ~ x ~ expr. In fact any syntax where lhs is length 2 will indicate that the 2nd element in lhs will be regarded as the symbol for side effect expression.

  • (x) is (,x
  • ~x is ~,x
  • f(x) is f, x

The following code will run without having to change any code:

mtcars %>>%
  (~ cat("data:",ncol(.),"columns\n")) %>>%
  subset(mpg >= quantile(mpg, 0.05) & mpg <= quantile(mpg,0.95)) %>>%
  ( lm(mpg ~ cyl + disp + wt + factor(vs), data = .) ) %>>%
  summary() %>>%
  (coefficients) %>>%
  (~ coe ~ cat("coefficients:",class(coe),"\n")) %>>%
  (~ coe ~ print(coe)) %>>%
  (coe ~ coe[-1,1]) %>>%
  barplot(main = "coefficients")

@yanlinlin82
Copy link

I was wondering why this would be treated as a "side effect".

I prefer to look it directly as the final return value of the whole pipe expression of (A %>>% fun). Since the default return value of a pipe expression is the rhs, why don't you define another operator for such "returning lhs" requirement, which I think may leave the pipe itself more clear.

For example:

1:10 %>>% mean # return mean(1:10)
1:10 %<<% mean # calculate mean(1:10) but only return lhs, i.e. 1:10

@yanlinlin82
Copy link

A more comprehensive example could be like this:

x <- 1:10 # First I have a data set
print(x %>>% plot) # Plot the data set, and the whole pipe expression returns NULL
x %<<% plot %>>% mean # What if I want to calculate mean() while plotting it

I think this should be more clear than:

x %>>% (~ plot) %>>% mean
Because in the latter scenario, I need to understand the pipe expression first and then found that it is a "side effect".

@renkun-ken
Copy link
Owner Author

Thanks @yanlinlin82 for your opinion. You just pointed out the core problem in this issue: more operators or more syntax?

Let's see the example with %<<% being the side-effect operator or simply use magrittr's %T>%.

mtcars %<<%
  (cat("data:",ncol(.),"columns\n")) %>>%
  subset(mpg >= quantile(mpg, 0.05) & mpg <= quantile(mpg,0.95)) %>>%
  (lm(mpg ~ cyl + disp + wt + factor(vs), data = .)) %>>%
  summary() %>>%
  (coefficients) %<<%
  (coe ~ cat("coefficients:",class(coe),"\n") ) %<<%
  (coe ~ print(coe)) %>>%
  (coe ~ coe[-1,1]) %>>%
  barplot(main = "coefficients")

I feel I must scan the code very carefully to understand which line is forward piping and which line is only side effect. In this line-by-line example, only when I look back and find which operator is used can I assure whether it is a side effect or not. Neither can I quickly find the input of the "normal" lines without carefully back-looking at the code.

I think the same problem exists with magrittr's %T>%:

mtcars %T>%
  (l(. ~ cat("data:",ncol(.),"columns\n"))) %>%
  subset(mpg >= quantile(mpg, 0.05) & mpg <= quantile(mpg,0.95)) %>%
  lm(mpg ~ cyl + disp + wt + factor(vs), data = .) %>%
  summary() %$%
  coefficients %T>%
  (l(. ~ cat("coefficients:",class(.),"\n"))) %>%
  print %>%
  (l(coe ~ coe[-1,1])) %>%
  barplot(main = "coefficients")

Do you feel you can quickly understand which object is piped to where and quickly pick out the important "really-doing-stuff" lines? Frankly speaking, I can't, because a little operator is too small to distinguish and in line-by-line piping, the operator must be written in the previous line which determines how the next piping works.

Look at the new syntax where one wants to do some logging between pipes:

library(pipeR)
mtcars %>>%
  (~ cat("data:",ncol(.),"columns\n")) %>>%
  subset(mpg >= quantile(mpg, 0.05) & mpg <= quantile(mpg,0.95)) %>>%
  ( lm(mpg ~ cyl + disp + wt + factor(vs), data = .) ) %>>%
  summary() %>>%
  (coefficients) %>>%
  (~ coe ~ cat("coefficients:",class(coe),"\n")) %>>%
  (~ coe ~ print(coe)) %>>%
  (coe ~ coe[-1,1]) %>>%
  barplot(main = "coefficients")

I feel rather clear when I simply take a glimpse at the code if I know (~ expr) or (~ x ~ expr) indicates side-effect (it is only one side) and I don't have to care about the operator anymore thus not have to look back, because there's only one.

That's why I feel there are too many operators and hard to distinguish at a first glimpse. But with syntax, it should be much much easier to understand the code at first glimpse. That's why I make () more special because it's an alert that something special happens, and can be seen directly inline rather than an operator located in previous line.

A typical case is that one does not use this feature that heavily but rarely. Therefore, it should be like

Pipe(mtcars)$
  .(~ cat("data:",ncol(.),"columns\n"))$
  subset(mpg >= quantile(mpg, 0.05) & mpg <= quantile(mpg,0.95))$
  .(lm(mpg ~ cyl + disp + wt + factor(vs), data = .))$
  summary()

Just take a glimpse at the code, and it should be easy to find all lines that start with (~, if you want to understand the code quickly, just ignore all these lines and see what's being done and piped. But if the code uses more operators, I believe you won't understand it or scan it so quickly because you have to carefully look at the little symbol in the end of each line.

If you want to find out the input of a normal line, it should be pretty easy if you only look at the header of each line and look back until a line that does not start with (~, that is the line whose output is the input you want to know.

@timelyportfolio
Copy link

I agree x %>>% ( ~ p ~ expr ) for side effect with p = x is clearer to me.

I also vote against %<<%.

@yanlinlin82
Copy link

I finally see your opinion.​ You are using "side effect" syntax to ignore
branch steps to make it easy to find the main pipe stream. Then I admit it
is better than involving another operator.

renkun-ken added a commit that referenced this issue Aug 19, 2014
renkun-ken added a commit that referenced this issue Aug 19, 2014
renkun-ken added a commit that referenced this issue Aug 19, 2014
@yanlinlin82
Copy link

By the way, it just occurred to me that will it always have a main stream in a pipe, with or without other branches. That is to say, if a data set is to be processed by different procedures simultaneously, and if we want them all in a pipe, then we need to arbitrarily make one procedure be primary, and other procedures be branches, it this right?

For example:

x <- c(... some data ...)
proc1: foo1A(x); foo1B(x); foo1C(x); ...
proc2: foo2A(x); foo2B(x); foo2C(x); ...
proc3: foo3A(x); foo3B(x); foo3C(x); ...

Then it could be written like this:

x %>>%
(~ foo1A %>>% foo1B %>>% foo1C %>>% ...) %>>%
(~ foo2A %>>% foo2B %>>% foo2C %>>% ...) %>>%
foo3A %>>% foo3B %>>% foo3C ...

@renkun-ken
Copy link
Owner Author

@yanlinlin82 That's a very interesting insight! I have not yet considered much about "branching" in pipeline. It looks quite interesting. For example,

m <- data.frame(x=1:10)
par(mfrow=c(2,2))
m %>>%
  (~ . %>>% transform(y=x) %>>% plot(type="l")) %>>%
  (~ . %>>% transform(y=x^2) %>>% plot(type="l")) %>>%
  (~ . %>>% transform(y=sin(x/2)) %>>% plot(type="l")) %>>%
  (~ . %>>% transform(y=cos(x/2)) %>>% plot(type="l"))

which has four branches to manipulate one piece of data :)

@renkun-ken renkun-ken added this to the 0.4-2 milestone Sep 12, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants