Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

require-r not working inside a function #66

Open
awb99 opened this issue Jun 1, 2020 · 31 comments
Open

require-r not working inside a function #66

awb99 opened this issue Jun 1, 2020 · 31 comments
Labels
discussion discussion / ideas

Comments

@awb99
Copy link

awb99 commented Jun 1, 2020

(defn init-r []
  (println "configuring clojisr ..") 
  (require-r '[base :as base :refer [$ <- $<-]]
                  '[utils :as u]
                  '[stats :as stats]
                  '[graphics :as g]
                  '[datasets :refer :all])
    (base/options :width 120 :digits 7)
    (base/set-seed 11228899)
    (pdf-off))

(init-r)

When I have all the forms of this function direct in the namespace, then all works fine.
But if I wrap the initialization in a function to be more modular, then the r libraries
in require-r can no longer be found.

I am not sure if clj require works the same way or not.

@genmeblog
Copy link
Member

require-r is regular function and should work, what error do you get?

@awb99
Copy link
Author

awb99 commented Jun 1, 2020

It is working. But then later below I get errors that the namespaces are not defined.
The example is in: https://github.com/pink-gorilla/goldly/blob/master/profiles/demo/src/systems/r_telephone.clj

When I had it in a wrapper function, it would not work.

@genmeblog
Copy link
Member

Maybe something with intern function which is used to create namespaces and functions?

@awb99
Copy link
Author

awb99 commented Jun 2, 2020

I have no idea... Sorry. I just know that the code only works when is not in a function.
Also I think that I had to defined R variables via let. When I was using (defs macro) as
a standalone basis, then my functions were also not seeing the R variables.. But it might
be that this was an error in the defs macro. I dont know yet.

(defmacro defs
    [& bindings]
    {:pre [(even? (count bindings))]}
    `(do
       ~@(for [[sym init] (partition 2 bindings)]
           `(def ~sym ~init))))

@genmeblog
Copy link
Member

genmeblog commented Jun 2, 2020

I will test it soon. require-r should work from a function. defs macro should also work. Can you give me example of defs usage?

@genmeblog
Copy link
Member

Ok, the main issue with you function is that base package is not available during function compilation. It's created in require-r. So function this function cannot work and can't even compile.

Below works on my setup

(defn init-r []
  (println "configuring clojisr ..") 
  (require-r '[base :as base :refer [$ <- $<-]]
             '[utils :as u]
             '[stats :as stats]
             '[graphics :as g]
             '[datasets :refer :all]))

(init-r)

(base/options :width 120 :digits 7)
;; => $width
;;    [1] 120
;;    $digits
;;    [1] 7

(base/set-seed 11228899)
;; => NULL

@awb99
Copy link
Author

awb99 commented Jun 4, 2020

I have an idea:
Example:

(ns (:require [r.base]))
(base/options {})

What will happen here is that clojure analyzer will load the namespace and get the fuctions in it.
So there are two ways really:

  1. Similar to Shadow-cljs that creates clojurescript Namespaces from package.json you add a r-clojsir.edn file that defines the r-dependencies of the project. You then could make a r-module-generator that will create a file for each library that contains all information that is available in a namespace. You then can use this generated data to create the namespace.

  2. You skip Document how to install latest Rserve #1 and hook into the function resolver; and at compile time you resolve everything. So
    Say (r.base/options {}) resolves but also (r.base/typo-not-existing {}) would resolve. At execution time you then link it to the session.. and if you cannot establish a binding at that time, then you throw an exception.

@awb99
Copy link
Author

awb99 commented Jun 4, 2020

I am sorry for being a dick here. But how you have the syntax now makes it very difficult to write helper functions / libraries that use R functions.

@genmeblog genmeblog reopened this Jun 4, 2020
@genmeblog
Copy link
Member

Ok, I leave it open. Yes, it's hard to write external helper functions since symbols are not available before requiring them in the live session.

Maybe something like cljsjs can be helpful here? But someone needs to maintain it. Also I'm not sure about different backends and differences between packages (renjin vs R)

@awb99
Copy link
Author

awb99 commented Jun 4, 2020

I tested around a little bit. So require is tighty coupled with jars. No way to use require
without generating a jar first. I think in the long run, this is the best option. You generate
a jar file, and in the jar file you create functions/variables corresponding to the module
you have in R.

For the short term, I think this might work:
https://github.com/pink-junkjard/integrator/blob/master/src/demo/app.clj
https://github.com/pink-junkjard/integrator/blob/master/src/integrator/core.clj

So you essentially do not use requires at all, and define the ns completely dynamic. So
you would adapt integrator.core to read some edn fie that you generate when you discover an r module.

And then you can write a require-r macro that just redefines the functions you want to have
extracted. You might also use potemkin for this: https://github.com/ztellman/potemkin/blob/master/src/potemkin/namespaces.clj

Then all you need is one dynamic variable that is linked to the session, and then
essentially this variable is used in the generated functions.

@awb99
Copy link
Author

awb99 commented Jun 4, 2020

lein run

starting..
initializating R..
calculating sin of  3.14
done!

@awb99
Copy link
Author

awb99 commented Jun 4, 2020

starting..
initializating R..
calculating sin of 3.14
done!

@awb99
Copy link
Author

awb99 commented Jun 4, 2020

defs usage.

(defs a 1
      b 2)

this is identical to

(def a 1)
(def b 2)

@awb99
Copy link
Author

awb99 commented Jun 4, 2020

I would NOT do something like cljsjs. cljsjs is a thing of the past. shadow-cljs solved the poblem of npm dependencies and externals.

r-deps.edn

{:engine :rserv
 :deps [base math dplyr]} 

Then you call

clojisr .

clojisr is an executeable (or a lein plugin, does not matter)
clojisr then generates the file target/r-modules.edn.
It does so by starting rserv, and loading/exploring all modules that were specified.
Then it exits.
So this is a compile step similar to "cljsbuild once" or to a css preprocessor.
target/r-modules.edn then contains all the data that was discovered.

then you adapt integrant.core to read target/r-modules.edn; and this means
you have completely normal clojure namespaces for all the r fuctiobs that you want.

You just need to

@awb99
Copy link
Author

awb99 commented Jun 4, 2020

You can also do the interpreter approach that you do now; so it you have he startup cost at
each startup. But you would have the huge advantage, that r functions are now completely normal
clojure functions.

docstrings -> done!

@genmeblog
Copy link
Member

genmeblog commented Jun 4, 2020

Thanks for the idea. I believe that when we decouple session from robjects all of this will be possible. What I see is that packages can differ between R version (R 3.x, R 4.x) so I don't think providing dummy namespaces (or edn) in library by us is an option.

I need to rethink it, but generally the options we discuss are:

  • (current) load package symbols from live R session
  • (proposed) load package symbols from edn file (which has to be generated by live session).

The only difference I see is "live R session", right?

@awb99
Copy link
Author

awb99 commented Jun 5, 2020

You are absolutely right! It definitely does not make sense to you inside the library will not provide a fixed module definition edn. Instead clojisr generates this on the users machine, with the r setup supplied by the user.

However, this can be done BUILD TIME (say via a script that the can run user run,... ),
or it can be run AT EACH APP START. Now shadow-cljs does this at build time.
Currently you do it at APP start.

Irrespective of BUILD TIME vs APP START TIME, the idea with in-ns,.. make sense.
The more clojisr functions behave like normal clojure functions, the less of a integration
problem ...

@awb99
Copy link
Author

awb99 commented Jun 5, 2020

With my proposal, the syntax would change from:

(defn load-quakes-r [rmin rmax]
  (-> 'quakes
      (r.dplyr/filter `(& (>= mag ~rmin)
                          (<= mag ~rmax)))))

to this:

(defn load-quakes-r [rmin rmax]
  (-> quakes
      (r.dplyr/filter (& (>= mag rmin)
                          (<= mag rmax)))))

quakes is a clojure def symbol, and refers RObject.
r.dplyr/filter is a clojure function (that you added via create-ns ).

(ns clojisr.executor)
(def dynamic *session*)

(defn r-fun-exec [r-fun-name & args]
  (let [args-robject (map #(if (robject? %) % (c->robject)) args))]
    (send-to-r :exec r-fun-name args-robject)))
; this ns is auto generated (user would still not see the code as all is done in code)
(ns r.math)

(defn sum [a b]
   (r-fun-exec base/sum a b))

@genmeblog
Copy link
Member

It's not as easy as it looks. There is stuff which often operate on symbolic level (like formulas). Also if you want to create R function from Clojure you have to make it on symbolic level. Also there are functions which contain forbiden symbols, also data types are different (named lists are not maps) etc.

So removing symbolic calls removes part of the possible functionality.
Don't forget that actuall R call is done by passing properly formatted string.

I agree that dummy handlers for R functions or values can be generated to enable clojure compilation without connection to R, but removing symbolic call is not possible.

From your example

(defn load-quakes-r [rmin rmax]
  (-> 'quakes
      (r.dplyr/filter `(& (>= mag ~rmin)
                          (<= mag ~rmax)))))

Above code will be converted to a string and evaluated on the R side fully (dplyr/filter expects symbolic predicate and this predicate will be evaluated within quake context)

But this:

(defn load-quakes-r [rmin rmax]
  (-> quakes
      (r.dplyr/filter (& (>= mag rmin)
                          (<= mag rmax)))))

Will not work. First mag is unknown. Also, Clojure first will evaluate & which will fail.

This is how R works. Plenty of functions treat parameters as symbols and delay execution until needed.

@awb99
Copy link
Author

awb99 commented Jun 5, 2020

The forbidden Symbol problematic: totally agreed! There needs to be some kind of escaping. If I remember correctly this happens in cljs -> ks compilation also. (defn init! [] ...) Becomes init_bang (or similar)

@genmeblog
Copy link
Member

We escape using tick or backtick.

@awb99
Copy link
Author

awb99 commented Jun 5, 2020

(send-to-r :exec r-fun-name args-robject)
For Rserv in my example the send-to-r function will generate the following that is sent to R:

"
P123 <- function (min max)
    function [mag]
         (Mag >= min) and (mag <= max)
dplyr/filter (quakes P123)
"

The example is a little difficult because it uses an Anonymous function as predicate. I have not yet executed the code I copied. @daslu did translate it for me. This is what I actually use in clojure only:

(defn p-mag [rmin rmax]
  (fn [{:keys [mag]}]
    (and (>= mag rmin) (<= mag rmax))))
(filter p-mag quakes-clj)

So the difficult part in this example is, that in my example it needs to generate and define in R an anonymous function (the predicate). And the definition of the function will have to be implemented ad a macro im Clojure. Because forms inside a function may not generate one wrapped r function for each form, and instead make the forms function calls inside the body of the r function Definition.

@awb99
Copy link
Author

awb99 commented Jun 5, 2020

Sorry for my most likely not precise R code. I am not an R expert. So most likely my R code is not correct. But I hope you get the idea.

@genmeblog
Copy link
Member

I think I'm missing something. How the last example is going to produce given above string?

@genmeblog
Copy link
Member

genmeblog commented Jun 5, 2020

dplyr/filter doesn't take anonymous function, it takes symbolic predicate: https://dplyr.tidyverse.org/reference/filter.html

Did you test your code and is it working?

@awb99
Copy link
Author

awb99 commented Jun 5, 2020

I didn't test. I think you are right, and it will not work. As I said above.. I only tested the clj code I pasted above.

@genmeblog
Copy link
Member

We had a discussion with @daslu about using symbols (tick/backtics) or maybe use macros to mimic functional behaviour. And I agreed that parsing symbolic forms by clojisr instead of providing set of macros will be more convenient since it happens in runtime. So we want to stay with this method (unless we find something more useful), let me repeat: R treats code as symbols in arguments too often to cover all the cases by functions.

@awb99
Copy link
Author

awb99 commented Jun 5, 2020

In goldly I am using sci to compile short cljs functions to js. I think clojisr might want to use a similar compiler architecture that defines a few control functions (like if) in order to generate the string command to R.

@awb99
Copy link
Author

awb99 commented Jun 5, 2020

To not loose track of the main point here: the main usecace of clojisr Will be to call R functions. A simple function call, say (+ 1 2) would need to essentially check if it needs to convert each parameter to R, or use the R variable name if it is already known. When the R function had been mapped to a clojure function, then the rest becomes easy.

@genmeblog
Copy link
Member

genmeblog commented Jun 5, 2020

But it's done: (r+ 1 2) does it exactly as you described.

I don't agree that main use-case is calling a functions but also accessing the vars with different types (like S4 objects, data.frames, matrices), creating R functions and R datatypes.

When you import a package - respective Clojure symbols are created, When called on the function position they act as functions. If you provide primitive parameters there are converted respectively. (r.base/mean [1 2 3 4]) produces what is expected: mean(c(1,2,3,4)).

I have a feeling that we circle around the problem which I don't understand probably.

We agreed that r.base/mean should be created upfront to allow compiling code without the R backend. It's doable and we plan to implement it after session management refactor.

@genmeblog genmeblog added the discussion discussion / ideas label Apr 14, 2024
@behrica
Copy link
Member

behrica commented May 8, 2024

My 2-cents on this:
It seems to me that the idea of "reusable functions", which use "internally" clojurised R functions,
is hard to get done. Such code is "bound" to a concrete R version and the libraries present and this we cannot express this in terms of deps.edn.
So it can never be fully done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion discussion / ideas
Projects
None yet
Development

No branches or pull requests

3 participants