New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deepest Regression (DR) estimator #13
Comments
Since the CATLINE estimator is for the linear model with a single predictor, it is not generalized for the multiple case, but it would be fantastic to have it is implemented in the package. The reference of the estimator is
Any contributions are welcome. |
|
R already have this functionality and can be used as a reference: |
I @jbytecode, I would like to work on this topic |
Okay @fmyilmaz, very well, so what is your plan? Is it possible to write pure julia code without any dependency of R or C implementations? |
The R version of it contains a lot of Graph functions. So we should plan how to implement them using pure julia. |
Deepest regression (DR) have multiple methods implemented. The catline is only for single exploratory variable and it is not that necessary. As I remember, there is an exact algorithm for 2 exploratory variables. There is also a stochastic one for all dimensions. The methods in the corresponding R package calls C and Fortran code at the backend. Since DR algorithms are highly geometric ones, it is hard to implement them. Lets start from basic:
When the first goal is achieved, the second one is just an optimization problem and relatively easy. If we got a function like
then we can use a genetic algorithm or any other optimizer to search \hat{\beta_i} for i = 1, 2, ..., p |
Thanks! ı will contact with you to select which opt should we use. |
@tantei3 can I consult your opinions on a subject: Since the regression depth and the deepest regression estimators are difficult to implement and there is a vast amount of legacy code around, today, I examined the Fortran code of the Deepest Regression estimator in the R package
in the Mac Os terminal. Supposing the library is X = convert(Matrix, stackloss)
n, p = size(X)
n = Int32(n)
p = Int32(p)
A = zeros(Float64, p)
maxit = Int32(10000)
iter = Int32(1)
MDEPAPPR = Int32(3)
result = ccall((:sweepmedres_, "./a.out"),
Cint,
(Ref{Float64}, # X
Ref{Int32}, # n
Ref{Int32}, # np
Ref{Float64}, # A
Ref{Cint}, # maxit
Ref{Cint}, # iter
Ref{Cint} # MDEPAPPR
), X, n, p, A, maxit, iter, MDEPAPPR)
println(A)
the data stackloss is the data set in our package. The output is
and the same output is obtained in the R as R> rdepthmedian(maxit = 10000, x = stackloss)
$deepest
intercept slope var. 1 slope var. 2 slope var. 3
-35.37610619 0.82522124 0.44247788 -0.07964602
except the intercept is the last term in Julia output whereas it is the first term in R. This is not a problem. Finally, it seems the Fortran code is directly useful, however the problem is to
Since Julia package manager does not compile C or Fortran code, precompiled ones must be supported by the package maintainers. What do you think about this? |
the package GLMNet is using fortran code. Here is the link. |
I don't have prior experience with this, but to me it looks like https://julialang.github.io/Pkg.jl/v1/artifacts/ page mentions about downloading packages and binaries. So maybe in the CI during a release, it can also generate the Fortran/C static/dynamic libraries for the different supported Platforms and host it on Github, and during the package install, depending on the platform download the corresponding package and setup? But I think it will get complicated building all Julia supported platforms with different architectures like Arm, x86, along with different OS. So if the amount of code is not high enough, would be better to port to Julia slowly in my view. |
Actually GLMNet it looks like you need to generate the |
yes, it is too much complicated and translating the code is the most secure method. The most important file is the and it depends many other fortran files. |
Hi @tantei3 , I found a bucket of code in Julia's GitHub repository, in file here: The code import Base.llvmcall
function f(x, y)
llvmcall("""%3 = add i64 %1, %0
ret i64 %3""",
Int64,
Tuple{Int64,Int64},
x,
y)
end
result = f(5, 6)
println(result) perfectly produces the number 11. Here %3 = add i64 %1, %0
ret i64 %3 is the LLVM IR code. What do you think about shipping IR code for required Fortran functions using a LLVM Fortran compiler such as FLang or others? |
I think if it is done correctly then it should be portable and robust since Julia should take care of handling LLVM IR since it is also built on top of LLVM. Integrating Julia + LLVM IR might take some effort since all the fortran functions would need to be translated to LLVM IR and then wrapped in Julia function. Manually doing it would seem painful for Fortran code with many functions. If there is some tool to generate Julia wrapper functions from LLVM IR, it might be best. |
@tantei3 how about shipping the fortran code within the package, trying to compile them in the installation process or before loading first time? If the compiler is absent, when the user calls |
Somebody suggested JuliaPackaging/Binary Builder on Twitter. Link is here |
Requiring Fortran compiler in the use's machine doesn't seem nice to run Julia code. I think the BinaryBuilder method might be easiest at the moment (although still don't know the details). It should take care of making the binary available compatible with all platforms and just need to include the binary as a dependency. |
The Fortran code in that R package is compiled for all targets: JuliaPackaging/Yggdrasil#7224 (comment) Now mrfDepth_jll is ready for possible implementation of |
It is good to have deepest regression estimator referenced with
Van Aelst, Stefan, et al. "The deepest regression method." Journal of Multivariate Analysis 81.1 (2002): 138-166.
in the package. Any contributions are welcome.
The text was updated successfully, but these errors were encountered: