Deepest Regression (DR) estimator #13

jbytecode · 2020-09-25T19:40:45Z

It is good to have deepest regression estimator referenced with

Van Aelst, Stefan, et al. "The deepest regression method." Journal of Multivariate Analysis 81.1 (2002): 138-166.

in the package. Any contributions are welcome.

The text was updated successfully, but these errors were encountered:

jbytecode · 2020-09-25T19:44:02Z

Since the CATLINE estimator is for the linear model with a single predictor, it is not generalized for the multiple case, but it would be fantastic to have it is implemented in the package. The reference of the estimator is

Hubert, Mia, and Peter J. Rousseeuw. "The catline for deep regression." Journal of Multivariate Analysis 66.2 (1998): 270-296.

Any contributions are welcome.

jbytecode · 2020-09-25T19:45:40Z

It is good to have deepest regression estimator referenced with

Van Aelst, Stefan, et al. "The deepest regression method." Journal of Multivariate Analysis 81.1 (2002): 138-166.

in the package. Any contributions are welcome.

Rousseeuw, Peter J., and Stefan Van Aelst. "An algorithm for deepest multiple regression." COMPSTAT. Physica, Heidelberg, 2000.

jbytecode · 2020-09-25T20:02:09Z

R already have this functionality and can be used as a reference:
https://cran.r-project.org/web/packages/DepthProc/DepthProc.pdf

fmyilmaz · 2020-10-12T16:14:25Z

I @jbytecode, I would like to work on this topic

jbytecode · 2020-10-12T16:17:38Z

Okay @fmyilmaz, very well, so what is your plan? Is it possible to write pure julia code without any dependency of R or C implementations?

fmyilmaz · 2020-10-12T16:23:24Z

The R version of it contains a lot of Graph functions. So we should plan how to implement them using pure julia.

jbytecode · 2020-10-12T16:30:09Z

Deepest regression (DR) have multiple methods implemented. The catline is only for single exploratory variable and it is not that necessary. As I remember, there is an exact algorithm for 2 exploratory variables. There is also a stochastic one for all dimensions. The methods in the corresponding R package calls C and Fortran code at the backend. Since DR algorithms are highly geometric ones, it is hard to implement them.

Lets start from basic:

Calculating regression depth for any regression hyperplanes.
Search regression estimates that maximizes it.

When the first goal is achieved, the second one is just an optimization problem and relatively easy.

If we got a function like

rdept(setting::RegressionSetting) = ...

then we can use a genetic algorithm or any other optimizer to search \hat{\beta_i} for i = 1, 2, ..., p

@fmyilmaz

fmyilmaz · 2020-10-12T16:35:34Z

Thanks! ı will contact with you to select which opt should we use.

jbytecode · 2020-10-16T10:15:06Z

@tantei3 can I consult your opinions on a subject:

Since the regression depth and the deepest regression estimators are difficult to implement and there is a vast amount of legacy code around, today, I examined the Fortran code of the Deepest Regression estimator in the R package mrfDepth hosted in the read-only repository here. I compiled the Fortran codes shared in the src directory using

$ gfortran -shared -fPIC *.f

in the Mac Os terminal. Supposing the library is a.out, it can be callable in Julia using

X = convert(Matrix, stackloss) 
n, p = size(X)
n = Int32(n)
p = Int32(p)
A = zeros(Float64, p)
maxit = Int32(10000)
iter = Int32(1)
MDEPAPPR = Int32(3)
result = ccall((:sweepmedres_, "./a.out"), 
    Cint, 
    (Ref{Float64},      # X
        Ref{Int32},           # n
        Ref{Int32},           # np
        Ref{Float64},   # A
        Ref{Cint},           # maxit 
        Ref{Cint},           # iter
        Ref{Cint}            # MDEPAPPR
        ), X, n, p, A, maxit, iter, MDEPAPPR)

println(A)

the data stackloss is the data set in our package. The output is

[0.8252212389746946, 0.44247787604338223, -0.0796460177080886, -35.37610619462]

and the same output is obtained in the R as

R> rdepthmedian(maxit = 10000, x = stackloss)
$deepest
   intercept slope var. 1 slope var. 2 slope var. 3 
-35.37610619   0.82522124   0.44247788  -0.07964602

except the intercept is the last term in Julia output whereas it is the first term in R. This is not a problem.

Finally, it seems the Fortran code is directly useful, however the problem is to

how to pack and integrate this stuff in the package LinRegOutliers
if compiled code is required, what is your opinion for this because it should be compiled for at least three systems Windows, Linux and MacOs

Since Julia package manager does not compile C or Fortran code, precompiled ones must be supported by the package maintainers.

What do you think about this?

jbytecode · 2020-10-16T10:56:32Z

the package GLMNet is using fortran code. Here is the link.
They put the fortran code in directory /deps and just ccall.

tantei3 · 2020-10-16T10:57:21Z

I don't have prior experience with this, but to me it looks like https://julialang.github.io/Pkg.jl/v1/artifacts/ page mentions about downloading packages and binaries. So maybe in the CI during a release, it can also generate the Fortran/C static/dynamic libraries for the different supported Platforms and host it on Github, and during the package install, depending on the platform download the corresponding package and setup?

But I think it will get complicated building all Julia supported platforms with different architectures like Arm, x86, along with different OS. So if the amount of code is not high enough, would be better to port to Julia slowly in my view.

tantei3 · 2020-10-16T11:00:12Z

Actually GLMNet it looks like you need to generate the glmnet_jll binary package using Yggdrasil BinaryBuilder.jl I think. Here is the corresponding link to the build recipe https://github.com/JuliaPackaging/Yggdrasil/blob/master/G/glmnet/build_tarballs.jl

jbytecode · 2020-10-16T11:03:20Z

yes, it is too much complicated and translating the code is the most secure method. The most important file is the

RegresDepthDeepest.f

and it depends many other fortran files.

jbytecode · 2020-10-21T17:31:59Z

Hi @tantei3 ,

I found a bucket of code in Julia's GitHub repository, in file here:

The code

import Base.llvmcall

function f(x, y)
    llvmcall("""%3 = add i64 %1, %0
                    ret i64 %3""",
            Int64,
            Tuple{Int64,Int64},
            x,
            y)
end


result = f(5, 6)
println(result)

perfectly produces the number 11. Here

%3 = add i64 %1, %0
ret i64 %3

is the LLVM IR code. What do you think about shipping IR code for required Fortran functions using a LLVM Fortran compiler such as FLang or others?

tantei3 · 2020-10-22T04:13:59Z

I think if it is done correctly then it should be portable and robust since Julia should take care of handling LLVM IR since it is also built on top of LLVM. Integrating Julia + LLVM IR might take some effort since all the fortran functions would need to be translated to LLVM IR and then wrapped in Julia function. Manually doing it would seem painful for Fortran code with many functions. If there is some tool to generate Julia wrapper functions from LLVM IR, it might be best.

jbytecode · 2020-10-22T10:29:07Z

@tantei3 how about shipping the fortran code within the package, trying to compile them in the installation process or before loading first time? If the compiler is absent, when the user calls dr() we can print a message "You have not Fortran compiler installed, If you want to make this function active, please install a fortran compiler and then run the installbinaries() function" ?

jbytecode · 2020-10-22T10:50:59Z

Somebody suggested JuliaPackaging/Binary Builder on Twitter. Link is here

tantei3 · 2020-10-22T12:41:51Z

Requiring Fortran compiler in the use's machine doesn't seem nice to run Julia code. I think the BinaryBuilder method might be easiest at the moment (although still don't know the details). It should take care of making the binary available compatible with all platforms and just need to include the binary as a dependency.

jbytecode · 2023-08-22T19:12:39Z

The Fortran code in that R package is compiled for all targets: JuliaPackaging/Yggdrasil#7224 (comment)

Now mrfDepth_jll is ready for possible implementation of deepestregression().

jbytecode added the enhancement New feature or request label Sep 25, 2020

jbytecode assigned fmyilmaz Oct 12, 2020

jbytecode unassigned fmyilmaz Oct 13, 2020

jbytecode added a commit that referenced this issue Aug 23, 2023

add deepest regression estimator (#13)

313a007

jbytecode closed this as completed Aug 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deepest Regression (DR) estimator #13

Deepest Regression (DR) estimator #13

jbytecode commented Sep 25, 2020

jbytecode commented Sep 25, 2020

jbytecode commented Sep 25, 2020

jbytecode commented Sep 25, 2020

fmyilmaz commented Oct 12, 2020

jbytecode commented Oct 12, 2020

fmyilmaz commented Oct 12, 2020

jbytecode commented Oct 12, 2020 •

edited

fmyilmaz commented Oct 12, 2020

jbytecode commented Oct 16, 2020

jbytecode commented Oct 16, 2020

tantei3 commented Oct 16, 2020

tantei3 commented Oct 16, 2020

jbytecode commented Oct 16, 2020

jbytecode commented Oct 21, 2020 •

edited

tantei3 commented Oct 22, 2020

jbytecode commented Oct 22, 2020

jbytecode commented Oct 22, 2020

tantei3 commented Oct 22, 2020

jbytecode commented Aug 22, 2023

Deepest Regression (DR) estimator #13

Deepest Regression (DR) estimator #13

Comments

jbytecode commented Sep 25, 2020

jbytecode commented Sep 25, 2020

jbytecode commented Sep 25, 2020

jbytecode commented Sep 25, 2020

fmyilmaz commented Oct 12, 2020

jbytecode commented Oct 12, 2020

fmyilmaz commented Oct 12, 2020

jbytecode commented Oct 12, 2020 • edited

fmyilmaz commented Oct 12, 2020

jbytecode commented Oct 16, 2020

jbytecode commented Oct 16, 2020

tantei3 commented Oct 16, 2020

tantei3 commented Oct 16, 2020

jbytecode commented Oct 16, 2020

jbytecode commented Oct 21, 2020 • edited

tantei3 commented Oct 22, 2020

jbytecode commented Oct 22, 2020

jbytecode commented Oct 22, 2020

tantei3 commented Oct 22, 2020

jbytecode commented Aug 22, 2023

jbytecode commented Oct 12, 2020 •

edited

jbytecode commented Oct 21, 2020 •

edited