Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deepest Regression (DR) estimator #13

Closed
jbytecode opened this issue Sep 25, 2020 · 19 comments
Closed

Deepest Regression (DR) estimator #13

jbytecode opened this issue Sep 25, 2020 · 19 comments
Labels
enhancement New feature or request

Comments

@jbytecode
Copy link
Owner

It is good to have deepest regression estimator referenced with

Van Aelst, Stefan, et al. "The deepest regression method." Journal of Multivariate Analysis 81.1 (2002): 138-166.

in the package. Any contributions are welcome.

@jbytecode jbytecode added the enhancement New feature or request label Sep 25, 2020
@jbytecode
Copy link
Owner Author

Since the CATLINE estimator is for the linear model with a single predictor, it is not generalized for the multiple case, but it would be fantastic to have it is implemented in the package. The reference of the estimator is

Hubert, Mia, and Peter J. Rousseeuw. "The catline for deep regression." Journal of Multivariate Analysis 66.2 (1998): 270-296.

Any contributions are welcome.

@jbytecode
Copy link
Owner Author

It is good to have deepest regression estimator referenced with

Van Aelst, Stefan, et al. "The deepest regression method." Journal of Multivariate Analysis 81.1 (2002): 138-166.

in the package. Any contributions are welcome.

Rousseeuw, Peter J., and Stefan Van Aelst. "An algorithm for deepest multiple regression." COMPSTAT. Physica, Heidelberg, 2000.

@jbytecode
Copy link
Owner Author

R already have this functionality and can be used as a reference:
https://cran.r-project.org/web/packages/DepthProc/DepthProc.pdf

@fmyilmaz
Copy link

I @jbytecode, I would like to work on this topic

@jbytecode
Copy link
Owner Author

Okay @fmyilmaz, very well, so what is your plan? Is it possible to write pure julia code without any dependency of R or C implementations?

@fmyilmaz
Copy link

The R version of it contains a lot of Graph functions. So we should plan how to implement them using pure julia.

@jbytecode
Copy link
Owner Author

jbytecode commented Oct 12, 2020

Deepest regression (DR) have multiple methods implemented. The catline is only for single exploratory variable and it is not that necessary. As I remember, there is an exact algorithm for 2 exploratory variables. There is also a stochastic one for all dimensions. The methods in the corresponding R package calls C and Fortran code at the backend. Since DR algorithms are highly geometric ones, it is hard to implement them.

Lets start from basic:

  1. Calculating regression depth for any regression hyperplanes.
  2. Search regression estimates that maximizes it.

When the first goal is achieved, the second one is just an optimization problem and relatively easy.

If we got a function like

rdept(setting::RegressionSetting) = ...

then we can use a genetic algorithm or any other optimizer to search \hat{\beta_i} for i = 1, 2, ..., p

@fmyilmaz

@fmyilmaz
Copy link

Thanks! ı will contact with you to select which opt should we use.

@jbytecode
Copy link
Owner Author

@tantei3 can I consult your opinions on a subject:

Since the regression depth and the deepest regression estimators are difficult to implement and there is a vast amount of legacy code around, today, I examined the Fortran code of the Deepest Regression estimator in the R package mrfDepth hosted in the read-only repository here. I compiled the Fortran codes shared in the src directory using

$ gfortran -shared -fPIC *.f

in the Mac Os terminal. Supposing the library is a.out, it can be callable in Julia using

X = convert(Matrix, stackloss) 
n, p = size(X)
n = Int32(n)
p = Int32(p)
A = zeros(Float64, p)
maxit = Int32(10000)
iter = Int32(1)
MDEPAPPR = Int32(3)
result = ccall((:sweepmedres_, "./a.out"), 
    Cint, 
    (Ref{Float64},      # X
        Ref{Int32},           # n
        Ref{Int32},           # np
        Ref{Float64},   # A
        Ref{Cint},           # maxit 
        Ref{Cint},           # iter
        Ref{Cint}            # MDEPAPPR
        ), X, n, p, A, maxit, iter, MDEPAPPR)

println(A)

the data stackloss is the data set in our package. The output is

[0.8252212389746946, 0.44247787604338223, -0.0796460177080886, -35.37610619462]

and the same output is obtained in the R as

R> rdepthmedian(maxit = 10000, x = stackloss)
$deepest
   intercept slope var. 1 slope var. 2 slope var. 3 
-35.37610619   0.82522124   0.44247788  -0.07964602 

except the intercept is the last term in Julia output whereas it is the first term in R. This is not a problem.

Finally, it seems the Fortran code is directly useful, however the problem is to

  • how to pack and integrate this stuff in the package LinRegOutliers
  • if compiled code is required, what is your opinion for this because it should be compiled for at least three systems Windows, Linux and MacOs

Since Julia package manager does not compile C or Fortran code, precompiled ones must be supported by the package maintainers.

What do you think about this?

@jbytecode
Copy link
Owner Author

the package GLMNet is using fortran code. Here is the link.
They put the fortran code in directory /deps and just ccall.

@tantei3
Copy link
Contributor

tantei3 commented Oct 16, 2020

I don't have prior experience with this, but to me it looks like https://julialang.github.io/Pkg.jl/v1/artifacts/ page mentions about downloading packages and binaries. So maybe in the CI during a release, it can also generate the Fortran/C static/dynamic libraries for the different supported Platforms and host it on Github, and during the package install, depending on the platform download the corresponding package and setup?

But I think it will get complicated building all Julia supported platforms with different architectures like Arm, x86, along with different OS. So if the amount of code is not high enough, would be better to port to Julia slowly in my view.

@tantei3
Copy link
Contributor

tantei3 commented Oct 16, 2020

Actually GLMNet it looks like you need to generate the glmnet_jll binary package using Yggdrasil BinaryBuilder.jl I think. Here is the corresponding link to the build recipe https://github.com/JuliaPackaging/Yggdrasil/blob/master/G/glmnet/build_tarballs.jl

@jbytecode
Copy link
Owner Author

yes, it is too much complicated and translating the code is the most secure method. The most important file is the

RegresDepthDeepest.f

and it depends many other fortran files.

@jbytecode
Copy link
Owner Author

jbytecode commented Oct 21, 2020

Hi @tantei3 ,

I found a bucket of code in Julia's GitHub repository, in file here:

The code

import Base.llvmcall

function f(x, y)
    llvmcall("""%3 = add i64 %1, %0
                    ret i64 %3""",
            Int64,
            Tuple{Int64,Int64},
            x,
            y)
end


result = f(5, 6)
println(result)

perfectly produces the number 11. Here

%3 = add i64 %1, %0
ret i64 %3

is the LLVM IR code. What do you think about shipping IR code for required Fortran functions using a LLVM Fortran compiler such as FLang or others?

@tantei3
Copy link
Contributor

tantei3 commented Oct 22, 2020

I think if it is done correctly then it should be portable and robust since Julia should take care of handling LLVM IR since it is also built on top of LLVM. Integrating Julia + LLVM IR might take some effort since all the fortran functions would need to be translated to LLVM IR and then wrapped in Julia function. Manually doing it would seem painful for Fortran code with many functions. If there is some tool to generate Julia wrapper functions from LLVM IR, it might be best.

@jbytecode
Copy link
Owner Author

@tantei3 how about shipping the fortran code within the package, trying to compile them in the installation process or before loading first time? If the compiler is absent, when the user calls dr() we can print a message "You have not Fortran compiler installed, If you want to make this function active, please install a fortran compiler and then run the installbinaries() function" ?

@jbytecode
Copy link
Owner Author

Somebody suggested JuliaPackaging/Binary Builder on Twitter. Link is here

@tantei3
Copy link
Contributor

tantei3 commented Oct 22, 2020

Requiring Fortran compiler in the use's machine doesn't seem nice to run Julia code. I think the BinaryBuilder method might be easiest at the moment (although still don't know the details). It should take care of making the binary available compatible with all platforms and just need to include the binary as a dependency.

@jbytecode
Copy link
Owner Author

The Fortran code in that R package is compiled for all targets: JuliaPackaging/Yggdrasil#7224 (comment)

Now mrfDepth_jll is ready for possible implementation of deepestregression().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants