Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R-package] Add support for R 4.0 (fixes #3064, fixes #3024) #3065

Merged
merged 44 commits into from
Jun 14, 2020

Conversation

jameslamb
Copy link
Collaborator

@jameslamb jameslamb commented May 10, 2020

Experimental fix for #3064. Opening this PR to show the way I'm thinking about adding support for R 4.0. I will remove the [WIP] and request reviewers when it's ready for review, but any comments before then are welcomed if you have time!

R4.0 breaking changes that affect LightGBM

  • gendef.exe removed from Rtools 😭
  • mingw32-make.exe is no longer bundled with Rtools 😭
  • Rtools paths removed underscores, so mingw_64/ becomes mingw64/ 😂

Short description of the fix

gendef.exe fix

Building the LightGBM R package on Windows with Visual Studio compilers requires gendef.exe. That was bundled in previous versions of RTools but isn't part of Rtools 4.0.

This PR replaces it with an R script that uses objdump.exe, software bundled in old and new distributions of Rtools.

mingw32-make.exe fix

Using Rtools/usr/bin/make.exe instead of Rtools/mingw_64/bin/mingw32-make.exe if using R version 4.0 or greater.

mingw path change fix

It includes updates to docs and CI scripts for the mingw_64/ --> mingw64/ change.

How you can test this

In a Windows environment with R 4.0, Rtools 4.0, CMake, and Visual Studio, run the following:

git clone https://github.com/jameslamb/LightGBM.git
cd LightGBM
git fetch origin fix/r-4.0
git checkout fix/r-4.0

# install LightGBM
Rscript build_r.R

# test that it worked
cd R-package/tests
Rscript testthat.R

Long Description

Here's how building the R package with Visual Studio works today:

  1. The LightGBM R package has to be linked to R.dll to use R-provided C/C++ functions such as Rprintf for printing.
  2. Linking with Visual Studio compilers requires a .lib file (https://docs.microsoft.com/en-us/cpp/build/reference/link-input-files?redirectedfrom=MSDN&view=vs-2019)
  3. To create a .lib file from R.dll on Windows, we create a definition file (.def) and then use that to make a library file (.lib).
    • R.dll ships with R
    • A file R.def is created from R.dll, using gendef.exe
    • A file R.lib is created from R.def using dlltool.exe

The issue: gendefe.exe is not part of RTools 4.0.

My solution in this PR is to use objdump.exe + an R script to generate R.def. objdump creates output intended for humans not machines...it uses indentation and special chracters to make the file visually appealing. So the R script is used to run objdump, but more importantly to read in the text it produces and do some simple cleaning to get it into a machine-friendly format that can be re-written as a .def file.

From the "How to create a def file from a dll" section of this amazing MinGW documentation.

If the previous options don't work, you can still try to create a def file using the output from the objdump program (from the mingw distribution).
Here's an example.
objdump -p file.dll > dll.fil

Search for [Ordinal/Name Pointer] Table in dll.fil and use the list of functions following it to create your def file.

References

A lot of the content for this PR was totally new to me. Here are some links I found very helpful.

@StrikerRUS
Copy link
Collaborator

@jameslamb Great efforts! Thanks a lot!

  • gendef.exe removed from Rtools 😭
  • mingw32-make.exe is no longer bundled with Rtools

Is there any available discussion or reason a least why they did it? Or it was done without public discussion? Maybe you know.

@jameslamb
Copy link
Collaborator Author

@jameslamb Great efforts! Thanks a lot!

  • gendef.exe removed from Rtools 😭
  • mingw32-make.exe is no longer bundled with Rtools

Is there any available discussion or reason a least why they did it? Or it was done without public discussion? Maybe you know.

I don't know of any, but the fact that I don't know doesn't mean it was done without public discussion. I haven't had much luck using search engines and these changes are not mentioned at the official Rtools40 page, so I just posted a message to the r-package-devel mailing list. Will let you know what they say!

message sent to `r-package-devel`

Hello,

I am a maintainer on the LightGBM project, focused on that project's R package. The R package is not available on CRAN yet (we are working on it), so for now our users must build it from source.

The package includes compilation of a C++ library, and we link to R.dll / R.so to use R-provided functions like Rprintf.

With the release of R4.0 and Rtools40, we recently received reports from our users that they are unable to build our package on Windows systems with R 4.0 and Rtools 4.0 (#3064).

After some investigation, I've learned that the following changes in Rtools40 (relative to Rtools35) broke our installation process:
gendef.exe was removed
mingw32-make.exe was removed
paths like "mingw_64/bin" were changed to "mingw64/bin"
I do not expect that our off-CRAN installation process is supported. I understand that by maintaining our own process, we are taking on the burden of keeping up with new releases of R and Rtools.

My question is this...is there public documentation about why the changes I mentioned above were made? I have not found any mention of them in the following places, and am not sure where else to look.
https://cran.r-project.org/bin/windows/Rtools/
https://github.com/r-windows/docs/blob/master/faq.md#readme
https://jeroen.github.io/rstudio2019
Thank you very much for your time and consideration,

-James Lamb

I am going to keep working on this, so we can unblock users who've started upgrading to R 4.0.

@jameslamb jameslamb mentioned this pull request May 12, 2020
@jameslamb
Copy link
Collaborator Author

Got a response to the email I sent to the r-package-devel mailing list! The responder, Jeroen Ooms, posted #3064 (comment) as well.

The full response is included below. My takeaways:

  1. I think we should move forward with this PR (see my comment in [R-package] Installation with R4.0.0 on Windows is broken #3064 (comment))
  2. I think we should follow Jeroen's advice and try to use make.exe for everything, instead of keeping mingw32-make.exe for pre-R4.0 and make.exe for R4.0 onwards
Full response

Some utilities that were previously bundled with all rtools
installations can now be installed with the package manager.

To install gendef use:

 pacman -S mingw-w64-{i686,x86_64}-tools

To install mingw32-make.exe use:

  pacman -S mingw-w64-{i686,x86_64}-make

To install both of the at once you can use:

  pacman -S mingw-w64-{i686,x86_64}-{tools,make}

Note that it's often better to use 'make.exe' which is included with
all rtools installations instead of mingw32-make.exe.

These changes were a result of switching to an msys2 based toolchain.
The default rtools40 installer only includes the things needed to
build CRAN packages or base-R. Extra stuff such as system libraries,
debuggers, cmake, etc, are all optionally available via the package
manager.

@StrikerRUS
Copy link
Collaborator

These changes were a result of switching to an msys2 based toolchain.

Makes sense!
As a side note, with R 4.0 we will get CI test with MSYS toolchain for compiling LightGBM for free in the addition to our current MinGW-w64 and MSVC tests.

My takeaways:

Absolutely agree with you! (But I'm not an R maintainer!)

@StrikerRUS
Copy link
Collaborator

@jameslamb Found the following note in CMake docs. Just want to let you know.

Use this generator under a Windows command prompt with MinGW (Minimalist GNU for Windows) in the PATH and using mingw32-make as the build tool. The generated makefiles use cmd.exe as the shell to launch build rules. They are not compatible with MSYS or a unix shell.
To build under the MSYS shell, use the MSYS Makefiles generator.
https://cmake.org/cmake/help/v3.17/generator/MinGW%20Makefiles.html

@jameslamb jameslamb force-pushed the fix/r-4.0 branch 6 times, most recently from 36813b7 to 8342c27 Compare May 17, 2020 03:34
@jameslamb
Copy link
Collaborator Author

@jameslamb Found the following note in CMake docs. Just want to let you know.

Use this generator under a Windows command prompt with MinGW (Minimalist GNU for Windows) in the PATH and using mingw32-make as the build tool. The generated makefiles use cmd.exe as the shell to launch build rules. They are not compatible with MSYS or a unix shell.
To build under the MSYS shell, use the MSYS Makefiles generator.
https://cmake.org/cmake/help/v3.17/generator/MinGW%20Makefiles.html

🤒 yep thank you for sharing this. I've been experimenting with it tonight and found that to be true. That really complicates things. I think until the package is on CRAN, we need to keep up with the changes in the R toolchain in install.libs.R.

That means

  • all R versions: Visual Studio if you have it
  • R 3.x: MinGW Makefiles + mingw32-make
  • R 4.x: MSYS Makefiles + make (since R 4.0 has moved to MSYS and those are the tools bundled in Rtools40)

@StrikerRUS
Copy link
Collaborator

@jameslamb
What logic you think should be applied if user manually specifies use_mingw with R 4.0?

@jameslamb
Copy link
Collaborator Author

@jameslamb
What logic you think should be applied if user manually specifies use_mingw with R 4.0?

In my proposal, if you set use_mingw, the MinGW toolchain (MinGW compilers + mingw32-make) will be used, regardless of R version.

If a user does that with Rtools35, it will "just work" because those tools are bundled in Rtools35. If a user does that on R4.0, they will be responsible for making sure there is a mingw32-make.exe on PATH, maybe by downloading with pacman (#3064 (comment)) or with mingw-get from MinGW (http://www.mingw.org/wiki/getting_started) or some other means.

Does that make sense?

@jameslamb
Copy link
Collaborator Author

R CMD check for MSYS toolchain is failing with this new NOTE I've never seen before:

  • checking for non-standard things in the check directory ... NOTE
    Found the following files/directories:
    'data.bin' 'lgb-Dataset.data' 'lgb-model.rds' 'lgb-model.txt'
    'lgb.Dataset.data' 'model.rds' 'model.txt'

Those are all files created from the unit tests or examples. I think that to keep this PR's size from growing (so that it can be reviewed more quickly), we should just bump the allowed NOTEs to 4 here, then have a followup PR that removes it and sets allowed NOTEs back to 3.

@jameslamb jameslamb requested review from StrikerRUS and removed request for StrikerRUS May 17, 2020 17:42
@jameslamb
Copy link
Collaborator Author

@guolinke @StrikerRUS ok I have this passing all CI, and I've added GitHub Actions jobs for R 4.0. Could you please give this a review when you have time?

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jameslamb Great work as always!

Left some very minor comments below:

.ci/test_r_package_windows.ps1 Outdated Show resolved Hide resolved
.ci/test_r_package_windows.ps1 Outdated Show resolved Hide resolved
.ci/test_r_package_windows.ps1 Outdated Show resolved Hide resolved
.ci/test_r_package_windows.ps1 Show resolved Hide resolved
.ci/test_r_package_windows.ps1 Show resolved Hide resolved
R-package/README.md Outdated Show resolved Hide resolved
R-package/README.md Outdated Show resolved Hide resolved
R-package/inst/make-r-def.R Show resolved Hide resolved
R-package/inst/make-r-def.R Outdated Show resolved Hide resolved
R-package/README.md Outdated Show resolved Hide resolved
Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add R version to the name for fast navigation from a PR page:

name: ${{ matrix.task }} (${{ matrix.os }}, ${{ matrix.compiler }})

R-package/inst/make-r-def.R Outdated Show resolved Hide resolved
jameslamb and others added 3 commits June 9, 2020 04:30
@jameslamb
Copy link
Collaborator Author

Add R version to the name for fast navigation from a PR page:

name: ${{ matrix.task }} (${{ matrix.os }}, ${{ matrix.compiler }})

fixed in c65a910

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more comments based on the latest changes.

Comment on lines 38 to 41
As of `Rtools` 4.0, some common paths changed and software was removed from `Rtools`. If you are using `R` 4.0 or later and the corresponding `Rtools`, you need to add one additional path to `PATH`.

* `Rtools` usr bin:
- example: `C:\Rtools\usr\bin`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just add this path to the case If you have `Rtools` 4.0, example:?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh I thought I did that already! sorry, will do

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh! now I remember. That section you're referring to is documenting how the R package will fall back to msys2 if you don't have Visual Studio.

Adding this /usr/bin path is important specifically for the case of Visual Studio (the default), because that path is where objdump.exe is.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't get it. I'm referring the following section:

image

Fallbacks are documented much later in the README.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooooooooh I misunderstood! Yes you're right, ok those two are definitely duplicates

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

almost missed this one, sorry! Just fixed this in d0e995f.

I also added one more fix in that commit noticed during testing...if using system2() instead of {processx}, you have to shQuote(args) in because on of the args is a file path to R.dll, which might have spaces. This isn't necessary with {processx} because it does that quoting for you on Windows.

.ci/test_r_package_windows.ps1 Outdated Show resolved Hide resolved
.ci/test_r_package_windows.ps1 Show resolved Hide resolved
.vsts-ci.yml Outdated Show resolved Hide resolved
build_r.R Outdated Show resolved Hide resolved
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! However, I'm not very happy that now we have 2 Windows R jobs each longer than 10m. Hope they will be moved to GitHub Actions ASAP.

Seems that changing the template of GitHub Actions job name led to duplicated entries in check statuses.

image

@jameslamb
Copy link
Collaborator Author

LGTM! However, I'm not very happy that now we have 2 Windows R jobs each longer than 10m. Hope they will be moved to GitHub Actions ASAP.

I hope we can change it too, but I really think it's necessary to have those jobs since R 4.0 is so new. Once we merge this I will try again with GitHub Actions and Windows.

Seems that changing the template of GitHub Actions job name led to duplicated entries in check statuses.

image

I think that when @guolinke made the 4 GitHub Actions steps from #3119 required, the mechanism in GitHub is to store the job names. So now that we have new names, GitHub thinks 8 non-required things have run and 4 required things have not yet run.

I think the solution will be that once we merge this, the set of required checks needs to be updated. That will have to be done again whenever I manage to move the Windows R jobs, and at least one more time when we introduce CRAN builds (which I am waiting to do until this PR is finalized and merged).

After those things, we should be changing CI jobs much less frequently and focusing on the actual contents of the library.

@jameslamb
Copy link
Collaborator Author

Thanks as always for the thorough review @StrikerRUS !

@guolinke can you please review when you have time? I would like an approval from an R maintainer before this is merged.

@jameslamb
Copy link
Collaborator Author

@guolinke apologies for bothering you...since you are the only admin (I think) on the repo, we'll need your help to merge this. The set of required GitHub Actions tasks needs to change since the names of the GitHub Actions tasks have changed: #3065 (comment)

I think you will need to use administrator privileges to merge this pull request, then go into the repo's settings and change the set of required tasks. Sorry for the inconvenience, it is one of the annoying things about GitHub Actions :/

@guolinke
Copy link
Collaborator

@jameslamb no problem!

@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants