Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compile time gem dependencies #82

Open
rubys opened this issue Aug 30, 2013 · 12 comments
Open

compile time gem dependencies #82

rubys opened this issue Aug 30, 2013 · 12 comments
Labels

Comments

@rubys
Copy link

rubys commented Aug 30, 2013

Background:

Nokogumbo provides the ability for a Ruby program to invoke the Gumbo HTML5 parser and to access the result as a Nokogiri::HTML::Document.

Nokogumbo makes use of a single Nokogiri API:

VALUE Nokogiri_wrap_xml_document(VALUE klass, xmlDocPtr doc);

Nokogumbo successfully builds and runs today on Ubuntu Linux and OSX Mountain Lion. I recently converted the Rakefile to use rake-compiler in anticipation of cross compiling to Windows. My next step is to update my extconf.rb as follows:

-Rake::ExtensionTask.new('nokogumboc')
+Rake::ExtensionTask.new('nokogumboc', SPEC) do |ext|
+  ext.cross_compile  = true
+  ext.cross_platform = ["x86-mswin32-60", "x86-mingw32"]
+end

The result, predictably, is:

/usr/local/lib/site_ruby/1.9.1/rubygems/dependency.rb:296:in `to_specs': Could not find 'nokogiri' (>= 0) among 6 total gem(s) (Gem::LoadError)

I'm also looking into rake-compiler-dev-box, and hitting all sorts of mundane issues (current example: package_win32_fat_binary.sh includes Ruby 1.8.7, but nokogiri no longer supports that release; running rake cross compile myself results in Gem rake is not installed; rake-compiler-dev-box clearly makes use of rvm, but doesn't set up the rvm command).

I'm working through all of these (eg: . .rvm/scripts/rvm), and will contribute back whatever I learn in the form of pull requests for code, documentation, whatever. Meanwhile, any advice I can get would be appreciated.

@rubys
Copy link
Author

rubys commented Aug 30, 2013

Current blocker (both on my Ubuntu host machine, and on rake-compiler-dev-box):

/vagrant/nokogumbo/tmp/x86-mswin32-60/nokogumboc/1.9.3/mkmf.rb:381:in `try_do': The compiler failed to generate an executable file. (RuntimeError)
You have to install development tools first.

Relevant portion of the mkmf.log:

collect2: ld returned 1 exit status
checked program was:
/* begin */
1: #include "ruby.h"
2:
3: #include <winsock2.h>
4: #include <windows.h>
5: int main() {return 0;}
/* end */

Reproduction instructions for rake-compiler-dev-box. First remove 1.8.7 and 2.0.0 from package_win32_fat_binary.sh and prepare_xrubies.sh (in the process, change rvm use 1.8.7 to rvm use 1.9.3)

Then, from a fresh 'vagrant up':

sudo apt-get install libxslt-dev libxml2-dev
cd /vagrant
git clone https://github.com/rubys/nokogumbo.git
./package_win32_fat_binary.sh nokogumbo

@luislavena
Copy link
Contributor

Hello Sam,

First apologies for the lack of response, but I normally work on Open Source
projects during the weekends.

To better help you out, I'm going to over a few things I noticed from your
project, from requirements to execution, that will help us determine what
is going on and what is failing, ok?

First thing first, dependencies.

As you mention before, it is clear your project depends on Nokogiri, but not
Nokogiri as interface but it's internals (nokogiri.h).

Making that dependency to Nokogiri internals also pushes on you a dependency
on libxml2, as stated in your extconf.rb:

https://github.com/rubys/nokogumbo/blob/master/extconf.rb#L5

Next, you lookup for Nokogiri installation to determine it's source code
location. This points out that nokogiri is required to be installed in
order to compile the extension, and not just installation.

Please keep this in mind as I move forward the other points.

The last part of the dependencies is the availability of gumbo parser source
code, which is not clearly outlined in the extconf.rb, neither as a git
submodule to ease development but only during a packaging phase, where things
are copied around (more about that in the next point).

Gem structure

I couldn't miss noticing that your project is all the files in the root folder,
not having a traditional/standarized structure for gems, as described in the
RuyGems Guides here and
here

I did notice that you copy files around in your Rakefile to be able to generate the gem structure before packaging.

I wonder if wouldn't be better have that gem structure from the beginning and
avoid copy files around?

One thing that I find important is consistency and conventions. One thing you
will notice of rake-compiler project is that almost all the projects that use
it follow the same conventions and structure, which reduced a lot of code
require and tweaks, not to mention the mental learning curve around the
codebase.

If you look at rake-compiler structure recommendation will see what I mean.

This will greatly help both your project organization and welcoming other
developers

Build chain and compiling dependencies

As mentioned before, having Nokogiri installation as gem to be able to
retrieve source/headers from it might be a particular scenario: first, it
requires it installed to be able to compile during development phase. Second
it will be required to be installed during compilation.

This means that compiling natively will require nokogiri be installed but
also means that to cross-compile nokogumbo, it will require the
cross-compiled version also be installed.

Cross-compiled Ruby cannot be used to install or compile gems, simply because
the interpreter do not run natively but instead faked using the native
interpreter.

Because of this, you cannot install a gem inside the cross-compiled Ruby that
will be used to compile your extension.

This could be avoided if instead of looking up for Nokogiri source code using
RubyGems dependencies, its source code has been added to your own project,
removing the need to look for it at compilation time.

The second limitation is libxml2 and libxslt, which are two dependencies of
Nokogiri that are compiled and cross-compiled when targeting other platforms
like Windows.

In your case, you're not dealing at all with such dependency, which will cause
failure as libxml2 headers and support libraries will be missing to the cross
compiler.

If you look at Nokogiri, they deal with libxml dependency in their extconf:

https://github.com/sparklemotion/nokogiri/blob/master/ext/nokogiri/extconf.rb

Last on this topic, gumbo-parser should be compiled as another dependency,
in this case, cross-compiled too, and be part of the dependency chain required
for the extension to compile successfully.

Copying gumbo source files, as covered in your extconf.rb might not be
produce the best consistent results across different systems.

Either it is a submodule that is checked out and the source code part of the
gem, or is compiled separately and then used during linking.

Cross compiling to Windows

One of rake-compiler abilities is allow developers to cross-compile gems to
Windows.

While that might sound like a miracle, it just a sequence of conventions and
existing tools orchestrated for you.

To make things a bit more easier, a VM has been provided, which covers most
of the use cases, like targeting what we call fat-binaries and include
binaries for Ruby 1.8.7 to 2.0.0.

In relation to the platforms questions, RubyInstaller (the official installers)
uses x86-mingw32 (i386-mingw32) as platform and x64-mingw32 just for Ruby
2.0.0 (the 64bits version).

The usage of i386-mswin32-60 is used for legacy purposes as it supports old
Ruby 1.8.6 (One-Click Installer). Generating that gem is not necessary anymore.

In your particular case, these automated scripts might not serve you as 1.8.x
is no longer supported. That can be solved by simply editing the provided
scripts to your needs.

As said before, the VM follows the standards and the existing conventions, but
each project might have its differences. Also the VM is just for compilation,
it expects code will be possible to be compiled having Bundler and default
commands and not manual interaction.

But for all these tools to works properly, all the previous points needs to
be covered.

Present to developers and the tools a standard/conventional gem structure, so
rake-compiler can work without complicated hidden dependencies.

Deal with the external dependencies either as modules of your project,
pre-compiled or compiled as requirement of your compilation task. On this
subject, you can take a look at how things are being done by projects like
sqlite3-ruby vendor_sqlite3.rake, rugged
or Nokogiri itself (as linked before)

I would suggest focusing on be able to generate a native gem (not even
cross-compiled version) outside your own development machine (inside the VM
for example using the native script). That will give you an idea on what
needs improvement to make the entire process idempotent.

Of course, all this is my personal opinion on how to deal with these
dependencies and the code in a more maintainable way.

Hope it helps.

@rubys
Copy link
Author

rubys commented Aug 31, 2013

First apologies for the lack of response, but I normally work on Open Source projects during the weekends.

Not a problem. Thanks for taking the time to respond.

Making that dependency to Nokogiri internals also pushes on you a dependency on libxml2

Actually, my code depends heavily on libxml2 and gumbo (calling a number of public interfaces on each), and when done makes a single call to nokogiri.

The last part of the dependencies is the availability of gumbo parser source code,

Gumbo is new and probably won't be installed on most people's machines. However, it is straight C code, and extconf builds a Makefile that will compile it just fine, so my Rakefile will do a git clone and copy the necessary files into the ext/nokogumboc directory if it isn't already present.

Note that this will result in a single .so (or .dll) file which embeds the parser.

I wonder if wouldn't be better have that gem structure from the beginning and avoid copy files around?

Since I'm (optionally) copying files into the gem structure, that makes rake targets like clean and clobber more complicated if those directories contain both nokogumbo source files and other files.

Cross-compiled Ruby cannot be used to install or compile gems

This does not makes sense to me as installing a gem is merely a matter of putting the right files into the right places.

In your case, you're not dealing at all with such dependency, which will cause failure as libxml2 headers and support libraries will be missing to the cross compiler.

I seem to be finding the libxml2 headers (I develop on Ubuntu, and most of the headaches that nokogiri's build process appears to be working around is due to Mac OSX issues), that being said, you probably are right when it comes to link time.

Present to developers and the tools a standard/conventional gem structure, so rake-compiler can work without complicated hidden dependencies.

This seems to be a sticking point for you, so lets start with that. Lets put aside libxml2 and nokogiri for a moment, and consider gumbo_parser. Everything I need is in the src directory. All I need to add is an extconf.rb, and in that file specify $CFLAGS = " -std=c99".

Given that as the requirements, can I ask how you would recommend I structure my repository so that the gumbo parser will not only be compiled but also installed?

@luislavena
Copy link
Contributor

Actually, my code depends heavily on libxml2 and gumbo (calling a number
of public interfaces on each), and when done makes a single call to
nokogiri.

But remains true what I said: you depend on nokogiri internals.

Also, you're ignoring the point I mention about libxml. While it might be
present in some Linux installation, is not available for cross compilation.

Gumbo is new and probably won't be installed on most people's machines.
However, it is straight C code

Again goes to the same point I mentioned: compilation of the library. By
just including the source code your skipping the dependencies that gumbo
might require for the cross compilation.

I gave you some pointers on some projects that compile dependencies, rugged
and sqlite3, have you looked at those links?

Since I'm (optionally) copying files into the gem structure, that makes
rake targets like clean and clobber more complicated if those directories
contain both nokogumbo source files and other files.

I tried to get your work running on my machine prior making this comment.
It is clear to me that while this approaches works for you in your
environment, is not clear what needs to happen for someone to contribute.

On other projects that I got request to be involved, I freely and happily
send my pull requests with improvements and fixes, you can confirm this by
looking at my contributions.

However in this case, the way things are build aren't solid enough for me
to go ahead and send you my improvements.

This does not makes sense to me as installing a gem is merely a matter of
putting the right files into the right places.

But that covers just the gem, not the dependencies. Can you point me where
in your Rakefile or extconf are you cross compiling, obtaining packages or
linking against libxml?

Again, please see the work done in nokogiri to cross compile libxml, which
you depend on too.

The cross compiled ruby cannot be executed natively, that is why the gem
that you depend on (nokogiri) is required to be installed.

You installing Ubuntu libxml dependencies only solves the native
compilation, not the cross compilation.

For cross compilation to work, ruby and the dependencies of your library
needs to be compiled and available in the same format of the target
platform (in this case, nokogiri compiled for windows plus libxml and
gumbo).

English is not my native language, so perhaps what I'm explaining do not
translate properly. I will thank if you can take a look to check what
nokogiri is doing for cross compilation and also things like rugged and
sqlite3, the former building locally a dependency library while the later
depends on a external package.

This seems to be a sticking point for you, so lets start with that. Lets
put aside libxml2 and nokogiri for a moment, and consider gumbo_parser.
Everything I need is in the src directory. All I need to add is an
extconf.rb, and in that file specify $CFLAGS = " -std=c99".

Please read my comments above, including the source might not serve to
properly detect the library dependencies for different platforms. Take for
example rugged. Libgit2 source is available, however is not simply included
as it needs to determine the platform available features.

All my previous comments were made on the base that I was not able to get a
basic thing working with your code. Those are recommendations and you can
take them or ignore them.

Given that as the requirements, can I ask how you would recommend I
structure my repository so that the gumbo parser will not only be compiled
but also installed?

I believe in my previous comment included links to rubygems guides,
rake-compiler readme structure recommendations and examples, now is your
turn to extrapolate those to your project.

Something that I learned over the years is that instead of doing the
changes myself, it has more value to project maintainers understand the
reasoning behind these modifications. rake-compiler was born because I got
tired of doing this on every project.

Cross compilation is neither simple nor have a unique solution, I'm sharing
with you my knowledge and experience dealing with ruby and Windows, doing
this for several projects and worked for me, but I might be wrong, will
love and appreciate better solutions to improve it.

Regards.

Sorry for top posting. Sent from mobile.
On Aug 31, 2013 2:10 PM, "Sam Ruby" notifications@github.com wrote:

First apologies for the lack of response, but I normally work on Open
Source projects during the weekends.

Not a problem. Thanks for taking the time to respond.

Making that dependency to Nokogiri internals also pushes on you a
dependency on libxml2

Actually, my code depends heavily on libxml2 and gumbo (calling a number
of public interfaces on each), and when done makes a single call to
nokogiri.

The last part of the dependencies is the availability of gumbo parser
source code,

Gumbo is new and probably won't be installed on most people's machines.
However, it is straight C code, and extconf builds a Makefile that will
compile it just fine, so my Rakefile will do a git clone and copy the
necessary files into the ext/nokogumboc directory if it isn't already
present.

Note that this will result in a single .so (or .dll) file which embeds the
parser.

I wonder if wouldn't be better have that gem structure from the beginning
and avoid copy files around?

Since I'm (optionally) copying files into the gem structure, that makes
rake targets like clean and clobber more complicated if those directories
contain both nokogumbo source files and other files.

Cross-compiled Ruby cannot be used to install or compile gems

This does not makes sense to me as installing a gem is merely a matter of
putting the right files into the right places.

In your case, you're not dealing at all with such dependency, which will
cause failure as libxml2 headers and support libraries will be missing to
the cross compiler.

I seem to be finding the libxml2 headers (I develop on Ubuntu, and most of
the headaches that nokogiri's build process appears to be working around is
due to Mac OSX issues), that being said, you probably are right when it
comes to link time.

Present to developers and the tools a standard/conventional gem structure,
so rake-compiler can work without complicated hidden dependencies.

This seems to be a sticking point for you, so lets start with that. Lets
put aside libxml2 and nokogiri for a moment, and consider gumbo_parser.
Everything I need is in the srchttps://github.com/google/gumbo-parser/tree/master/srcdirectory. All I need to add is an extconf.rb, and in that file specify $CFLAGS
= " -std=c99".

Given that as the requirements, can I ask how you would recommend I
structure my repository so that the gumbo parser will not only be compiled
but also installed?


Reply to this email directly or view it on GitHubhttps://github.com//issues/82#issuecomment-23610206
.

@rubys
Copy link
Author

rubys commented Aug 31, 2013

By just including the source code your skipping the dependencies that gumbo might require for the cross compilation.

Gumbo is a pure C99 library with no outside dependencies.

I gave you some pointers on some projects that compile dependencies, rugged and sqlite3, have you looked at those links?

Those appear to presume that you have installed the necessary dependencies separately, and point to where they are installed. Perhaps I am attempting to be too clever, but as gumbo is a pure C library, I'm instead cloning the repository and copying what I need into the ext/nokogumboc directory prior to running extconf.rb.

Can you point me where in your Rakefile or extconf are you cross compiling, obtaining packages or linking against libxml?

Again, I may trying to be too clever, but I let nokogiri do that for me:

  require "#{nokogiri_ext}/extconf.rb"

English is not my native language

You are doing well! And my high school Spanish, while rusty, was good enough to follow this presentation: http://blog.mmediasys.com/2011/11/26/rubyconf-argentina-and-fenix/ :-)

I believe in my previous comment included links to rubygems guides, rake-compiler readme structure recommendations and examples, now is your turn to extrapolate those to your project.

I'm still not getting it :-( That's why I hoped that we could start with something simpler at first. Perhaps with an an earlier revision of nokogumbo?

Note that this /nearly/ follows the recommended structure (I didn't know then to insert 'nokogumboc' into the ext directory structure). It consists of a single C file that only has compile time dependencies against Ruby and Gumbo -- and again the latter is a pure C99 library with no other dependencies.

Something that I learned over the years is that instead of doing the changes myself, it has more value to project maintainers understand the reasoning behind these modifications

I can appreciate that. But hopefully we can find a happy medium between you doing all of the work and me trying clumsily to follow the examples I find and for you to tell me to once again look at those same examples.

Taken as a whole, gumbo + nokogumbo is 12 C99 source files and 14 header files that depend only on Ruby. What I will try to do (probably in a separate branch) is to get just that working before I attempt to tackle pulling in nokogiri and libxml2.

@luislavena
Copy link
Contributor

Gumbo is a pure C99 library with no outside dependencies.

You're correct, I was actually talking any possible special library that Gumbo might require when setting up on Windows, but seems Gumbo doesn't use anything in particular (by looking at autoconf and friends), so just including the code will be ok.

I am attempting to be too clever, but as gumbo is a pure C library, I'm instead cloning the repository and copying what I need into the ext/nokogumboc directory prior to running extconf.rb

Perhaps gumbo repository can be used as submodule and then added to the list of objects instead of copied over?

I normally try to avoid do that and instead produce a static library of the dependency (gumbo in this case) and link against it. That is what Rugged does with libgit2.

I would say that taking a step back and going with a simpler (non-nokogiri) approach first might be better than using Nokogiri internals.

I need to push some commits for tiny_tds project first and then will take a look to your early structure approach.

Will send my comments later today.

Thank you.

@rubys
Copy link
Author

rubys commented Sep 1, 2013

Perhaps gumbo repository can be used as submodule and then added to the list of objects instead of copied over?

I've added it as a submodule. I can't seem to find documentation for mkmf that covers all of the things you can do with global variables, but from what I can tell, mkmf assumes that everything is in one directory. (You can specify which directory you want to use, but you can't specify multiple). I could be wrong about this, as this is based on reviewing the source code.

I normally try to avoid do that and instead produce a static library of the dependency (gumbo in this case) and link against it. That is what Rugged does with libgit2.

libgit2 provides Makefile.embed. gumbo-parser builds this using ./configure. Not being sure whether or not ./configure would play nice with cross compiling, at the moment I'm sticking with extconf and mkmf.

I would say that taking a step back and going with a simpler (non-nokogiri) approach first might be better than using Nokogiri internals.

Such an approach would come with a significant CPU and memory usage penalty, but I've added conditional compilation instructions falling back to such an approach should nokogiri headers not be found.

Current status

At this point in time, nokogumbo is effectively a "pure C" library with no required dependencies other than Ruby itself. As such, I would think that cross compiling should be easy peasy. Unfortunately, I'm still seeing this error. This is both on my machine and on rake-compiler-dev-box, The error indicates that winsock can't be linked in to the library -- something I don't explicitly require.

Building and testing on Ubuntu or Mac OSX is as simple as installing dependencies (via bundle install) and rake.

Running rake cross compile will reproduce the problem I've described. You can verify that the standard ext and lib directories are built prior to the cross compilation process.

@rubys
Copy link
Author

rubys commented Sep 1, 2013

Success?

Apparently, I was misunderstanding the error message produced... the problem was libxml2 not being found. But as libxml2 is not a hard requirement any more, I rearranged my extconf.rb file and am able to build nokogumboc.so. Despite the name ending in .so, I verified using strings that msvcrt.dll was in the file, as well as names of various Windows APIs such as GetProcAddress and EnterCriticalSection.

Next up: figuring out how to build a Gem for Windows.

@rubys
Copy link
Author

rubys commented Sep 3, 2013

Not quite successful yet:

invalid ELF header - /home/rubys/git/nokogumbo/lib/nokogumboc.so (LoadError)

Also, it doesn't appear that mingw supports C99. :-(

So, current status is that now I have implemented the recommended directory structure, reference gumbo-parser as a submodule, have no required dependencies beyond Ruby, and while this appears to be sufficient to install the gem on Windows 8 with RailsInstaller, I still can't cross compile.

@luislavena
Copy link
Contributor

Hello @rubys, sorry for the late response, but I was down the rabbithole for the past months at work.

Shared objects .so generated for Windows are not executables in Linux, they are not even ELF but instead PE or PE+.

Perhaps you tried to run nokogumbo after the cross compilation? If that was the case, you need to perform a native compile so the .so gets replaced by your local platform one.

@rubys
Copy link
Author

rubys commented Oct 24, 2013

Any chance you can try? As I stated above, the current state is that nokogumbo can now be compiled with no required dependencies beyond Ruby, and I have implemented the recommended directory structure, yet I can not get it to work. As such, it should be the perfect candidate project for rake-compiler; but I've tried everything and failed.

@luislavena
Copy link
Contributor

@rubys will try again this weekend while I work fixing some issues with rake-compiler and rake-compiler-dev-box.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants