Compiler API tracking issue #1

m4b · 2017-07-26T06:15:50Z

This is the current, completely experimental API that will likely have to change:

pub trait Artifact {
    fn new(target: Target, name: Option<String>) -> Self;
    fn add_code(&mut self, name: String, code: Code);
    fn add_data(&mut self, name: String, data: Data);
    fn import(&mut self, import: String);
    fn link_import(&mut self, caller: &str, import: &str, offset: usize);
    fn link(&mut self, to: &str, from: &str, offset: usize);
    fn write<T: Write + Seek + ::std::fmt::Debug>(self, file: T) -> error::Result<()>;
}

Things I like about it, and would like to desperately keep if possible:

simple
elegant
simple
abstract

But I don't know what the demands of a semi-complicated compiler backend will require; weak symbols? complicated relocations (probably not, most object files have about 4-5 unique ELF x86_64 or ARM relocations, for example), anything else?

It probably needs to be lived in a bit, is my guess, hence this tracking issue

m4b · 2017-07-26T06:35:38Z

Suggestions from @ubsan on IRC: file type should not be required immediately; instead you pass in functions, imports, relocations, etc., and then, perhaps in write (I also like the name, emit it sounds cooler), you'd do:

obj.write::<Elf>(name)?

It also just struck me, if we didn't take self, we could also emit several formats at once:

obj.write::<Elf>(name)?;
obj.write::<Mach>(name)?;
obj.write::<PE>(name)?;

etc., which is cool

m4b · 2017-07-31T03:11:15Z

On hindsight, Artifact should not be a trait, but rather a simple struct with an api like above, but:

pub struct Artifact {
    fn new(target: Target, name: Option<String>) -> Self;
    fn add_code(&mut self, name: String, code: Code);
    fn add_data(&mut self, name: String, data: Data);
    fn import(&mut self, import: String);
    fn link_import(&mut self, caller: &str, import: &str, offset: usize);
    fn link(&mut self, to: &str, from: &str, offset: usize);
    fn emit<O: Object>(&self) -> error::Result<Vec<u8>>;
    fn write<O: Object>(&self, name: &str) -> error::Result<()>;
}

Where Object is a trait:

pub trait Object {
  fn to_object(&Artifact) -> Vec<u8>
}

Then, Elf, MachO, and Pe will impl Object, and given an artifact, or some intermediate form with code and data, can marshal this into a vector of bytes, which the downstream consumer can then write to disk, etc.

E.g.:

  let artifact = Artifact::new(// bla bla);
  // bla bla add code data bla  bla
  let elf_bytes = artifact.emit::<Elf>()?;
  // and then we can write, or have artifact do it
  let pe_bytes = artifact.emit::<Pe>()?;
  // etc.

sunfishcode · 2017-07-31T19:05:57Z

Is emitting several formats at once useful? Beyond each object format having its own relocations, they also their own GOT/PLT/etc. schemes, which require different instructions. Above that, there are platform-specific ABI variations, for example, Windows has a different calling convention from Darwin and ELF platforms on x64.

At first glance, it seems like there'd be too many complications to make use of this flexibility in practice, so it shouldn't be something to design the API around. But I'd be happy to learn otherwise :-).

m4b · 2017-07-31T20:26:13Z

Just briefly; can respond in more detail later, but first so it's clear, this repo is a cross platform object file generator, not an cross platform assembler or cross platform linker. I have toyed with a cross platform linker but that's not here right now, and is (alas) out of scope. One day.

Consequently the code you have to dump are unrelocated raw bytes (I can't assemble asm for you), and the files it outputs are the platforms version of an object file, so it is unlinked.

Afaik all platform object files have no notion of a GOT w.r.t an actual structure in the object file, as this is generated by the linker at link time depending on the code and relocations (e.g. If it has a GOTPC32 or something).

There definitely isn't a PLT, this for sure is generated at link time by the linker, so these two concerns are less important. So you and I and whoever don't really need to worry about the PLT as the system linker constructs it according to whether there are unresolved symbols generally.

This latter part is the idea behind specifying an import, symbolically, and then this repo takes care of generating the correct platform relocation.

As for calling conventions yes of course they are different, but again, that's a detail of the bytes you're sending in, and not my responsibility; faerie knows nothing about the semantics of your bytes, only that they are code, data, a string, an import, and whatever else we deem necessary to type. It could be this simplication is too brittle / not comprehensive enough and explicit platform methods are necessary. That's fine; I'm expecting the API to be driven by organic uses, and don't have a general "philosophy" for what this crate is supposed to, other than get yo dang bytes out to disk!

Lastly you will always have the platform object at your disposal to manipulate and push bytes, relocations, etc., into directly, if that is your desire.

The idea of the Object trait was a sort of generic, uniform, simple, let's get started out putting basic functions and strings kind of backend option

I also do think it's possible to design a generic backend like this that does 90% of what you want for all the container formats, and that this would be cool, but it won't likely be a driving force behind the entire libraries API, which is primarily to get your bytes out into the world in any modern container format you want.

I hope that clears some things up? Lemme know what you are thinking

:)

sunfishcode · 2017-07-31T21:39:38Z

Makes sense. I don't expect this repo to provide GOT/PLT/etc. or calling convention abstractions. My observation is just that since there are all these platform ABI differences between platforms, and the use cases I'm imagining will need to know about them, they'll be able to pick the container format they need up front.

I don't have a strong need either way; it just seems that telling the API the container format up front might provide the implementation some extra flexibility.

m4b · 2018-01-14T19:03:05Z

This issue isn't really useful anymore; API is becoming essentially stabilized around import/declare/define/link api afaics.

Thanks for everyone helping dogfood the initial version to arrive at a better api all around :)

m4b added enhancement help wanted question labels Jul 26, 2017

m4b mentioned this issue Jul 26, 2017

native code output bytecodealliance/cranelift#121

Closed

m4b mentioned this issue Dec 14, 2017

support different kinds of relocations for a symbol #25

Closed

m4b closed this as completed Jan 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compiler API tracking issue #1

Compiler API tracking issue #1

m4b commented Jul 26, 2017

m4b commented Jul 26, 2017 •

edited

Loading

m4b commented Jul 31, 2017 •

edited

Loading

sunfishcode commented Jul 31, 2017

m4b commented Jul 31, 2017

sunfishcode commented Jul 31, 2017

m4b commented Jan 14, 2018

Compiler API tracking issue #1

Compiler API tracking issue #1

Comments

m4b commented Jul 26, 2017

m4b commented Jul 26, 2017 • edited Loading

m4b commented Jul 31, 2017 • edited Loading

sunfishcode commented Jul 31, 2017

m4b commented Jul 31, 2017

sunfishcode commented Jul 31, 2017

m4b commented Jan 14, 2018

m4b commented Jul 26, 2017 •

edited

Loading

m4b commented Jul 31, 2017 •

edited

Loading