Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Update .mrb file format #944

Open
masuidrive opened this issue Mar 4, 2013 · 28 comments
Open

[RFC] Update .mrb file format #944

masuidrive opened this issue Mar 4, 2013 · 28 comments

Comments

@masuidrive
Copy link
Contributor

It's proposal of new .mrb file format.
Current .mrb file is hex format. It's too fat.

New .mrb file can have sections. now it have irep section only. I'll add debug section.
It have #880 in mind. The irep section has 'endianness' field and can choice big/little endian in the irep section.

I'm working on https://github.com/masuidrive/mruby/tree/binary

mrb file format

@matz
Copy link
Member

matz commented Mar 4, 2013

I agreed with compact .mrb file.

I still think .mrb format should be endian neutral. There's no reason for "file format" to be endian sensitive, unless you REALLY want to read them through mmap. Only we need is in-memory correspondent of .mrb file, which should consume less memory.

FYI, I have a vague plan to remove irep array from mrb_state, instead, I'd add small irep arrays to each ireps.

@masuidrive
Copy link
Contributor Author

.mrb loader support both endian.
So little endian CPU can load big endian mrb file.
'endianness' flag is for direct refer to ISEQ in .mrb file on ROM.

What do you think about ROM .mrb?
Should I need to create new section in .mrb file?

@mattn
Copy link
Contributor

mattn commented Mar 5, 2013

Why did you store compiler name/ver in irep section? I prefer to be upper-layer.

@mattn
Copy link
Contributor

mattn commented Mar 5, 2013

Also bytecode ver, endian things.

@matz
Copy link
Member

matz commented Mar 5, 2013

In my opinion, ROM stored data format should be separated from .mrb file.

  • mrb should be endian neutral, ROM is not
  • mrb should be error tolerant (CRC e.g.), ROM need not to be.
  • mrb should be read/write via I/O, ROM is not.

Design something that serves both purposes has no merit.

@skandhas
Copy link
Contributor

skandhas commented Mar 5, 2013

I agree with what @mattn said. Compiler name/ver, bytecode ver and endianness are redundant in irep section. Those infomation can be stored in .mrb file header.

@miura1729
Copy link
Contributor

I think that member whose size is 16 or 32bits should not assign on odd address. And in embedded system I think .mrb file needs reserved area for extending format. So I propose it need reserved member after endianess or top of IREP record 'B'.

@monaka
Copy link
Contributor

monaka commented Mar 5, 2013

I prefer @matz's opinion.
I believe there is the needs to ROMize.
But there are so many ROM-CPU connection type on real targets. So we mruby core team can't follow all of them.
In addition to @matz's exemplify: Alignments and endian is important for parallel ROMs. But they are not always important for serial ROMs like SPI connected.
(BTW, CRC check is required even if it is a programmable ROM, I think. ...back on topic.)

The solution is depends on the reason why binary format is required.
We can use compression if the reason is compact size.
(This may provide the another merit we can pack media contents with bytecodes.)

And I think we should provide a pluggable dump/load framework if the reason is ROM.

@monaka
Copy link
Contributor

monaka commented Mar 5, 2013

I agree with @mattn's saying basically.
It's useful to store {compiler name | version | bytecode ver. | endian} into the section header instead of the section binary. (And strictly I don't support adding endian to portable .mrb file format.)

@monaka
Copy link
Contributor

monaka commented Mar 5, 2013

How about to add file-format-type field in RITE file header?
If it exists, loader framework (this framework also "if it is exsits") can dispatch each reader subsystems.

Just a first plan:
file-format-type is uint32_t.
0 means traditional mrb format by @matz.
1 means new(?) mrb ascii hex format by @masuidrive and others.
2 means new(?) mrb binary format similar to 1.
3 to 255 is reserved for the future use.
256 - UINT32_MAX means free to use by application's self-responsibility.

@masuidrive
Copy link
Contributor Author

@monaka
I think you don't need to file-format-type.
You can have application specific section in this file.

@monaka
Copy link
Contributor

monaka commented Mar 5, 2013

Ok. Go back to the root of this issue, then.

Could you tell us again why we need the new format?
For compaction? For archiving IREPs? For machine readability?

There have more than a merit. So it become easier to discuss if it was focused.

@masuidrive
Copy link
Contributor Author

I moved compiler name/ver to file header.

But byte code ver still in irep section.
The file can contain some irep section what's different byte code ver.

I agreed remove endianness field.
I see sparately .mrb and ROM.

@masuidrive
Copy link
Contributor Author

I think the new format is for extendable.
New format can contain data more than IREPs.

After that I'll work for containing debug information to the file.

@masuidrive
Copy link
Contributor Author

@monaka
Copy link
Contributor

monaka commented Mar 5, 2013

I think this work is enough worth to develop incremental.

We should focus to IREP archiving for now if it regards as the top priority.

If it is so, is it acceptable the new file format is based on "ASCII hex format" and "endian neutral"? (for now)

@matsumotory
Copy link
Member

Do you think a IREP Record have a IREP record header included nlocals, nregs, npools, nsyms and so on? I prefre IREP record header by IREP record. If we have a IREP record header structure, simple to use IREP record section when cast original data to IREP record structure or header structure.

@masuidrive
Copy link
Contributor Author

@monaka
I agreed "endian natural". but I don't understand ASCII hex format. Why do you want to use ASCII?

@monaka
Copy link
Contributor

monaka commented Mar 5, 2013

There are 2 + 1 reason why I suggest ASCII.

1: working step by step. the current version of mrb is ASCII based.

2: easy to debug. This helps you until the format is stable. e.g. We can't paste binary here.

3: target loading in embedded systems. ASCII based formats are still active format on the embedded system area. Typical examples are Intel hex and Motorola S-record.

Actually 3 is not important. Maybe embedians will choice the another format anyway.
I'm afraid 1,2.

@masuidrive
Copy link
Contributor Author

  1. Binary generator code is simple than hex generator. And I want to remove hex generate code from current code, it's messy.
  2. You can paste hexdump-ed binary and upload binary to gist. Either way, we need to write format verification tool for debugging, because hard to reed bin/hex data by human.

@monaka
Copy link
Contributor

monaka commented Mar 6, 2013

I have no reason that I recommend strongly if you say so.

@mattn
Copy link
Contributor

mattn commented Mar 6, 2013

@masuidrive that link is 404

@monaka
Copy link
Contributor

monaka commented Mar 6, 2013

I go back to the figure attached.

It is probably required IREP section table. Even if we can determine the offset of next section using section size.

And it's better alignment conscious. This tactic makes easy to analyze using binary editors.
You'll fall into tool making hell if you play down about alignment. ;)

I think 4bytes alignment is fit to this format. 2bytes also possible.
So the magic of IREP record should be expand to 2/4bytes. Or should mark as reserved 1/3bytes.
"pool size" and "sym size" also considerable to treat alignment.

@matz
Copy link
Member

matz commented Mar 6, 2013

I propose to have two separate irep representation, one for mrb file format (new mrb), the other for in-memory packed representation (packed irep).

new mrb format should be:

  • endian neutral
  • binary (should be compact than current mrb)
  • with CRC check sum

packed irep should be:

  • can be represented by C array (a la mrbc -B)
  • ROM able
  • endian aware so that irep can refer iseq section in packed irep

@monaka
Copy link
Contributor

monaka commented Mar 6, 2013

I have no rights to stop someone's creation.
But I think we can't decide spec for ROMable. This is not by out skill of couse, but by the diversity of embedded targets.
So My counter propose is that we concentrate to new (not packed) mrb format.

If I understand correctly, not packed version of new mrb format can convert to C array and linked as same as current mrb format. Is this right?
If it is right, it is a enough progressive feature even if the target is a small embedded system.

@masuidrive
Copy link
Contributor Author

I updated new .mrb format.

mrb file format

  • All uint* are big endian.
  • binary
  • have CRC in file header.

@beoran
Copy link

beoran commented Mar 9, 2013

That image is too messy, so I made a diagram in Dia:

new_mrb_format_beoran

Download the Dia file here (use "save as" functionality of your browser):
http://www.beoran.net/eruta/uploads/diagrams/new_mrb_format.dia

@mattn
Copy link
Contributor

mattn commented Mar 11, 2013

@beoran thank you. it's cool.

takahashim pushed a commit to takahashim/mruby that referenced this issue Nov 3, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants