Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encode codemap and span information in crate metadata. #22235

Merged
merged 1 commit into from Mar 4, 2015

Conversation

Projects
None yet
6 participants
@michaelwoerister
Copy link
Contributor

michaelwoerister commented Feb 12, 2015

This allows to create proper debuginfo line information for items inlined from other crates (e.g. instantiations of generics). Only the codemap's 'metadata' is stored in a crate's metadata. That is, just filename, positions of line-beginnings, etc. but not the actual source code itself.

Crate metadata size is increased by this change because spans in the encoded ASTs take up space now:

                BEFORE    AFTER
libcore         36 MiB    39.6 MiB    +10%
libsyntax       51.1 MiB  60.5 MiB    +18.4%
libcollections  11.2 MiB  12.8 MiB    +14.3%

This only affects binaries containing metadata (rlibs and dylibs), executables should not be affected in size.

Fixes #19228 and probably #22226.

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Feb 16, 2015

Do you know if it would be easy enough to track which spans are associated with generic functions and which aren't? It'd be nice to just cut down on the metadata size slightly (4mb for libcore is pretty big)

@michaelwoerister

This comment has been minimized.

Copy link
Contributor Author

michaelwoerister commented Feb 18, 2015

I'll see what I can do to omit span data for types and other stuff that is not used later anyway. I'll also experiment with encoding spans in LEB128 and see if that helps in addition to the LZ77 encoding we are doing anyway.

I'll be on vacation for the rest of the week, however, so please don't expect any updates before next week.

@Manishearth

This comment has been minimized.

Copy link
Member

Manishearth commented Feb 18, 2015

Needs a rebase, sorry.

@michaelwoerister

This comment has been minimized.

Copy link
Contributor Author

michaelwoerister commented Feb 24, 2015

So, I've been investigating different options of reducing space usage for spans but I've not been really successful.

  • At the moment it's not possible to omit some spans from metadata but not others. The reason is that spans occur in the AST and we use derive-generated Encode/Decode implementations for these data structures. If we had a #[dont_serialize_me] attributes for fields, we could do something here. However, I didn't want to add this feature in a rudimentary, ad-hoc fashion, just for this one use case.
  • I made the compiler store spans in the variable-sized LEB128 format also used in DWARF. This however didn't give the expected space reduction because our RBML writer is super-wasteful, tagging each u8 I emit with a type-tag and a byte storing the size of the number. This results in at least 3 bytes and up to 12 bytes per BytePos and between 6 and 24 bytes per Span which totally defeats the purpose of storing things in LEB128.

A maybe acceptable intermediate solution is to store spans as one u64 in order to cut down on RBML tagging overhead. In a preliminary test this worked quite well under the circumstances. I'll push a cleaned up version of this the day after tomorrow.

The real problem though is how metadata is encoded in general (an issue already recorded in #21482). It's super-verbose and I doubt that all this tagging really has much of a benefit. Coming up with a better data format will be necessary sooner or later.

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Feb 25, 2015

Wow that is... depressing. It's not necessarily the end of the world to inflate metadata size right now, it's just something to think about :). If you've got a smaller format working though, that sounds great!

@michaelwoerister

This comment has been minimized.

Copy link
Contributor Author

michaelwoerister commented Feb 25, 2015

Wow that is... depressing.

That's one way to see it. I see the current metadata encoding as a lustrous garden of low-hanging fruit, overripe and ready for the picking. How often does one get the chance to shave off a third of the space consumption at no cost at all.

We'll have to pin down the requirements for the metadata format though before the feast can start.

@michaelwoerister michaelwoerister force-pushed the michaelwoerister:cross-crate-spans branch 6 times, most recently from 15d0206 to 855fe9f Feb 26, 2015

@michaelwoerister

This comment has been minimized.

Copy link
Contributor Author

michaelwoerister commented Feb 26, 2015

OK, this is updated now. Numbers are better than before.

                BEFORE    AFTER
libcore         39   MiB  41   MiB    +5%
libsyntax       54.6 MiB  61.2 MiB    +12%
libcollections  14.3 MiB  15.3 MiB    +7%
@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Feb 27, 2015

Ok, this is all looking good to me, nice work @michaelwoerister!

I'd like a second set of eyes on this, however.

r? @nrc or @brson

@nrc

This comment has been minimized.

Copy link
Member

nrc commented Feb 27, 2015

I need to give this a proper read, but I think the idea is good.

One question: what about macro expansions? I guess we don't use them in debug info? Is it ever possible we'd want to have this information cross-crate?

@michaelwoerister

This comment has been minimized.

Copy link
Contributor Author

michaelwoerister commented Feb 28, 2015

Yes, macro expansion information is not used in debuginfo. DWARF has some provisions for recording C-macro definitions (not expansions) but that's not even supported by LLVM. I guess this won't be a topic for us in the foreseeable future.

About having this information cross-crate in general: It would be easy to implement, I think, but it costs additional space and at the moment I don't know of a use case. For now, I'd just leave it out.

}
}

return true;

This comment has been minimized.

@nrc

nrc Mar 1, 2015

Member

nit: unnecessary return

@@ -1564,3 +1563,19 @@ pub fn is_default_trait<'tcx>(cdata: Cmd, id: ast::NodeId) -> bool {
_ => false
}
}

pub fn get_codemap(metadata: &[u8]) -> Vec<codemap::FileMap> {

This comment has been minimized.

@nrc

nrc Mar 1, 2015

Member

this could have a better name since it doesn't actually return a codemap

/// within the local crate's codemap. `creader::import_codemap()` will
/// already have allocated any additionally needed FileMaps in the local
/// codemap.
pub fn tr_span(&self, span: Span) -> Span {

This comment has been minimized.

@nrc

nrc Mar 1, 2015

Member

it would be good to s/tr_span/translate_span. If this is a huge amount of work, don't worry about it (I see it was called this before).

codemap::DUMMY_SP // FIXME (#1972): handle span properly

/// Translates a `Span` from an extern crate to the corresponding `Span`
/// within the local crate's codemap. `creader::import_codemap()` will

This comment has been minimized.

@nrc

nrc Mar 1, 2015

Member

this 'will' is a bit worrying to me - who's responsibility is it to ensure this invariant? Is it possible to cause an error by using this function without establishing that invariant.

This comment has been minimized.

@nrc

nrc Mar 1, 2015

Member

I assume this is just a nit about the comment, I don't expect you'll need to change any code, just clarify the comment

@@ -144,7 +151,10 @@ pub fn decode_inlined_item<'tcx>(cdata: &cstore::crate_metadata,
cdata: cdata,
tcx: tcx,
from_id_range: from_id_range,
to_id_range: to_id_range
to_id_range: to_id_range,
translate_spans: cdata.codemap_import_info.len() > 0 &&

This comment has been minimized.

@nrc

nrc Mar 1, 2015

Member

would it be harmful to always translate spans? I don't understand why the first predicate would be false - is this for backwards compatibility with existing crates? If we do the translate work and it is unnecessary (second predicate) - is that a real waste of resources? Does if actually introduce an error?

In general, I like to avoid flags like this - I feel the increased scope for error outweighs the performance benefit. But if it is necessary, then leave it.

// the lines list is sorted and individual lines are
// probably not that long. Because of that we can store lines
// as a difference list, using as little space as possible
// for the differences.

This comment has been minimized.

@nrc

nrc Mar 1, 2015

Member

nice :-)

let mut lines = Vec::with_capacity(num_lines as usize);

if num_lines > 0 {
// read the number of bytes used per diff

This comment has been minimized.

@nrc

nrc Mar 1, 2015

Member

Nit^2: Comments should be proper sentences - start with a capital, end with a full stop (sorry for nittiness).

@nrc

This comment has been minimized.

Copy link
Member

nrc commented Mar 1, 2015

r+ with the nits addressed

@michaelwoerister

This comment has been minimized.

Copy link
Contributor Author

michaelwoerister commented Mar 2, 2015

Thanks for the review. I work in your comments on Wednesday. Don't worry about nittiness :)

@lifthrasiir

This comment has been minimized.

Copy link
Contributor

lifthrasiir commented Mar 3, 2015

The space optimization for filemap may have to be revisited after #22971 lands. The relevant commit is fe73d38, which makes the tag represent the size of following integers. If you keep individual tags in the filemap then you don't need separate bytes_per_diff; otherwise you will need Opaque to elide tags at all.

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Mar 3, 2015

☔️ The latest upstream changes (presumably #22971) made this pull request unmergeable. Please resolve the merge conflicts.

Encode codemap and span information in crate metadata.
This allows to create proper debuginfo line information for items inlined from other crates (e.g. instantiations of generics).
Only the codemap's 'metadata' is stored in a crate's metadata. That is, just filename, line-beginnings, etc. but not the actual source code itself. We are thus missing the opportunity of making Rust the first "open-source-only" programming language out there. Pity.

@michaelwoerister michaelwoerister force-pushed the michaelwoerister:cross-crate-spans branch from 855fe9f to 2f88655 Mar 4, 2015

@michaelwoerister

This comment has been minimized.

Copy link
Contributor Author

michaelwoerister commented Mar 4, 2015

I think I've addressed all comments. The encoding of the line data in filemaps is not optimal yet. I started implementing a better version using emit_opaque() as @lifthrasiir suggested but it got so ugly and unwieldy that I decided to do this in a separate PR later.

@michaelwoerister

This comment has been minimized.

Copy link
Contributor Author

michaelwoerister commented Mar 4, 2015

@bors r=nrc

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Mar 4, 2015

@bors r=michaelwoerister 2f88655

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Mar 4, 2015

⌛️ Testing commit 2f88655 with merge 3b3bb0e...

bors added a commit that referenced this pull request Mar 4, 2015

Auto merge of #22235 - michaelwoerister:cross-crate-spans, r=michaelw…
…oerister

This allows to create proper debuginfo line information for items inlined from other crates (e.g. instantiations of generics). Only the codemap's 'metadata' is stored in a crate's metadata. That is, just filename, positions of line-beginnings, etc. but not the actual source code itself.

Crate metadata size is increased by this change because spans in the encoded ASTs take up space now:
```
                BEFORE    AFTER
libcore         36 MiB    39.6 MiB    +10%
libsyntax       51.1 MiB  60.5 MiB    +18.4%
libcollections  11.2 MiB  12.8 MiB    +14.3%
```
This only affects binaries containing metadata (rlibs and dylibs), executables should not be affected in size. 

Fixes #19228 and probably #22226.

@bors bors merged commit 2f88655 into rust-lang:master Mar 4, 2015

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
homu Test successful
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.