Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upEncode codemap and span information in crate metadata. #22235
Conversation
michaelwoerister
referenced this pull request
Feb 12, 2015
Closed
Parameterized struct doesn't get debug symbols for any of its methods #22226
This comment has been minimized.
This comment has been minimized.
|
Do you know if it would be easy enough to track which spans are associated with generic functions and which aren't? It'd be nice to just cut down on the metadata size slightly (4mb for libcore is pretty big) |
This comment has been minimized.
This comment has been minimized.
|
I'll see what I can do to omit span data for types and other stuff that is not used later anyway. I'll also experiment with encoding spans in LEB128 and see if that helps in addition to the LZ77 encoding we are doing anyway. I'll be on vacation for the rest of the week, however, so please don't expect any updates before next week. |
This comment has been minimized.
This comment has been minimized.
|
Needs a rebase, sorry. |
This comment has been minimized.
This comment has been minimized.
|
So, I've been investigating different options of reducing space usage for spans but I've not been really successful.
A maybe acceptable intermediate solution is to store spans as one The real problem though is how metadata is encoded in general (an issue already recorded in #21482). It's super-verbose and I doubt that all this tagging really has much of a benefit. Coming up with a better data format will be necessary sooner or later. |
This comment has been minimized.
This comment has been minimized.
|
Wow that is... depressing. It's not necessarily the end of the world to inflate metadata size right now, it's just something to think about :). If you've got a smaller format working though, that sounds great! |
This comment has been minimized.
This comment has been minimized.
That's one way to see it. I see the current metadata encoding as a lustrous garden of low-hanging fruit, overripe and ready for the picking. How often does one get the chance to shave off a third of the space consumption at no cost at all. We'll have to pin down the requirements for the metadata format though before the feast can start. |
michaelwoerister
force-pushed the
michaelwoerister:cross-crate-spans
branch
6 times, most recently
from
15d0206
to
855fe9f
Feb 26, 2015
This comment has been minimized.
This comment has been minimized.
|
OK, this is updated now. Numbers are better than before.
|
This comment has been minimized.
This comment has been minimized.
|
Ok, this is all looking good to me, nice work @michaelwoerister! I'd like a second set of eyes on this, however. |
rust-highfive
assigned
nrc
Feb 27, 2015
This comment has been minimized.
This comment has been minimized.
|
I need to give this a proper read, but I think the idea is good. One question: what about macro expansions? I guess we don't use them in debug info? Is it ever possible we'd want to have this information cross-crate? |
This comment has been minimized.
This comment has been minimized.
|
Yes, macro expansion information is not used in debuginfo. DWARF has some provisions for recording C-macro definitions (not expansions) but that's not even supported by LLVM. I guess this won't be a topic for us in the foreseeable future. About having this information cross-crate in general: It would be easy to implement, I think, but it costs additional space and at the moment I don't know of a use case. For now, I'd just leave it out. |
nrc
reviewed
Mar 1, 2015
nrc
reviewed
Mar 1, 2015
| @@ -1564,3 +1563,19 @@ pub fn is_default_trait<'tcx>(cdata: Cmd, id: ast::NodeId) -> bool { | |||
| _ => false | |||
| } | |||
| } | |||
|
|
|||
| pub fn get_codemap(metadata: &[u8]) -> Vec<codemap::FileMap> { | |||
This comment has been minimized.
This comment has been minimized.
nrc
reviewed
Mar 1, 2015
| /// within the local crate's codemap. `creader::import_codemap()` will | ||
| /// already have allocated any additionally needed FileMaps in the local | ||
| /// codemap. | ||
| pub fn tr_span(&self, span: Span) -> Span { |
This comment has been minimized.
This comment has been minimized.
nrc
Mar 1, 2015
Member
it would be good to s/tr_span/translate_span. If this is a huge amount of work, don't worry about it (I see it was called this before).
nrc
reviewed
Mar 1, 2015
| codemap::DUMMY_SP // FIXME (#1972): handle span properly | ||
|
|
||
| /// Translates a `Span` from an extern crate to the corresponding `Span` | ||
| /// within the local crate's codemap. `creader::import_codemap()` will |
This comment has been minimized.
This comment has been minimized.
nrc
Mar 1, 2015
Member
this 'will' is a bit worrying to me - who's responsibility is it to ensure this invariant? Is it possible to cause an error by using this function without establishing that invariant.
This comment has been minimized.
This comment has been minimized.
nrc
Mar 1, 2015
Member
I assume this is just a nit about the comment, I don't expect you'll need to change any code, just clarify the comment
nrc
reviewed
Mar 1, 2015
| @@ -144,7 +151,10 @@ pub fn decode_inlined_item<'tcx>(cdata: &cstore::crate_metadata, | |||
| cdata: cdata, | |||
| tcx: tcx, | |||
| from_id_range: from_id_range, | |||
| to_id_range: to_id_range | |||
| to_id_range: to_id_range, | |||
| translate_spans: cdata.codemap_import_info.len() > 0 && | |||
This comment has been minimized.
This comment has been minimized.
nrc
Mar 1, 2015
Member
would it be harmful to always translate spans? I don't understand why the first predicate would be false - is this for backwards compatibility with existing crates? If we do the translate work and it is unnecessary (second predicate) - is that a real waste of resources? Does if actually introduce an error?
In general, I like to avoid flags like this - I feel the increased scope for error outweighs the performance benefit. But if it is necessary, then leave it.
nrc
reviewed
Mar 1, 2015
| // the lines list is sorted and individual lines are | ||
| // probably not that long. Because of that we can store lines | ||
| // as a difference list, using as little space as possible | ||
| // for the differences. |
This comment has been minimized.
This comment has been minimized.
nrc
reviewed
Mar 1, 2015
| let mut lines = Vec::with_capacity(num_lines as usize); | ||
|
|
||
| if num_lines > 0 { | ||
| // read the number of bytes used per diff |
This comment has been minimized.
This comment has been minimized.
nrc
Mar 1, 2015
Member
Nit^2: Comments should be proper sentences - start with a capital, end with a full stop (sorry for nittiness).
This comment has been minimized.
This comment has been minimized.
|
r+ with the nits addressed |
This comment has been minimized.
This comment has been minimized.
|
Thanks for the review. I work in your comments on Wednesday. Don't worry about nittiness |
This comment has been minimized.
This comment has been minimized.
|
The space optimization for filemap may have to be revisited after #22971 lands. The relevant commit is fe73d38, which makes the tag represent the size of following integers. If you keep individual tags in the filemap then you don't need separate |
This comment has been minimized.
This comment has been minimized.
|
|
michaelwoerister
force-pushed the
michaelwoerister:cross-crate-spans
branch
from
855fe9f
to
2f88655
Mar 4, 2015
This comment has been minimized.
This comment has been minimized.
|
I think I've addressed all comments. The encoding of the line data in filemaps is not optimal yet. I started implementing a better version using |
This comment has been minimized.
This comment has been minimized.
|
@bors r=nrc |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
|
michaelwoerister commentedFeb 12, 2015
This allows to create proper debuginfo line information for items inlined from other crates (e.g. instantiations of generics). Only the codemap's 'metadata' is stored in a crate's metadata. That is, just filename, positions of line-beginnings, etc. but not the actual source code itself.
Crate metadata size is increased by this change because spans in the encoded ASTs take up space now:
This only affects binaries containing metadata (rlibs and dylibs), executables should not be affected in size.
Fixes #19228 and probably #22226.