New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1493527 - Optimize analysis merging #154
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly nits, thanks for doing this!
tools/src/format.rs
Outdated
@@ -138,7 +134,7 @@ pub fn format_code(jumps: &HashMap<String, Jump>, format: FormatAs, | |||
let items = d.iter().map(|item| { | |||
let mut obj = json::Object::new(); | |||
obj.insert("pretty".to_string(), Json::String(item.pretty.clone())); | |||
obj.insert("sym".to_string(), Json::String(item.sym.clone())); | |||
obj.insert("sym".to_string(), Json::String(item.sym.clone().join(","))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the clone()
before join()
can go away.
tools/src/file_format/analysis.rs
Outdated
@@ -13,6 +14,12 @@ pub struct Location { | |||
pub col_end: u32, | |||
} | |||
|
|||
impl fmt::Display for Location { | |||
fn fmt(&self, formatter: &mut fmt::Formatter<'_>) -> fmt::Result { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think the <'_>
can be omitted in these impls.
tools/src/file_format/analysis.rs
Outdated
Err(_) => return vec![], | ||
}; | ||
let reader = BufReader::new(&file); | ||
read_analyses(vec![ filename ], filter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I don't think spaces are put after [
/ before ]
.
tools/src/file_format/analysis.rs
Outdated
read_analyses(vec![ filename ], filter) | ||
} | ||
|
||
pub fn read_analyses<T>(filenames: Vec<&str>, filter: &Fn(&Object) -> Option<T>) -> Vec<WithLocation<Vec<T>>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Does this need to take a Vec
? Or just &[&str]
would suffice?
tools/src/file_format/analysis.rs
Outdated
/// Also asserts that the `pretty` and `no_crossref` fields are | ||
/// the same because otherwise the merge doesn't really make sense. | ||
pub fn merge(&mut self, mut other: AnalysisSource) { | ||
assert!(self.pretty == other.pretty); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: assert_eq!
gives you a better error message.
if let Some(tgt) = read_target(obj) { | ||
let loc = parse_location(obj.get("loc").unwrap().as_string().unwrap()); | ||
let target_str = format!("{}", WithLocation{ data: tgt, loc: loc }); | ||
if !unique_targets.contains(&target_str) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: You could use the return value of insert
, though I guess that means that you need to clone it... Not sure what's better, probably can leave as-is, but I just though I'd point it out.
tools/src/bin/merge-analyses.rs
Outdated
// hashset to deduplicate them. | ||
if let Some(tgt) = read_target(obj) { | ||
let loc = parse_location(obj.get("loc").unwrap().as_string().unwrap()); | ||
let target_str = format!("{}", WithLocation{ data: tgt, loc: loc }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: space after WithLocation
, and can avoid loc: loc
and just use loc
.
if let Some(e) = last_entry { | ||
loc_data.data.push(e); | ||
} | ||
print!("{}", loc_data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On retrospective a new trait for this could've been better (maybe most of them can be written using serde annotations), but not a big deal in any case.
The analysis data structures are modified slightly to bring them more in line with what the C++ indexer emits and the documented format of the analysis data. This consists of a few changes. The first is a trivial change that reorders some of the fields in the "source" analysis struct to match that of the C++ indexer. This doesn't actually matter but I'm OCD like that. The next is to use a vec of strings to represent the comma-separated list of symbols allowed in source analysis lines, rather than keeping it as a single string. Make this change exposed a defect in the output file formatter where a token may have gotten co-highlighted improperly because it was using a list of symbols instead of a single symbol, and so the change to format.rs should fix that by picking a single symbol. The final change is to sort and deduplicate the `syntax` and `sym` fields in the source analysis files, since conceptually these properties are sets rather than lists. Sorting them provides additional benefits in terms of producing a normalized output format as the next patch does.
This implements the Display trait on some structures so that they can easily be JSON-ified, with the output format being normalized to what the C++ indexer emits. "Normalized" means given any set of equivalent input analysis lines, the output should always be the same - ordering of properties in the emitted JSON, ordering of tokens with property values that are comma-separated, omission of optional properties, formatting of numbers, etc. are all made consistent. This allows us to do cheap deduplication of equivalent lines, as well as just being easier to debug.
This modifies the existing read_analysis function to take a list of files instead of just a single file, and read the analysis data from all of those files. It renames the function read_analyses to be consistent with the change. It adds back a read_analysis function with the old signature that delegates to the multiple-file version. When viewing this patch, view the ignore-whitespace diff as most of the patch is just a change in indentation.
This will be needed in the next patch, where we pass in a lambda that needs to modify captured state.
This rewrites the merge-analyses.py script in Rust. There is a minor semantic difference in the Rust version in that it automatically deduplicates the target lines, while the Python version does not. This makes it unnecessary for the calling script to pipe the output through `| sort | uniq` to deduplicate that. Logically this deduplication step belongs with the merging tool since the duplicate lines should only appear because of the need to handle multiple input files. The output from the Rust version is also better normalized to a consistent format (with respect to ordering of properties, etc. within a given line). While the Python version preserved the order of properties from the input files, the Rust version takes it a step further and normalizes them to a known order.
Thanks for the quick review! Addressed the nits, except I decided to leave the contains() check as-is since I feel it's slightly more readable. |
No description provided.