-
Notifications
You must be signed in to change notification settings - Fork 155
Implement file database for indexing and querying #235
Conversation
ec20740 to
308a435
Compare
b31f519 to
05a73d2
Compare
Performance SummaryComparing base 6c96fb2 with head b83aea1 on microsoft/TypeScript@v4.9.5. For details see workflow artifacts. Note that performance is tested on the last commits with changes in BeforeAfter |
|
I am very surprised by the change in memory usage. I don't really know how to explain that 🤔 |
b6f0100 to
451ebec
Compare
Performance SummaryComparing base 6c96fb2 with head 67db2ae on microsoft/TypeScript@v4.9.5. For details see workflow artifacts. Note that performance is tested on the last commits with changes in BeforeAfter |
| source_root, | ||
| &source_path, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it intended that only one of these is a reference? Same for lines 128-129
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question! Both were passed by reference, but source_root was already a reference, while source_path was a value. However, Rick made a nice suggestion above that allows using source_path now.
rewinfrey
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I have more questions than comments or suggestions 😄, but left a couple minor things for you to consider. I think the SQLite implementation and additional database commands look good to me.
| } | ||
|
|
||
| impl DatabaseArgs { | ||
| pub fn default_for_crate(crate_name: &str) -> anyhow::Result<PathBuf> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to define the Default trait for DatabaseArgs? That would mean callers could use DatabaseArgs::default().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is perhaps a bit confusing here, since it does not actually create or act upon DatabaseArgs values. It's an associated method to easily get a default database location for a given crate name. I'll move this to be a module level function instead, so that it's clearer that it does not use DatabaseArgs in any way.
| let mut seen_mark = false; | ||
| let mut db = SQLiteWriter::open(&db_path)?; | ||
| for source_path in &self.source_paths { | ||
| let source_path = source_path.canonicalize()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like source_path is used always as a reference. If that's the intent, could this be updated to let source_path = &source_path.canonicalize()?;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about that, that works :). That's a bit cleaner!
I thought that the value created by canonicalize() would not live long enough if you immediately take the reference but don't capture the value itself as well. Here it is accepted, although I know sometimes it doesn't work. Now I'm not sure when it is or is not allowed 🤔.
| } | ||
| } | ||
| Displayer(self, graph) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The inner Displayer struct here to close over self and graph is pretty clever!
It makes me think that AssertionSource could hold the graph as an additional field, and I think would let you implement the Display trait for AssertionSource directly, but that further complicates the AssertionSource struct with the graph lifetime, so I'm not sure it'd be worth spilling the complexity across the code base where AssertionSource is used for the sake of a single trait implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might be right, that we could capture the graph in the AssertionSource itself. Without having looked at all the current uses, I think this might make it a lot less flexible to use, though. As I recall, these are constructed from tests before the graph even exists (and their construction does not need a graph either). So capturing the graph earlier would restrict users quite a bit.
In one of the later following PRs I've dropped the use of AssertionSource in the CLI again, because it requires full LSP positions which require reading the source file to be calculated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked in the source, and it is not exactly as I said. The graph must already exist, because we store a Handle<File> in AssertionSource. But I think my point is still valid, because capturing the graph would e.g. not allow any &mut references as long as the AssertionSource exists. In other words, it would still force a user to have the graph fully constructed before creating AssertionSources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great point that having a reference to the graph in AssertionSource would complicate &mut references elsewhere. I still think the current solution is 👍 and a nice way to handle this.
| pub(crate) scope_stack_precondition: PartialScopeStack, | ||
| pub(crate) scope_stack_postcondition: PartialScopeStack, | ||
| pub(crate) edges: PartialPathEdgeList, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL pub(crate) visibility specifier, that's very handy 👍
stack-graphs/src/stitching.rs
Outdated
| graph: &StackGraph, | ||
| partials: &mut PartialPaths, | ||
| db: &mut Database, | ||
| _db: &mut Database, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this parameter not needed now, or is this required for some other reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially it was there for consistency with the from_partial_paths method, which did use it. However, that was mostly because the constructors were already doing a bit of work. I've changed that at some point so that the constructor only sets up the stitcher, but doesn't do work until process_next_phase is called. Given that neither of the constructors use db anymore, I think we can remove it.
stack-graphs/src/storage.rs
Outdated
| Ok(()) | ||
| } | ||
|
|
||
| /// Add a partial path for a file to the database. Throws an error if the file does not exist in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nit: Could this comment be updated to say Returns an error if the file does not exist?
stack-graphs/src/storage.rs
Outdated
| } | ||
|
|
||
| /// Open a file database. If the file does not exist, it is automatically created. | ||
| /// An error is thrown if the database version is not supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nit: Could this be updated to say An error is returned ... instead of throws?
| "added path {} must start in given file {} or at root", | ||
| path.display(graph, partials), | ||
| graph[file].name() | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General question: when a function can panic, should that be indicated in the comment for the function? I'm not sure if there is a Rust convention for that, and this isn't intended to be a suggestion here, it's only a drive-by question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I think you're right that it is a good idea to add that in the comment!
…te comments to reflect current behavior
Performance SummaryComparing base 6c96fb2 with head 2b9e261 on microsoft/TypeScript@v4.9.5. For details see workflow artifacts. Note that performance is tested on the last commits with changes in BeforeAfter |
Performance SummaryComparing base 6c96fb2 with head 8a1df24 on microsoft/TypeScript@v4.9.5. For details see workflow artifacts. Note that performance is tested on the last commits with changes in BeforeAfter |
|
@rewinfrey & @rnkaufman, thank you for taking a look at this. I think I addressed all your comments. I've included one more change, which changes the command-line syntax for queries. Before, a query was I've done this for two reasons:
|
rewinfrey
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 this looks great to me, thank you for the additional context and the answers 😄
| } | ||
| } | ||
| Displayer(self, graph) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great point that having a reference to the graph in AssertionSource would complicate &mut references elsewhere. I still think the current solution is 👍 and a nice way to handle this.
#248 should be merged as soon as possible after this one to prevent CI for other PRs based of
mainto be broken.This PR is part of the work to implement an LSP server (#242). This implements file storage for stack graphs and partial paths, allowing separate
indexandqueryinvocations.Changes
storagemodule in stack-graphs has the reader and writer types for the SQLite database we use. The reader supports piecemeal loading of paths into the runtime structures. It is guarded by thestorageCargo feature.indexcommand replaces theanalyzecommand. It takes a database argument where the results are stored.querycommand can be used to query the database. It currently has a single subcommanddefinitionto query for definitions.cleancommand can be used to clean data from the database. It either expects source paths, in which case all descendants of these paths are cleaned, or the--allflag, in which case the whole database is cleaned.--forceis passed).Demo
First index
Cached index
Forced index
Clean and re-index
Query definition