Add SourceGraph to Oak database#1214
Conversation
b405165 to
f9880bc
Compare
| #[derive(Clone, Debug, PartialEq, Eq, Hash)] | ||
| pub enum PackageOrigin { | ||
| Workspace { root: PathBuf }, | ||
| Installed { version: String, libpath: PathBuf }, |
There was a problem hiding this comment.
Is the idea that version is required to make a unique identifier in case the package is reinstalled?
There was a problem hiding this comment.
hmm I don't think so, we'll need to depend on all package metadata and R files.
Let me delete both version and libpath here, we don't use either of those in future PRs.
| pub fn new() -> Self { | ||
| Self::default() | ||
| let db = Self::default(); | ||
| oak_db::SourceGraph::new(&db, vec![], vec![], vec![]); |
There was a problem hiding this comment.
For an example of this odd pattern, see ty's Program::from_settings() in ProjectDatabase::new(), where Program is a singleton as well
| let db = Self::default(); | ||
| oak_db::SourceGraph::new(&db, vec![], vec![], vec![]); | ||
| db | ||
| } |
There was a problem hiding this comment.
I see that ty sets .durability(Durability::HIGH) on their Program singleton, but that's presumably because it is rare that it ever changes
Our SourceGraph on the other hand is likely to change quite a bit, I think
There was a problem hiding this comment.
As a whole yes. Down the line we'll want to set high durability on installed packages, and low durability on scripts and workspace packages.
| fn source_graph(&self) -> SourceGraph { | ||
| SourceGraph::get(self) | ||
| } |
There was a problem hiding this comment.
This might be splitting hairs a bit, but, just to confirm, you want to inflict a SourceGraph on all salsa Databases we might ever want to create?
Like, IIUC, ty has an inheritance-like approach where they have this base db
/// Most basic database that gives access to files, the host system, source code, and parsed AST.
pub trait Db: salsa::Database {
fn vendored(&self) -> &VendoredFileSystem;
fn system(&self) -> &dyn System;
fn files(&self) -> &Files;
fn python_version(&self) -> PythonVersion;
}they reference this as SourceDb throughout the project, i.e. all databases should be able to provide file sources and ASTs
then further up they have something they refer to as a SemanticDb, which inherits from SemanticDb -> ModuleResolverDb -> SourceDb, but is noted as
/// Database giving access to semantic information about a Python program.
it looks like this with these extra methods
// SemanticDb
pub trait Db: ModuleResolverDb {
/// Returns `true` if the file should be checked.
fn should_check_file(&self, file: File) -> bool;
/// Resolves the rule selection for a given file.
fn rule_selection(&self, file: File) -> &RuleSelection;
fn lint_registry(&self) -> &LintRegistry;
fn analysis_settings(&self, file: File) -> &AnalysisSettings;
/// Whether ty is running with logging verbosity INFO or higher (`-v` or more).
fn verbose(&self) -> bool;
}
// ModuleResolverDb
pub trait Db: SourceDb {
/// Returns the search paths for module resolution.
fn search_paths(&self) -> &SearchPaths;
}And, for example, this search_paths() method is where they use their singleton Program::get()
#[salsa::db]
impl ty_module_resolver::Db for ProjectDatabase {
fn search_paths(&self) -> &SearchPaths {
Program::get(self).search_paths(self)
}
}So I just wondered if we should at least have this base Db with no trait methods for now (it will probably have files() eventually) and then a SemanticDb that has source_graph()
There was a problem hiding this comment.
I'm not sure I see the need for thinking about abstraction now?
I also don't think we "inflict" anything with the source graph. Are you concerned about more verbose tests? The other side of the coin is that it makes them more realistic.
For context, I can see the semantic index being useful on its own, independent of oak_db. So we should be careful to keep it that way (e.g. the ImportResolver coming up is generic) But as soon as cross-file resolution is needed, we need the full DB and its graph.
| #[salsa::interned] | ||
| pub struct Name<'db> { | ||
| #[returns(ref)] | ||
| pub text: String, |
There was a problem hiding this comment.
Might be interesting to consider CompactString used by ty at some point
There was a problem hiding this comment.
Good idea, that's a Salsa feature. I've just switched.
f9880bc to
e797f3e
Compare
e797f3e to
eedc2cf
Compare
eedc2cf to
78e798c
Compare
Branched from #1213
Progress towards #1212
Progress towards #1183
Just simple data structures for now.
The source graph contains all workspace scripts, workspace packages, and installed packages. The edges are encoded differently for
ScriptandPackagenodes. In scripts they are populated fromlibrary()or::occurrences, stored in the file's semantic index. For packages, the package's metadata in thenamespacefield defines edges.