Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exploration: supported languages as "singleton values" #52

Merged
merged 3 commits into from
Jul 13, 2023

Conversation

helixbass
Copy link
Owner

In this PR:

  • explore the idea of using fixed-size collections to represent the "all supported languages" collection data structures

Based on more-languages

#[derive(Copy, Clone, Debug, Eq, PartialEq, ValueEnum, Hash)]
pub enum SupportedLanguageName {
#[derive(Copy, Clone, Debug, Eq, PartialEq, ValueEnum)]
pub enum SupportedLanguage {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so probably the idea would be that most of the stuff in this file would get generated by some kind of macro "spec"

SupportedLanguage is kind of a "token", just a simple enum that gets used as an index into all of the "all of the supported languages" collections

.unwrap()
}
}
pub type BySupportedLanguage<T> = [T; 22];
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use fixed-size arrays as the actual data structure for the "all supported languages" collections

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea for (proc?) macro for generating this stuff:

enumerated_collection! {
  name => "SupportedLanguage",
  variants => [
    "Rust",
    ...
  ]
}

would generate:

  • the SupportedLanguage "token" enum type (I guess with #[derive(clap::ValueEnum)] "baked-in"?)
  • the BySupportedLanguage type alias
  • the Index implementation
  • the From<usize> implementation
  • the ALL_SUPPORTED_LANGUAGES static collection
  • a by_supported_language!() proc macro (to be used for "not messing up" the definitions of the languages/names for ignore select collections) that could be invoked like:
pub static ALL_SUPPORTED_LANGUAGES: BySupportedLanguage<SupportedLanguage> = by_supported_language! {
  Rust => tree_sitter_rust::language(),
  ...
};

(and it would know the right order for those enum variants in the generated array literal regardless of what order you wrote the variants in, and it would fail if you didn't supply exactly the right expected variants)

Actually wait you probably can't generate proc macros from proc macros (at least not that are then able to be "used immediately after") can you?

Ok so maybe by_supported_language!() would be implemented as a declarative macro that just passes along its argument (as a token tree?) + a hard-coded list of the expected enum variants (in the order they should be in the generated array) to an existing proc macro eg by_enumerated_collection!()?

pub name: SupportedLanguageName,
pub name_for_ignore_select: &'static str,
fn index(&self, index: SupportedLanguage) -> &Self::Output {
&self[index as usize]
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically encapsulate some enum <-> usize massaging

value if value == Self::Json as usize => Self::Json,
value if value == Self::Css as usize => Self::Css,
value if value == Self::Lua as usize => Self::Lua,
_ => unreachable!(),
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obviously this is not technically true (that no usize argument could trigger this code path) but I think we'd be ok panicking if we were trying to convert a non-"known good" usize back into the enum, so using From (instead of eg TryFrom) seems more ergonomically desirable

tree_sitter_query::language(),
tree_sitter_json::language(),
tree_sitter_css::language(),
tree_sitter_lua::language(),
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would obviously be sketchy as far as not messing up the order if the idea wasn't that it would be macro-generated by something where "the relevant things are grouped together"

I guess I decided to make these + the name_for_ignore_select's their own separate data structures (vs in the existing code they are fields on the SupportedLanguage struct) since that seemed like the simplest way to "free up" SupportedLanguage to just be a simple "token" enum

But might be interesting to explore how you'd represent the "main type that's getting passed around" (eg SupportedLanguage) as something with a uniform set of fields too

.get(&language.name)
.unwrap()
.get_or_init(|| maybe_get_query(query_source, language.language).map(Arc::new))
self.0[language]
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I guess this was one of the goals, getting rid of the "fallible lookup" (ie the .unwrap())

.filter(|(_, value)| value.get().is_some())
.map(|(index, value)| -> (SupportedLanguage, _) { (index.into(), value) })
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usize <-> enum massaging

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea: maybe (eg maybe via a newtype wrapper on BySupportedLanguage?) BySupportedLanguage should expose more of a "HashMap-style iteration"?

Since conceptually it is a key-value collection?

Here that would avoid the manual .enumerate() + usize-massaging, which feels like boilerplate

} else {
for language in ALL_SUPPORTED_LANGUAGES.iter() {
types_builder.select(language.name_for_ignore_select);
for name_for_ignore_select in SUPPORTED_LANGUAGE_NAMES_FOR_IGNORE_SELECT.iter() {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort of interesting that with the approach of "moving all of these other fields into their own separate fixed-size data structures" now this usage pattern becomes an iteration over "the data structure that just has that one relevant thing in it"

name: SupportedLanguageName::Lua,
name_for_ignore_select: "lua",
},
pub static ALL_SUPPORTED_LANGUAGES: BySupportedLanguage<SupportedLanguage> = {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one was able to be truly static...

]
};

static SUPPORTED_LANGUAGE_LANGUAGES: Lazy<BySupportedLanguage<Language>> = Lazy::new(|| {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...but this one needed a Lazy wrapper since I guess tree_sitter::Language's aren't const?

Copy link
Collaborator

@peterstuart peterstuart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this seems like a good approach (assuming that some macros were making it all “safe”).

.filter(|(_, value)| value.get().is_some())
.map(|(index, value)| -> (SupportedLanguage, _) { (index.into(), value) })
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea: maybe (eg maybe via a newtype wrapper on BySupportedLanguage?) BySupportedLanguage should expose more of a "HashMap-style iteration"?

Since conceptually it is a key-value collection?

Here that would avoid the manual .enumerate() + usize-massaging, which feels like boilerplate

@@ -222,7 +214,7 @@ pub fn run(args: Args) {
let query_context = QueryContext::new(
query,
capture_index,
language.language,
get_supported_language_language(language),
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea: use .into() instead for "type-unique" "mappings" from the token type (eg here SupportedLanguage -> tree_sitter::Language)?

Or actually per other idea, this could just be a (manually-implemented) .language() method on SupportedLanguage?

pub static ALL_SUPPORTED_LANGUAGES_BY_NAME_FOR_IGNORE_SELECT: Lazy<
HashMap<&'static str, SupportedLanguage>,
> = Lazy::new(|| {
ALL_SUPPORTED_LANGUAGES
.iter()
.map(|supported_language| {
(
supported_language.name_for_ignore_select,
get_supported_language_name_for_ignore_select(*supported_language),
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea: just because with the current approach the "all supported languages collection" for "names for ignore select" is a separate "top-level" data structure doesn't mean that this mapping has to be a top-level function, could just manually add an impl block for SupportedLanguage that exposes an eg name_for_ignore_select() method (whose body does a lookup of self in the "all supported languages collection" for "names for ignore select")?

.unwrap()
}
}
pub type BySupportedLanguage<T> = [T; 22];
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea for (proc?) macro for generating this stuff:

enumerated_collection! {
  name => "SupportedLanguage",
  variants => [
    "Rust",
    ...
  ]
}

would generate:

  • the SupportedLanguage "token" enum type (I guess with #[derive(clap::ValueEnum)] "baked-in"?)
  • the BySupportedLanguage type alias
  • the Index implementation
  • the From<usize> implementation
  • the ALL_SUPPORTED_LANGUAGES static collection
  • a by_supported_language!() proc macro (to be used for "not messing up" the definitions of the languages/names for ignore select collections) that could be invoked like:
pub static ALL_SUPPORTED_LANGUAGES: BySupportedLanguage<SupportedLanguage> = by_supported_language! {
  Rust => tree_sitter_rust::language(),
  ...
};

(and it would know the right order for those enum variants in the generated array literal regardless of what order you wrote the variants in, and it would fail if you didn't supply exactly the right expected variants)

Actually wait you probably can't generate proc macros from proc macros (at least not that are then able to be "used immediately after") can you?

Ok so maybe by_supported_language!() would be implemented as a declarative macro that just passes along its argument (as a token tree?) + a hard-coded list of the expected enum variants (in the order they should be in the generated array) to an existing proc macro eg by_enumerated_collection!()?

@helixbass helixbass changed the base branch from more-languages to split-tests-file July 12, 2023 10:41
Base automatically changed from split-tests-file to master July 12, 2023 11:28
@helixbass helixbass merged commit 2cf848b into master Jul 13, 2023
2 checks passed
@helixbass helixbass deleted the supported-languages-fixed-size branch July 13, 2023 21:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants