Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract language definitions into data file #699

Merged
merged 40 commits into from
Jul 9, 2022
Merged

Extract language definitions into data file #699

merged 40 commits into from
Jul 9, 2022

Conversation

spenserblack
Copy link
Collaborator

@spenserblack spenserblack commented Jul 5, 2022

This uses a templating tool, Tera, to read the language data and generate the appropriate Rust code,
instead of using a macro. Some of the benefits of this:

  • A familiar format that makes it easy for users to contribute (github-linguist also uses YAML).
  • The templating language allows more enforcement at compile time instead of runtime. For example,
    matching basic and true color lengths are now enforced with a compiler macro instead of a test runtime.
  • Other tools will be able to read this data without onefetch needing to export it with a public API.

There are some unnecessary commits in this PR: making a temporary onefetch lib implementation,
making a temp binary to export onefetch data to JSON, etc. That doesn't need to stay in this PR
and can be rebased away, but I thought it was useful to preserve the history in some form.

For #696

To do

  • Finish redefining the languages' colors
  • Update CONTRIBUTING.md

@spenserblack spenserblack marked this pull request as ready for review July 5, 2022 18:34
@spenserblack spenserblack requested a review from o2sh as a code owner July 5, 2022 18:34
@spenserblack
Copy link
Collaborator Author

There will likely be more changes needed, but I think this is at least ready for an initial review.

@spenserblack
Copy link
Collaborator Author

These types of things always make we wonder if the data files should be discoverable by linguist, since in this case it's used in code generation.

@spenserblack
Copy link
Collaborator Author

I think we're going to need to add some integration tests at some point. This PR shouldn't change any behavior at all, but right we basically ensure that by checking if it the output looks good. I automated the bulk of this with a quick, lazy, buggy script, so I don't have 100% confidence that there weren't some unintended changes to logos. It definitely mangled the Haskell logo, which I had to fix.

languages.yaml Outdated
- [255, 255, 255]
- [0, 24, 201]
- [12, 10, 124]
chip: [2, 248, 140]
Copy link
Owner

@o2sh o2sh Jul 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
chip: [2, 248, 140]
chip: Rgb[2, 248, 140]

We should maybe keep the Rgb prefix everywhere to make it more self explanatory and obvious that we're talking about colors.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that valid YAML? AFAIK that would deserialize into a string, which would require some additional parsing in addition to deserialization in build.rs.
It could perhaps be an object, like

chip:
  r: 2
  g: 248
  b: 140

but I was trying to reduce the length of the config while maintaining human readability. Or maybe change the key to chip-rgb.

Copy link
Owner

@o2sh o2sh Jul 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes or chip-color. Too bad owo_colors doesn't support hex colors 🤔

Copy link
Collaborator Author

@spenserblack spenserblack Jul 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could register a hex-to-rgb filter in the build script.
What do you think about this?

ansi:
  - red
  - green
  - blue
rgb: # maybe another name, "hex" or "true" or "fullcolor"?
  - '#FF0000' # mainly just for consistency with chip, so we aren't mixing RGB and hex
  - '#00FF00'
  - '#0000FF'
chip: '#02F88C' # sadly # is a comment, so it needs quotes -_-

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea! we're getting closer to the GitHub linguist format: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml

hex sounds good to me.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@o2sh done 👍 9f02ad7

languages.yaml Outdated Show resolved Hide resolved
templates/language.rs Outdated Show resolved Hide resolved
Comment on lines 100 to 129
{% for language, attrs in languages -%}
{% if attrs.colors.rgb %}
{% set ansi_length = attrs.colors.ansi | length %}
{% set rgb_length = attrs.colors.rgb | length %}
{% if ansi_length != rgb_length %}
compile_error!("{{ language }}: ansi and rgb colors must be the same length");
{% endif %}
{% endif %}
{% endfor %}

{% set max_width = 40 %}
{# NOTE Permitting trailing newline #}
{% set max_height = 26 %}


{% for language, attrs in languages -%}
{% set lines = attrs.ascii | split(pat="\n") %}
{% set height = lines | length %}
{% if height > max_height %}
compile_error!("{{ language }}: ascii art must have {{ max_height - 1 }} or less lines, has {{ height }}");
{% endif %}

{% for line in lines %}
{% set cleaned_line = line | strip_color_indices %}
{% set width = cleaned_line | length %}
{% if width > max_width %}
compile_error!("{{ language }}: ascii art line {{ loop.index }} must be {{ max_width }} or less characters wide");
{% endif %}
{% endfor %}
{% endfor %}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why but if you look at the generated language.rs by tera in the /target/debut/onefetch-*/out folder, this part of the template results in thousands of empty lines.

Besisdes, IMO these should be unit tests not compilation errors. We should maybe find a way to put them back in the original language.rs file.

Copy link
Collaborator Author

@spenserblack spenserblack Jul 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the empty lines are from the if statements never being entered. E.g.

{# 3 empty lines #}
{% for n in [1,2,3] %}
  {% if n > 4 %}println!("Hello!"){% endif %}
{% endfor %}

IMO we don't need to worry too much about how pretty the generated code is: generated code isn't really intended to be human-readable.

For compile errors vs tests, the benefit of compile errors is that you can't even run onefetch if the values are invalid. Additionally, IMO it's preferable to assert the validity of values at compile-time instead of a runtime (in this case the test runtime). IMO tests should really be for behavior, not validating internal values. As a reductionist example, it's kind of like doing this:

/// A number between 0 and 255
const SMALL_NUMBER: usize = make_number!();

#[test]
fn number_is_small() {
    assert!(SMALL_NUMBER < 256);
}

instead of this:

/// A number between 0 and 255
const SMALL_NUMBER: u8 = make_number!();

Making them compile errors also should result in a faster CI failure, since the presence of a compile_error! macro in the generated code will make compilation fail pretty quickly.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed 👍

Yeah, the empty lines are from the if statements never being entered. E.g.

Maybe add some -% to reduce the clutter.

@o2sh
Copy link
Owner

o2sh commented Jul 7, 2022

Very impressive @spenserblack 😮
You must have broken the record of the biggest PR (number of files changed) on this repo 😄 🎉

I left a few comments, but I'll finish the review over the week end 👍

@o2sh
Copy link
Owner

o2sh commented Jul 8, 2022

I pushed a few changes (I hope you don't mind).

@spenserblack Do you think we can merge the PR as is? Or do we wait until we've added unit/integration tests?

@spenserblack
Copy link
Collaborator Author

I pushed a few changes (I hope you don't mind).

I don't mind at all, please and thanks!

Do you think we can merge the PR as is?

I'm definitely not opposed to adding some tests, but I looped through all of the languages and they all looked good. Looks like you already fixed the logos in commits like 3e759a8, thanks!


Given that the only thing that's "public" is the binary itself and its output to the terminal, I was thinking we could start with testing several variations of Printer. For this PR in particular, the biggest concern was breaking the logo output.

TBH I was thinking of putting off these types of tests until Hacktoberfest, because it would be pretty repetitive 😅 . IMO we'd want 2 tests for each ASCII logo, both basic and true color, to guarantee stability. At this point, I think the output is stable enough that we'd want tests to guarantee stability during refactors like this.

I'm focusing mainly on logos just because of this PR, but we'd probably want to test the stats output, too. And with that comes variations of the git repository being analyzed, so that would come with testing different repositories (either real repositories or by mocking gitoxide). We might also want to test the CLI itself. While working on #685 I was surprised by a few panics. Tests could ensure that our CLI that we put a lot of work in doesn't break with clap updates, or for any other reason.

So that was a lot of words to say that, because this is a binary with no library, if our tests guarantee stability of input via the CLI and output to the terminal, I think we're good.

@o2sh
Copy link
Owner

o2sh commented Jul 9, 2022

Glad to see you've already put some thoughts into it. IMO, we won't be able to skip on tests for much longer as more people seem to be playing with this tool.

Maybe we can merge this PR as is to be able to work on #696 and create a separate issue on that matter as a milestone for the next release.

@o2sh o2sh merged commit d4e6cda into o2sh:main Jul 9, 2022
@spenserblack spenserblack deleted the dev/json-languages branch July 9, 2022 13:15
@o2sh o2sh mentioned this pull request Jul 25, 2022
32 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants