Extract language definitions into data file #699

spenserblack · 2022-07-05T16:03:37Z

This uses a templating tool, Tera, to read the language data and generate the appropriate Rust code,
instead of using a macro. Some of the benefits of this:

A familiar format that makes it easy for users to contribute (github-linguist also uses YAML).
The templating language allows more enforcement at compile time instead of runtime. For example,
matching basic and true color lengths are now enforced with a compiler macro instead of a test runtime.
Other tools will be able to read this data without onefetch needing to export it with a public API.

There are some unnecessary commits in this PR: making a temporary onefetch lib implementation,
making a temp binary to export onefetch data to JSON, etc. That doesn't need to stay in this PR
and can be rebased away, but I thought it was useful to preserve the history in some form.

For #696

To do

Finish redefining the languages' colors
Update CONTRIBUTING.md

This changes the data format from JSON to YAML to be easier to use and embed multiline strings. It then uses the YAML to fill in a template Rust file with relevant values.

Was part of temporary changes when generating JSON export.

It is no longer needed, as test function names are now generated with templates instead of macros.

spenserblack · 2022-07-05T18:34:47Z

There will likely be more changes needed, but I think this is at least ready for an initial review.

spenserblack · 2022-07-05T20:12:09Z

These types of things always make we wonder if the data files should be discoverable by linguist, since in this case it's used in code generation.

spenserblack · 2022-07-06T15:01:40Z

I think we're going to need to add some integration tests at some point. This PR shouldn't change any behavior at all, but right we basically ensure that by checking if it the output looks good. I automated the bulk of this with a quick, lazy, buggy script, so I don't have 100% confidence that there weren't some unintended changes to logos. It definitely mangled the Haskell logo, which I had to fix.

o2sh · 2022-07-07T14:21:33Z

languages.yaml

+      - [255, 255, 255]
+      - [0, 24, 201]
+      - [12, 10, 124]
+    chip: [2, 248, 140]


Suggested change

chip: [2, 248, 140]

chip: Rgb[2, 248, 140]

We should maybe keep the Rgb prefix everywhere to make it more self explanatory and obvious that we're talking about colors.

Is that valid YAML? AFAIK that would deserialize into a string, which would require some additional parsing in addition to deserialization in build.rs.
It could perhaps be an object, like

chip: r: 2 g: 248 b: 140

but I was trying to reduce the length of the config while maintaining human readability. Or maybe change the key to chip-rgb.

Yes or chip-color. Too bad owo_colors doesn't support hex colors 🤔

I could register a hex-to-rgb filter in the build script.
What do you think about this?

ansi: - red - green - blue rgb: # maybe another name, "hex" or "true" or "fullcolor"? - '#FF0000' # mainly just for consistency with chip, so we aren't mixing RGB and hex - '#00FF00' - '#0000FF' chip: '#02F88C' # sadly # is a comment, so it needs quotes -_-

Great idea! we're getting closer to the GitHub linguist format: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml

hex sounds good to me.

@o2sh done 👍 9f02ad7

languages.yaml

templates/language.rs

o2sh · 2022-07-07T15:01:01Z

templates/language.rs

+{% for language, attrs in languages -%}
+    {% if attrs.colors.rgb %}
+        {% set ansi_length = attrs.colors.ansi | length %}
+        {% set rgb_length = attrs.colors.rgb | length %}
+        {% if ansi_length != rgb_length %}
+            compile_error!("{{ language }}: ansi and rgb colors must be the same length");
+        {% endif %}
+    {% endif %}
+{% endfor %}
+
+{% set max_width = 40 %}
+{# NOTE Permitting trailing newline #}
+{% set max_height = 26 %}
+
+
+{% for language, attrs in languages -%}
+    {% set lines = attrs.ascii | split(pat="\n") %}
+    {% set height = lines | length %}
+    {% if height > max_height %}
+        compile_error!("{{ language }}: ascii art must have {{ max_height - 1 }} or less lines, has {{ height }}");
+    {% endif %}
+
+    {% for line in lines %}
+        {% set cleaned_line = line | strip_color_indices %}
+        {% set width = cleaned_line | length %}
+        {% if width > max_width %}
+            compile_error!("{{ language }}: ascii art line {{ loop.index }} must be {{ max_width }} or less characters wide");
+        {% endif %}
+    {% endfor %}
+{% endfor %}


I'm not sure why but if you look at the generated language.rs by tera in the /target/debut/onefetch-*/out folder, this part of the template results in thousands of empty lines.

Besisdes, IMO these should be unit tests not compilation errors. We should maybe find a way to put them back in the original language.rs file.

Yeah, the empty lines are from the if statements never being entered. E.g.

{# 3 empty lines #} {% for n in [1,2,3] %} {% if n > 4 %}println!("Hello!"){% endif %} {% endfor %}

IMO we don't need to worry too much about how pretty the generated code is: generated code isn't really intended to be human-readable.

For compile errors vs tests, the benefit of compile errors is that you can't even run onefetch if the values are invalid. Additionally, IMO it's preferable to assert the validity of values at compile-time instead of a runtime (in this case the test runtime). IMO tests should really be for behavior, not validating internal values. As a reductionist example, it's kind of like doing this:

/// A number between 0 and 255 const SMALL_NUMBER: usize = make_number!(); #[test] fn number_is_small() { assert!(SMALL_NUMBER < 256); }

instead of this:

/// A number between 0 and 255 const SMALL_NUMBER: u8 = make_number!();

Making them compile errors also should result in a faster CI failure, since the presence of a compile_error! macro in the generated code will make compilation fail pretty quickly.

Agreed 👍

Yeah, the empty lines are from the if statements never being entered. E.g.

Maybe add some -% to reduce the clutter.

o2sh · 2022-07-07T15:07:52Z

Very impressive @spenserblack 😮
You must have broken the record of the biggest PR (number of files changed) on this repo 😄 🎉

I left a few comments, but I'll finish the review over the week end 👍

src/info/langs/language.tera.rs

Co-authored-by: Ossama Hjaji <ossama-hjaji@live.fr>

src/info/langs/language.tera.rs

Co-authored-by: Ossama Hjaji <ossama-hjaji@live.fr>

… into dev/json-languages

o2sh · 2022-07-08T21:33:53Z

I pushed a few changes (I hope you don't mind).

@spenserblack Do you think we can merge the PR as is? Or do we wait until we've added unit/integration tests?

CONTRIBUTING.md

spenserblack · 2022-07-08T22:30:06Z

I pushed a few changes (I hope you don't mind).

I don't mind at all, please and thanks!

Do you think we can merge the PR as is?

I'm definitely not opposed to adding some tests, but I looped through all of the languages and they all looked good. Looks like you already fixed the logos in commits like 3e759a8, thanks!

Given that the only thing that's "public" is the binary itself and its output to the terminal, I was thinking we could start with testing several variations of Printer. For this PR in particular, the biggest concern was breaking the logo output.

TBH I was thinking of putting off these types of tests until Hacktoberfest, because it would be pretty repetitive 😅 . IMO we'd want 2 tests for each ASCII logo, both basic and true color, to guarantee stability. At this point, I think the output is stable enough that we'd want tests to guarantee stability during refactors like this.

I'm focusing mainly on logos just because of this PR, but we'd probably want to test the stats output, too. And with that comes variations of the git repository being analyzed, so that would come with testing different repositories (either real repositories or by mocking gitoxide). We might also want to test the CLI itself. While working on #685 I was surprised by a few panics. Tests could ensure that our CLI that we put a lot of work in doesn't break with clap updates, or for any other reason.

So that was a lot of words to say that, because this is a binary with no library, if our tests guarantee stability of input via the CLI and output to the terminal, I think we're good.

o2sh · 2022-07-09T09:01:31Z

Glad to see you've already put some thoughts into it. IMO, we won't be able to skip on tests for much longer as more people seem to be playing with this tool.

Maybe we can merge this PR as is to be able to work on #696 and create a separate issue on that matter as a milestone for the next release.

spenserblack added 16 commits July 4, 2022 12:42

Export languages as JSON

8f71b69

Make language name the JSON keys

094bfa7

Alphabetize JSON

2142e42

Use Tokei value names as JSON keys

d0eb796

Add ASCII filenames to JSON

01da2c9

Clean JSON

cee37f7

Prettify JSON more

c8fe473

Generate language data with build script

7eb1ead

This changes the data format from JSON to YAML to be easier to use and embed multiline strings. It then uses the YAML to fill in a template Rust file with relevant values.

Cleanup unneeded JS files

390e2b1

Was part of temporary changes when generating JSON export.

Update CODEOWNERS

7e34cea

Remove unneeded TODO

8cd1fcf

Fix clippy errors on generated tests

92745e0

Remove paste dependency

f6353e0

It is no longer needed, as test function names are now generated with templates instead of macros.

Fix alphabetization of languages

c014c6b

Add back clap serialization macro

b76784e

Add back language chips/circles

49c1d1b

spenserblack marked this pull request as ready for review July 5, 2022 18:34

spenserblack requested a review from o2sh as a code owner July 5, 2022 18:34

spenserblack added 3 commits July 5, 2022 15:50

Clarify TODO

372c0b6

Remove redundant name: fields

e1e09c2

Update CONTRIBUTING.md

c824643

spenserblack added 3 commits July 6, 2022 09:21

Add script to create YAML with embedded ASCII

5d7bfc0

Embed ASCII in YAML

713e4f6

Remove temporary script

fe3ce50

Update CONTRIBUTING.md

9e15944

o2sh reviewed Jul 7, 2022

View reviewed changes

languages.yaml Outdated Show resolved Hide resolved

o2sh reviewed Jul 7, 2022

View reviewed changes

templates/language.rs Outdated Show resolved Hide resolved

o2sh reviewed Jul 7, 2022

View reviewed changes

Place template alongside source code

8d6841b

o2sh reviewed Jul 7, 2022

View reviewed changes

src/info/langs/language.tera.rs Outdated Show resolved Hide resolved

spenserblack and others added 6 commits July 7, 2022 18:03

Fix trailing space after template if

4988dcb

Co-authored-by: Ossama Hjaji <ossama-hjaji@live.fr>

Add script to convert RGB to hex

c745b64

Use hex values in languages.yaml

9f02ad7

Rename default to white, remap to Default with filter

5be782b

Fix ASCII formatting mangled by hex remapping

cd75fb0

Fix clippy warnings

7f47c79

o2sh reviewed Jul 8, 2022

View reviewed changes

src/info/langs/language.tera.rs Outdated Show resolved Hide resolved

spenserblack and others added 9 commits July 8, 2022 11:22

Simplify remapping white to default

98b9f9a

Co-authored-by: Ossama Hjaji <ossama-hjaji@live.fr>

small cleanup of build.rs

2086a88

revert to env::var

3636cda

cargo fmt

331d1fe

fix markdown logo

3e759a8

missing serialization attrs

9ed7fc9

Update CONTRIBUTING.md

8c0a34b

remove trailing empty lines

5e1787e

Merge branch 'dev/json-languages' of github.com:spenserblack/onefetch…

c8d19d4

… into dev/json-languages

spenserblack commented Jul 8, 2022

View reviewed changes

CONTRIBUTING.md Show resolved Hide resolved

remove table of contents from contributing.md

927321e

o2sh merged commit d4e6cda into o2sh:main Jul 9, 2022

spenserblack deleted the dev/json-languages branch July 9, 2022 13:15

o2sh mentioned this pull request Jul 25, 2022

Improve test coverage of onefetch #700

Open

32 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extract language definitions into data file #699

Extract language definitions into data file #699

spenserblack commented Jul 5, 2022 •

edited

Loading

spenserblack commented Jul 5, 2022

spenserblack commented Jul 5, 2022

spenserblack commented Jul 6, 2022

o2sh Jul 7, 2022 •

edited

Loading

spenserblack Jul 7, 2022

o2sh Jul 7, 2022 •

edited

Loading

spenserblack Jul 7, 2022 •

edited

Loading

o2sh Jul 7, 2022

spenserblack Jul 7, 2022

o2sh Jul 7, 2022

spenserblack Jul 7, 2022 •

edited

Loading

o2sh Jul 7, 2022

o2sh commented Jul 7, 2022

o2sh commented Jul 8, 2022 •

edited

Loading

spenserblack commented Jul 8, 2022

o2sh commented Jul 9, 2022 •

edited

Loading

Extract language definitions into data file #699

Extract language definitions into data file #699

Conversation

spenserblack commented Jul 5, 2022 • edited Loading

To do

spenserblack commented Jul 5, 2022

spenserblack commented Jul 5, 2022

spenserblack commented Jul 6, 2022

o2sh Jul 7, 2022 • edited Loading

Choose a reason for hiding this comment

spenserblack Jul 7, 2022

Choose a reason for hiding this comment

o2sh Jul 7, 2022 • edited Loading

Choose a reason for hiding this comment

spenserblack Jul 7, 2022 • edited Loading

Choose a reason for hiding this comment

o2sh Jul 7, 2022

Choose a reason for hiding this comment

spenserblack Jul 7, 2022

Choose a reason for hiding this comment

o2sh Jul 7, 2022

Choose a reason for hiding this comment

spenserblack Jul 7, 2022 • edited Loading

Choose a reason for hiding this comment

o2sh Jul 7, 2022

Choose a reason for hiding this comment

o2sh commented Jul 7, 2022

o2sh commented Jul 8, 2022 • edited Loading

spenserblack commented Jul 8, 2022

o2sh commented Jul 9, 2022 • edited Loading

spenserblack commented Jul 5, 2022 •

edited

Loading

o2sh Jul 7, 2022 •

edited

Loading

o2sh Jul 7, 2022 •

edited

Loading

spenserblack Jul 7, 2022 •

edited

Loading

spenserblack Jul 7, 2022 •

edited

Loading

o2sh commented Jul 8, 2022 •

edited

Loading

o2sh commented Jul 9, 2022 •

edited

Loading