Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode Support #36

Merged
merged 21 commits into from
Nov 14, 2022
Merged

Unicode Support #36

merged 21 commits into from
Nov 14, 2022

Conversation

0b11001111
Copy link
Contributor

@0b11001111 0b11001111 commented Oct 31, 2022

This PR explores potential Unicode support, see discussion at #5

@0b11001111 0b11001111 mentioned this pull request Oct 31, 2022
@ndd7xv
Copy link
Owner

ndd7xv commented Nov 1, 2022

Thanks so much for the PR! I've been pretty busy with work recently but I'll try and look at this sometime this weekend :) This is honestly some really good stuff 😄

Considering the problems mentioned in #5 (comment), I think we could put this behind some sort of feature toggle, like running heh -u/heh --unicode (which would indicate representing bytes with UTF-8 for now) so users would have the option to view bytes as UTF-8 unicode. As a result, we can just mark the work here as "in development" as its being worked on.

Just glimpsed at your PR and I have to say it's really neat! I'll definitely try and get this merged when I have the time (might ask questions/request mild changes + documentation, but if you're too busy I'll get around to it sooner or later).

@0b11001111
Copy link
Contributor Author

That's great news! I've rewritten the whole decoder though and it should be much cleaner now. The architecture of my code allows for simple extension (looking at you utf-16) and supports runtime configuration. However, the code still needs some better test and refactoring as it is partly redundant to bytes.rs. Also, as of now the colouring is chosen under the assumption that all is ASCII...

@0b11001111
Copy link
Contributor Author

... and escaping is missing for control, whitespace and other non displayable characters.

@0b11001111
Copy link
Contributor Author

0b11001111 commented Nov 3, 2022

I made a few more changes and now it almost works as I wish. Also, it should be trivial to integrate other Unicode encodings.

A few quirks I observed:

  • Decoded text may end up overflowing the box. This is caused by a not so monospaced representation of some Unicode characters by the font used by your terminal.
  • For displaying, characters are escaped using the methods the standard library provides us with. However, I doubt that this is complete and there aren't any characters in the Unicode space that mess up the representation entirely.

Personally, I think solving these issues is far beyond the scope of this little feature and exceeds my Unicode knowledge anyway. Since all this is optional to the user, I'd keep it as is

Demo Time

heh demo.md
grafik

heh --encoding utf8 demo.md
grafik

@ndd7xv
Copy link
Owner

ndd7xv commented Nov 5, 2022

Hello! I'm glancing over your code right now and may make a couple of tweaks, but it looks promising (and very nice demo :D)! There is one thing I'm looking to change - If I were to use heh on a file with the following:

Examples of emoji are 😂, 😃, 🧘🏻‍♂️, 🌍, 🌦️, 🍞, 🚗, 📞, 🎉, ❤️, 🍆, 🍑 and 🏁.

regardless of encoding option displays
image

while the current version of heh displays
image

You'll notice that some bytes aren't showing up (A7 98 F0 in the second row). I'm gonna look into fixing this but I think after that I'll squash everything and merge.

Copy link
Owner

@ndd7xv ndd7xv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a couple of questions/comments/requests! Let me know what you think. If you're too busy to work on this anymore, I can also help pick up the work, but again, I really appreciate the PR :)

src/decoder.rs Outdated Show resolved Hide resolved
src/decoder.rs Show resolved Hide resolved
src/screen.rs Outdated Show resolved Hide resolved
src/screen.rs Outdated Show resolved Hide resolved
src/decoder.rs Outdated Show resolved Hide resolved
@ndd7xv ndd7xv merged commit 6442c00 into ndd7xv:main Nov 14, 2022
@ndd7xv
Copy link
Owner

ndd7xv commented Nov 14, 2022

Sorry for leaving you hanging! I think this is great and I'll merge it now; I'll get around to creating issues for smaller things/concerns (e.g. better escape characters/colors as you mentioned) as I find the need to. I should probably get around to testing and all that other good stuff too..

Thank you again for your contribution, it genuinely means a lot to me 😄 I'm looking to publish another release by the end of the year, but let me know if you'd want me to try to do something earlier and I'll try and fit in what I want by then 🙂

@0b11001111
Copy link
Contributor Author

Cool cool :) By the end of the year I'll have more spare time and may jump in again if I feel like doing some Rust hacking... Let's see :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants