Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating unicode versions #1

Open
daurnimator opened this issue Nov 19, 2018 · 11 comments
Open

Updating unicode versions #1

daurnimator opened this issue Nov 19, 2018 · 11 comments

Comments

@daurnimator
Copy link

Unicode is not a static standard, as such the tables need to be outdated from time to time.
However for compatability, you sometimes need to use a specfic version of unicode for a given function.

One thing I've been thinking about with Zig is if we could have a function that parses the official unicode tables (https://www.unicode.org/Public/11.0.0/ucd/).
Most people would use it at compile time, but if it could be used at run-time too then we have a very powerful tool!

I've been thinking about writing such a tool/library myself, but I just found your repository and thought it would be a bad idea to duplicate work/have competing libraries.

@daurnimator
Copy link
Author

Just saw your post at https://www.reddit.com/r/Zig/comments/9u2qnu/github_gernestzunicode_port_of_go_standard/e9mqbtf which indicates this might already be on your mind?

@gernest
Copy link
Owner

gernest commented Nov 19, 2018

@daurnimator as I said on ^^ that reddit comment. We dynamically generate tables. also it is on the tables.zig file

// Code generated by maketables; DO NOT EDIT.

We can easily update to new versions of unicode any time. The only reason the script is not here is it is written in Go so I didn't want to pollute the repo, since I was hopping to port it to zig one day.

For now we are using unicode 10.0.0 , I am still working on stabilizing the API and making sure all the tests are accounted for. Upgrading to 11.0.0 will be a matter of running the script, but it is not important to me for now.

Do you have special need for 11.0.0? I can upgrade for you.

This is the tool/script that is doing what you said https://github.com/gernest/matrix/blob/master/script/make_tables.go . And this library is using it to generate the tables.zig file.

@daurnimator
Copy link
Author

Do you have special need for 11.0.0? I can upgrade for you.

Actually I need Unicode 3.2.0 to implement XMPP's nodeprep function.

@gernest
Copy link
Owner

gernest commented Nov 19, 2018

so the unicode versions are not backward compatible?
I can generate the tables.zig file for 3.2.0(not now though, I'm on mobile) and you can replace it ,the rest of the lib will just work. However I have no plan to support older version or more than one version, so I will be upgrading to latest unicode versions from time to time.

@daurnimator
Copy link
Author

so the unicode versions are not backward compatible?

Correct. Every upgrade will have the potential to break libraries/applications.

I can generate the tables.zig file for 3.2.0(not now though, I'm on mobile) and you can replace it ,the rest of the lib will just work.

No hurry, I have weeks until I actually need it.

I have no plan to support older version or more than one version, so I will be upgrading to latest unicode versions from time to time.

If you implement the parsing in zig then we get support for all versions!
I'm happy to wait for you, or possibly do this work myself.

@gernest
Copy link
Owner

gernest commented Nov 19, 2018

If you implement the parsing in zig then we get support for all versions!
I'm happy to wait for you, or possibly do this work myself.

It will take time before I port the script(It is super low in priority list) , so I will appreciate if you take a stab at it. I will always be around If there is anything you need to know to help with porting.

@daurnimator
Copy link
Author

daurnimator commented Nov 25, 2018

FWIW I started playing around with it at https://github.com/daurnimator/zig-unicode but ran into some issues. I asked in the zig irc channel and got this reply from @andrewrk:

< andrewrk> | this use case of using @embedFile and parsing the stuff at comptime, is a good use case of zig. but I think zig is too immature to handle it right now. it'll be worth trying this again when self hosted is done

So I guess I'll put this project on hold for a while.

@gernest
Copy link
Owner

gernest commented Nov 25, 2018

@daurnimator Maybe I misunderstood your concerns, is there something else that this lib is lacking or not doing right? You will still need to generate the tables/symbols and doing it at runtime is just not cool(expensive etc).

I mean I get it when you said you wanted to use older versions of unicode, which I believe is possible ( just generate the tables.zig with the old unicode version.

I kinda worked hard on this, so any feedback that will help me improve it is highly appreciated. That way I can see if we can add/resolve the issue and I can feel much better about myself(yeah, just don't wanna be be the guy who build stuff that no one never uses)

@daurnimator
Copy link
Author

Maybe I misunderstood your concerns, is there something else that this lib is lacking or not doing right? You will still need to generate the tables/symbols and doing it at runtime is just not cool(expensive etc

I mainly wanted a place to play with writing my own table code in zig. I attempted to do it with zunicode but it got in the way more than it helped, so I started fresh.

I kinda worked hard on this, so any feedback that will help me improve it is highly appreciated.

A few misc things:

  • the codebase has lots of inconsistencies between using u32 and i32, they should really be u21
  • return switch (self) {
    could just be a call to std.meta.tagName
  • I don't understand your split between Range16 and Range32
  • Unicode attributes are missing e.g. NumericValue

@gernest
Copy link
Owner

gernest commented Nov 26, 2018

Thanks for the feedback

I mainly wanted a place to play with writing my own table code in zig. I attempted to do it with >zunicode but it got in the way more than it helped, so I started fresh.

I see, table generation is completely handled by go, I think I already said this before. The limitation isn't on zunicode but zig, else I would have ported it to zig already.

the codebase has lots of inconsistencies between using u32 and i32, they should really be u21

Remember that this is a direct port of golang unicode std lib. I'm not a domain expert in unicode and I'm also not a zig expert too. From my reddit post you linked I was calling for help to improve. I really don't mind using u21 I just don't know how so we can just collaborate where I can do my best to help, so long it works.

Note that I also had to port the test suite to ensure I was achieving correct behaviour .

could just be a call to std.meta.tagName

std.meta.tagName returns []const u8 but the parent fn wants to return *RangeTable symbol, using switch sounded more cleaner, because I would avoid multiple std.mem.eql to check which tag is which. Again , this is my first month of zig, if you don't mind can you show me a snippet where std.meta.tagName will fit better? I will update the table generator ASAP.

I don't understand your split between Range16 and Range32

Me neither, I took it from Go, and it works. I will ditch it in a heartbeat if there is another way.

Unicode attributes are missing e.g. NumericValue

Maybe naming? There is isNumber fn for checking numerical values and the test suite for it.

@daurnimator
Copy link
Author

std.meta.tagName returns []const u8 but the parent fn wants to return *RangeTable symbol, using switch sounded more cleaner, because I would avoid multiple std.mem.eql to check which tag is which. Again , this is my first month of zig, if you don't mind can you show me a snippet where std.meta.tagName will fit better? I will update the table generator ASAP.

I think this will work?

@field(RangeTable, std.meta.tagName(x))

kivikakk referenced this issue in kivikakk/zunicode Apr 22, 2024
add native support for Zigmod package manager
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants