Unac #15306

smgoller · 2012-10-04T20:27:46Z

This formula is for unac, a C library for removing accents from a string. I'm submitting this formula because it's required for flactag, the program I really want to get into homebrew. Its home page is at http://flactag.sourceforge.net/ . If/when this formula gets accepted, I'll be submitting one for flactag.

unac is a very stable project (it hasn't been modified for quite a while) and it seems at this point to be maintained by debian. This formula pulls the latest source from them, as well as their patches. The local patches are made to get things to build properly on Mac OS.

This also exists in macports.

unac is a C library and command that removes accents from a string. For instance the string été will become ete. It provides a command line interface that removes accents from a input flow or a string given in argument (unaccent command). This package is what I would consider to be extremely stable. Even though the project was somewhat abandoned in the 2002-2004 timeframe, Debian has continued to ensure the package works. The formula gets the code and patches from them, then makes local patches in order to compile properly under Mac OS.

…e to use symbol syntax.

smgoller · 2012-10-04T22:04:13Z

Library/Formula/unac.rb

@@ -0,0 +1,53 @@
+# -*- coding: utf-8 -*-


Emacs added the coding line. Should I remove it, even though it indicates there's UTF-8 in this formula?

No, definitely keep it.

Would it make more sense to try and handle this globally, i.e. by adding -KU to our shebang lines and the places where we invoke the interpreter directly?

I thought so, but I can't seem to find the right magic invocation. -KU according to the manpage is for kanji and in practice I was still getting encoding errors:

invalid multibyte escape: /^\037\213/ (SyntaxError) invalid multibyte escape: /^\037\235/ invalid multibyte escape: /^\xFD7zXZ\x00/

Which is weird. Similar story with -Eutf-8:utf-8.

Well, those sequences aren't utf-8 (the last one even has an embedded null!). I suppose we have to compile them with /n to disable multibyte interpretation under 1.9.

Hm, but if they're not valid utf-8 why does it work with the utf-8 magic comment?

Because we are talking about two different encodings here. The first is the source encoding, i.e. the encoding of the actual bytes that make up the file. This is what the magic comment and command-line switch address.

The second is the encoding used by the Regexp engine when compiling /^\037\213/, etc., which under 1.9 is utf-8. The warnings you see are generated after the source file is loaded.

IOW, the sequence "/^\037\213/" is perfectly valid utf-8 (obviously it's also valid ASCII). The file can thus be read as utf-8. But when those escapes are interpreted later, it generates the sequence 0x001F 0x008B, which is not a valid utf-8 sequence, and under 1.9 the regexp engine is not encoding agnostic and needs to be told to compile them without an encoding, e.g.

[irb(main)]$ /^\037\213/ SyntaxError: (irb):4: invalid multibyte escape: /^\037\213/ from /usr/local/opt/ruby/bin/irb:12:in `<main>' [irb(main)]$ /^\037\213/n ====> /^\037\213/n

mistydemeo · 2012-10-04T22:06:57Z

It doesn't work with the system automake on Xcode 3.2.6, but can use the system versions of autoconf and libtool.

adamv · 2012-10-05T04:25:25Z

Should note the autotools version requirements.

smgoller · 2012-10-05T21:52:03Z

To be honest, I'm not sure what the autotools version requirements are. I'm running Mountain Lion with the latest Xcode.

mistydemeo · 2012-10-09T16:45:29Z

@smgoller Just needs a note about why any hard dependencies are required - in this case automake requires some version newer than the one which used to ship with Xcode.

adamv · 2013-01-07T21:05:04Z

Sorry for the delay in pulling this; thanks for the submission.

smgoller · 2013-01-07T21:49:46Z

No worries, thanks for incorporating it!

-Sean.

On Mon, Jan 7, 2013 at 1:05 PM, Adam Vandenberg notifications@github.comwrote:

Sorry for the delay in pulling this; thanks for the submission.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/15306#issuecomment-11971611.

unac is a C library and command that removes accents from a string. Closes Homebrew#15306. Signed-off-by: Adam Vandenberg <flangy@gmail.com>

smgoller added 3 commits October 4, 2012 12:00

Add homepage for project.

c6ab842

Fix indentation, document DATA patch, and switch autoconf and automak…

7576349

…e to use symbol syntax.

smgoller reviewed Oct 4, 2012
View reviewed changes

Add comment regarding automake requirements.

c4e7564

adamv closed this in da8f135 Jan 7, 2013

dholm pushed a commit to dholm/homebrew that referenced this pull request Jan 14, 2013

unac 1.8.0

9c7364b

unac is a C library and command that removes accents from a string. Closes Homebrew#15306. Signed-off-by: Adam Vandenberg <flangy@gmail.com>

Homebrew locked and limited conversation to collaborators Feb 16, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unac #15306

Unac #15306

smgoller commented Oct 4, 2012

smgoller Oct 4, 2012

mistydemeo Oct 4, 2012

jacknagel Oct 5, 2012

mistydemeo Oct 5, 2012

jacknagel Oct 5, 2012

mistydemeo Oct 5, 2012

jacknagel Oct 5, 2012

mistydemeo commented Oct 4, 2012

adamv commented Oct 5, 2012

smgoller commented Oct 5, 2012

mistydemeo commented Oct 9, 2012

adamv commented Jan 7, 2013

smgoller commented Jan 7, 2013

Unac #15306

Unac #15306

Conversation

smgoller commented Oct 4, 2012

smgoller Oct 4, 2012

Choose a reason for hiding this comment

mistydemeo Oct 4, 2012

Choose a reason for hiding this comment

jacknagel Oct 5, 2012

Choose a reason for hiding this comment

mistydemeo Oct 5, 2012

Choose a reason for hiding this comment

jacknagel Oct 5, 2012

Choose a reason for hiding this comment

mistydemeo Oct 5, 2012

Choose a reason for hiding this comment

jacknagel Oct 5, 2012

Choose a reason for hiding this comment

mistydemeo commented Oct 4, 2012

adamv commented Oct 5, 2012

smgoller commented Oct 5, 2012

mistydemeo commented Oct 9, 2012

adamv commented Jan 7, 2013

smgoller commented Jan 7, 2013