Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Capitalize initials of all names in a string #48

Closed
derekmahar opened this issue Mar 20, 2021 · 11 comments
Closed

Capitalize initials of all names in a string #48

derekmahar opened this issue Mar 20, 2021 · 11 comments

Comments

@derekmahar
Copy link
Contributor

derekmahar commented Mar 20, 2021

Given a string that contains multiple lower case names (words), how might I capitalize only the initial letter of these names?

I think that the existing set of string functions in csvq cannot perform this operation. However, this can be done by applying a PCRE2 regular expression substitution which converts all initials to upper case.

Unfortunately, according to Regular Expressions 101, Golang does not support the PCRE2 substitution operator \u, so I presume that the new regular expression functions REGEXP_REPLACE in csvq v1.14.1-pr.1 likewise also cannot perform this operation. If this is the case, would you consider implementing a specific string function or generic set of string functions to capitalize all words in a string?

@derekmahar
Copy link
Contributor Author

I verified that REGEXP_REPLACE in csvq v1.14.1-pr.1 does not support substitution operator \u:

derek@derek-TB350-BTC:~/Downloads/csvq-v1.14.1-pr.1-linux-amd64$ ./csvq "SELECT REGEXP_REPLACE('abc abc', '(\w+)', '\u\$1')"
+----------------------------------------------+
| REGEXP_REPLACE('abc abc', '(\\w+)', '\\u$1') |
+----------------------------------------------+
| \uabc \uabc                                  |
+----------------------------------------------+

@mithrandie
Copy link
Owner

I'm sorry if you feel this is rude, but I have a question.
Are the features you suggested what you actually needed in some situations? Or are you aiming for the perfect tool?

Csvq is not originally designed to do very complex processing, and there are usually other good solutions for such things. There is no need to try to do everything with one tool when there are many other great tools in the world.

By the way, you can get the results with the current features of csvq. For example, you can create a user-defined function to get the result you want.

Declare the user-defined function in a file named "capitalize.sql".

DECLARE CAPITALIZE FUNCTION (@str) AS BEGIN
    DECLARE @result := '';

    DECLARE @list := REGEXP_FIND_ALL(@str, '\w+');
    DECLARE @i := 0;

    WHILE TRUE
    DO
        DECLARE @item := JSON_VALUE('[' || @i || '][0]', @list);
        IF @item IS NULL THEN
            BREAK;
        END IF;

        IF 0 < LEN(@item) THEN
            IF 0 < LEN(@result) THEN
                @result := @result || ' ';
            END IF;

            @result := @result || UPPER(SUBSTRING(@item FROM 1 FOR 1));
        END IF;

        IF 1 < LEN(@item) THEN
            @result := @result || SUBSTRING(@item FROM 2);
        END IF;

        @i := @i + 1;
    END WHILE;

    RETURN @result;
END;

And run a sql with the SOURCE command.

$ csvq "SOURCE `capitalize.sql`; SELECT CAPITALIZE('capitalize initials of all names')"
+------------------------------------------------+
| CAPITALIZE('capitalize initials of all names') |
+------------------------------------------------+
| Capitalize Initials Of All Names               |
+------------------------------------------------+

@derekmahar
Copy link
Contributor Author

derekmahar commented Mar 20, 2021

Thank you for this solution. I don't think your response was rude at all.

I do have a specific purpose. I want to capitalize the initials of names in a column that contains full names in mixed case. ;-) I think this is not such a complicated problem and so I prefer to use csvq than resort to another tool or a lower level programming language like Rust, Java, or heaven forbid, Python.

I've yet to use user defined functions in csvq and I had forgotten that csvq supported them. I have read several sections of the csvq documentation many times, but this time I somehow overlooked the section on user defined functions. I sometimes confuse Miller and csvq's features and so mistakenly thought that Miller had user defined functions, but csvq did not.

@derekmahar
Copy link
Contributor Author

derekmahar commented Mar 20, 2021

By the way, for what it's worth, as far as I could tell, SQLite string functions can't do this, either, and SQLite doesn't support user defined functions except those in the form of loadable modules implemented in C.

@derekmahar
Copy link
Contributor Author

Is JSON_VALUE the only function in csvq which can operate on an array of strings?

@mithrandie
Copy link
Owner

mithrandie commented Mar 20, 2021

You can also use the function LIST_ELEM.

$ csvq "SELECT LIST_ELEM('word1 word2', ' ', 1)"
+----------------------------------+
| LIST_ELEM('word1 word2', ' ', 1) |
+----------------------------------+
| word2                            |
+----------------------------------+

I'd like to implement composite types of Array and Map, and syntax and functions to handle them, but it may take a while.

@derekmahar
Copy link
Contributor Author

I'll be patient, then.

@mithrandie
Copy link
Owner

Thanks for your response.
If you use user-defined functions frequently, you can define them in the $HOME/.csvqrc so that you don't need to run the SOURCE command every time. (See: https://mithrandie.github.io/csvq/reference/command.html#configurations)

The function to capitalize will be provided in the future, but regular expressions will remain as they are now.

@derekmahar
Copy link
Contributor Author

Thank you. I was just thinking about how to load user defined functions.

I'm satisfied with the user defined CAPITALIZE function.

@derekmahar
Copy link
Contributor Author

I implemented user defined function CAPITALIZE_WORDS that uses function LIST_ELEM instead of JSON_VALUE:

$ cat csvqrc
DECLARE CAPITALIZE_WORDS FUNCTION (@words) AS
BEGIN
  DECLARE @result := CAPITALIZE(GET_WORD(@words, 1));
  DECLARE @word := '';
  DECLARE @word_count := 2;

  WHILE (@word := GET_WORD(@words, @word_count)) IS NOT NULL
  DO
    @result := @result || ' ' || CAPITALIZE(@word);
    @word_count := @word_count + 1;
  END WHILE;

  RETURN @result;
END;

DECLARE CAPITALIZE FUNCTION (@word) AS
BEGIN
  IF @word = '' THEN
    RETURN '';
  ELSE
    RETURN UPPER(SUBSTR(@word, 0, 1))
      || LOWER(IFNULL(SUBSTR(@word, 1), ''));
  END IF;
END;

DECLARE GET_WORD FUNCTION (@words, @index) AS
BEGIN
  RETURN LIST_ELEM(@words, ' ', @index - 1);
END;
$ csvq "SELECT '\'' || CAPITALIZE_WORDS('  abc  def  ') || '\'';"
+--------------------------------------------------+
| '\'' || CAPITALIZE_WORDS('  abc  def  ') || '\'' |
+--------------------------------------------------+
| '  Abc  Def  '                                   |
+--------------------------------------------------+
$ csvq "SELECT '\'' || CAPITALIZE_WORDS('abc def') || '\'';"
+---------------------------------------------+
| '\'' || CAPITALIZE_WORDS('abc def') || '\'' |
+---------------------------------------------+
| 'Abc Def'                                   |
+---------------------------------------------+
$ csvq "SELECT '\'' || CAPITALIZE_WORDS('abc') || '\'';"
+-----------------------------------------+
| '\'' || CAPITALIZE_WORDS('abc') || '\'' |
+-----------------------------------------+
| 'Abc'                                   |
+-----------------------------------------+
$ csvq "SELECT '\'' || CAPITALIZE_WORDS('') || '\'';"
+--------------------------------------+
| '\'' || CAPITALIZE_WORDS('') || '\'' |
+--------------------------------------+
| ''                                   |
+--------------------------------------+
$ csvq "SELECT '\'' || CAPITALIZE_WORDS(NULL) || '\'';"
+----------------------------------------+
| '\'' || CAPITALIZE_WORDS(NULL) || '\'' |
+----------------------------------------+
|                  NULL                  |
+----------------------------------------+

mithrandie added a commit that referenced this issue May 3, 2022
- TITLE_CASE ([GitHub #48](#48))
mithrandie added a commit that referenced this issue May 3, 2022
- Support column name wildcard for tables. ([GitHub #68](#68))
- Add a built-in function.
  - TITLE_CASE ([GitHub #48](#48))
- Modify integer and float handling.
  - Fixed a bug in the conversion between integer and float.
  - Intuitive handling of integer and float.
  - Allow exponential notation for float.
@mithrandie
Copy link
Owner

TITLE_CASE function has been included in version 1.17.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants