-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gtools does not group strL variables correctly #39
Comments
Actually, this is a limitation of the stata plugin interface version I am using. I wrote the plugin using version 2, which makes no mention of strL variables being an issue (here). However, version 3 notes that the API method to access string variables only works for non-strL variables, and they added special functions to deal with strL variables (here). First of all, I have to throw an error if the variable type is strL. I don't think version 2.0 of the plugin can deal with it correctly. The error will note:
That's done (will merge to master soon). I now need to compile a separate version of the plugin for Stata 14 and above, but I don't own Stata 14.I might be able to do it through servers or my library's computers, so It'll take me a bit longer to fix. For now the function just throws an error for strL variables. |
Pending - For now, throws error for strL variable: #39 Plugin version 2.0 makes no mention of strL variables; however, plugin version 3.0 notes that they are actually not supported by the macros that I use to access strings. I think I'll have to compile a sepparate version of the plugin for Stata 13 and Stata 14 and above. The Stata 13 version will throw an error for strL variables. This is currently the only version until I figure out where to test Stata 14 and how to implement this switch. Bug fixes - Fixes #38 When the quantile requested was N - 1 out of an array of length N, I makde an exception so the function picked up the largest value, instead of passing N - 1 to the selection algorithm. However, I made a mistake and quantiles between 100 - 150 / N and 100 - 100 / N, left-inclusive right-exclusive, would give the wrong quantile.
Pending - For now, throws error for strL variable; see #39 Plugin version 2.0 makes no mention of strL variables; however, plugin version 3.0 notes that they are actually not supported by the macros that I use to access strings. I think I'll have to compile a sepparate version of the plugin for Stata 13 and Stata 14 and above. The Stata 13 version will throw an error for strL variables. This is currently the only version until I figure out where to test Stata 14 and how to implement this switch. Bug fixes - Fixes #38 When the quantile requested was N - 1 out of an array of length N, I makde an exception so the function picked up the largest value, instead of passing N - 1 to the selection algorithm. However, I made a mistake and quantiles between 100 - 150 / N and 100 - 100 / N, left-inclusive right-exclusive, would give the wrong quantile.
Actually, as I'm getting around to this I've noticed that I can't write to a strL variable, even with the 3.0 interface. I think I can implement this for most functions anyway, but, crucially, not for |
Would it even make sense to collapse a strL? One alternative is to just recast it as a str#, and if not possible (max length is too long, to just raise an error) |
This is basically what happens at the moment. Though I don't recast as str# (I'm not entirely sure I should; though I suppose I could...), I just suggest the user try that. It's just annoying that I can't entirely support |
I guess I could add an option, |
This turned out to be way more complicated than I thought because I have to make a special case for when strL contains binary data; in that case all the string functions I use will give the wrong answer. I'd have to create a special array that contains the number of bytes in each entry, and change At any rate, I will implement support for very long strings in version |
Features * Adds `gduplicates` as a replacement of `duplicates`. This is basically a wrapper for `gegen tag` and `gegen count`. * `fasterxtile` and `gquantiles` now accept weights (including `by()`) * Note that stata might not handle weights accurately in pctile and xtile (or, at the very least, it does not seem to follow its documented formulas in all cases). * `strL` partial support (long strings only). * Option `compress` tries to recast strL variables as str# * Option `forcestrl` ignores binary values check (use with caution). * Fixes #31 (added option `mlast` to `hashsort` to imitate `gsort` when sorting numbers in inverse order and they contain missing values). Bug Fixes * `semean` returns missing with fewer than 2 observations. * If `strL` contains binary data `gtools` functions now throw an error. * `strL` missing values now read correctly. * Added `strL`-specific tests * Closes #39 `strL` partial support. * Closes #41 `wild` with existing variables gets a warning. * Fixes #42 `gunique` typo. Enhancements * `_gtools_internals.ado` exits when `GTOOLS_CALLER` is empty. Tests * All passing in Stata 13 in Linux, Windows * All passing in Stata 15 in Linux, Windows * All passing in Stata 14 in OSX
Hi, when doing gcollapse by groups with some group being data type of "strL", the results seem to be problematic. Reproduced as follows:
The text was updated successfully, but these errors were encountered: