-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
semi-colon ini text value causes truncation #42
Comments
Here is a cleaned up example (all <> brackets changed to [] to avoid text disappearing in this post): Value in INI file (Email_Message=): Value Read by INI component: |
Note that the semi-colon in rgb(255, 255, 192);"] is indeed causing the problem because removing it in the INI entry fixes the issue. |
Also note that the value returned starts with a literal double-quote: "[HTML]... |
The problem is due to finding even number of double quotes in a segment that doesn't actually end with double quotes. `private bool HaveUnmatchedQuotesIn(IEnumerable parts) {
}` to: `private bool HaveUnmatchedQuotesIn(IEnumerable parts) {
}` |
Just submitted a pull request with the code to solve this issue. |
Thanks for reporting! Looking at #43 I'm not convinced that that's the best way to resolve this (though it's working for your use-case). Looking at the first SO post I found (https://stackoverflow.com/questions/11046234/adding-quotes-to-ini-file-values-php#11046544) and the wikipedia entry at https://en.wikipedia.org/wiki/INI_file#Escape_characters, I think that the best way forward is to support escaping quotes with The others aren't that interesting in a string which is surrounded with double-quotes, like you have done:
For now, I'm going to leave these as-is, unless you or someone else would find them really useful, but I've fixed up the parsing of values to do as above: double-quotes which are escaped by a backslash will not be counted as part of an unmatched pair, so will cause the value-grabber to keep on going, gobbling up the rest of the line. There's a test at
Since this behavior breaks existing usage for some people, there's a way to opt out via constructor parameter and property on INIFile. I've also renamed the namespace to match the package - if I'm going to break stuff, I might as well do it properly! |
Davyd, how would your solution handle cases with a literal backslash followed by double quote (at the end of the entry or in the middle of the entry?) |
Davyd, how would your solution handle cases with a literal *\"* (at the end
of the entry or in the middle of the entry?)?
My pull request solution handles those cases without a problem and doesn't
change the behavior of the component.
…On Tue, Jan 26, 2021 at 3:28 AM Davyd McColl ***@***.***> wrote:
Thanks for reporting!
Looking at #43 <#43> I'm
not convinced that that's the best way to resolve this (though it's working
for your use-case). Looking at the first SO post I found (
https://stackoverflow.com/questions/11046234/adding-quotes-to-ini-file-values-php#11046544)
and the wikipedia entry at
https://en.wikipedia.org/wiki/INI_file#Escape_characters, I think that
the best way forward is to support escaping quotes with \" and ensuring
that parsing takes the entire value in the outer quotes. Also, that article
suggests that I should revisit INIFile to support escaping other common
characters, especially newlines and carriage returns; perhaps tabs too? T
he others aren't that interesting in a string which is surrounded with
double-quotes, like you have done:
- #, ', ;, : don't break anything in quotes
- nulls, bells and backspaces aren't going to be easily supported in
string values
- = is already catered for by only accepting the first = as a
name/value separator
For now, I'm going to leave these as-is, unless you or someone else would
find them really useful, but I've fixed up the parsing of values to do as
above: double-quotes which are escaped will not be counted as part of an
unmatched pair, so will cause the value-grabber to keep on going, gobbling
up the rest of the line.
There's a test at
https://github.com/fluffynuts/PeanutButter/blob/f32e905caccc618c9b408adfe5e8e9ce425a8bf6/source/INI/PeanutButter.INI.Tests/TestINI.cs#L1488
using your example. This update should be available in PeanutButter.INI
version 1.2.397.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#42 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGKG6KFPYEYTAEX5QGEEBSDS3Z4KJANCNFSM4WRCWHUA>
.
|
a literal backslash also has to be escaped, ie as
I thought I'd rather pick something which would be expected by a new user: that backslashes could be used to escape quotes (meaning that lone backslashes also need escaping), as per https://en.wikipedia.org/wiki/INI_file#Escape_characters . The idea is to try to have as conformant a parser as possible. Honestly, I feel like the fact that I missed the concept of escaped characters and the complex use-cases that people have found (yours included) meant that I introduced buggy behavior into the parser, and I apologise! So the default is to escape everything but the caller can opt out. I didn't see a test for your PR, but there's an obvious problem in that quotes inside comments would cause incorrect interpretation of the value, ie:
and it's non-conformant behavior. To be quite honest, INIFile was one of the first things I wrote in PeanutButter, though GitHub only shows history back to 2016, I'm quite sure I first made it a few years prior, since I'd come out of C++ land, where I had my own parser, into .net land, around 2012, and I needed a similar parser. I really wish I'd thought about escape characters back then! It's (imo) the right thing to do, though it means that your INI files are negatively affected, I know, and I'm really sorry: it's the result of me doing it wrong the first time. A setting like:
should have always given parsing errors, though I guess I could figure a way around it. If it's a huge issue, please let me know and I'll see if I can figure out more tolerant parsing to maintain older behavior. It's a tough situation tho, because if I apply the old parsing mechanism to a value which is correctly escaped, ie:
then the consuming code would see the value of I could add a third parsing "mode" (where we currently have "no escaping" and "full escaping", where I try "best effort" and I see what algorithm I can come up with that solves the best effort without breaking the others - I could literally switch on the algo and say that for "best effort", do the best I can. There will still be cases where the parser might do unexpected things - whereas the fully-escaped and totally-not-escaped paths are very predictable. I don't want to leave a user in a bad spot, so, as I say, let me know what you think - if it's highly impractical for you to use "proper" escaping, as per the wiki article, let's see what we can do. If it were me, I'd write a pre-parser for the config files to do the conversion, since you know the format of your files and what to expect, so it wouldn't be a blanket "try to fix all values" - you could literally read that line out of the file, look for Something like: void EnsureSettingIsEscaped(
string iniPath, // path to the ini file
string setting) // setting name to fix up
{
var lines = File.ReadAllLines(iniPath);
var idx = -1;
foreach (var line in lines)
{
idx++;
var parts = line.Split('=');
if (parts[0] != setting)
{
continue;
}
var rawSetting = string.Join("=", parts.Skip(1));
if (rawSetting.Length > 2)
{
// trim off containing quotes, but only one at either end!
if (rawSetting[0] == '"')
{
rawSetting = rawSetting.Substring(1);
}
if (rawSetting[rawSetting.Length - 1] == '"')
{
rawSetting = rawSetting.Substring(0, rawSetting.Length - 1);
}
}
var escaped = rawSetting.Replace("\"", "\\\"");
lines[idx] = $"{parts[0]}=\"{escaped}\"";
}
File.WriteAllLines(iniPath, lines, System.Text.Encoding.UTF8);
} though I've written this "by the seat of my pants", so I wouldn't be surprised if there's an error! |
Hi Davyd,
The easiest way out of this is to add a "Parsing_Mode" property and default
it to "*Legacy*".
Then, add a mode of "*No_Escaped_Characters*"
For that mode, you can simply use my code change.
That would give us an *immediate fix without change in behavior*.
You can then take your time to think through character escaping logic and
parsing modes that break existing behavior.
By the way, going forward, I suggest you state that all future parsing
modes *do not support inline comments*.
All comments must start on a new line.
Regards,
- Ido
…On Tue, Jan 26, 2021 at 12:37 PM Davyd McColl ***@***.***> wrote:
a literal backslash also has to be escaped, ie as \\, or the caller can
opt out of escaped characters (an option I had to add back in again for
backwards compatibility because I know I have a user with settings like:
[section]
value="\\server\share\path\"
I thought I'd rather pick something which would be expected by a new user:
that backslashes could be used to escape quotes (meaning that lone
backslashes also need escaping), as per
https://en.wikipedia.org/wiki/INI_file#Escape_characters . The idea is to
try to have as conformant a parser as possible.
So the default is to escape everything but the caller can opt out.
I didn't see a test for your PR, but there's an obvious problem in that
quotes inside comments would cause incorrect interpretation of the value,
ie:
[section]
key="some value;" ;"force something"
and it's non-conformant behavior. To be quite honest, INIFile was one of
the first things I wrote in PeanutButter, though GitHub only shows history
back to 2016, I'm quite sure I first made it a few years prior, since I'd
come out of C++ land, where I had my own parser, into .net land, around
2012, and I needed a similar parser. I really wish I'd thought about escape
characters back then! It's (imo) the right thing to do, though it means
that your INI files are negatively affected, I know, and I'm really sorry:
it's the result of me doing it wrong the first time. A setting like:
[section]
key="some value "with quotes" and more"
should have always given parsing errors, though I guess I could figure a
way around it. If it's a huge issue, please let me know and I'll see if I
can figure out more tolerant parsing to maintain older behavior. It's a
tough situation tho, because if I apply the old parsing mechanism to a
value which is correctly escaped, ie:
[section]
key="some value \"with quotes\" and more"
then the consuming code would see the value of key to be `some value
"with quotes" and more", ie the slashes would be unexpected.
I could add a third parsing "mode" (where we currently have "no escaping"
and "full escaping", where I try "best effort" and I see what algorithm I
can come up with that solves the best effort without breaking the others -
I could literally switch on the algo and say that for "best effort", do the
best I can. There will still be cases where the parser might do unexpected
things - whereas the fully-escaped and totally-not-escaped paths are very
predictable. I don't want to leave a user in a bad spot, so, as I say, let
me know what you think - if it's highly impractical for you to use "proper"
escaping, as per the wiki article, let's see what we can do. If it were me,
I'd write a pre-parser for the config files to do the conversion, since you
know the format of your files and what to expect, so it wouldn't be a
blanket "try to fix all values" - you could literally read that line out of
the file, look for " without preceding = or \ where it's not at the end
of the line " and replace with \" to have a winner.
Something like:
void EnsureSettingIsEscaped(string iniPath, string setting)
{
var lines = File.ReadAllLines(iniPath);
var idx = -1;
foreach (var line in lines)
{
idx++;
var parts = line.Split('=');
if (parts[0] != setting)
{
continue;
}
var rawSetting = string.Join("=", parts.Skip(1));
if (rawSetting.Length > 2)
{
// trim off containing quotes, but only one at either end!
if (rawSetting[0] == '"')
{
rawSetting = rawSetting.Substring(1);
}
if (rawSetting[rawSetting.Length - 1] == '"')
{
rawSetting = rawSetting.Substring(0, rawSetting.Length - 1);
}
}
var escaped = rawSetting.Replace("\"", "\\\"");
lines[idx] = $"{parts[0]}=\"{escaped}\"";
}
File.WriteAllLines(iniPath, lines, System.Text.Encoding.UTF8);
}
though I've written this "by the seat of my pants", so I wouldn't be
surprised if there's an error!
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#42 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGKG6KFU7WFU5OUBBED4AULS334WRANCNFSM4WRCWHUA>
.
|
Better idea that also fixes the test case with comment error:
Please add a Parse_Mode of "*No_Escaped_Characters_No_inLine_Comments*"
For that mode, you can simply grab the whole line (no need to Split on ";")
Regards,
- Ido
…On Tue, Jan 26, 2021 at 12:51 PM Ido Millet ***@***.***> wrote:
Hi Davyd,
The easiest way out of this is to add a "Parsing_Mode" property and
default it to "*Legacy*".
Then, add a mode of "*No_Escaped_Characters*"
For that mode, you can simply use my code change.
That would give us an *immediate fix without change in behavior*.
You can then take your time to think through character escaping logic and
parsing modes that break existing behavior.
By the way, going forward, I suggest you state that all future parsing
modes *do not support inline comments*.
All comments must start on a new line.
Regards,
- Ido
On Tue, Jan 26, 2021 at 12:37 PM Davyd McColl ***@***.***>
wrote:
> a literal backslash also has to be escaped, ie as \\, or the caller can
> opt out of escaped characters (an option I had to add back in again for
> backwards compatibility because I know I have a user with settings like:
>
> [section]
> value="\\server\share\path\"
>
> I thought I'd rather pick something which would be expected by a new
> user: that backslashes could be used to escape quotes (meaning that lone
> backslashes also need escaping), as per
> https://en.wikipedia.org/wiki/INI_file#Escape_characters . The idea is
> to try to have as conformant a parser as possible.
>
> So the default is to escape everything but the caller can opt out.
>
> I didn't see a test for your PR, but there's an obvious problem in that
> quotes inside comments would cause incorrect interpretation of the value,
> ie:
>
> [section]
> key="some value;" ;"force something"
>
> and it's non-conformant behavior. To be quite honest, INIFile was one of
> the first things I wrote in PeanutButter, though GitHub only shows history
> back to 2016, I'm quite sure I first made it a few years prior, since I'd
> come out of C++ land, where I had my own parser, into .net land, around
> 2012, and I needed a similar parser. I really wish I'd thought about escape
> characters back then! It's (imo) the right thing to do, though it means
> that your INI files are negatively affected, I know, and I'm really sorry:
> it's the result of me doing it wrong the first time. A setting like:
>
> [section]
> key="some value "with quotes" and more"
>
> should have always given parsing errors, though I guess I could figure a
> way around it. If it's a huge issue, please let me know and I'll see if I
> can figure out more tolerant parsing to maintain older behavior. It's a
> tough situation tho, because if I apply the old parsing mechanism to a
> value which is correctly escaped, ie:
>
> [section]
> key="some value \"with quotes\" and more"
>
> then the consuming code would see the value of key to be `some value
> "with quotes" and more", ie the slashes would be unexpected.
>
> I could add a third parsing "mode" (where we currently have "no escaping"
> and "full escaping", where I try "best effort" and I see what algorithm I
> can come up with that solves the best effort without breaking the others -
> I could literally switch on the algo and say that for "best effort", do the
> best I can. There will still be cases where the parser might do unexpected
> things - whereas the fully-escaped and totally-not-escaped paths are very
> predictable. I don't want to leave a user in a bad spot, so, as I say, let
> me know what you think - if it's highly impractical for you to use "proper"
> escaping, as per the wiki article, let's see what we can do. If it were me,
> I'd write a pre-parser for the config files to do the conversion, since you
> know the format of your files and what to expect, so it wouldn't be a
> blanket "try to fix all values" - you could literally read that line out of
> the file, look for " without preceding = or \ where it's not at the end
> of the line " and replace with \" to have a winner.
>
> Something like:
>
> void EnsureSettingIsEscaped(string iniPath, string setting)
> {
> var lines = File.ReadAllLines(iniPath);
> var idx = -1;
> foreach (var line in lines)
> {
> idx++;
> var parts = line.Split('=');
> if (parts[0] != setting)
> {
> continue;
> }
> var rawSetting = string.Join("=", parts.Skip(1));
> if (rawSetting.Length > 2)
> {
> // trim off containing quotes, but only one at either end!
> if (rawSetting[0] == '"')
> {
> rawSetting = rawSetting.Substring(1);
> }
> if (rawSetting[rawSetting.Length - 1] == '"')
> {
> rawSetting = rawSetting.Substring(0, rawSetting.Length - 1);
> }
> }
> var escaped = rawSetting.Replace("\"", "\\\"");
> lines[idx] = $"{parts[0]}=\"{escaped}\"";
> }
> File.WriteAllLines(iniPath, lines, System.Text.Encoding.UTF8);
> }
>
> though I've written this "by the seat of my pants", so I wouldn't be
> surprised if there's an error!
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> <#42 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AGKG6KFU7WFU5OUBBED4AULS334WRANCNFSM4WRCWHUA>
> .
>
|
This probably has lower impact than other changes as I imagine that inline comments are a rarer use-case than fancy value parsing; however, i would still need to know up-front that the caller doesn't intend to use escaped characters so that, eg, perhaps then, we need to have a parsing feature matrix, of which we can include:
It seems that the broadest use-case might be catered for by:
Since the last package release was a major version bump to 2.0.0, it shouldn't have affected existing users unless they choose to upgrade there (and a major version upgrade is a warning that things may break). So I'm happy to still allow a little "breakage", in that 2.0.1 would not behave like 2.0.0, but would have better backward compatibility. |
Sounds good to me. |
Ok, I think I've gone one better - using TDD to drive out the implementation, I have two current strategies for parsing ini files: BestEffort and Strict.
Parsing is achieved per-line with a selected instance of the public interface I've also taken this time to ensure that PeanutButter has xmldoc emitted and cleaned up a bit of code here and there. The new logic is available as of published package 2.0.1 . Please test and report any issues. |
New default behavior solves the issue for me. Many thanks! Consider adding a Simple strategy, which assumes no inline comments and no escaped characters. |
Setting gINIFile.ParseStrategy = ParseStrategies.BestEffort seems to get proper behavior on the outside. Please add a simple Parse strategy that doesn't expect and doesn't force escaping. |
write-out escaping when using BestEffort should only happen if the incoming line was escaped - please provide an example ini to work off of for me to create a unit test and a fix if that's happening. There's even a test around this (: You shouldn't have to set anything btw - default strategy is BestEffort. If performance is hugely impacted, please provide an example too - this is my order of interest for all things:
so I'm sure you can see that, whilst performance matters to me, it's by no means top of the list, and I get quite stubborn about it. |
The comment editor on this page keeps dropping content and I don't know how to work around it. |
I'll look into it in the morning; best-effort should only apply escaping if the incoming value had escaping |
btw, if your values are written out escaped, that may not be a bad thing - it means that future reads will have a precise handle on what data should be there. As mentioned before, i consider it a bug that PeanutButter.INI didn't have escaping from the start |
I apologize for the false alarm. This is working fine for me now. Many thanks for the lightning-fast new version release! |
The following ini entry was saved properly. But this entry gets truncated starting with the first semi-colon when reading the value.
Email_Message=" <title>message</title> <style type="text/css"> body{font-family: Verdana, Geneva, sans-serif; font-size:10pt;} td{font-family: Verdana, Geneva, sans-serif;font-size:10pt;} p{margin-top: 10px; margin-bottom: 10px;} ol,ul{margin-top: -10px; margin-bottom: -10px;} </style> test {MovedFileFullPath} "
So when reading that entry, the text starting with the first semi-colon is lost, resulting in just this text:
" <title>message</title> <meta http-equiv="content-type" content="text/html
Looks like the value-embedded semi-colon is falsely handled as a start of a comment.
Looks like there's a need to improve comment detection.
The text was updated successfully, but these errors were encountered: