Skip to content

Commit

Permalink
[Minor] Improve words wrap algorithm
Browse files Browse the repository at this point in the history
  • Loading branch information
vstakhov committed Jan 21, 2023
1 parent 59fcbfa commit 6fea589
Showing 1 changed file with 23 additions and 1 deletion.
24 changes: 23 additions & 1 deletion src/client/rspamc.cxx
Expand Up @@ -875,7 +875,29 @@ rspamc_symbol_human_output(FILE *out, const ucl_object_t *obj)
return;
}
for (size_t pos = 0; pos < line.size(); ) {
auto s = line.substr(pos, pos ? (maxlen-indent) : maxlen);
/*
* First, find the longest sequence of words, delimited by space of punctuation,
* and adjust `maxlen` if needed
*/
auto split_len = pos ? (maxlen-indent) : maxlen;
auto word_len = 0;
auto suffix = std::string_view(line).substr(pos);
for (;;) {
auto delim_pos = suffix.find_first_of(" \t,;[]():");
if (word_len + delim_pos + 1 < split_len && delim_pos != std::string_view::npos && delim_pos < suffix.size()) {
word_len += delim_pos + 1;
suffix = suffix.substr(delim_pos + 1);
}
else {
break;
}
}

if (word_len > 0 && word_len < split_len && line.size() + pos > split_len) {
split_len = word_len;
}

auto s = std::string_view(line).substr(pos, split_len);
if (indent && pos) {
fmt::print(out, "{:>{}}", " ", indent);
}
Expand Down

3 comments on commit 6fea589

@amishmm
Copy link
Contributor

@amishmm amishmm commented on 6fea589 Jan 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something is wrong in above wrapping logic:

See a report below

$ rspamc -R < sample-nonspam.txt
3.00/0.00/0.00/15.00,action=5:no action,spam=0,skipped=0
Content analysis details:   (3.00 points, 15.00 required)

 pts rule name              description
---- ---------------------- --------------------------------------------------
 0.0 ARC_NA                 ARC signature absent
 0.0 ASN                    [asn:701, ipnet:199.172.0.0/15, country:US]
 1.0 AUTH_NA                Authenticating message via SPF/DKIM/DMARC/ARC 
                            not 
                            available
 1.0 DATE_IN_PAST           Message date is in past [190713]
 0.0 DMARC_NA               No DMARC record [std.com]
 0.0 FROM_HAS_DN            From header has a display name
 0.0 FROM_NEQ_ENVFROM       From address is different to the envelope [
                            dawson@world.std.com,tbtf-approval@world.std.com]
 0.0 HAS_REPLYTO            Has Reply-To header [tbtf-approval@europe.std.com]
 0.5 MID_RHS_IP_LITERAL     Message-ID RHS is an IP-literal
-0.1 MIME_GOOD              Known content-type [text/plain]

Notice how "not available" has split in 2 lines.

 pts rule name              description
---- ---------------------- --------------------------------------------------
 1.0 AUTH_NA                Authenticating message via SPF/DKIM/DMARC/ARC 
                            not 
                            available

Actually "not" would have fit perfectly in the first line itself, but it still put it in a new line.

Related JSON output is this:

"AUTH_NA": {
    "name": "AUTH_NA",
    "score": 1.0,
    "metric_score": 1.0,
    "description": "Authenticating message via SPF/DKIM/DMARC/ARC not available"
},

Please check thank you.

@vstakhov
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot reproduce it so far.

@vstakhov
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can now :)

Please sign in to comment.