Skip to content

Commit

Permalink
Add support for French dates
Browse files Browse the repository at this point in the history
I’ve tweaked frwiki’s config tons of times to try to understand why the automatic detect of user talk page warning templates was not working properly.

In fact it seems that the format of dates present in the signatures in French is different from English, since we put first the date and then the time while enwiki does the opposite, and frwiki specificity was not yet managed.

This PR adds a bit of logic for French wikis, and adds a few bits of comments around the way.

I think I managed the memory correctly but do not hesitate to check.
  • Loading branch information
framawiki authored and benapetr committed Mar 2, 2024
1 parent a8efedb commit 939b2a2
Showing 1 changed file with 32 additions and 8 deletions.
40 changes: 32 additions & 8 deletions src/huggle_core/huggleparser.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -230,7 +230,9 @@ byte_ht HuggleParser::GetLevel(QString page, QDate bt, WikiSite *site)
HUGGLE_PROFILER_INCRCALL(BOOST_CURRENT_FUNCTION);
if (Configuration::HuggleConfiguration->SystemConfig_TrimOldWarnings)
{
// we need to get rid of old warnings now
// we want to know the highter warning level present on this talk page
// cut talk page content in sections (all paragraph that are separated by a blank line)
// to try to find warnings (by tags they have) and date (discard too old messages depending on config)
QStringList sections;
// windows fix
page.replace("\r", "");
Expand Down Expand Up @@ -268,15 +270,34 @@ byte_ht HuggleParser::GetLevel(QString page, QDate bt, WikiSite *site)
CurrentIndex++;
continue;
}

// discard content after CET or CEST, because we know the date is just before it
QString section = sections.at(CurrentIndex);
section = section.mid(0, dp).trimmed();
if (!section.contains(site->GetProjectConfig()->Parser_Date_Prefix))
{
// this is some borked date let's remove it
CurrentIndex++;
continue;

// language-specific logic may be needed to parse dates from signature
// French dates in signatures have dates in a fixed position before the CET: 30 novembre 2023 à 22:10 (CET)
// English ones are exactly between a comma and UTC: 22:58, 17 July 2023 (UTC)
QString time;
if (site->Name.startsWith("fr")) {
QStringList parts_section = section.split(' ');
// we know the last part from 5 spaces before the end to the CET (end of string) is the date/time
if (parts_section.length() < 5) {
// this is some borked date let's remove it
CurrentIndex++;
continue;
}
time = parts_section.at(parts_section.length() - 5) + " " + parts_section.at(parts_section.length() - 4) + " " + parts_section.at(parts_section.length() - 3);
} else {
if (!section.contains(site->GetProjectConfig()->Parser_Date_Prefix))
{
// this is some borked date let's remove it
CurrentIndex++;
continue;
}
time = section.mid(section.lastIndexOf(site->GetProjectConfig()->Parser_Date_Prefix) + site->GetProjectConfig()->Parser_Date_Prefix.length());
}
QString time = section.mid(section.lastIndexOf(site->GetProjectConfig()->Parser_Date_Prefix) + site->GetProjectConfig()->Parser_Date_Prefix.length());

// now we need this uberhack so that we can get a month name from localized version
// let's hope that month is a word in a middle of string
time = time.trimmed();
Expand Down Expand Up @@ -319,7 +340,7 @@ byte_ht HuggleParser::GetLevel(QString page, QDate bt, WikiSite *site)
continue;
} else
{
// now check if it's at least 1 month old
// now check if it's more recent than the delay in config (ie 1 month)
if (bt.addDays(site->ProjectConfig->TemplateAge) > date)
{
// we don't want to parse this thing
Expand All @@ -331,6 +352,9 @@ byte_ht HuggleParser::GetLevel(QString page, QDate bt, WikiSite *site)
CurrentIndex++;
}
}

// now searching in user talk page tags as defined in wiki config, like <!-- Template:Huggle/warn-spam-1 -->
// and keep the highter one found
byte_ht level = 4;
while (level > 0)
{
Expand Down

0 comments on commit 939b2a2

Please sign in to comment.