Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pie chart percentages add up to more than 100 % #1056

Closed
KommX opened this issue Oct 23, 2020 · 11 comments · Fixed by #1057
Closed

Pie chart percentages add up to more than 100 % #1056

KommX opened this issue Oct 23, 2020 · 11 comments · Fixed by #1057

Comments

@KommX
Copy link

KommX commented Oct 23, 2020

Versions

  • Pi-hole version is v5.1.2 (Latest: v5.1.2)
  • AdminLTE version is v5.1.1 (Latest: v5.1.1)
  • FTL version is v5.2 (Latest: v5.2)

Platform

  • OS and version: Raspbian GNU/Linux 10 (buster)
  • Platform: Raspberry Pi

Expected behavior

The percentages in the mouseover tool tips of the pie charts should add up to (roughly) 100 %.

Actual behavior / bug

The percentages in the mouseover tool tips of the Queries answered by pie chart adds up to more than 100 %.

Steps to reproduce

Steps to reproduce the behavior:

  1. Go to the main dashboard
  2. Hover over the three sections blocklist, cache, localhost of the Queries answered by pie chart.
  3. Calculate the sum of the three percentages.
  4. The sum is bigger than 100 %, see screenshots, in my case 109

Debug Token

Screenshots

blocklist
cache
localhost

@alexis-coulombe
Copy link

Indeed, mine show 107.5%

@pralor-bot
Copy link

This issue has been mentioned on Pi-hole Userspace. There might be relevant details there:

https://discourse.pi-hole.net/t/wrong-percent-calculation-in-pie-chart/44277/7

@Daxtorim
Copy link
Contributor

Daxtorim commented Feb 12, 2021

This is really an issue with FTL rather than AdminLTE, but I will add my findings here anyway.


The problem is that queries with certain status types are counted twice when calculating the stats. At least when reading from the database, e.g. after pihole restartdns. Here is where the stats are calculated:

FTL/src/api/api.c

Lines 521 to 576 in da89cc6

totalqueries = counters->forwarded + counters->cached + counters->blocked;
// Loop over available forward destinations
for(int i = -2; i < min(counters->upstreams, 8); i++)
{
float percentage = 0.0f;
const char* ip, *name;
in_port_t upstream_port = 0;
if(i == -2)
{
// Blocked queries (local lists)
ip = "blocklist";
name = ip;
if(totalqueries > 0)
// Whats the percentage of locked queries on the total amount of queries?
percentage = 1e2f * counters->blocked / totalqueries;
}
else if(i == -1)
{
// Local cache
ip = "cache";
name = ip;
if(totalqueries > 0)
// Whats the percentage of cached queries on the total amount of queries?
percentage = 1e2f * counters->cached / totalqueries;
}
else
{
// Regular upstream destination
// Get sorted indices
int upstreamID;
if(sort)
upstreamID = temparray[i][0];
else
upstreamID = i;
// Get upstream pointer
const upstreamsData* upstream = getUpstream(upstreamID, true);
if(upstream == NULL)
continue;
// Get IP and host name of upstream destination if available
ip = getstr(upstream->ippos);
if(upstream->namepos != 0)
name = getstr(upstream->namepos);
else
name = getstr(upstream->ippos);
upstream_port = upstream->port;
// Get percentage
if(totalqueries > 0)
percentage = 1e2f * upstream->count / totalqueries;
}

The important variables to keep in mind are totalqueries, counters->blocked and upstream->count.

When queries are read form the database, the associated upstream server needs to be converted into an ID. This is done here:

const char *upstream = NULL;
int upstreamID = -1; // Default if not forwarded
// Determine upstreamID only when status == 2 (forwarded) as the
// field need not to be filled for other query status types
if(sqlite3_column_bytes(stmt, 6) > 0 &&
(upstream = (const char *)sqlite3_column_text(stmt, 6)) != NULL)
{
// Get IP address and port of upstream destination
char serv_addr[INET6_ADDRSTRLEN] = { 0 };
unsigned int serv_port = 53;
// We limit the number of bytes written into the serv_addr buffer
// to prevent buffer overflows. If there is no port available in
// the database, we skip extracting them and use the default port
sscanf(upstream, "%"xstr(INET6_ADDRSTRLEN)"[^#]#%u", serv_addr, &serv_port);
serv_addr[INET6_ADDRSTRLEN-1] = '\0';
upstreamID = findUpstreamID(serv_addr, (in_port_t)serv_port, true);
}

While doing so, the call of findUpstreamID() will increment upstream->count for every query with an associated upstream server, i.e. all queries with these status types (the comment in the code is not correct anymore):

QUERY_FORWARDED
QUERY_GRAVITY_CNAME
QUERY_REGEX_CNAME
QUERY_BLACKLIST_CNAME
QUERY_EXTERNAL_BLOCKED_IP
QUERY_EXTERNAL_BLOCKED_NULL
QUERY_EXTERNAL_BLOCKED_NXRA

However, further down the line, every one of the blocked queries is also incrementing the counters->blocked variable:

// Increment status counters
switch(status)
{
case QUERY_UNKNOWN: // Unknown
counters->unknown++;
break;
case QUERY_GRAVITY: // Blocked by gravity
case QUERY_REGEX: // Blocked by regex blacklist
case QUERY_BLACKLIST: // Blocked by exact blacklist
case QUERY_EXTERNAL_BLOCKED_IP: // Blocked by external provider
case QUERY_EXTERNAL_BLOCKED_NULL: // Blocked by external provider
case QUERY_EXTERNAL_BLOCKED_NXRA: // Blocked by external provider
case QUERY_GRAVITY_CNAME: // Blocked by gravity (inside CNAME path)
case QUERY_REGEX_CNAME: // Blocked by regex blacklist (inside CNAME path)
case QUERY_BLACKLIST_CNAME: // Blocked by exact blacklist (inside CNAME path)
counters->blocked++;
query->flags.blocked = true;
// Get domain pointer
domainsData* domain = getDomain(domainID, true);
domain->blockedcount++;
change_clientcount(client, 0, 1, -1, 0);
// Update overTime data structure
overTime[timeidx].blocked++;
break;
case QUERY_FORWARDED: // Forwarded
counters->forwarded++;
// Update overTime data structure
overTime[timeidx].forwarded++;
break;
case QUERY_CACHE: // Cached or local config
counters->cached++;
// Update overTime data structure
overTime[timeidx].cached++;
break;
case QUERY_RETRIED: // Retried query
case QUERY_RETRIED_DNSSEC: // fall through
// Nothing to be done here
break;
default:
logg("Warning: Found unknown status %i in long term database!", status);
break;
}

This leads to the problem:
The blocked queries are counted twice in upstream->count and counters->blocked, but only go once into totalqueries. Only queries with actual status QUERY_FORWARDED are incrementing counters->forwarded. Due to this, the calculated percentages of the upstream servers are too high and everything adds up to more than 100 %.

Solution 1: Make the problematic query types increment counters->forwarded as well to balance everything out:

                [...]
		// Increment status counters
		switch(status)
		{
			case QUERY_UNKNOWN: // Unknown
				counters->unknown++;
				break;

                        // These types were originally forwarded and then blocked later.
                        // They are already counted once as part of the respective upstream server
                        // and thus need to be counted as forwarded too to make the statistics work.
                        // https://github.com/pi-hole/AdminLTE/issues/1612
			case QUERY_EXTERNAL_BLOCKED_IP: // Blocked by external provider
			case QUERY_EXTERNAL_BLOCKED_NULL: // Blocked by external provider
			case QUERY_EXTERNAL_BLOCKED_NXRA: // Blocked by external provider
			case QUERY_GRAVITY_CNAME: // Blocked by gravity (inside CNAME path)
			case QUERY_REGEX_CNAME: // Blocked by regex blacklist (inside CNAME path)
			case QUERY_BLACKLIST_CNAME: // Blocked by exact blacklist (inside CNAME path)

                                 counters->forwarded++;

			case QUERY_GRAVITY: // Blocked by gravity
			case QUERY_REGEX: // Blocked by regex blacklist
			case QUERY_BLACKLIST: // Blocked by exact blacklist
				counters->blocked++;
				query->flags.blocked = true;
				// Get domain pointer
				domainsData* domain = getDomain(domainID, true);
				domain->blockedcount++;
				change_clientcount(client, 0, 1, -1, 0);
				// Update overTime data structure
				overTime[timeidx].blocked++;
				break;


			case QUERY_FORWARDED: // Forwarded
				counters->forwarded++;
				// Update overTime data structure
				overTime[timeidx].forwarded++;
				break;
                [...]

Solution 2: Exempt these blocked queries from being counted to the respective upstream server in the first place:

            [...]
		const char *upstream = NULL;
		int upstreamID = -1; // Default if not forwarded
		// Determine upstreamID only when status == 2 (forwarded) as the
		// field need not to be filled for other query status types
		if(sqlite3_column_bytes(stmt, 6) > 0 &&
		   (upstream = (const char *)sqlite3_column_text(stmt, 6)) != NULL)
		{
			// Get IP address and port of upstream destination
			char serv_addr[INET6_ADDRSTRLEN] = { 0 };
			unsigned int serv_port = 53;
			// We limit the number of bytes written into the serv_addr buffer
			// to prevent buffer overflows. If there is no port available in
			// the database, we skip extracting them and use the default port
			sscanf(upstream, "%"xstr(INET6_ADDRSTRLEN)"[^#]#%u", serv_addr, &serv_port);
			serv_addr[INET6_ADDRSTRLEN-1] = '\0';

                        // Queries with these status types were really blocked and should not
                        // be counted as being forwarded to the respective upstream server
                        // https://github.com/pi-hole/AdminLTE/issues/1612
			if(status == QUERY_EXTERNAL_BLOCKED_IP ||
			   status == QUERY_EXTERNAL_BLOCKED_NULL ||
			   status == QUERY_EXTERNAL_BLOCKED_NXRA ||
			   status == QUERY_GRAVITY_CNAME ||
			   status == QUERY_REGEX_CNAME ||
			   status == QUERY_BLACKLIST_CNAME)
			{
			        upstreamID = findUpstreamID(serv_addr, (in_port_t)serv_port, false);
			}
			else
			{
				upstreamID = findUpstreamID(serv_addr, (in_port_t)serv_port, true);
			}
		}
            [...]

I don't know if this is also true if memory is generated live or if that is really only the case when read from the database. I didn't yet get a good understanding of the Dnsmasq interface.

@Daxtorim
Copy link
Contributor

Okay, I dug a little deeper and I think I understand the Dnsmasq interface a little better. This code is also affected.

There are two places were the counters a corrected after the fact:

  1. After a query was determined to be blocked after CNAME inspection:

    FTL/src/dnsmasq_interface.c

    Lines 1415 to 1439 in da89cc6

    static void query_blocked(queriesData* query, domainsData* domain, clientsData* client, const unsigned char new_status)
    {
    // Get response time
    struct timeval response;
    gettimeofday(&response, 0);
    save_reply_type(blocking_flags, NULL, query, response);
    // Adjust counters if we recorded a non-blocking status
    if(query->status == QUERY_UNKNOWN)
    {
    counters->unknown--;
    }
    else if(query->status == QUERY_FORWARDED)
    {
    counters->forwarded--;
    }
    else if(query->status == QUERY_CACHE)
    {
    counters->cached--;
    }
    else
    {
    // Already a blocked query, no need to change anything
    return;
    }
  2. After a query was determined to be externally blocked:

    FTL/src/dnsmasq_interface.c

    Lines 1251 to 1287 in da89cc6

    static void query_externally_blocked(const int queryID, const enum query_status status)
    {
    // Get query pointer
    queriesData* query = getQuery(queryID, true);
    if(query == NULL)
    {
    // Memory error, skip check for this query
    return;
    }
    // Get time index
    const unsigned int timeidx = query->timeidx;
    // If query is already known to be externally blocked,
    // then we have nothing to do here
    if(query->status == QUERY_EXTERNAL_BLOCKED_IP ||
    query->status == QUERY_EXTERNAL_BLOCKED_NULL ||
    query->status == QUERY_EXTERNAL_BLOCKED_NXRA)
    return;
    // Correct counters if necessary ...
    if(query->status == QUERY_FORWARDED)
    {
    counters->forwarded--;
    overTime[timeidx].forwarded--;
    // Get forward pointer
    upstreamsData* upstream = getUpstream(query->upstreamID, true);
    if(upstream != NULL)
    upstream->count--;
    }
    // Mark query as blocked
    domainsData* domain = getDomain(query->domainID, true);
    clientsData* client = getClient(query->clientID, true);
    query_blocked(query, domain, client, status);
    }

Notice how 2. is just calling 1. at the end. Because of that, counters->forwarded is actually decremented twice when a query was blocked externally. On the other hand, upstream->count is not corrected at all when a query was blocked after CNAME inspection. The correction of the counters can be moved from 2. to 1. and it should give a more consistent result.

static void query_blocked(queriesData* query, domainsData* domain, clientsData* client, const unsigned char new_status)
{
	// Get response time
	struct timeval response;
	gettimeofday(&response, 0);
	save_reply_type(blocking_flags, NULL, query, response);

	// Adjust counters if we recorded a non-blocking status
	if(query->status == QUERY_UNKNOWN)
	{
		counters->unknown--;
	}
	else if(query->status == QUERY_FORWARDED)
	{
               // Correct counters if necessary ...
		counters->forwarded--;
		overTime[query->timeidx].forwarded--;

		// Get forward pointer
		upstreamsData* upstream = getUpstream(query->upstreamID, true);
		if(upstream != NULL)
			upstream->count--;
	}
	else if(query->status == QUERY_CACHE)
	{
		counters->cached--;
	}
	else
	{
		// Already a blocked query, no need to change anything
		return;
	}

	// Count as blocked query
	counters->blocked++;
	overTime[query->timeidx].blocked++;
	if(domain != NULL)
		domain->blockedcount++;
	if(client != NULL)
		change_clientcount(client, 0, 1, -1, 0);

	// Update status
	query->status = new_status;
	query->flags.blocked = true;
}

*I haven't actually tested any of the modified code yet.


This begs the question:
Should the counters be corrected at all?
Are a blocked query and a forwarded query mutually exclusive?

These queries were forwarded and did create a load on the DNS server before they were shot down. In case they aren't exclusive and we choose Solution 1, the correction of the counters should be removed completely. This would mean that externally blocked queries and those blocked after CNAME inspection are contributing to both counters->blocked and counters->forwarded. All parts of the pie chart will be proportionally affected, they should then still be adding up to 100%. That is what this is trying to fix after all.

Solution 2 would mean we ignore the fact that these queries were forwarded at all and just pretend they were blocked immediately. In that case the counters need to be corrected of course.

(There is also FTL/src/gc.c that is also modifying these counters. Need to take a look at that as well.)

I'll see if I can find some time to do a little testing over the weekend.

Opinions on which solution to pick? Anything obvious I missed?

@pralor-bot
Copy link

This issue has been mentioned on Pi-hole Userspace. There might be relevant details there:

https://discourse.pi-hole.net/t/wrong-percent-calculation-in-pie-chart/44277/8

@yubiuser
Copy link
Member

Thanks for that digging!

Are a blocked query and a forwarded query mutually exclusive?

These queries were forwarded and did create a load on the DNS server before they were shot down. In case they aren't exclusive and we choose Solution 1, the correction of the counters should be removed completely. This would mean that externally blocked queries and those blocked after CNAME inspection are contributing to both counters->blocked and counters->forwarded

In my opinion, they are mutually exclusive. From a users point of view, I would not care about where the query was blocked but only if it was blocked. For the counters I would use the status definition from here
https://docs.pi-hole.net/database/ftl/#supported-status-types

My understanding is that everything that is blocked there should only contribute to blocked queries and not forwarded. This is in line with the current query log, where externally and CNAME blocked queries are shown in red (aka. blocked).

@DL6ER
Copy link
Member

DL6ER commented Feb 13, 2021

Thanks for the research. I agree that they are considered mutually exclusive and maybe they don't have to. In the end, the question is what the diagrams should show. With the current concept, you know that everything in the forward destination is not blocked (= permitted). The same goes for the cache. However, if we'd start mixing things (like externally blocked queries contribute to both), this sharp distinction wouldn't be there anymore. Even when this would reflect better what happened on the wire, this may be less what a user expects.

The GC function keeps all the internal counters consistent to ensure FTL can run months (or years) without the need for a restart. Any issues with the counters here would likely show up even earlier.

The modification of the counters being inlined in the find...ID subroutines was fine when FTL was a lot simpler years ago, however, meanwhile, this should be improved to avoid such mistakes. We already removed these things from the v6.0 code, however, it won't hurt to fix things in v5.x code and push a bugfix release until v6.0 becomes ready (this will take some time).

I will move this ticket into the FTL namespace as this is where it belongs.

@DL6ER DL6ER transferred this issue from pi-hole/web Feb 13, 2021
@DL6ER
Copy link
Member

DL6ER commented Feb 13, 2021

Proposed fix is #1057

@Daxtorim
Copy link
Contributor

@DL6ER

The GC function keeps all the internal counters consistent

Do queries with status types QUERY_RETRIED and QUERY_RETRIED_DNSSEC increment counters->forwarded at some point that I missed? Because they certainly do not increment the counter when being read from the database:

case QUERY_RETRIED: // Retried query
case QUERY_RETRIED_DNSSEC: // fall through
case QUERY_IN_PROGRESS:
// Nothing to be done here
break;

But when those queries are removed, counters->forwarded is decremented by GC:

FTL/src/gc.c

Lines 89 to 109 in da89cc6

// Change other counters according to status of this query
switch(query->status)
{
case QUERY_UNKNOWN:
// Unknown (?)
counters->unknown--;
break;
case QUERY_FORWARDED: // (fall through)
case QUERY_RETRIED: // (fall through)
case QUERY_RETRIED_DNSSEC:
// Forwarded to an upstream DNS server
// Adjust counters
counters->forwarded--;
if(query->upstreamID > -1)
{
upstreamsData* upstream = getUpstream(query->upstreamID, true);
if(upstream != NULL)
upstream->count--;
}
overTime[timeidx].forwarded--;
break;

@DL6ER
Copy link
Member

DL6ER commented Feb 13, 2021

Do queries with status types QUERY_RETRIED and QUERY_RETRIED_DNSSEC increment counters->forwarded at some point that I missed?

Yes, the flow is the following:

  1. A query is sent upstream: _FTL_forwarded is called, it does counters->forwarded++
  2. The query is retried by the requesting client: FTL_forwarding_retried is called (it does not modify the counters)
    This function marks the previous query as having been retried and doesn't do anything else (exception for DNSSEC retried, but they don't count anything, either)
  3. The new query is sent upstream: _FTL_forwarded is called, it does counters->forwarded++

The database code was incorrect. I pushed a fix to the same branch.

@DL6ER DL6ER linked a pull request Feb 13, 2021 that will close this issue
5 tasks
@DL6ER
Copy link
Member

DL6ER commented Feb 16, 2021

Pi-hole FTL v5.7 has been released

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants