Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make adlist ID available, rename queries.regex_id -> queries.list_id #1841

Merged
merged 6 commits into from
Feb 9, 2024

Conversation

DL6ER
Copy link
Member

@DL6ER DL6ER commented Dec 25, 2023

What does this implement/fix?

Important

This PR needs both core and web branches with the same name to work.

Furthermore, you will have to run pihole -g at least once to upgrade your gravity database such that the list IDs become available for FTL. This should be done automatically during checkout/update.

Rename query_storage.regex_id to query_storage.list_id as it is already now used to also store exact matching domainlist entries by their ID. This commit further extends this to also store the (first) matching anti-/gravity list (if available).

Screenshot from 2023-12-25 05-32-55
Screenshot from 2023-12-25 05-33-10


Related issue or feature (if applicable): N/A

Pull request in docs with documentation (if applicable): N/A


By submitting this pull request, I confirm the following:

  1. I have read and understood the contributors guide, as well as this entire template. I understand which branch to base my commits and Pull Requests against.
  2. I have commented my proposed changes within the code.
  3. I am willing to help maintain this change if there are issues with it later.
  4. It is compatible with the EUPL 1.2 license
  5. I have squashed any insignificant commits. (git rebase)

Checklist:

  • The code change is tested and works locally.
  • I based my code and PRs against the repositories developmental branch.
  • I signed off all commits. Pi-hole enforces the DCO for all contributions
  • I signed all my commits. Pi-hole requires signatures to verify authorship
  • I have read the above and my PR is ready for review.

…dy now used to also store exact matching domainlist entries by their ID. This commit further extends this to also store the (first) matching anti-/gravity list (if available)

Signed-off-by: DL6ER <dl6er@dl6er.de>
…istinguish from domains rather easily. Users are free to foil this method when they force negative IDs into the database but they will never be automatically created

Signed-off-by: DL6ER <dl6er@dl6er.de>
…ading to a crash in heavy TCP worker activity

Signed-off-by: DL6ER <dl6er@dl6er.de>
Signed-off-by: DL6ER <dl6er@dl6er.de>
Copy link

github-actions bot commented Jan 1, 2024

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: DL6ER <dl6er@dl6er.de>
Copy link

github-actions bot commented Jan 1, 2024

Conflicts have been resolved.

@PromoFaux
Copy link
Member

What if a blocked domain is on multiple adlists? Are we just taking the id/name of the first adlist that we find a match for?

@DL6ER
Copy link
Member Author

DL6ER commented Jan 6, 2024

What if a blocked domain is on multiple adlists? Are we just taking the id/name of the first adlist that we find a match for?

Yes, and there is not even a guarantee which one this is as it highly depends on the tree-structure of the database index. Regardless, the ID returned here will be the one that triggered the block so that's always correct. But yeah, as you implied, there is no guarantee that disabling/removing this one adlist will cause the domain to be permitted afterwards.

@PromoFaux
Copy link
Member

Is it worth potentially making list_id return an array?

@DL6ER
Copy link
Member Author

DL6ER commented Jan 6, 2024

We don't have more than one adlist_id (the first one) because we stop looking for additional matches once we find the first hit. This is a necessary simplification as it avoids a needless full scan of the index (which eats additional time). If you would like to be fully-correct, we'd even have to continue scanning if there is possibly some additional stuff like regex or deny list entries blocking this ... you see where this is going ...

@yubiuser
Copy link
Member

Actually, I don't like adding the adlist id to the query log. The simple reason is that the information is not very useful when users use more than 1 list but it adds quite some overhead to all 3 Pi-hole components. The proper tool to identify which adlist(s) contain the queried domain is query.sh/Search Lists. I could instead imagine a link from each blocked query's details to the search page with pre-filled (and executed?) search input field.


Renaming queries.regex_id -> queries.list_id is fine to me.

@rdwebdesign
Copy link
Member

rdwebdesign commented Jan 12, 2024

I agree with yubiuser.

Since the same domain can be on multiple lists (and many users have more than one list), knowing the "the first one" adlist_id is not very useful.

I could instead imagine a link from each blocked query's details to the search page with pre-filled (and executed?) search input field.

We could add a link like /admin/search?domain=example.com on each query and add the javascript code to search the domain.

@DL6ER
Copy link
Member Author

DL6ER commented Jan 13, 2024

Note that this mindset may also question regex_id (or list_id after the renaming) as we have the same underlying issue here: There may be several regex matching a given domain but we will also stop when we see the first match for efficiency reasons. Does this mean this feature is to be removed altogether and - instead - always use a link to /search ?

@yubiuser
Copy link
Member

yubiuser commented Jan 14, 2024

Does this mean this feature is to be removed altogether and - instead - always use a link to /search ?

I like this idea. It seems like a regression at first, but is the right thing to do after consideration. (It's a bit like switching from line to bar graphs). If there are multiple adlists and/or regex that influence an allow/block decision it seems wrong to show a single "random" responsible adlist/regex.

Copy link
Member

@rdwebdesign rdwebdesign left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested with web and core branches

@PromoFaux
Copy link
Member

Agree with Yubi here, I'm unsure of the benefit of showing a link to the first adlist we find the domain in. Lets say this causes the user to rethink that adlist and remove it. If it is in more than one list, then they wait and see it blocked again, and again click the link and maybe rethink that next list..... it's all a rather drawn out process.

"Domain was blocked: click here to find out what lists it was on" with a link to /search seems to be the best way to go, both in computational efficiency and in meaningfulness (is that a word? It is now) to an end user

@rdwebdesign
Copy link
Member

@PromoFaux

The image on the PR is outdated.

The current code is showing a link to the Search page, as it was suggested.

image

@PromoFaux
Copy link
Member

Lol, I didn't yet check or test the code - and as I didn't see any commits after the conversation.... I guess the old code was FP'd away :)

@rdwebdesign
Copy link
Member

The related code was changed in web PR, but the conversation was kept here, in one place.

@DL6ER DL6ER merged commit 276ba42 into development-v6 Feb 9, 2024
17 checks passed
@DL6ER DL6ER deleted the tweak/list_id branch February 9, 2024 19:52
DL6ER added a commit that referenced this pull request Feb 9, 2024
Signed-off-by: DL6ER <dl6er@dl6er.de>
DL6ER added a commit that referenced this pull request Feb 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants