Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce number of false positives when gravity unable to import domains - change the wording of the output #5128

Merged
merged 2 commits into from
Jan 22, 2023

Conversation

PromoFaux
Copy link
Member

@PromoFaux PromoFaux commented Jan 17, 2023

What does this PR aim to accomplish?:

Currently when we analyse an adlist, we print out a warning if any invalid domains are detected. However, it is not always strictly true that there is something wrong with the entry - it is more that they are incompatible with Pi-hole

https://discourse.pi-hole.net/t/fresh-install-steven-black-list-invalid-domains/60523

This PR introduces a small list of common localhost domains (as included at the top of StevenBlack's list to discount these as being counted as "invalid", and also changes the wording of invalid domains to unusable entries.

Example output before these changes:

[i] Target: https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts
  [✓] Status: Retrieval successful
  [i] Analyzed 168834 domains, 15 domains invalid!
      Sample of invalid domains:
      - 0.0.0.0
      - broadcasthost
      - fe
      - ff
  [i] List stayed unchanged

[i] Target: https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts
  [✓] Status: Retrieval successful
  [i] Analyzed 168834 domains, 15 domains invalid!
      Sample of invalid domains:
      - 0.0.0.0
      - broadcasthost
      - fe
      - ff
  [i] List stayed unchanged

Example output after these changes:

  [i] Target: https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts
  [✓] Status: Retrieval successful
  [i] Imported 168819 domains, ignoring 2 non-domains
      Sample of non-domain entries:
      - fe
      - ff

  [i] Target: https://raw.githubusercontent.com/jankytay/pihole-troubleshooting/master/issue5008.txt
  [✓] Status: Retrieval successful
  [i] Imported 3 domains, ignoring 5 non-domains
      Sample of non-domain entries:
      - (\.|^)example6\.com$
      - (^|\.)example5\.com
      - (^|\.)example8\.com;reply=none
      - *.example3.com
      - ^example77?

The copy could probably be further clarified to mention that this is not an error - just for information, and that nothing is wrong, but it is late and I'm struggling to find the words


By submitting this pull request, I confirm the following:

  1. I have read and understood the contributors guide, as well as this entire template. I understand which branch to base my commits and Pull Requests against.
  2. I have commented my proposed changes within the code and I have tested my changes.
  3. I am willing to help maintain this change if there are issues with it later.
  4. It is compatible with the EUPL 1.2 license
  5. I have squashed any insignificant commits. (git rebase)

  • I have read the above and my PR is ready for review. Check this box to confirm

@PromoFaux PromoFaux requested a review from a team January 17, 2023 23:42
@PromoFaux PromoFaux force-pushed the reduce-false-positives-in-gravity-output branch from 15e4de8 to 9a2e5cd Compare January 17, 2023 23:43
@pralor-bot
Copy link

This pull request has been mentioned on Pi-hole Userspace. There might be relevant details there:

https://discourse.pi-hole.net/t/fresh-install-steven-black-list-invalid-domains/60523/27

@PromoFaux PromoFaux force-pushed the reduce-false-positives-in-gravity-output branch 2 times, most recently from 1b0d449 to fba322d Compare January 18, 2023 16:42
@PromoFaux
Copy link
Member Author

@jfb-pihole can I have your thoughts on this version, too, please?

@jfb-pihole
Copy link
Sponsor Member

"Sample of unusable entries:"

I would change this to "sample of entries not imported into Pi-hole"

@rdwebdesign
Copy link
Member

The expression "unusable entries" tries to make it clear Pi-hole can't use those entries...

I know the difference is very small (if any), but I think the expression "not imported into Pi-hole" will raise a lot of questions like "Why Pi-hole didn't import everything? What is wrong?".

We can explain better using something like "X lines without domains not imported by Pi-hole", but this looks too long.

@jfb-pihole
Copy link
Sponsor Member

I'm OK with that change. We just need to indicate that the domains weren't imported.

@PromoFaux
Copy link
Member Author

  [i] Target: https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts
  [✓] Status: Retrieval successful
  [i] Imported 168819 domains, ignoring 2 non-domains
      Sample of non-domain entries:
      - fe
      - ff

  [i] Target: https://raw.githubusercontent.com/jankytay/pihole-troubleshooting/master/issue5008.txt
  [✓] Status: Retrieval successful
  [i] Imported 3 domains, ignoring 5 non-domains
      Sample of non-domain entries:
      - (\.|^)example6\.com$
      - (^|\.)example5\.com
      - (^|\.)example8\.com;reply=none
      - *.example3.com
      - ^example77?

gravity.sh Outdated Show resolved Hide resolved
gravity.sh Outdated Show resolved Hide resolved
@pralor-bot
Copy link

This pull request has been mentioned on Pi-hole Userspace. There might be relevant details there:

https://discourse.pi-hole.net/t/stevenblack-adlists-show-15-invalid-domains/60641/2

…'s host file), and discount them when calculating the number of "invalid" domains in a given list

Soften the output message when reporting on unsuable domains

Signed-off-by: Adam Warner <me@adamwarner.co.uk>
@PromoFaux PromoFaux force-pushed the reduce-false-positives-in-gravity-output branch from bf09c9d to 79f4a7c Compare January 20, 2023 18:52
@PromoFaux PromoFaux requested a review from a team January 20, 2023 18:55
@PromoFaux PromoFaux added the PR: Approval Required Open Pull Request, needs approval label Jan 20, 2023
rdwebdesign
rdwebdesign previously approved these changes Jan 20, 2023
yubiuser
yubiuser previously approved these changes Jan 21, 2023
@PromoFaux PromoFaux changed the title Reduce number of false positives when gravity unable to import domains Reduce number of false positives when gravity unable to import domains - change the wording of the output Jan 21, 2023
@PromoFaux PromoFaux dismissed stale reviews from yubiuser and rdwebdesign via 8bd1a1e January 21, 2023 15:01
@PromoFaux PromoFaux requested a review from a team January 21, 2023 18:33
gravity.sh Outdated Show resolved Hide resolved
@PromoFaux PromoFaux force-pushed the reduce-false-positives-in-gravity-output branch 2 times, most recently from d2d92eb to d45b483 Compare January 21, 2023 23:56
…learer

Signed-off-by: Adam Warner <me@adamwarner.co.uk>
@PromoFaux PromoFaux force-pushed the reduce-false-positives-in-gravity-output branch from d45b483 to 9939cf1 Compare January 21, 2023 23:57
@PromoFaux PromoFaux merged commit 81a31b9 into development Jan 22, 2023
@PromoFaux PromoFaux deleted the reduce-false-positives-in-gravity-output branch January 22, 2023 11:06
@pralor-bot
Copy link

This pull request has been mentioned on Pi-hole Userspace. There might be relevant details there:

https://discourse.pi-hole.net/t/pi-hole-web-v5-18-2-and-core-v5-15-1-released/60695/1

@thomasmerz
Copy link
Sponsor

Great PR! I love it. Because now it's not "something wrong" anymore, but only "an info in a footnote" 😆

grafik

@dschaper
Copy link
Member

These changes will lead to a high system load when fed a list that contains nothing of value or use. Any list that is in ABP format will try to filter out the common localhosts from a long string of invalid information (read ABP regex) and Pi-hole will try its hardest to salvage the list.

Looks like we need to have an intermediate step that looks for any hosts entry in the supplied list and if there are none then stop. I thought we had ABP filtering before but that may have been to try to convert ABP in to hosts format. If I can find that detection regex then we can reuse it to just remove the invalid entries and then count the length of the array of hosts entries. If there are none then we don't need to proceed. Sample list that doesn't work for Pi-hole anymore is https://big.oisd.nl. The older https://dbl.oisd.nl has been removed and we safely note that the list is gone. But people will try to replace dbl with big and we shouldn't crash just because they are feeding garbage to Pi-hole now.

@yubiuser
Copy link
Member

yubiuser commented Feb 14, 2023

Looks like we need to have an intermediate step that looks for any hosts entry in the supplied list and if there are none then stop.

If you want to go for an all-or-nothing approach, just check the number of lines of ${target} and if it's 0 skip the removal of false positive.

@dschaper
Copy link
Member

Okay, ideally we should only remove false positives from potential positives but I haven't looked too deeply at the code to see how it all is being processed now. (sed and regex is such a nightmare in bash...)

@yubiuser
Copy link
Member

The false_positive is misleading here. They are false_positive of being "unusable". We first get all non-domains and remove the false_positive from this list leaving the real "unusable". Therefore we first need all the garbage to sort out a few items.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR: Approval Required Open Pull Request, needs approval
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants