Skip to content

Commit

Permalink
Merge pull request #759 from rix1337/dev
Browse files Browse the repository at this point in the history
v.19.0.2 - Detect cloudflare blogs on empty SF/FF feed
  • Loading branch information
rix1337 committed Nov 12, 2023
2 parents a793dfd + 115fb83 commit 32c4de4
Show file tree
Hide file tree
Showing 7 changed files with 443 additions and 436 deletions.
2 changes: 2 additions & 0 deletions .github/Changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@

### Changelog FeedCrawler:

- **19.0.2** Überprüfe bei SF/FF den Feed von vor 3 Tagen, für die Erkennung von Cloudflare-Blockaden.
Das verhindert Falsch positive Blockade-Erkennung, wenn der heutige Feed (noch) leer ist.
- **19.0.1** [FeedCrawler Sponsors Helper](https://github.com/rix1337/FeedCrawler/wiki/5.-FeedCrawler-Sponsors-Helper) schließt Chrome automatisch, wenn Links an die GUI übergeben wurden (#755)
- **19.0.0** Web-basierte GUI für den [FeedCrawler Sponsors Helper](https://github.com/rix1337/FeedCrawler/wiki/5.-FeedCrawler-Sponsors-Helper) entfernt
- **19.0.0** Neue Methode, um aktiven Sponsoren-Status zwischen [FeedCrawler Sponsors Helper](https://github.com/rix1337/FeedCrawler/wiki/5.-FeedCrawler-Sponsors-Helper) und FeedCrawler zu übermitteln.
Expand Down
8 changes: 4 additions & 4 deletions feedcrawler/providers/url_functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
# Dieses Modul stellt alle Funktionen für die Prüfung und Interaktion mit URLs zur Verfügung.

import concurrent.futures
import datetime
from datetime import datetime, timedelta

from feedcrawler.providers import shared_state
from feedcrawler.providers.config import CrawlerConfig
Expand Down Expand Up @@ -36,7 +36,7 @@ def check_url(start_time):
db_status.delete(site + "_advanced")
sponsors_helper_url = get_solver_url("sponsors_helper")
flaresolverr_url = get_solver_url("flaresolverr")
skip_sites = ["HW", "WW", ] # SJ/DJ not listed, because they rarely block scraping attempts
skip_sites = ["HW", "WW", ]
skip_normal_ip = (sponsors_helper_url or flaresolverr_url) and (site in skip_sites)
if skip_normal_ip:
blocked_with_normal_ip = True
Expand Down Expand Up @@ -73,13 +73,13 @@ def check_if_blocked(site, url):
return True
# Custom checks required
elif site in ["SF"]:
delta = datetime.datetime.now().strftime("%Y-%m-%d")
delta = (datetime.now() - timedelta(days=3)).strftime("%Y-%m-%d")
sf_test = cached_request(url + '/updates/' + delta, dont_cache=True)
if not sf_test["text"] or sf_test["status_code"] is not (
200 or 304) or '<h3><a href="/' not in sf_test["text"]:
return True
elif site in ["FF"]:
delta = datetime.datetime.now().strftime("%Y-%m-%d")
delta = (datetime.now() - timedelta(days=3)).strftime("%Y-%m-%d")
ff_test = cached_request(url + '/updates/' + delta, dont_cache=True)
if not ff_test["text"] or ff_test["status_code"] is not (
200 or 304) or '<div class="list blog"' not in ff_test["text"]:
Expand Down
2 changes: 1 addition & 1 deletion feedcrawler/providers/version.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@


def get_version():
return "19.0.1"
return "19.0.2"


def create_version_file():
Expand Down

0 comments on commit 32c4de4

Please sign in to comment.