v0.3.0
๐ New Release: Version 0.3.0
We're excited to announce a significant overhaul of the Fundus crawling core logic in this release! We've transitioned from using asyncio to a ThreadPool-based solution, resulting in a more robust and performant system. Now, each publisher operates on its own thread, synchronized seamlessly through a queue.
Breaking changes
To provide a more streamlined experience, we've relocated every crawler-type specific parameter to its respective constructor. As a result, these parameters are no longer accessible through the crawl method:
delay->Crawlerstart, end-> CCNewsCrawler
Furthermore, since we removed asyncio, the crawl_async method is no longer available.
What's new
- Unbatch Fundus by @MaxDall in #357
- Add
free_accessas attribute toArticleby @MaxDall in #421 - Add query parameter [Based on #357] by @addie9800 in #403
- Rework
ExtractionFilterto adept to boolean values by @MaxDall in #423
New publisher
- Add Lithuanian News Source by @addie9800 in #393
- Add US version of business insider by @MaxDall in #356
- Adding a swiss publisher (SRF) by @addie9800 in #410
- Add
Rheinische Postas publisher by @MaxDall in #416
Updating existing publisher
- This is a renewed PR for BI Germany, that keeps the mostly Test files unmodified by @addie9800 in #402
- Bump
WAZto versionV1_1by @MaxDall in #388 - Update
FAZparser by @MaxDall in #419 - bi authentication bug workaround by @addie9800 in #406
Bug fixes
- Fix domains for several publishers by @MaxDall in #398
- Restrict
typing-extensionsversion to >= 4.6 by @MaxDall in #405 - Bump
mypyto version 1.9.0 by @MaxDall in #412 - Fixed a bug in
documentation.yamlby @MaxDall in #415 - Fix a bug in generate_parser_test_files.py by @MaxDall in #418
- Fix a bug in bf_search regarding boolean values by @MaxDall in #422
QoL
- Adds Pretty Print for PublisherCollection and PublisherSpec by @addie9800 in #399
- Add custom filter for
publisher_coverageto skip boolean values by @MaxDall in #408 - Documentation Update: Explain Addition of New Countries by @addie9800 in #413
- Attributes Parameter in Test Generation Script by @addie9800 in #411
- Add
bodyto unit tests by @MaxDall in #338 - Adds a part about
generate_tablesscript to the documentation by @MaxDall in #424
Maintenance
- Update relevant actions to versions utilizing node 20 by @MaxDall in #417
- Disable
strict_queryparsing for URL validation. by @MaxDall in #407
Full Changelog: v0.2.2...v0.3.0