v0.5.4
🛠️ Maintenance Update 🛠️
This PR introduces new quality-of-life improvements that streamline the update process for existing parsers. So we got hands on and improved 20 existing publishers as well as added 2 new ones. In addition, with this release we fixed several bugs related to xpath_search, encoding detection, and sitemap parsing.
✨ Quality of Life Improvements
- Add
check_coveragescript by @MaxDall in #839 - Apply general quality improvements by @MaxDall in #859
🚀 Publishers
🆕 New
- Add German publisher T-Online by @freylily in #805
- add klassegegenklasse (DE) publisher + parser + tests + tables by @baurlaur in #809
🔧 Updates
- Adjust
paragraph_selectorforRheinische Postby @MaxDall in #838 - FIX
CBC Newsby @MaxDall in #842 - Deprecate
FreiePresseby @MaxDall in #857 - Update
Dagbladetparser to versionV1_1by @MaxDall in #856 - Update
SeznamZpravyparser by @MaxDall in #843 - Fix
Tageszeitungby @MaxDall in #855 - Update
TheMirrorparser by @MaxDall in #850 - Deprecate
authorsforThePortugalNewsby @MaxDall in #845 - Update selectors by @addie9800 in #828
- Update parser for
SalzburgerNachrichtenby @MaxDall in #854 - Deprecate
Morgunbladidby @MaxDall in #853 - Update
NTVparser by @MaxDall in #846 - Update
Euronewsparser to versionV1_1by @MaxDall in #852 - Update
DailyMaverickparser by @MaxDall in #851 - Fix
SRFsummary selector by @MaxDall in #861 - Fix summary selector for
20Minutesby @MaxDall in #860 - Fix sitemaps for BR by @MaxDall in #862
🐛 Bug fixes
- Fix a bug in the encoding detection by @MaxDall in #841
- Fix escaping in
xpath_searchby @MaxDall in #840 - Skip lazy loading images by @MaxDall in #849
- Catch unexpected HTML by @MaxDall in #863
New Contributors
Full Changelog: v0.5.3...v0.5.4