Releases: medialab/hyphe
Releases · medialab/hyphe
2025 up in the Skybox
ChangeLog:
- Fix Default WebEntityCreationRule not always applied when different of domain (upgrades to hyphe-traph v2.2) (#499)
- Add an option in the web interface to load tags from a CSV file along with importing new or existing WebEntities (#503)
- Add the possibility to set a crawl job as reviewed (#478)
- Allow to rename a corpus (#457)
- Better handle WebEntities with prefixes including special characters in the path (#447)
- Distinguish crawl pages error from simple redirection ones (#492)
- Auto resolve more urls directly within crawler (#463)
- Fix automatic feeding of recent UserAgents, whether behind a proxy or not
- Small fixes for INA & BnF Web Archives (#502 + permalinks with misformatted dates)
- Minor fixes to lookups logic, config loading, manual installation doc, corpus landing page (#487) and backend logs display
Full Changelog: v1.11.0...v1.12.0
Early 2024
Back-to-school papercuts
ChangeLog:
- Add a button to export metadata from all pages of a webentity (#318)
- Explicitly separate startpages warnings regarding redirected pages and faulty ones (#379)
- Allow to set a specific User-Agent per crawl within the web interface (#461)
- Display hints on the meaning of the different possible status of a crawl (#474)
- Highlight corresponding webentities when hovering a status or a tag in the network legend (#459)
- Switch User-Agents list used within crawls to relying on https://www.useragents.me/ (#453)
- Various improvements (cleaner backend logs, remove empty traphs directories (#475), updated heuristics for webentity links calculation rhythm, visual fixes (#476, #477)
Hot Summer '23
ChangeLog:
- migrated caching WELinks to (working) files instead of mongo to handle huge corpuses
- allow to set archives pass as ENV variable for docker instances
- display time required by links indexation on overview
Summer '23
ChangeLog:
- Added handling of more webarchives as sources (Arquivo.pt + INA DLWeb) + fixed various webarchives frontend info (#469, #471,
- Added a corpus setting "ignore internal links" to crawl but not record links within the currently crawled webentity in order to fasten drastically indexation of entities with crazy amounts of links (with a cost in terms of functionalities since the network of internal pages is then not available, and entities that are split after a crawl will require to recrawled) (cf #371, #378, #433)
- Better handle frontend warning on pending actions when trying to close a tab (#465, #466)
- Minor fixes (#448, #460, #467, #468, #470, 50d97e8, 85decf2)
Better, faster, stronger traph, there it is!
ChangeLog:
- Switched to breaking new version of hyphe-traph 2.1, which should help fasten indexation on big networks, but requires to rebuild corpuses from start
- Make iterator traph calls less recurrent to leave priority to quick user actions
- Fixed stack on calling empty callback in List Webentities
- Upgraded urllib3 to handle SSL deprecation
- Froze dependencies to maintain python2.7 compat
Summer '22
ChangeLog:
- Upgraded User Agents list
- Added extra default WebEntity CreationRules for Github, Instagram, TikTok, Reddit and a bunch of blog platforms
- Added perma.cc to list of default autofollowlinks
- Diverse fixes and extra features for webarchives (links to archive permalinks, etc.)
- Minor bugfixes
Spring '22
ChangeLog:
- Added a distinction between successful and errored crawled pages to identify Suspicious crawls (#425)
- Fixed frontend compatibility within Hyphe-Browser (medialab/hyphe-browser#212)
- Fixed WebArchives crawling interface (#431) and behavior from BNF's archives (#426)
- Improved network page's interaction using latest sigma.js v2.2 (node highlight etc & #367)
- Allowed frontend to automatically restart a closed corpus when reopening the frontend directly on a specific corpus link (#440)
- Allowed to check contiguous cases in frontend's lists of webentities using the shift key (#438)
- Allowed to tune the frontend's header color from the config (#430)
- Published Hyphe on Zenodo & Software Heritage
- Minor fixes (#397, #388, #432, #429, #437, #343, #341, #444, #325)
Robots sensitive crawls (stabilized)
ChangeLog:
- Fixed environment variable OBEY_ROBOTS for Docker instance
- Added explanation helpers in frontend
- Fixed undeletable corpora