Releases: paultraf/makestaticsite
Maintenance: Wget workaround and refactoring of variable assignment
In this maintenance release, a Wget workaround has been introduced to improve native support for the Wayback Machine, without recourse to the Wayback Machine Downloader, though it is currently limited to a single page.
The assignment of option variables has been refactored, which should help with maintenance and make it easier to add custom options in .cfg
files.
Maintenance: bug fixes and performance enhancements to asset processing
This release fixes a bug that adversely affected the performance of versions 0.29.2 and 0.29.3, specifically in the conversion of assets links from absolute to relative URLs. In addition, a new switch, url_wildcard_capture
(= yes
or no
), has been added that generally reduces the amount of postprocessing, especially for sites with numerous external assets, though there may be a trade-off in the accuracy of output.
Various other refinements include additional checks to help ensure routines run with fewer errors from edge cases.
Maintenance: bug fixes and enhancements to mirror output
This release focuses on the postprocessing of output generated by Wget
, correcting a few edge cases around the handling of query strings together with some other tidying up. It also introduces a progress bar while converting absolute links to relative links, which is the most CPU-intensive routine of this phase.
Extended conversion of assets for offline usage
This release features two main developments:
- a more thorough conversion of links from absolute to relative URLs, escaping metacharacters in extra asset URLs ahead of being processed as regular expressions by
sed
; - the introduction of multiple runs of phase 3 (
wget_extra_urls()
) to extend the retrieval of extra assets for incorporating offline.
Maintenance: bug fixes for the fine-tuning of output
Maintenance has been carried out for some of the fine-tuning in post-processing following the initial run of Wget.
- more robust handling of Wget
--exclude
and--include
(domain) options - corrected the position of
asset_exclude_dirs
infind
commands to duly exclude directories - inserted missing context for regex search to prune links with query strings appended.
- All HTML Tidy errors now append to same file (via
stderr
)
Restructuring and extended postprocessing
This release includes a little refactoring of code and has extended the postprocessing of output from the initial run of Wget (phase 2). It includes support for scheme relative URLs, CORS, and assets with long filenames.
In general, the way assets are determined for localisation is more refined: in particular, with more options to restrict the processing, there should be fewer commissions (or false positives) in transforming absolute URLs to relative URLs.
Maintenance: improved asset (post-)processing
This release improves handling of asset processing in phase 4, where downloaded assets are consolidated within a designated folder. New checks have been introduced to ensure that directories are not moved inside subdirectories of themselves. There is also a fix for broken links arising in the clause that updates hyperlinks to assets such as images downloaded from other domains.
Maintenance: cookie handling
This release focuses mainly on authenticated sessions via web form logins, specifically a more robust treatment of how Wget cookies are affected by the value of the option wget_user_agent
defined in constants.sh
.
Maintenance: custom user agent strings
This maintenance release fixes the handling of custom user agent strings for use by curl
and wget
.
Improved support for URLs with redirects
The setup script will now follow redirects in supplied URLs (as many as specified by a new constant, max_redirects
). This should help in maintaining site mirrors and ensure they are captured properly.