Skip to content

Commit

Permalink
Version 3.23.0rc0
Browse files Browse the repository at this point in the history
  • Loading branch information
mborsetti committed May 14, 2024
1 parent 77300a7 commit 203fe04
Show file tree
Hide file tree
Showing 13 changed files with 388 additions and 145 deletions.
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@

# NOTE: in this package tox runs 'pre-commit run -a'

minimum_pre_commit_version: 3.4.0 # based on what's available at https://pre-commit.ci/
minimum_pre_commit_version: 3.7.1 # based on what's available at https://pre-commit.ci/

# Force all unspecified python hooks to run python3
default_language_version:
Expand Down
31 changes: 29 additions & 2 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,33 @@ can check out the `wish list <https://github.com/mborsetti/webchanges/blob/main/
Internals, for changes that don't affect users. [triggers a minor patch]
Version 3.23.0rc0
===================
Unreleased


⚠ Breaking Changes
------------------
* The ``ai-google`` (BETA) differ now defaults to using the new ``gemini-1.5-flash`` model (see documentation `here
<https://ai.google.dev/gemini-api/docs/models/gemini#gemini-1.5-flash-expandable>`__), as it still supports
1M tokens, "excels at summarization" (per `here <https://blog
.google/technology/ai/google-gemini-update-flash-ai-assistant-io-2024/#gemini-model-updates:~:text=1
.5%20flash%20excels%20at%20summarization%2C>`__), allows for a higher number of requests per minute (in the
free version, 15 vs. 2 of ``gemini-1.5-pro``), is faster, and, if you're paying for it, cheaper. To continue to
use ``gemini-1.5-pro``, which may produce more "complex" results, specify it in the job's ``differ`` directive.

Fixed
-----
* Fixed header of ``deepdiff`` and ``image`` (BETA) differs to be more consistent with the default ``unified`` differ.
* Fixed the way images are handled in the email reporter so that they now display correctly in clients such as Gmail.

Internals
---------
* Command line argument ``--test-differs`` now processes the new ``mime_type`` attribute correctly (``mime_type`` is
an internal work in progress attribute to facilitate future automation of filtering, diffing, and reporting).



Version 3.22
===================
2024-04-25
Expand Down Expand Up @@ -64,8 +91,8 @@ Changed
* Updated the command line argument ``--dump-history`` to display the ``mime_type`` attribute when present.
* Enhanced differs functionality:

- Standardized headers for ``deepdiff`` and ``imagediff`` to align more closely with those of ``unified``.
- Improved the ``google_ai`` differ:
- Standardized headers for ``deepdiff`` and ``imagediff`` (BETA) to align more closely with those of ``unified``.
- Improved the ``google_ai`` differ (BETA):

- Enhanced error handling: now, the differ will continue operation and report errors rather than failing outright
when Google API errors occur.
Expand Down
56 changes: 11 additions & 45 deletions RELEASE.rst
Original file line number Diff line number Diff line change
@@ -1,53 +1,19 @@
⚠ Breaking Changes
------------------
* Developers integrating custom Python code (``hooks.py``) should refer to the "Internals" section below for important
changes.

Changed
-------
* Snapshot database

- Moved the snapshot database from the "user_cache" directory (typically not backed up) to the "user_data" directory.
The new paths are (typically):

- Linux: ``~/.local/share/webchanges`` or ``$XDG_DATA_HOME/webchanges``
- macOS: ``~/Library/Application Support/webchanges``
- Windows: ``%LOCALAPPDATA%\webchanges\webchanges``

- Renamed the file from ``cache.db`` to ``snapshots.db`` to more clearly denote its contents.
- Introduced a new command line option ``--database`` to specify the filename for the snapshot database, replacing
the previous ``--cache`` option (which is deprecated but still supported).
- Many thanks to `Markus Weimar <https://github.com/Markus00000>`__ for pointing this problem out in issue `#75
<https://github.com/mborsetti/webchanges/issues/75>`__.

* Modified the command line argument ``--test-differ`` to accept a second parameter, specifying the maximum number of
diffs to generate.
* Updated the command line argument ``--dump-history`` to display the ``mime_type`` attribute when present.
* Enhanced differs functionality:

- Standardized headers for ``deepdiff`` and ``imagediff`` to align more closely with those of ``unified``.
- Improved the ``google_ai`` differ:

- Enhanced error handling: now, the differ will continue operation and report errors rather than failing outright
when Google API errors occur.
- Improved the default prompt to ``Analyze this unified diff and create a summary listing only the
changes:\n\n{unified_diff}`` for improved results.
* The ``ai-google`` (BETA) differ now defaults to using the new ``gemini-1.5-flash`` model (see documentation `here
<https://ai.google.dev/gemini-api/docs/models/gemini#gemini-1.5-flash-expandable>`__), as it still supports
1M tokens, "excels at summarization" (per `here <https://blog
.google/technology/ai/google-gemini-update-flash-ai-assistant-io-2024/#gemini-model-updates:~:text=1
.5%20flash%20excels%20at%20summarization%2C>`__), allows for a higher number of requests per minute (in the
free version, 15 vs. 2 of ``gemini-1.5-pro``), is faster, and, if you're paying for it, cheaper. To continue to
use ``gemini-1.5-pro``, which may produce more "complex" results, specify it in the job's ``differ`` directive.

Fixed
-----
* Fixed an AttributeError Exception when the fallback HTTP client package ``requests`` is not installed, as reported
by `yubiuser <https://github.com/yubiuser>`__ in `issue #76 <https://github.com/mborsetti/webchanges/issues/76>`__.
* Addressed a ValueError in the ``--test-differ`` command, a regression reported by `Markus Weimar
<https://github.com/Markus00000>`__ in `issue #79 <https://github.com/mborsetti/webchanges/issues/79>`__.
* To prevent overlooking changes, webchanges now refrains from saving a new snapshot if a differ operation fails
with an Exception.
* Fixed header of ``deepdiff`` and ``image`` (BETA) differs to be more consistent with the default ``unified`` differ.
* Fixed the way images are handled in the email reporter so that they now display correctly in clients such as Gmail.

Internals
---------
* New ``mime_type`` attribute: we are now capturing and storing the data type (as a MIME type) alongside data in the
snapshot database to facilitate future automation of filtering, diffing, and reporting. Developers using custom
Python code will need to update their filter and retrieval methods in classes inheriting from FilterBase and
JobBase, respectively, to accommodate the ``mime_type`` attribute. Detailed updates are available in the `hooks
documentation <https://webchanges.readthedocs.io/en/stable/hooks.html#:~:text=Changed%20in%20version%203.22>`__.
* Updated terminology: References to ``cache`` in object names have been replaced with ``ssdb`` (snapshot database).
* Int
* Command line argument ``--test-differs`` now processes the new ``mime_type`` attribute correctly (``mime_type`` is
an internal work in progress attribute to facilitate future automation of filtering, diffing, and reporting).
100 changes: 52 additions & 48 deletions docs/configuration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Configuration
=============
The global configuration for :program:`webchanges` contains basic settings for the generic behavior of
:program:`webchanges`, including its :ref:`reports <reports>` and :ref:`reporters <reporters>`. It is written in **YAML
format**, is called ``config.yaml``, and is located in the in the following directory:
format**, is called ``config.yaml``, and is located in the following directory:

* Linux: ``~/.config/webchanges``
* MacOS: ``~/Library/Preferences/webchanges``
Expand Down Expand Up @@ -120,6 +120,8 @@ Reporters are implemented in a hierarchy, and configuration settings of a report
Setting the ``email`` reporter's ``html`` option to ``true`` will cause it to inherit from the ``html``
configuration.



.. _job_defaults:

Job Defaults
Expand All @@ -143,7 +145,7 @@ config file. The following example will set default headers for all ``url`` jobs
Sec-Fetch-User: ?1
Sec-GCP: 1
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36
The above config file sets all ``url`` jobs without the ``browser`` directive to use the specified headers.

Expand All @@ -155,92 +157,94 @@ The possible sub-directives to ``job_defaults`` are:
* ``browser``: Applies only to jobs with the directives ``url`` and ``use_browser: true``;
* ``command``: Applies only to jobs with the directive ``command``.

See :ref:`jobs <jobs>` about the different job kinds and directives that can be set.
See :ref:`jobs <jobs>` for an explanation of the different job kinds and their directives.

Duplicate handling
******************
If a directive is specified both in ``all`` and either in ``url``, ``browser`` or ``command``, the one in ``all``
will be overridden, with the contents of ``headers`` being handled as if they were separate directives before being
overridden.
Handling of duplicate directives
````````````````````````````````
Any directive specified in either ``url``, ``browser`` or ``command`` will override the same directive specified in
``all``. In case of the ``headers`` directive, the overriding is done on a header-by-header basis.



Database configuration
----------------------
If you want to change some settings for all your jobs, edit the ``database`` section in your config file:
The ``database`` section in your config file contains information on how snapshots are stored from run to run:

.. code-block:: yaml
database:
engine: sqlite3
max_snapshots: 4
.. _database_engine:

Default database engine
-------------------------
Database engine
```````````````
``engine``

You can select one of the engines from this list; the default engine can also be changed on an individual run with the
``--cache-engine`` command line argument.
You can select one of the database engines as specified below; this can be overridden with the ``--cache-engine``
command line argument.

Default (``sqlite3``)
*********************
In version 3.2 we migrated the internal database system to one that relies on the built-in ``sqlite3`` engine. This
is more efficient due to indexing, creates smaller files due to data compression with `msgpack <https://msgpack
.org/index.html>`__, and provides additional functionality such as no data corruption in case of an execution error.
``sqlite3``
:::::::::::
The default database engine, uses the ``sqlite3`` database built into Python with data compression provided by
`msgpack <https://msgpack.org/index.html>`__. It is the most advanced solution due its speed due to indexing, small
data files, and no data corruption or snapshot storage in case of an execution error.

This has also allowed us to remove the requirement for the ``minidb`` Python package. Migration of the latest snapshots
from the legacy (minidb) database is done automatically and the old file is preserved for manual deletion.
The migration to this engine in version 3.2 allowed us to remove the requirement for the ``minidb`` Python package.

Text files (``textfiles``)
**************************
To have the latest snapshot of each job saved as a separate text file instead of as a record in a database, use
``textfiles``.
``textfiles``
:::::::::::::
Saves the latest snapshot of each job as its own individual text file. Only one snapshot can be saved, and both the
ETag (allowing the speeding up of web data retrieval) and MIME type (enabling some diffing and reporting automation)
will be lost.

Legacy (``mindib``)
*******************
This will use a database that is backwards compatible with version 3.1 and with :program:`urlwatch` 2. The ``minidib``
Python package must be installed for this to work.

Redis (``redis://...`` or ``rediss://...``)
*******************************************
``redis://...`` or ``rediss://...``
:::::::::::::::::::::::::::::::::::
To use Redis as a database (cache) backend, specify a redis URI:

``mindib``
::::::::::
The deprecated legacy database engine, it is backwards compatible with :program:`urlwatch`. Requires that
the ``minidib`` Python package is installed; MIME types are not stored, is not indexed, data is not compressed, and
the database file will grow indefinitely.

.. code-block:: yaml
database:
engine: redis://localhost:6379/
For this to work, optional dependencies need to be installed; please see :ref:`here <dependencies>`

There is no migration path from an existing database: the Redis database will be empty the first time it is used.
To use Redis, optional dependencies need to be installed; please see :ref:`here <dependencies>`

.. note:: Switching from Legacy (``mindib``) to Default (``sqlite3``) will cause an automatic data migration as long
as the ``minidb`` Python package is installed; the old file database file is preserved for manual deletion. There is
no migration path between any other databases types; for example, switching to Redis will create a new empty
database at the first run.


.. _database_max_snapshots:

Maximum number of snapshots to save
***********************************
``max_snapshots``
`````````````````
Maximum number of snapshots to save

Each time you run :program:`webchanges`, it captures the data downloaded from the URL (or the output of the command
specified), applies filters, and if it finds a change it saves the resulting snapshot to a database for future
comparison. By default¹ only the last 4 changed snapshots are kept, but this number can be modified either in the
configuration file or, for an individual run, with the with the ``--max-snapshots`` command line argument.
comparison. By default, only the last 4 changed snapshots are kept, but this number can be modified either in the
configuration file or with the ``--max-snapshots`` command line argument.

If set to 0, all changed snapshots are retained (the database will grow unbounded).
If set to 0, all changed snapshots are retained (the database will grow indefinitely).

.. tip:: Changes (diffs) between saved snapshots can be redisplayed with the ``--test-differ`` command line argument (see
:ref:`here <test-differ>`).
.. note:: Only applicable to the ``sqlite3`` (default) database engine. When using ``redis`` or ``minidb`` database
engines all snapshots will be kept (the database will grow indefinitely), while when using the ``textfiles``
database engine only the last snapshot is kept.

¹ Note that when using ``redis`` or ``minidb`` database engines all snapshots will be kept, while when using the
``textfiles`` database engine only the last snapshot is kept.
.. tip:: Changes (diffs) between saved snapshots can be redisplayed with the ``--test-differ`` command line argument
(see :ref:`here <test-differ>`).


.. versionadded:: 3.11
for default ``sqlite3`` database engine only.
For default ``sqlite3`` database engine only.



Expand All @@ -253,8 +257,8 @@ line argument.



Keys starting with underline are ignored
----------------------------------------
Keys that start with underline are ignored and can be used for remarks.
Remarks
-------
YAML files do not allow for remarks; however, keys that start with underline are ignored and can be used for remarks.

.. versionadded:: 3.11
7 changes: 4 additions & 3 deletions docs/differs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -420,9 +420,10 @@ This differ is currently in BETA and the directives may change in the future.
requires the package ``numpy`` to be installed (default: 2.5).

.. note:: If you pass a ``url`` or ``filename`` to the differ, it will detect changes only if the url or
filename changes, not if the image behind the url/filename does. To detect changes in an image when the url or
filename doesn't change, build a job that captures the image itself encoded in Ascii85 (preferably, see the
:ref:`ascii85` filter or Base64 and set ``data_type`` accordingly.
filename changes, not if the image behind the url/filename does; no change will be reported if the url or filename
changes but the image doesn't. To detect changes in an image when the url or filename doesn't change, build a job
that captures the image itself encoded in Ascii85 (preferably, see the :ref:`ascii85` filter) or Base64 and set
``data_type`` accordingly.

Required packages
`````````````````
Expand Down
Loading

0 comments on commit 203fe04

Please sign in to comment.