Skip to content

Commit

Permalink
Version 3.20rc0
Browse files Browse the repository at this point in the history
  • Loading branch information
mborsetti committed Mar 15, 2024
1 parent ff1d02d commit 5fab962
Show file tree
Hide file tree
Showing 24 changed files with 276 additions and 81 deletions.
34 changes: 26 additions & 8 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,25 @@ can check out the `wish list <https://github.com/mborsetti/webchanges/blob/main/
Internals, for changes that don't affect users. [triggers a minor patch]
Version 3.20rc0
===================
2024-03-15

Added
-----
* ``re.findall`` filter to extract, delete or replace non-overlapping text using Python ``re.findall``.

Changed
-------
* ``--test-reporter`` now allows testing of reporters that are not enabled; the error that the reporter is not enabled
is now a warning. This simplifies testing.
* ``email`` reporter supports sending to multiple "to" addresses (both SMTP and sendmail)

Fixed
-----
* Reports from jobs with ``monospace: true`` were not being rendered correctly in Gmail.


Version 3.19.1
===================
2024-03-07
Expand Down Expand Up @@ -61,8 +80,8 @@ Added
(see the advanced section of the documentation for a suggestion of elements to block). This was available under
Pypetteer and has been reintroduced for Playwright.
* ``init_script`` directive for jobs with ``use_browser: true`` to execute a JavaScript in Chrome after launching it
and before navigating to ``url``. This can be useful to e.g. unset certain default Chrome ``navigator``
properties by calling a JavaScript function to do so.
and before navigating to ``url``. This can be useful to e.g. unset certain default Chrome ``navigator`` properties
by calling a JavaScript function to do so.


Version 3.18.1
Expand Down Expand Up @@ -670,9 +689,9 @@ Internals
---------
* Updated licensing file to `GitHub naming standards
<https://docs.github.com/en/communities/setting-up-your-project-for-healthy-contributions/adding-a-license-to-a-repository>`__
and updated its contents to more clearly state that this software redistributes source code of release 2.21
of urlwatch (https://github.com/thp/urlwatch/tree/346b25914b0418342ffe2fb0529bed702fddc01f), retaining its license,
which is distributed as part of the source code.
and updated its contents to more clearly state that this software redistributes source code of release 2.21 dated 30
July 2020 of urlwatch (https://github.com/thp/urlwatch/tree/346b25914b0418342ffe2fb0529bed702fddc01f) retaining its
license, which is distributed as part of the source code.
* Pyppeteer has been removed from the test suite.
* Deprecated ``webchanges.jobs.ShellError`` exception in favor of Python's native ``subprocess.SubprocessError`` one and
its subclasses.
Expand Down Expand Up @@ -1350,7 +1369,7 @@ Version 3.0

Milestone
---------
Initial release of **webchanges**, based on reworking of code from *urlwatch* 2.21.
Initial release of **webchanges**, based on reworking of code from *urlwatch* 2.21 dated 30 July 2020.

Added
-----
Expand Down Expand Up @@ -1473,5 +1492,4 @@ Relative to *urlwatch* 2.21:

Known bugs
----------
* Documentation could be more complete
* Almost complete lack of inline docstrings in the code
* None
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
SOURCE CODE REDISTRIBUTION NOTICE
(urlwatch by Thomas Perl)

This software redistributes source code of release 2.21 of July 30, 2020 of
This software redistributes source code of release 2.21 dated 30 July 2020 of
urlwatch
https://github.com/thp/urlwatch/tree/346b25914b0418342ffe2fb0529bed702fddc01f,
which is subject to the following copyright notice and license (from
Expand Down
23 changes: 13 additions & 10 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,9 @@
webchanges |downloads|
======================

**webchanges** checks web content and notifies you via e-mail (or one of many other supported services) if a change is
detected. **webchanges** can also check the output of local commands. The notification includes the changed URL or
**webchanges** checks web content and notifies you via email (or one of many other `supported services
<https://webchanges.readthedocs.io/en/stable/introduction.html#reporters-list>`__) if a change is detected.
**webchanges** can also check the output of local commands. The notification includes the changed URL or
command and a summary (diff) of what has changed.

**webchanges** *anonymously* alerts you of web changes.
Expand Down Expand Up @@ -62,7 +63,7 @@ Initialize
#. Run the following command to change the default `configuration
<https://webchanges.readthedocs.io/en/stable/configuration.html>`__, e.g. to receive change notifications
("`reports <https://webchanges.readthedocs.io/en/stable/reporters.html>`__")
by `e-mail <https://webchanges.readthedocs.io/en/stable/reporters.html#smtp>`__ and/or one of many other methods:
by `email <https://webchanges.readthedocs.io/en/stable/reporters.html#smtp>`__ and/or one of many other methods:

.. code-block:: bash
Expand All @@ -71,7 +72,7 @@ Initialize
Run
---
To check the sources in your jobs and report on (e.g. display or via e-mail) any changes found from the previous
To check the sources in your jobs and report on (e.g. display or via email) any changes found from the previous
execution, just run:

.. code-block:: bash
Expand Down Expand Up @@ -110,18 +111,20 @@ License
|license|

Released under the `MIT License <https://opensource.org/licenses/MIT>`__ but redistributing modified source code from
`urlwatch 2.21 <https://github.com/thp/urlwatch/tree/346b25914b0418342ffe2fb0529bed702fddc01f>`__ licensed under a
`BSD 3-Clause License
`urlwatch 2.21 <https://github.com/thp/urlwatch/tree/346b25914b0418342ffe2fb0529bed702fddc01f>`__ dated 30 July 2020
licensed under a `BSD 3-Clause License
<https://raw.githubusercontent.com/thp/urlwatch/346b25914b0418342ffe2fb0529bed702fddc01f/COPYING>`__. See the complete
license `here <https://github.com/mborsetti/webchanges/blob/main/LICENSE>`__.


Compatibility with **urlwatch**
================================

This project is based on code from `urlwatch <https://github.com/thp/urlwatch>`__ 2.21. You can easily upgrade from
the current version of **urlwatch** (see `here <https://webchanges.readthedocs.io/en/stable/migration.html>`__)
using the same job and configuration files and benefit from many HTML-focused improvements, including:
This project is based on code from `urlwatch 2.21
<https://github.com/thp/urlwatch/tree/346b25914b0418342ffe2fb0529bed702fddc01f>`__ dated 30 July 2020. You can
easily upgrade from the current version of **urlwatch** (see `here
<https://webchanges.readthedocs.io/en/stable/migration.html>`__) using the same job and configuration files and
benefit from many HTML-focused improvements, including:

* Report links that are `clickable <https://pypi.org/project/webchanges/>`__!
* Original formatting such as **bolding / headers**, *italics*, :underline:`underlining`, list bullets (•) and
Expand All @@ -134,7 +137,7 @@ using the same job and configuration files and benefit from many HTML-focused im
which makes it easier to track content that was added without the distractions of the content that was deleted;
* New features such as ``--errors`` to catch jobs that no longer work;
* Much better `documentation <https://webchanges.readthedocs.io/>`__;
* More reliability and stability, including a 30+ percentage point increase in testing coverage;
* More reliability and stability, including a ~30 percentage point increase in testing coverage;
* Many other additions, refinements and fixes (see `detailed information
<https://webchanges.readthedocs.io/en/stable/migration.html#upgrade-details>`__).

Expand Down
14 changes: 11 additions & 3 deletions RELEASE.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,13 @@
Added
-----
* ``re.findall`` filter to extract, delete or replace non-overlapping text using Python ``re.findall``.

Changed
-------
* ``--test-reporter`` now allows testing of reporters that are not enabled; the error that the reporter is not enabled
is now a warning. This simplifies testing.
* ``email`` reporter supports sending to multiple "to" addresses (both SMTP and sendmail)

Fixed
-----
* Added the ``Date`` header field to SMTP email messages to ensure the timestamp is present even when it is not added
by the server upon receipt. Contributed by `Dominik <https://github.com/DL6ER>`__ in `#71
<https://github.com/mborsetti/webchanges/pull/71>`__.
* Reports from jobs with ``monospace: true`` were not being rendered correctly in Gmail.
4 changes: 2 additions & 2 deletions docs/cli_help.txt
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@ usage: webchanges [-h] [-V] [-v] [--jobs FILE] [--config FILE] [--hooks FILE]
[joblist ...]

Checks web content to detect any changes since the prior run. If any are found,
it shows what changed ('diff') and/or sends it via e-mail and/or other
supported services. Can check the output of local commands as well.
it shows what changed ('diff') and/or sends it via email and/or other supported
services. Can check the output of local commands as well.

positional arguments:
joblist job(s) to run (by index as per --list) (default: run
Expand Down
87 changes: 74 additions & 13 deletions docs/filters.rst
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,8 @@ At the moment, the following filters are available:
<https://docs.python.org/3/library/re.html#regular-expression-syntax>`__.
- :ref:`re.sub`: Replace or remove text matching a `Python regular expression
<https://docs.python.org/3/library/re.html#regular-expression-syntax>`__.
- :ref:`re.findall`: Extract, replace or remove all non-overlapping text matching a `Python regular expression
<https://docs.python.org/3/library/re.html#regular-expression-syntax>`__.
- :ref:`strip`: Strip leading and/or trailing whitespace or specified characters.
- :ref:`sort`: Sort lines.
- :ref:`remove_repeated`: Remove repeated items (lines).
Expand Down Expand Up @@ -1055,21 +1057,81 @@ To run jobs with the ``password`` sub-directive, then use the following:
.. _re.findall:

re.findall
----------
This filter extracts, deletes or replaces non-overlapping text using Python `re.findall
<https://docs.python.org/3/library/re.html#re.findall>`__ `regular expression
<https://docs.python.org/3/library/re.html#regular-expression-syntax>`__ operation.

Just specifying a regular expression (regex) or string as the value will extract the match. Patterns can be replaced
with another string using ``pattern`` as the expression and ``repl`` as the replacement, or deleted by setting
``repl`` to an empty string.

All features are described in Python’s re.findall's `documentation
<https://docs.python.org/3/library/re.html#re.findall>`__. The ``pattern`` is first iteratively matched using
`re.finditer <https://docs.python.org/3/library/re.html#re.finditer>`__ and the ``repl`` value is applied to each
non-overlapping match; if ``repl`` is missing, then group "0" (the entire match) is extracted.

Each match is outputted on its own line.

The following example applies the filter twice:

1. Just specifying a string as the value will include the full match in the output.
2. You can use groups (``()``) and back-reference them with ``\1`` (etc..) to put groups into the replacement string.

By default, the full match will be included in the output.

.. code-block:: yaml
url: https://example.com/regex-findall.html
filter:
- re.findall: '<span class="price">.*</span>'
- re.findall:
pattern: 'Price: \$([0-9]+)'
repl: '\1'
.. tip:: Remember that some useful Python regex flags, such as
`IGNORECASE <https://docs.python.org/3/library/re.html#re.IGNORECASE>`__,
`MULTILINE <https://docs.python.org/3/library/re.html#re.MULTILINE>`__,
`DOTALL <https://docs.python.org/3/library/re.html#re.DOTALL>`__, and
`VERBOSE <https://docs.python.org/3/library/re.html#re.VERBOSE>`__,
can be specified as inline flags and therefore can be used with :program:`webchanges`.

You can use the entire range of Python's `regular expression (regex) syntax
<https://docs.python.org/3/library/re.html#regular-expression-syntax>`__.

Optional sub-directives
"""""""""""""""""""""""
* ``pattern``: Regular expression pattern or string for matching; this sub-directive must be specified when
using the ``repl`` sub-directive, otherwise the pattern can be specified as the value of ``re.sub`` (in which case
a match will be extracted).
* ``repl``: The string applied iteratively to each match (default: '\g<0>', or extract all matches).

.. versionadded:: 3.20



.. _re.sub:

re.sub
------
This filter deletes or replaces text using Python `regular expressions
<https://docs.python.org/3/library/re.html#regular-expression-syntax>`__.
This filter deletes or replaces text using Python Python `re.sub
<https://docs.python.org/3/library/re.html#re.sub>`__ `regular expression
<https://docs.python.org/3/library/re.html#regular-expression-syntax>`__ operation.

Just specifying a regular expression (regex) as the value will remove the match. Patterns can be replaced with another
string using ``pattern`` as the expression and ``repl`` as the replacement.
Just specifying a regular expression (regex) or string as the value will remove the match. Patterns can be replaced
with another string by specifying ``repl`` as the replacement.

All features are described in Python’s re.sub `documentation <https://docs.python.org/3/library/re.html#re.sub>`__. The
``pattern`` and ``repl`` values are passed to this function as-is; if ``repl`` is missing, then it's considered to be an
empty string, and this filter deletes the the leftmost non-overlapping occurrences of ``pattern``.
All features are described in Python’s re.sub's `documentation <https://docs.python.org/3/library/re.html#re.sub>`__.
The ``pattern`` and ``repl`` values are passed to this function as-is; if ``repl`` is missing, then it's considered
to be an empty string, and this filter deletes the the leftmost non-overlapping occurrences of ``pattern``.

.. tip:: Remember that some useful Python regxx flags, such as
.. tip:: Remember that some useful Python regex flags, such as
`IGNORECASE <https://docs.python.org/3/library/re.html#re.IGNORECASE>`__,
`MULTILINE <https://docs.python.org/3/library/re.html#re.MULTILINE>`__,
`DOTALL <https://docs.python.org/3/library/re.html#re.DOTALL>`__, and
Expand Down Expand Up @@ -1109,11 +1171,10 @@ never changes):
Optional sub-directives
"""""""""""""""""""""""
* ``pattern``: Regular expression to match for replacement; this sub-directive must be specified when using the ``repl``
sub-directive, otherwise the pattern can be specified as the value of ``re.sub`` (in which case a match will be
deleted).
* ``repl``: The string for replacement. If this sub-directive is missing, defaults to empty string (i.e. deletes the
string matched in ``pattern``).
* ``pattern``: Regular expression pattern or string to match for replacement; this sub-directive must be specified when
using the ``repl`` sub-directive, otherwise the pattern can be specified as the value of ``re.sub`` (in which case
a match will be deleted).
* ``repl``: The string for replacement (default: empty string, i.e. deletes the string matched in ``pattern``).



Expand Down
18 changes: 13 additions & 5 deletions docs/reporters.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@ Each reporter has a directive called ``enabled`` that can be set to true or fals
Tip: If you are running :program:`webchanges` on a cloud server on a different timezone (e.g. UTC), see :ref:`tz`
below to set the time zone to be uses for reporting.

.. _reporters-list:

At the moment, the following reporters are available:

* :ref:`stdout` (enabled by default): Display on stdout (the console).
Expand All @@ -45,15 +47,17 @@ At the moment, the following reporters are available:
* :ref:`webhook`: Send to an e.g. Slack or Mattermost channel using the service's webhook.
* :ref:`xmpp`: Send using the Extensible Messaging and Presence Protocol (XMPP).

Programmers can write their own reporter(s) as a :ref:`hook <hooks>`. file.

.. To convert the "webchanges --features" output, use:
webchanges --features | sed -e 's/^ \* \(.*\) - \(.*\)$/- **\1**: \2/'
Please note that many reporters need the installation of additional Python packages to work, as noted below and in
:ref:`dependencies <dependencies>`.


.. tip:: While jobs are executed in parallel for speed, they are sorted alphabetically in reports so you can use
:ref:`name` to control the order in which they appear in the report.
.. tip:: While jobs are executed in parallel for speed, the output is sorted alphabetically in reports so you can
use the :ref:`name` to control the order in which they appear in the report.

.. versionchanged:: 3.11
Reports are sorted by job name.
Expand Down Expand Up @@ -174,7 +178,8 @@ Sub-directives
~~~~~~~~~~~~~~
* ``method``: Either ``smtp`` or ``sendmail``.
* ``from``: The sender's email address. **Do not use your main email address** but create a throwaway one!
* ``to``: The destination email address.
* ``to``: The destination email address(es); if sending to more than one recipient, concatenate the addresses with a
comma (``,``).
* ``subject``: The subject line. Use ``{count}`` for the number of reports, ``{jobs}`` for the title of jobs
reported, and {jobs_files} for a space followed by the name of the jobs file(s) used within parenthesis, stripped
of preceding ``jobs-``, if not using the default ``jobs.yaml``. Default: ``[webchanges] {count}
Expand All @@ -200,7 +205,7 @@ low-risk way to run unattended.
email:
enabled: true # don't forget to set this to true! :)
from: webchanges <throwawayaccount@example.com> # (edit accordingly; don't use your primary account for this!!)
to: myself@example.com # The email address of where want to receive reports
to: myself@example.com, someonelse@example.com # The email address(es) of where want to receive reports
subject: "[webchanges] {count} changes: {jobs}"
html: true
method: smtp
Expand Down Expand Up @@ -317,7 +322,7 @@ sendmail
~~~~~~~~

Calls the external `sendmail <https://www.proofpoint.com/us/products/email-protection/open-source-email-solution>`__
program, which must already be installed and configured.
program (linux only), which must already be installed and configured.

Optional packages
~~~~~~~~~~~~~~~~~
Expand All @@ -333,6 +338,9 @@ If using a Keychain to store the password, you also need to:
pip install --upgrade webchanges[safe_password]
.. versionchanged:: 3.10
Can specify multiple "to" email addresses.



.. _ifttt:
Expand Down
Loading

0 comments on commit 5fab962

Please sign in to comment.