Fixes to the configurable user-agent changes introduced in 3.11 #97

jikamens · 2020-02-06T16:39:04Z

No description provided.

Profpatsch · 2020-02-06T17:28:02Z

rss2email/post_process/redirect.py

@@ -61,7 +61,7 @@ def process(feed, parsed, entry, guid, message):
    for link in links:
        try:
            request = urllib.request.Request(link)
-            request.add_header('User-agent', rss2email.feed._USER_AGENT)
+            request.add_header('User-agent', feed.user_agent)


We don’t hit this line with our tests?

git grep redirect in the test directory returns no matches, so it does seem that way.

Profpatsch · 2020-02-06T17:33:37Z

rss2email/feed.py

+        feed. We've fixed that problem, but we want to go back now and
+        repair feeds that got the wrong user agent value in the
+        interim.
+        """


Am I reading this right, and instead of __VERSION__ the previous code would serialize the substituted value into the configuration?

Yes. Here's what happens in the v3.11 code (without this PR) when a feed is added:

Config is loaded.

__VERSION__ and __URL__ are substituted into the user agent string at that time.

When the config for the feed is saved, the code recognizes that the value of this setting for the feed -- since it has been substituted -- is different for the value in the DEFAULT section, so it saves the feed-specific value.

ghost · 2020-02-06T18:53:52Z

Please see https://github.com/rss2email/rss2email/blob/master/test/allthingsrss/4.config and https://github.com/rss2email/rss2email/blob/master/test/allthingsrss/4.expected for testing.

Edit:
Ah, I see that the proper thing to do is add __VERSION__ and __URL__ to the test cases.

Profpatsch · 2020-02-06T19:21:36Z

Ah, I see that the proper thing to do is add __VERSION__ and __URL__ to the test cases.

Let’s add them in these fixes then.

jikamens · 2020-02-06T19:34:58Z

Here's what happens when I try to run PYTHONPATH=. ./test/test.py:

E..
======================================================================
ERROR: test_send (__main__.TestEmails)
Emails generated from already-fetched feed data are correct
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test/test.py", line 129, in test_send
    self.run_single_test(dirname=this_path)
  File "./test/test.py", line 83, in run_single_test
    self.run_single_test(dirname=dirname, config_path=config_path, force=force)
  File "./test/test.py", line 109, in run_single_test
    '\n'.join(diff_lines)))
ValueError: error processing disqus/1.config
--- expected
+++ generated
@@ -39,7 +39,7 @@
 Message-ID: <...@dev.null.invalid>
 User-Agent: rss2email/...
 X-RSS-Feed: disqus/feed.rss
-X-RSS-ID: ab03f2100069a1cd0876b997be87976c18d48e8a
+X-RSS-ID: a52375ec78a988241fe9864a2243d4d910538d52
 X-RSS-URL: http://software-carpentry.org/2012/11/who-wants-to-write-a-little-code/#comment-713578640
 
 @Hans-Martin  

----------------------------------------------------------------------
Ran 3 tests in 1.705s

FAILED (errors=1)

Here's what happens when I try PYTHONPATH=. ./test/test.py test/allthingsrss/4.config:

E
======================================================================
ERROR: test/allthingsrss/4 (unittest.loader._FailedTest)
----------------------------------------------------------------------
AttributeError: module '__main__' has no attribute 'test/allthingsrss/4'

----------------------------------------------------------------------
Ran 1 test in 0.000s

FAILED (errors=1)

I'm pretty sure I'm running the commands as specified in test/README.

Both of these errors occur on master, not just on the branch with my changes, so they are not related to my changes.

I am happy to update my PR to include tests, but I do not have the bandwidth to make the test framework as a prerequisite for that, so please either (a) tell me what I am doing wrong when trying to invoke the test framework or (b) fix it if it's broken. Thanks.

Profpatsch · 2020-02-07T22:53:09Z

The README seems outdated.

env "PATH=$(pwd):$PATH" "PYTHONPATH=$(pwd):$PYTHONPATH" ./test/test.py

works for me.

Profpatsch · 2020-02-07T23:38:24Z

I have a correction up at #98

jikamens · 2020-02-08T14:17:13Z

Tests are still failing on master for me:

$ env "PATH=$(pwd):$PATH" "PYTHONPATH=$(pwd):$PYTHONPATH" ./test/test.py
E..
======================================================================
ERROR: test_send (__main__.TestEmails)
Emails generated from already-fetched feed data are correct
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test/test.py", line 129, in test_send
    self.run_single_test(dirname=this_path)
  File "./test/test.py", line 83, in run_single_test
    self.run_single_test(dirname=dirname, config_path=config_path, force=force)
  File "./test/test.py", line 109, in run_single_test
    '\n'.join(diff_lines)))
ValueError: error processing disqus/1.config
--- expected
+++ generated
@@ -39,7 +39,7 @@
 Message-ID: <...@dev.null.invalid>
 User-Agent: rss2email/...
 X-RSS-Feed: disqus/feed.rss
-X-RSS-ID: ab03f2100069a1cd0876b997be87976c18d48e8a
+X-RSS-ID: a52375ec78a988241fe9864a2243d4d910538d52
 X-RSS-URL: http://software-carpentry.org/2012/11/who-wants-to-write-a-little-code/#comment-713578640
 
 @Hans-Martin  

----------------------------------------------------------------------
Ran 3 tests in 1.531s

FAILED (errors=1)

Y'all let me know when you have tests actually passing on master and I'll work on adding tests for my changes. I don't want to have to battle someone else's failing tests while trying to add my own, since that'll make it difficult for me to understand what failures are my fault.

Profpatsch · 2020-02-08T22:05:23Z

Which version of python are you using, and what’s the python path?

If you look at the CI, we are running tests against ~~three~~four different python versions, so I’m assuming the environment is different.

Inside the nix-shell (See hacking.md) the following dependencies will be available:

> printenv PYTHONPATH | tr ':' '\n' | sort | uniq
/nix/store/3y9iqi6bk24i5jpfi6c7qb9950jjid3g-python3.7-html2text-2018.1.9/lib/python3.7/site-packages
/nix/store/k5rdcbcwwpvj7l9f1yvd5mfggcfz16kk-python3-3.7.5/lib/python3.7/site-packages
/nix/store/pza4bi76fq3qlx8qs7p1a8rkvfv012vb-update-copyright-0.6.2/lib/python3.7/site-packages
/nix/store/q0ssx731z2zlrgsswxy7dsy6x7shisfr-python3.7-feedparser-5.2.1/lib/python3.7/site-packages

The setup.py specifies

    install_requires=[
        'feedparser>=5.0.1',
        'html2text>=3.0.1',
        ],

maybe we need to revise that.

jikamens · 2020-02-08T22:33:37Z

Oh, boy, another package manager to deal with, just what I always wanted.

OK, now I've installed nix and run nix-shell and run env "PATH=$(pwd):$PATH" "PYTHONPATH=$(pwd):$PYTHONPATH" ./test/test.py inside that shell. Now the error above is no longer happening, but it's only running three tests, which is clearly much fewer tests than there actually are. Furthermore, neither env "PATH=$(pwd):$PATH" "PYTHONPATH=$(pwd):$PYTHONPATH" ./test/test.py test/allthingsrss nor env "PATH=$(pwd):$PATH" "PYTHONPATH=$(pwd):$PYTHONPATH" ./test/test.py test/allthingsrss test/allthingsrss/1.config work inside the nix shell. For example:

[nix-shell:~/src/rss2email-1]$ env "PATH=$(pwd):$PATH" "PYTHONPATH=$(pwd):$PYTHONPATH" ./test/test.py test/allthingsrss/1.config 
E
======================================================================
ERROR: test/allthingsrss/1 (unittest.loader._FailedTest)
----------------------------------------------------------------------
AttributeError: module '__main__' has no attribute 'test/allthingsrss/1'

----------------------------------------------------------------------
Ran 1 test in 0.000s

FAILED (errors=1)

So even in the nix shell there is a disconnect between how your documentation says the tests work and how they work for me.

Here is the output of the command you gave above on my system inside the nix shell where I'm running the tests:

[nix-shell:~/src/rss2email-1]$ printenv PYTHONPATH | tr ':' '\n' | sort | uniq
/nix/store/3y9iqi6bk24i5jpfi6c7qb9950jjid3g-python3.7-html2text-2018.1.9/lib/python3.7/site-packages
/nix/store/k5rdcbcwwpvj7l9f1yvd5mfggcfz16kk-python3-3.7.5/lib/python3.7/site-packages
/nix/store/pza4bi76fq3qlx8qs7p1a8rkvfv012vb-update-copyright-0.6.2/lib/python3.7/site-packages
/nix/store/q0ssx731z2zlrgsswxy7dsy6x7shisfr-python3.7-feedparser-5.2.1/lib/python3.7/site-packages

This is getting frustrating. I'm trying to do the right thing here -- you said you wanted tests, I'm trying to add tests, because as someone who has maintained open-source software for over 30 years, I know how frustrating it is when people submit incomplete patches -- but these roadblocks are making this take a lot more time than I had hoped to spend on this. Given that the patches I've submitted reflect two different, real bugs in your code, and my efforts to get the tests working so I can add the tests have been repeatedly stymied, maybe one of the main developers of the package can meet me halfway here and write the tests?

Profpatsch · 2020-02-08T23:08:23Z

Given that the patches I've submitted reflect two different, real bugs in your code, and my efforts to get the tests working so I can add the tests have been repeatedly stymied, maybe one of the main developers of the package can meet me halfway here and write the tests?

There are no main developers in this project at the moment, we are trying to maintain the existing code as best we can in our spare time. Incidentally, the main original author, Aaron Swartz, no longer rests among the living.

We have been trying our best consolidating the different forks and patches that have been lying around for years, but there is only so much time people have been able to spare to push this project into a working state again.

$ env "PATH=$(pwd):$PATH" "PYTHONPATH=$(pwd):$PYTHONPATH" ./test/test.py

… runs the tests for me inside the nix shell.

Referencing a config throws the error you described, it looks to me like the docs were wrong in this case (they are pretty old, from before this maintainership effort). @kaashif has restructured the test suite a bit, maybe they can give more input on what the intended usage is.

jikamens · 2020-02-09T00:55:35Z

I had no idea Aaron was the original author of this project. Now I'm depressed. :-/

Note that when I run nix-build -A pythonVersions.rss2email-python_3_7 nix/release.nix that also only runs three tests. I gather it should be running a lot more than that? I honestly don't have any idea where to go from here.

The _USER_AGENT attribute has been removed from `rss2email.feed._USER_AGENT` and moved to each specific feed, so the redirect post-processor needs to get it from the new location.

If we substitute `__VERSION__` and `__URL__` in the user agent string when it is loaded from the configuration file, then whenever a new feed is added it will end up with the current version number and URL hard-coded into its `user-agent` setting, which means that even when either of those changes in a new version of rss2email, that feed will keep using the old ones. This fix changes that by doing the substitution at the time the user agent string is actually used, i.e., included in an HTTP request or email header, rather than when the config file is loaded.

Because we were substituting `__VERSION__` and `__URL__` into the user agent string when the config was read in 3.11, any feed added with 3.11 would end up with a per-feed user-agent setting with substituted current values, so that feed would keep using those values even when the version or URL of rss2email changed. That bug is fixed in another commit, but now we want to put in some migration code to repair feeds that were broken in the interim. We can't get this 100% right since we can only safely do the repair when the user is using the default user-agent setting, but that will cover most users so it is worth doing.

jikamens · 2020-02-09T02:17:46Z

OK, additional info...

I understand much better how the tests work, and I've added an additional commit to this branch which I believe makes them work much better now. See the commit message for the second to last commit on this branch for details.

Upon further investigation I determined that the current tests were testing that __VERSION__ and __URL__ were substituted properly in the user agent string, but they weren't doing quite a good enough job of it. I've added an additional commit to fix this; that's the last commit on the branch.

Testing the "User agent needs to be substituted at use time, not at load time" is harder and not easily doable within the current test framework so I'm not adding a test for that at this time.

Use the `parameterized` module to treat each `.config` file as a separate test, so that it's clear which of them is failing, and so that it is possible to use the functionality built into `unittest` for running subsets of the tests or even individual tests. While I was implementing this I discovered that all of the test cases were depending on a call to `_os.chdir` being done in just one of them, which is problematic if you're only trying to execute a subset of the tests and also because it's a bad idea to rely on the order in which tests execute, so I moved the `_os.chdir` into `ExecContext` and made sure all the code which depends on the current directory being changed is wrapped within an `ExecContext`. This commit also puts code into `test.py` for adjusting the python search path as necessary so that the user calling the tests no longer needs to do that, and updates `tests/README` to reflect that change and update the instructions for running a subset of the tests to reflect that they are now being executed by the `unittest` framework.

The regexp we were using to clean up the user agent string didn't have backslashes on the parentheses so it wasn't actually making sure they were there, and it wasn't actually confirming that `__VERSION__` had been substituted properly with a URL. These problems have both been fixed.

jikamens · 2020-02-09T03:13:02Z

The build is failing because there are problems in the parameterized unit tests for Python 3.8. They apparently do not affect the functionality of the package.

I have spent over an hour now trying to figure out how to modify nix/release.nix not to run the unit tests for parameterized when building it for Python 3.8. It's supposedly possible to do this, but apparently I'm too stupid to understand nix's configuration language and syntax well enough to be able to figure it out despite banging my head against it for over an hour.

Perhaps whoever decided to add nix to rss2email understands it well enough to do in a few minutes what I was unable to do despite over an hour of trying to figure it out?

See http://www.rssboard.org/media-rss#media-credit.

jikamens · 2020-02-13T15:31:45Z

I am not sure what to do at this point. I need some guidance.

I believe my commit to use parameterized.expand to run the individual test cases as separate tests is a good and correct change which improves the code base.

I have several other good and useful fixes on this branch.

However, it can't pass right now because the parameterized module doesn't work properly in Nix for Python 3.8 (the unit tests for the module don't pass, though the module actually works fine in Python 3.8 for our purposes if that is ignored), and the Travis build includes Python 3.8.

Options:

Fix the Nix build not to run unit tests for parameterized for Python 3.8, or heck just not to bother running the unit tests for parameterized at all. I tried and failed at this. I asked here if anyone can help and no one answered. I posted in the Nix discourse asking for help and no one responded there either. I have hit a brick wall with this option.
Remove Python 3.8 from the release and Travis builds until parameterized is fixed to work in Python 3.8.
Remove my parameterized commit from this PR and try again when parameterized is working in Python 3.8 in Nix.
Ignore the build failures and merge anyway (probably not a good option, but mentioned for completeness).
If there's no one on the current rss2email team who is proficient enough with Nix to be able to handle an issue like this, then perhaps stop using Nix. 🤷‍♂️
Some other option I'm missing.

Thanks.

Profpatsch · 2020-02-13T18:07:59Z

Sorry, busy with dayjob. Will take a look later.

Profpatsch · 2020-02-16T00:38:08Z

rss2email/feed.py

+                    data['author'] = a
+                    break
+            except (AttributeError, KeyError, StopIteration):
+                pass


This change should go into a separate PR, as it’s adding new functionality.

Profpatsch · 2020-02-16T00:38:35Z

test/test.py

@@ -63,7 +63,7 @@ def __init__(self, *args, **kwargs):
        self.MESSAGE_ID_REGEXP = _re.compile(
            '^Message-ID: <[^@]*@dev.null.invalid>$', _re.MULTILINE)
        self.USER_AGENT_REGEXP = _re.compile(
-            r'^User-Agent: rss2email/[0-9.]* (\S*)$', _re.MULTILINE)
+            r'^User-Agent: rss2email/[0-9.]* \(https:\S*\)$', _re.MULTILINE)


Please open a new PR for this change, as it adds new functionality.

Profpatsch · 2020-02-16T00:40:06Z

test/test.py

@@ -74,61 +93,37 @@ def clean_result(self, text):
            text = regexp.sub(replacement, text)
        return text

-    def run_single_test(self, dirname=None, config_path=None, force=False):
-        if dirname is None:
+    @parameterized.expand(((p) for p in find_email_tests()))


Instead of blocking on this package, we can create one def test_<name> function per email which calls into test_send.

Profpatsch · 2020-02-16T00:50:13Z

test/test.py

+                _os.remove(self.data_path)
+            _os.rmdir(self.tmpdir)
+        finally:
+            _os.chdir(self.orig_dir)


Let’s remove any chdirs and use absolute paths instead.

If ExecContext doesn’t do path switching, we don’t need to rely on its magic presence at the right moments.

Profpatsch · 2020-02-16T00:50:28Z

test/test.py

+                _os.remove(self.cfg_path)
+            if _os.path.exists(self.data_path):
+                _os.remove(self.data_path)
+            _os.rmdir(self.tmpdir)


just remove the whole temporary directory instead?

Profpatsch · 2020-02-16T01:04:20Z

test/test.py

-                    config_path,
-                    '\n'.join(diff_lines)))
-
-    def test_send(self):


what happened to test_send?

Profpatsch · 2020-02-16T01:07:15Z

test/test.py

@@ -140,21 +135,30 @@ class ExecContext:
        context.call("run", "--no-send")

    """
-    def __init__(self, config):
+    def __init__(self, config=None):


What doesn’t pass a config?

Profpatsch · 2020-02-16T01:10:20Z

I think we can simplify this a lot by not making the ExecContext do magic setup.

kaashif · 2020-02-16T02:13:27Z

Hey @jikamens, sorry about all of the mess in the tests - what we have now is an improvement over what existed before, but they are still pretty low-coverage and flaky in some cases. Thanks for your efforts in this PR, bug fixes are always welcome.

I am pretty sure your test failure is due to #11 - feedparser doesn't order HTML attributes deterministically, so the hash of the content sometimes changes despite the content really not changing. I get this failure too sometimes, but I don't know any easy way to fix it (and I don't have much time to look into it). This test failure is long-standing, it predates the entire current team of maintainers.

My advice is to ignore anything to do with improving the existing tests, since fixing them up is probably not a good use of your time, we (the maintainers) should deal with that.

main developers of the package can meet me halfway here and write the tests?

Although the real main developers are gone, I agree with the sentiment in your comment here. The test suite needs serious cleanup and documentation and I started to do that in my other PR here: #94 before I noticed that your work overlapped with what I was trying to do.

To be clear, don't worry about the tests, I'll be writing some tests tomorrow anyway, I can write a few for this too. I can sense your frustration through your writing, which is a shame (for us) since you've already written some great bug fixes I want to get merged.

jikamens · 2020-03-07T14:49:32Z

Closing in favor of other pull requests.

jikamens mentioned this pull request Feb 6, 2020

Add --maximum option to run command, fix bugs in new user agent configurability #95

Closed

Profpatsch reviewed Feb 6, 2020

View reviewed changes

jikamens added 3 commits February 8, 2020 21:10

Redirect post-processor needs to get user agent from feed

8836c7c

The _USER_AGENT attribute has been removed from `rss2email.feed._USER_AGENT` and moved to each specific feed, so the redirect post-processor needs to get it from the new location.

jikamens force-pushed the user_agent_fixes branch from b642d2a to 82035ee Compare February 9, 2020 02:11

jikamens added 2 commits February 8, 2020 22:09

jikamens force-pushed the user_agent_fixes branch from 6369367 to 8baad9d Compare February 9, 2020 03:09

Pull author from media:credit element if it's present

3322949

See http://www.rssboard.org/media-rss#media-credit.

Profpatsch requested changes Feb 16, 2020

View reviewed changes

kaashif mentioned this pull request Feb 16, 2020

Fixes to the configurable user-agent changes introduced in 3.11 #100

Merged

jikamens closed this Mar 7, 2020

kaashif mentioned this pull request Mar 14, 2020

Lock the datafile at the start, only release at the end #94

Merged

auouymous mentioned this pull request Apr 21, 2022

Fix a couple of HTTP user agent regressions #216

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes to the configurable user-agent changes introduced in 3.11 #97

Fixes to the configurable user-agent changes introduced in 3.11 #97

jikamens commented Feb 6, 2020

Profpatsch Feb 6, 2020

jikamens Feb 6, 2020

Profpatsch Feb 6, 2020

jikamens Feb 6, 2020

ghost commented Feb 6, 2020 •

edited by ghost

Profpatsch commented Feb 6, 2020

jikamens commented Feb 6, 2020

Profpatsch commented Feb 7, 2020

Profpatsch commented Feb 7, 2020

jikamens commented Feb 8, 2020

Profpatsch commented Feb 8, 2020 •

edited

jikamens commented Feb 8, 2020

Profpatsch commented Feb 8, 2020 •

edited

jikamens commented Feb 9, 2020

jikamens commented Feb 9, 2020

jikamens commented Feb 9, 2020

jikamens commented Feb 13, 2020

Profpatsch commented Feb 13, 2020

Profpatsch Feb 16, 2020

Profpatsch Feb 16, 2020

Profpatsch Feb 16, 2020

Profpatsch Feb 16, 2020

Profpatsch Feb 16, 2020

Profpatsch Feb 16, 2020

Profpatsch Feb 16, 2020

Profpatsch Feb 16, 2020

Profpatsch commented Feb 16, 2020

kaashif commented Feb 16, 2020

jikamens commented Mar 7, 2020

Fixes to the configurable user-agent changes introduced in 3.11 #97

Fixes to the configurable user-agent changes introduced in 3.11 #97

Conversation

jikamens commented Feb 6, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ghost commented Feb 6, 2020 • edited by ghost

Profpatsch commented Feb 6, 2020

jikamens commented Feb 6, 2020

Profpatsch commented Feb 7, 2020

Profpatsch commented Feb 7, 2020

jikamens commented Feb 8, 2020

Profpatsch commented Feb 8, 2020 • edited

jikamens commented Feb 8, 2020

Profpatsch commented Feb 8, 2020 • edited

jikamens commented Feb 9, 2020

jikamens commented Feb 9, 2020

jikamens commented Feb 9, 2020

jikamens commented Feb 13, 2020

Profpatsch commented Feb 13, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Profpatsch commented Feb 16, 2020

kaashif commented Feb 16, 2020

jikamens commented Mar 7, 2020

ghost commented Feb 6, 2020 •

edited by ghost

Profpatsch commented Feb 8, 2020 •

edited

Profpatsch commented Feb 8, 2020 •

edited