Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes to the configurable user-agent changes introduced in 3.11 #97

Closed
wants to merge 6 commits into from

Conversation

jikamens
Copy link
Contributor

@jikamens jikamens commented Feb 6, 2020

No description provided.

@@ -61,7 +61,7 @@ def process(feed, parsed, entry, guid, message):
for link in links:
try:
request = urllib.request.Request(link)
request.add_header('User-agent', rss2email.feed._USER_AGENT)
request.add_header('User-agent', feed.user_agent)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don’t hit this line with our tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

git grep redirect in the test directory returns no matches, so it does seem that way.

feed. We've fixed that problem, but we want to go back now and
repair feeds that got the wrong user agent value in the
interim.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I reading this right, and instead of __VERSION__ the previous code would serialize the substituted value into the configuration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Here's what happens in the v3.11 code (without this PR) when a feed is added:

  1. Config is loaded.
  2. __VERSION__ and __URL__ are substituted into the user agent string at that time.
  3. When the config for the feed is saved, the code recognizes that the value of this setting for the feed -- since it has been substituted -- is different for the value in the DEFAULT section, so it saves the feed-specific value.

@ghost
Copy link

ghost commented Feb 6, 2020

Please see https://github.com/rss2email/rss2email/blob/master/test/allthingsrss/4.config and https://github.com/rss2email/rss2email/blob/master/test/allthingsrss/4.expected for testing.

Edit:
Ah, I see that the proper thing to do is add __VERSION__ and __URL__ to the test cases.

@Profpatsch
Copy link
Contributor

Ah, I see that the proper thing to do is add __VERSION__ and __URL__ to the test cases.

Let’s add them in these fixes then.

@jikamens
Copy link
Contributor Author

jikamens commented Feb 6, 2020

Here's what happens when I try to run PYTHONPATH=. ./test/test.py:

E..
======================================================================
ERROR: test_send (__main__.TestEmails)
Emails generated from already-fetched feed data are correct
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test/test.py", line 129, in test_send
    self.run_single_test(dirname=this_path)
  File "./test/test.py", line 83, in run_single_test
    self.run_single_test(dirname=dirname, config_path=config_path, force=force)
  File "./test/test.py", line 109, in run_single_test
    '\n'.join(diff_lines)))
ValueError: error processing disqus/1.config
--- expected
+++ generated
@@ -39,7 +39,7 @@
 Message-ID: <...@dev.null.invalid>
 User-Agent: rss2email/...
 X-RSS-Feed: disqus/feed.rss
-X-RSS-ID: ab03f2100069a1cd0876b997be87976c18d48e8a
+X-RSS-ID: a52375ec78a988241fe9864a2243d4d910538d52
 X-RSS-URL: http://software-carpentry.org/2012/11/who-wants-to-write-a-little-code/#comment-713578640
 
 @Hans-Martin  

----------------------------------------------------------------------
Ran 3 tests in 1.705s

FAILED (errors=1)

Here's what happens when I try PYTHONPATH=. ./test/test.py test/allthingsrss/4.config:

E
======================================================================
ERROR: test/allthingsrss/4 (unittest.loader._FailedTest)
----------------------------------------------------------------------
AttributeError: module '__main__' has no attribute 'test/allthingsrss/4'

----------------------------------------------------------------------
Ran 1 test in 0.000s

FAILED (errors=1)

I'm pretty sure I'm running the commands as specified in test/README.

Both of these errors occur on master, not just on the branch with my changes, so they are not related to my changes.

I am happy to update my PR to include tests, but I do not have the bandwidth to make the test framework as a prerequisite for that, so please either (a) tell me what I am doing wrong when trying to invoke the test framework or (b) fix it if it's broken. Thanks.

@Profpatsch
Copy link
Contributor

The README seems outdated.

env "PATH=$(pwd):$PATH" "PYTHONPATH=$(pwd):$PYTHONPATH" ./test/test.py

works for me.

@Profpatsch
Copy link
Contributor

I have a correction up at #98

@jikamens
Copy link
Contributor Author

jikamens commented Feb 8, 2020

Tests are still failing on master for me:

$ env "PATH=$(pwd):$PATH" "PYTHONPATH=$(pwd):$PYTHONPATH" ./test/test.py
E..
======================================================================
ERROR: test_send (__main__.TestEmails)
Emails generated from already-fetched feed data are correct
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test/test.py", line 129, in test_send
    self.run_single_test(dirname=this_path)
  File "./test/test.py", line 83, in run_single_test
    self.run_single_test(dirname=dirname, config_path=config_path, force=force)
  File "./test/test.py", line 109, in run_single_test
    '\n'.join(diff_lines)))
ValueError: error processing disqus/1.config
--- expected
+++ generated
@@ -39,7 +39,7 @@
 Message-ID: <...@dev.null.invalid>
 User-Agent: rss2email/...
 X-RSS-Feed: disqus/feed.rss
-X-RSS-ID: ab03f2100069a1cd0876b997be87976c18d48e8a
+X-RSS-ID: a52375ec78a988241fe9864a2243d4d910538d52
 X-RSS-URL: http://software-carpentry.org/2012/11/who-wants-to-write-a-little-code/#comment-713578640
 
 @Hans-Martin  

----------------------------------------------------------------------
Ran 3 tests in 1.531s

FAILED (errors=1)

Y'all let me know when you have tests actually passing on master and I'll work on adding tests for my changes. I don't want to have to battle someone else's failing tests while trying to add my own, since that'll make it difficult for me to understand what failures are my fault.

@Profpatsch
Copy link
Contributor

Profpatsch commented Feb 8, 2020

Which version of python are you using, and what’s the python path?

If you look at the CI, we are running tests against threefour different python versions, so I’m assuming the environment is different.

Inside the nix-shell (See hacking.md) the following dependencies will be available:

> printenv PYTHONPATH | tr ':' '\n' | sort | uniq
/nix/store/3y9iqi6bk24i5jpfi6c7qb9950jjid3g-python3.7-html2text-2018.1.9/lib/python3.7/site-packages
/nix/store/k5rdcbcwwpvj7l9f1yvd5mfggcfz16kk-python3-3.7.5/lib/python3.7/site-packages
/nix/store/pza4bi76fq3qlx8qs7p1a8rkvfv012vb-update-copyright-0.6.2/lib/python3.7/site-packages
/nix/store/q0ssx731z2zlrgsswxy7dsy6x7shisfr-python3.7-feedparser-5.2.1/lib/python3.7/site-packages

The setup.py specifies

    install_requires=[
        'feedparser>=5.0.1',
        'html2text>=3.0.1',
        ],

maybe we need to revise that.

@jikamens
Copy link
Contributor Author

jikamens commented Feb 8, 2020

Oh, boy, another package manager to deal with, just what I always wanted.

OK, now I've installed nix and run nix-shell and run env "PATH=$(pwd):$PATH" "PYTHONPATH=$(pwd):$PYTHONPATH" ./test/test.py inside that shell. Now the error above is no longer happening, but it's only running three tests, which is clearly much fewer tests than there actually are. Furthermore, neither env "PATH=$(pwd):$PATH" "PYTHONPATH=$(pwd):$PYTHONPATH" ./test/test.py test/allthingsrss nor env "PATH=$(pwd):$PATH" "PYTHONPATH=$(pwd):$PYTHONPATH" ./test/test.py test/allthingsrss test/allthingsrss/1.config work inside the nix shell. For example:

[nix-shell:~/src/rss2email-1]$ env "PATH=$(pwd):$PATH" "PYTHONPATH=$(pwd):$PYTHONPATH" ./test/test.py test/allthingsrss/1.config 
E
======================================================================
ERROR: test/allthingsrss/1 (unittest.loader._FailedTest)
----------------------------------------------------------------------
AttributeError: module '__main__' has no attribute 'test/allthingsrss/1'

----------------------------------------------------------------------
Ran 1 test in 0.000s

FAILED (errors=1)

So even in the nix shell there is a disconnect between how your documentation says the tests work and how they work for me.

Here is the output of the command you gave above on my system inside the nix shell where I'm running the tests:

[nix-shell:~/src/rss2email-1]$ printenv PYTHONPATH | tr ':' '\n' | sort | uniq
/nix/store/3y9iqi6bk24i5jpfi6c7qb9950jjid3g-python3.7-html2text-2018.1.9/lib/python3.7/site-packages
/nix/store/k5rdcbcwwpvj7l9f1yvd5mfggcfz16kk-python3-3.7.5/lib/python3.7/site-packages
/nix/store/pza4bi76fq3qlx8qs7p1a8rkvfv012vb-update-copyright-0.6.2/lib/python3.7/site-packages
/nix/store/q0ssx731z2zlrgsswxy7dsy6x7shisfr-python3.7-feedparser-5.2.1/lib/python3.7/site-packages

This is getting frustrating. I'm trying to do the right thing here -- you said you wanted tests, I'm trying to add tests, because as someone who has maintained open-source software for over 30 years, I know how frustrating it is when people submit incomplete patches -- but these roadblocks are making this take a lot more time than I had hoped to spend on this. Given that the patches I've submitted reflect two different, real bugs in your code, and my efforts to get the tests working so I can add the tests have been repeatedly stymied, maybe one of the main developers of the package can meet me halfway here and write the tests?

@Profpatsch
Copy link
Contributor

Profpatsch commented Feb 8, 2020

Given that the patches I've submitted reflect two different, real bugs in your code, and my efforts to get the tests working so I can add the tests have been repeatedly stymied, maybe one of the main developers of the package can meet me halfway here and write the tests?

There are no main developers in this project at the moment, we are trying to maintain the existing code as best we can in our spare time. Incidentally, the main original author, Aaron Swartz, no longer rests among the living.

We have been trying our best consolidating the different forks and patches that have been lying around for years, but there is only so much time people have been able to spare to push this project into a working state again.

$ env "PATH=$(pwd):$PATH" "PYTHONPATH=$(pwd):$PYTHONPATH" ./test/test.py

… runs the tests for me inside the nix shell.

Referencing a config throws the error you described, it looks to me like the docs were wrong in this case (they are pretty old, from before this maintainership effort). @kaashif has restructured the test suite a bit, maybe they can give more input on what the intended usage is.

@jikamens
Copy link
Contributor Author

jikamens commented Feb 9, 2020

I had no idea Aaron was the original author of this project. Now I'm depressed. :-/

Note that when I run nix-build -A pythonVersions.rss2email-python_3_7 nix/release.nix that also only runs three tests. I gather it should be running a lot more than that? I honestly don't have any idea where to go from here.

The _USER_AGENT attribute has been removed from
`rss2email.feed._USER_AGENT` and moved to each specific feed, so the
redirect post-processor needs to get it from the new location.
If we substitute `__VERSION__` and `__URL__` in the user agent string
when it is loaded from the configuration file, then whenever a new
feed is added it will end up with the current version number and URL
hard-coded into its `user-agent` setting, which means that even when
either of those changes in a new version of rss2email, that feed will
keep using the old ones.

This fix changes that by doing the substitution at the time the user
agent string is actually used, i.e., included in an HTTP request or
email header, rather than when the config file is loaded.
Because we were substituting `__VERSION__` and `__URL__` into the user
agent string when the config was read in 3.11, any feed added with
3.11 would end up with a per-feed user-agent setting with substituted
current values, so that feed would keep using those values even when
the version or URL of rss2email changed. That bug is fixed in another
commit, but now we want to put in some migration code to repair feeds
that were broken in the interim. We can't get this 100% right since we
can only safely do the repair when the user is using the default
user-agent setting, but that will cover most users so it is worth
doing.
@jikamens
Copy link
Contributor Author

jikamens commented Feb 9, 2020

OK, additional info...

I understand much better how the tests work, and I've added an additional commit to this branch which I believe makes them work much better now. See the commit message for the second to last commit on this branch for details.

Upon further investigation I determined that the current tests were testing that __VERSION__ and __URL__ were substituted properly in the user agent string, but they weren't doing quite a good enough job of it. I've added an additional commit to fix this; that's the last commit on the branch.

Testing the "User agent needs to be substituted at use time, not at load time" is harder and not easily doable within the current test framework so I'm not adding a test for that at this time.

Use the `parameterized` module to treat each `.config` file as a
separate test, so that it's clear which of them is failing, and so
that it is possible to use the functionality built into `unittest` for
running subsets of the tests or even individual tests.

While I was implementing this I discovered that all of the test cases
were depending on a call to `_os.chdir` being done in just one of
them, which is problematic if you're only trying to execute a subset
of the tests and also because it's a bad idea to rely on the order in
which tests execute, so I moved the `_os.chdir` into `ExecContext` and
made sure all the code which depends on the current directory being
changed is wrapped within an `ExecContext`.

This commit also puts code into `test.py` for adjusting the python
search path as necessary so that the user calling the tests no longer
needs to do that, and updates `tests/README` to reflect that change
and update the instructions for running a subset of the tests to
reflect that they are now being executed by the `unittest` framework.
The regexp we were using to clean up the user agent string didn't have
backslashes on the parentheses so it wasn't actually making sure they
were there, and it wasn't actually confirming that `__VERSION__` had
been substituted properly with a URL. These problems have both been
fixed.
@jikamens
Copy link
Contributor Author

jikamens commented Feb 9, 2020

The build is failing because there are problems in the parameterized unit tests for Python 3.8. They apparently do not affect the functionality of the package.

I have spent over an hour now trying to figure out how to modify nix/release.nix not to run the unit tests for parameterized when building it for Python 3.8. It's supposedly possible to do this, but apparently I'm too stupid to understand nix's configuration language and syntax well enough to be able to figure it out despite banging my head against it for over an hour.

Perhaps whoever decided to add nix to rss2email understands it well enough to do in a few minutes what I was unable to do despite over an hour of trying to figure it out?

@jikamens
Copy link
Contributor Author

I am not sure what to do at this point. I need some guidance.

I believe my commit to use parameterized.expand to run the individual test cases as separate tests is a good and correct change which improves the code base.

I have several other good and useful fixes on this branch.

However, it can't pass right now because the parameterized module doesn't work properly in Nix for Python 3.8 (the unit tests for the module don't pass, though the module actually works fine in Python 3.8 for our purposes if that is ignored), and the Travis build includes Python 3.8.

Options:

  1. Fix the Nix build not to run unit tests for parameterized for Python 3.8, or heck just not to bother running the unit tests for parameterized at all. I tried and failed at this. I asked here if anyone can help and no one answered. I posted in the Nix discourse asking for help and no one responded there either. I have hit a brick wall with this option.

  2. Remove Python 3.8 from the release and Travis builds until parameterized is fixed to work in Python 3.8.

  3. Remove my parameterized commit from this PR and try again when parameterized is working in Python 3.8 in Nix.

  4. Ignore the build failures and merge anyway (probably not a good option, but mentioned for completeness).

  5. If there's no one on the current rss2email team who is proficient enough with Nix to be able to handle an issue like this, then perhaps stop using Nix. 🤷‍♂️

  6. Some other option I'm missing.

Thanks.

@Profpatsch
Copy link
Contributor

Sorry, busy with dayjob. Will take a look later.

data['author'] = a
break
except (AttributeError, KeyError, StopIteration):
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change should go into a separate PR, as it’s adding new functionality.

@@ -63,7 +63,7 @@ def __init__(self, *args, **kwargs):
self.MESSAGE_ID_REGEXP = _re.compile(
'^Message-ID: <[^@]*@dev.null.invalid>$', _re.MULTILINE)
self.USER_AGENT_REGEXP = _re.compile(
r'^User-Agent: rss2email/[0-9.]* (\S*)$', _re.MULTILINE)
r'^User-Agent: rss2email/[0-9.]* \(https:\S*\)$', _re.MULTILINE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please open a new PR for this change, as it adds new functionality.

@@ -74,61 +93,37 @@ def clean_result(self, text):
text = regexp.sub(replacement, text)
return text

def run_single_test(self, dirname=None, config_path=None, force=False):
if dirname is None:
@parameterized.expand(((p) for p in find_email_tests()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of blocking on this package, we can create one def test_<name> function per email which calls into test_send.

_os.remove(self.data_path)
_os.rmdir(self.tmpdir)
finally:
_os.chdir(self.orig_dir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let’s remove any chdirs and use absolute paths instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If ExecContext doesn’t do path switching, we don’t need to rely on its magic presence at the right moments.

_os.remove(self.cfg_path)
if _os.path.exists(self.data_path):
_os.remove(self.data_path)
_os.rmdir(self.tmpdir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just remove the whole temporary directory instead?

config_path,
'\n'.join(diff_lines)))

def test_send(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happened to test_send?

@@ -140,21 +135,30 @@ class ExecContext:
context.call("run", "--no-send")

"""
def __init__(self, config):
def __init__(self, config=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What doesn’t pass a config?

@Profpatsch
Copy link
Contributor

I think we can simplify this a lot by not making the ExecContext do magic setup.

@kaashif
Copy link
Contributor

kaashif commented Feb 16, 2020

Hey @jikamens, sorry about all of the mess in the tests - what we have now is an improvement over what existed before, but they are still pretty low-coverage and flaky in some cases. Thanks for your efforts in this PR, bug fixes are always welcome.

I am pretty sure your test failure is due to #11 - feedparser doesn't order HTML attributes deterministically, so the hash of the content sometimes changes despite the content really not changing. I get this failure too sometimes, but I don't know any easy way to fix it (and I don't have much time to look into it). This test failure is long-standing, it predates the entire current team of maintainers.

My advice is to ignore anything to do with improving the existing tests, since fixing them up is probably not a good use of your time, we (the maintainers) should deal with that.

main developers of the package can meet me halfway here and write the tests?

Although the real main developers are gone, I agree with the sentiment in your comment here. The test suite needs serious cleanup and documentation and I started to do that in my other PR here: #94 before I noticed that your work overlapped with what I was trying to do.

To be clear, don't worry about the tests, I'll be writing some tests tomorrow anyway, I can write a few for this too. I can sense your frustration through your writing, which is a shame (for us) since you've already written some great bug fixes I want to get merged.

@jikamens
Copy link
Contributor Author

jikamens commented Mar 7, 2020

Closing in favor of other pull requests.

@jikamens jikamens closed this Mar 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants