-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes to the configurable user-agent changes introduced in 3.11 #97
Conversation
@@ -61,7 +61,7 @@ def process(feed, parsed, entry, guid, message): | |||
for link in links: | |||
try: | |||
request = urllib.request.Request(link) | |||
request.add_header('User-agent', rss2email.feed._USER_AGENT) | |||
request.add_header('User-agent', feed.user_agent) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don’t hit this line with our tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
git grep redirect
in the test
directory returns no matches, so it does seem that way.
feed. We've fixed that problem, but we want to go back now and | ||
repair feeds that got the wrong user agent value in the | ||
interim. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I reading this right, and instead of __VERSION__
the previous code would serialize the substituted value into the configuration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Here's what happens in the v3.11 code (without this PR) when a feed is added:
- Config is loaded.
__VERSION__
and__URL__
are substituted into the user agent string at that time.- When the config for the feed is saved, the code recognizes that the value of this setting for the feed -- since it has been substituted -- is different for the value in the
DEFAULT
section, so it saves the feed-specific value.
Please see https://github.com/rss2email/rss2email/blob/master/test/allthingsrss/4.config and https://github.com/rss2email/rss2email/blob/master/test/allthingsrss/4.expected for testing. Edit: |
Let’s add them in these fixes then. |
Here's what happens when I try to run
Here's what happens when I try
I'm pretty sure I'm running the commands as specified in test/README. Both of these errors occur on master, not just on the branch with my changes, so they are not related to my changes. I am happy to update my PR to include tests, but I do not have the bandwidth to make the test framework as a prerequisite for that, so please either (a) tell me what I am doing wrong when trying to invoke the test framework or (b) fix it if it's broken. Thanks. |
The README seems outdated.
works for me. |
I have a correction up at #98 |
Tests are still failing on master for me:
Y'all let me know when you have tests actually passing on master and I'll work on adding tests for my changes. I don't want to have to battle someone else's failing tests while trying to add my own, since that'll make it difficult for me to understand what failures are my fault. |
Which version of python are you using, and what’s the python path? If you look at the CI, we are running tests against Inside the nix-shell (See hacking.md) the following dependencies will be available:
The
maybe we need to revise that. |
Oh, boy, another package manager to deal with, just what I always wanted. OK, now I've installed nix and run
So even in the nix shell there is a disconnect between how your documentation says the tests work and how they work for me. Here is the output of the command you gave above on my system inside the nix shell where I'm running the tests:
This is getting frustrating. I'm trying to do the right thing here -- you said you wanted tests, I'm trying to add tests, because as someone who has maintained open-source software for over 30 years, I know how frustrating it is when people submit incomplete patches -- but these roadblocks are making this take a lot more time than I had hoped to spend on this. Given that the patches I've submitted reflect two different, real bugs in your code, and my efforts to get the tests working so I can add the tests have been repeatedly stymied, maybe one of the main developers of the package can meet me halfway here and write the tests? |
There are no main developers in this project at the moment, we are trying to maintain the existing code as best we can in our spare time. Incidentally, the main original author, Aaron Swartz, no longer rests among the living. We have been trying our best consolidating the different forks and patches that have been lying around for years, but there is only so much time people have been able to spare to push this project into a working state again.
… runs the tests for me inside the nix shell. Referencing a config throws the error you described, it looks to me like the docs were wrong in this case (they are pretty old, from before this maintainership effort). @kaashif has restructured the test suite a bit, maybe they can give more input on what the intended usage is. |
I had no idea Aaron was the original author of this project. Now I'm depressed. :-/ Note that when I run |
The _USER_AGENT attribute has been removed from `rss2email.feed._USER_AGENT` and moved to each specific feed, so the redirect post-processor needs to get it from the new location.
If we substitute `__VERSION__` and `__URL__` in the user agent string when it is loaded from the configuration file, then whenever a new feed is added it will end up with the current version number and URL hard-coded into its `user-agent` setting, which means that even when either of those changes in a new version of rss2email, that feed will keep using the old ones. This fix changes that by doing the substitution at the time the user agent string is actually used, i.e., included in an HTTP request or email header, rather than when the config file is loaded.
Because we were substituting `__VERSION__` and `__URL__` into the user agent string when the config was read in 3.11, any feed added with 3.11 would end up with a per-feed user-agent setting with substituted current values, so that feed would keep using those values even when the version or URL of rss2email changed. That bug is fixed in another commit, but now we want to put in some migration code to repair feeds that were broken in the interim. We can't get this 100% right since we can only safely do the repair when the user is using the default user-agent setting, but that will cover most users so it is worth doing.
b642d2a
to
82035ee
Compare
OK, additional info... I understand much better how the tests work, and I've added an additional commit to this branch which I believe makes them work much better now. See the commit message for the second to last commit on this branch for details. Upon further investigation I determined that the current tests were testing that Testing the "User agent needs to be substituted at use time, not at load time" is harder and not easily doable within the current test framework so I'm not adding a test for that at this time. |
Use the `parameterized` module to treat each `.config` file as a separate test, so that it's clear which of them is failing, and so that it is possible to use the functionality built into `unittest` for running subsets of the tests or even individual tests. While I was implementing this I discovered that all of the test cases were depending on a call to `_os.chdir` being done in just one of them, which is problematic if you're only trying to execute a subset of the tests and also because it's a bad idea to rely on the order in which tests execute, so I moved the `_os.chdir` into `ExecContext` and made sure all the code which depends on the current directory being changed is wrapped within an `ExecContext`. This commit also puts code into `test.py` for adjusting the python search path as necessary so that the user calling the tests no longer needs to do that, and updates `tests/README` to reflect that change and update the instructions for running a subset of the tests to reflect that they are now being executed by the `unittest` framework.
The regexp we were using to clean up the user agent string didn't have backslashes on the parentheses so it wasn't actually making sure they were there, and it wasn't actually confirming that `__VERSION__` had been substituted properly with a URL. These problems have both been fixed.
6369367
to
8baad9d
Compare
The build is failing because there are problems in the I have spent over an hour now trying to figure out how to modify Perhaps whoever decided to add |
I am not sure what to do at this point. I need some guidance. I believe my commit to use parameterized.expand to run the individual test cases as separate tests is a good and correct change which improves the code base. I have several other good and useful fixes on this branch. However, it can't pass right now because the parameterized module doesn't work properly in Nix for Python 3.8 (the unit tests for the module don't pass, though the module actually works fine in Python 3.8 for our purposes if that is ignored), and the Travis build includes Python 3.8. Options:
Thanks. |
Sorry, busy with dayjob. Will take a look later. |
data['author'] = a | ||
break | ||
except (AttributeError, KeyError, StopIteration): | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change should go into a separate PR, as it’s adding new functionality.
@@ -63,7 +63,7 @@ def __init__(self, *args, **kwargs): | |||
self.MESSAGE_ID_REGEXP = _re.compile( | |||
'^Message-ID: <[^@]*@dev.null.invalid>$', _re.MULTILINE) | |||
self.USER_AGENT_REGEXP = _re.compile( | |||
r'^User-Agent: rss2email/[0-9.]* (\S*)$', _re.MULTILINE) | |||
r'^User-Agent: rss2email/[0-9.]* \(https:\S*\)$', _re.MULTILINE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please open a new PR for this change, as it adds new functionality.
@@ -74,61 +93,37 @@ def clean_result(self, text): | |||
text = regexp.sub(replacement, text) | |||
return text | |||
|
|||
def run_single_test(self, dirname=None, config_path=None, force=False): | |||
if dirname is None: | |||
@parameterized.expand(((p) for p in find_email_tests())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of blocking on this package, we can create one def test_<name>
function per email which calls into test_send.
_os.remove(self.data_path) | ||
_os.rmdir(self.tmpdir) | ||
finally: | ||
_os.chdir(self.orig_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let’s remove any chdirs and use absolute paths instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If ExecContext
doesn’t do path switching, we don’t need to rely on its magic presence at the right moments.
_os.remove(self.cfg_path) | ||
if _os.path.exists(self.data_path): | ||
_os.remove(self.data_path) | ||
_os.rmdir(self.tmpdir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just remove the whole temporary directory instead?
config_path, | ||
'\n'.join(diff_lines))) | ||
|
||
def test_send(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happened to test_send
?
@@ -140,21 +135,30 @@ class ExecContext: | |||
context.call("run", "--no-send") | |||
|
|||
""" | |||
def __init__(self, config): | |||
def __init__(self, config=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What doesn’t pass a config?
I think we can simplify this a lot by not making the |
Hey @jikamens, sorry about all of the mess in the tests - what we have now is an improvement over what existed before, but they are still pretty low-coverage and flaky in some cases. Thanks for your efforts in this PR, bug fixes are always welcome. I am pretty sure your test failure is due to #11 - feedparser doesn't order HTML attributes deterministically, so the hash of the content sometimes changes despite the content really not changing. I get this failure too sometimes, but I don't know any easy way to fix it (and I don't have much time to look into it). This test failure is long-standing, it predates the entire current team of maintainers. My advice is to ignore anything to do with improving the existing tests, since fixing them up is probably not a good use of your time, we (the maintainers) should deal with that.
Although the real main developers are gone, I agree with the sentiment in your comment here. The test suite needs serious cleanup and documentation and I started to do that in my other PR here: #94 before I noticed that your work overlapped with what I was trying to do. To be clear, don't worry about the tests, I'll be writing some tests tomorrow anyway, I can write a few for this too. I can sense your frustration through your writing, which is a shame (for us) since you've already written some great bug fixes I want to get merged. |
Closing in favor of other pull requests. |
No description provided.