New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
option for change User-Agent in http headers #154
Comments
|
Is this different from the existing |
According to documentation "user-agent" is used in outgoing mails, not http headers. Even if that patch, make it used in http headers, I am not sure if this value is proper for both purposes (so I used different option name). |
#120 was waiting for someone to write a regression test, but you are correct about the documentation saying I don't think there should be separate user-agent settings. The current setting would need to be renamed to Using separate values is only needed in order to spoof. You should instead contact the faulty feed and let them know that whitelisting is lazy and prevents access from valid clients. |
Sorry, could you explain? |
Setting UA to 'Mozilla/Linux' is lying (spoofing) about which client you are using. Anyone changing user-agent for security reasons would change both mail and http to the same obscure value. |
Sorry, if you use, #120 for changing user-agent in http headers is not "lying" about clientu you are using? Still not understand. |
Setting user-agent to nothing, 'none of your business' or simply 'rss2email' without a version isn't lying, it is just sharing less information with others who don't need to know which version or even which client you use. |
Ok I want to use: |
Re-reading #120 and #74, it sounds like the existing user-agent is mail-user-agent indeed, so there is a place for http-user-agent as this patch suggests. @kjonca Would it be possible for you to add documentation for this new setting (in r2e.1), and to add a test for this new feature so we don't break it unknowingly? |
Oh, I missed one comment! @auouymous Feeds requiring spoofing are unfortunately numerous, and trying to fight them all is way more work than adding an HTTP User-Agent parameter. This being said, maybe the documentation for user-agent should mention that people should contact the feed owner before using it? This way, people could read feeds, and at the same time broken feeds would hopefully be notified (even though most feed owners probably have literally no idea how their feed is being operated) |
Blocking clients with an empty UA is common for web servers so it is good to not allow the user to set a blank UA. But the man page and config comment should mention that leaving this blank will advertise as feedparser with feedparser's version.
|
Sorry for my English. For test I need some time, as I am not a python expert. |
@kjonca The documentation you added looks good to me, thank you :) Do you still plan on adding a test (which can be added using the procedure described in the top post of #159)? Also, would it be possible for you to either open a pull request with your code changes, or if not to consolidate all the patches and suggestions in a single patch, so it's possible to review without having to go through currently 3, hopefully 4, code blocks? :) |
What exactly would a test for this do? Can tests setup a local web server and compare the headers sent to it? |
Yes we do :) In test/test.py, the TestFetch task group spins up a webserver. Adjusting the NoLogHandler to return a special feed upon request with a certain user-agent, and then adding a test that uses said user-agent to fetch the feed at an otherwise non-existent URL, should be a reasonable way to make it I think |
Sorry for delay. I forked this repository and tried to run tests (before making any changes) but they
|
Hmm this is weird… how are you attempting to test? Here is a transcript of a local run:
|
(I did not use nix-shell, I have to check it out) You are in 'detached HEAD' state. You can look around, make experimental If you want to create a new branch to retain commits you create, you may git switch -c Or undo this operation with: git switch - Turn off this advice by setting config variable advice.detachedHead to false HEAD is now at e2a6d91 Add wrap-links option to prevent links from wrapping over multiple lines (#172) $python3 --version |
Hmm… do you have all the development packages installed? If you have nix installed nix-shell does it for you, but if not I think you have to use poetry to install the things, potentially in a virtualenv — it's been a very long time I haven't developed without nix, so I'm a bit fuzzy on how things work nowadays, especially as IIRC poetry wasn't a thing last I did python dev pre-nix |
Some sites refuses fetching rss, when user-agent is not known to them. I prepared simple change which allow define "http-user-agent-option" and pass it to parser (diff is against debian version but I think this is no issue)
The text was updated successfully, but these errors were encountered: