Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent cache polution #675

Merged
merged 1 commit into from
Apr 12, 2021
Merged

Conversation

Alkarex
Copy link
Contributor

@Alkarex Alkarex commented Mar 28, 2021

Retrieving a feed with distinct cURL parameters such as user-agent might give different responses, which should not share the same cache.

Follow-up and generalisation of #643

The code to get the cache filename is extracted in its own new function SimplePie::get_cache_filename($url) so that clients can easily get this information as well.

Downstream PR: FreshRSS/FreshRSS#3502

Retrieving a feed with distinct cURL parameters such as user-agent might give different responses, which should not share the same cache.

Follow-up and generalisation of simplepie#643

The code to get the cache filename is extracted in its own new fuction `SimplePie::get_cache_filename($url)` so that clients can easily get this information as well.

Downstream PR: FreshRSS/FreshRSS#3502
// Append custom parameters to the URL to avoid cache pollution in case of multiple calls with different parameters.
$url .= $this->force_feed ? '#force_feed' : '';
$options = array();
if ($this->timeout != 10)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those tests are to keep the best possible compatibility with existing cache filenames: if not using custom cURL options, the cache filenames should be unchanged

if (!empty($options))
{
ksort($options);
$url .= '#' . urlencode(var_export($options, true));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The modified URL is still a valid URL, in case a custom cache_name_function() might have such an expectation

@mblaney
Copy link
Member

mblaney commented Apr 11, 2021

could this be optional? I don't feel that changing your agent necessarily means you want to cache both versions, but this is enforcing it. I think you will also need to move force_feed to your options array, because you could end up with two # segments in the url.

@Alkarex
Copy link
Contributor Author

Alkarex commented Apr 11, 2021

  1. Yes, we can make an option if you want. I am not sure what use-case would be addressed by such an option though. In any case, I think it would be more standard and less error prone to not share cache by default. As standard, HTTP caches are not supposed to be shared with different query parameters, and should even obey things such as Vary in the response. If you confirm that you rather want an option, and if yes what the default should be, I will edit accordingly.

  2. Extra hashes are legal in an URL fragment, e.g. https://example.net/#one#two#three . See e.g. https://tools.ietf.org/html/rfc3986#appendix-B , any character is allowed after the first hash sign. The corresponding PHP function parses it just fine:

<?php
$url = 'https://example.net/#one#two#three';
print_r(parse_url($url));
Array
(
    [scheme] => https
    [host] => example.net
    [path] => /
    [fragment] => one#two#three
)

@mblaney
Copy link
Member

mblaney commented Apr 12, 2021

ok fair enough, I was thinking caches might grow unexpectedly from this change but that's really only a problem if you're changing the agent options a lot. thanks for the update!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants