Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use correct Cache-Tag (CDN) and X-RTD-Project header on subprojects #7593

Merged
merged 1 commit into from Oct 26, 2020

Conversation

humitos
Copy link
Member

@humitos humitos commented Oct 22, 2020

Currently, when accessing a subproject (eg. edx-guide-for-students being
subproject of edx) we are generating Cache-Tag: edx,edx-latest. Then, when
a build for edx-guide-for-students on version latest succeeds, we are purging
the cache tags edx-guide-for-students-latest.

This commit makes the Cache-Tag generated for subprojects to use the
subproject's slug instead of the parent's/superproject's slug and matching the
cache tag that we are purging when the build succeeds.

Besides, it fixes the X-RTD-Project header as well, since it was using the
same parent's/superproject's slug as value.

Fixes #7590

Currently, when accessing a subproject (eg. `edx-guide-for-students` being
subproject of `edx`) we are generating `Cache-Tag: edx,edx-latest`. Then, when
a build for `edx-guide-for-students` on version `latest` succeeds, we are purging
the cache tags `edx-guide-for-students-latest`.

This commit makes the `Cache-Tag` generated for subprojects to use the
subproject's slug instead of the parent's/superproject's slug and matching the
cache tag that we are purging when the build succeeds.

Besides, it fixes the `X-RTD-Project` header as well, since it was using the
same parent's/superproject's slug as value.
@humitos humitos requested review from ericholscher and a team October 22, 2020 17:35
# I think it's not accurate saying that it's `subdomain` the method
# that we use to get the project slug here, since it was in fact the
# URL's path but we don't have that feature built
self.assertEqual(r['X-RTD-Project-Method'], 'subdomain')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not related to this PR, but I wanted to make a note here, tho. Maybe a TODO where we set this header is better?

if hasattr(request, 'rtdheader'):
response['X-RTD-Project-Method'] = 'rtdheader'
elif hasattr(request, 'subdomain'):
response['X-RTD-Project-Method'] = 'subdomain'
elif hasattr(request, 'cname'):
response['X-RTD-Project-Method'] = 'cname'

This header is wrong to all the subprojects when they are accessed via the domain of their parent's, eg. project.readthedocs.io/projects/subproject/en/latest/

@@ -120,7 +120,7 @@ class ProxitoMiddleware(MiddlewareMixin):
def add_proxito_headers(self, request, response):
"""Add debugging headers to proxito responses."""

project_slug = getattr(request, 'host_project_slug', '')
project_slug = getattr(request, 'path_project_slug', '')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we not need to use host_project_slug sometimes as well? It seems like this is breaking existing logic, unless path_project_slug is getting set other places?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we not need to use host_project_slug sometimes as well?

As a fallback, you mean? In case that path_project_slug is not available? Why it would not be available?

It seems like this is breaking existing logic, unless path_project_slug is getting set other places?

What existing logic do you think it's breaking? If you have an example on mind I can try to write a test case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using the same logic for version_path_project that follows this line. It's used here and set in _get_project_data_from_request (

request.path_version_slug = version_slug
) which is the first function that is called when serving a page at
final_project, lang_slug, version_slug, filename = _get_project_data_from_request( # noqa
request,
project_slug=project_slug,
subproject_slug=subproject_slug,
lang_slug=lang_slug,
version_slug=version_slug,
filename=filename,
)

The host_project_slug is returned by map_host_project_slug which is accurate,

ret = map_host_to_project_slug(request)
. However, the project slug we get from the host sometimes (in subprojects or translations, for example) is not the project slug being served.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, are other uses of host_project_slug unaffected by this issue? It just seems bad to have 2 different variables for the project_slug, where we use them in different places.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just seems bad to have 2 different variables for the project_slug, where we use them in different places.

Well, if I'm understand correctly this they are different things:

  • host_project_slug is the project slug that we infer from the host/domain's URL
    • not accurate for subdomain/translations
    • it's fine for custom domains and .readthedocs.io
    • it's set at the beginning of the request in the middleware
  • path_project_slug is the project slug that we can infer from the path's URL mainly
    • it's good for subdomain/translations
    • it's the same as what we have been calling final_project (maybe it's better to call it final_project_slug?)
    • it's set in the _get_project_data_from_request which is called from inside the ServeDocs view in Proxito as the first step of the request after all the middlewares

It seems that both are required. We use host_project_slug as first step trying to get the Project object (map_project_slug decorator). Then, inside the _get_project_data_from_request we use host_project_slug to check single project, translation and subproject and set the path_project_slug which in those cases could be different than host_project_slug.

Gotcha, are other uses of host_project_slug unaffected by this issue?

Grepping the code I didn't find anything that clearly needs to be adapted. Maybe there is something deeper that I'm not seeing, tho

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me. We should probably document all these variables (as you have done above) in the code/docs, so we don't get confused in the future.

Copy link
Member

@ericholscher ericholscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me, though we should audit to make sure we aren't creating additional bugs by not using the same method of resolving the project_slug everywhere.

@ericholscher
Copy link
Member

Gonna merge this, but @humitos please check the other places this might be needed.

@ericholscher ericholscher merged commit 4cf4aca into master Oct 26, 2020
2 checks passed
@ericholscher ericholscher deleted the humitos/cache-tag-subprojects branch October 26, 2020 19:42
@nedbat
Copy link
Contributor

nedbat commented Oct 26, 2020

Thanks so much, I love the responsiveness!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

.inv files are over-cached?
3 participants