Skip to content
This repository has been archived by the owner on Aug 26, 2022. It is now read-only.

Bug 1362438: additional redirects from SCL3 Apache to Django #4231

Merged
merged 3 commits into from
Aug 3, 2017

Conversation

bookshelfdave
Copy link
Contributor

source:
https://github.com/mozilla/kuma/blob/master/configs/htaccess

The remaining rules to address from htaccess are in this gist for now.

  • Rewrite rules that don't redirect (typically annotated with [L] in this file) will most likely need to be handled by Whitenoise or some other method:
# Links to FTP'ed code samples and examples
RewriteRule ^patches(.*) data/www/patches$1 [L]
RewriteRule ^presentations(.*) data/www/presentations$1 [L]
RewriteRule ^samples(.*) data/www/samples$1 [L]
RewriteRule ^diagrams(.*) data/www/diagrams$1 [L]
RewriteRule ^web-tech(.*) data/www/web-tech$1 [L]
RewriteRule ^css(.*) data/www/css$1 [L]


# Rewrites to robots & sitemaps
RewriteRule ^robots.txt$ media/robots.txt [L]
RewriteRule ^sitemap.xml$ media/sitemap.xml [L]
RewriteRule ^sitemaps/([\w\-]*)/sitemap.xml$ media/sitemaps/$1/sitemap.xml [L]
  • These seem to be broken:
# Miscellaneous off-site redirects
# broken
RewriteRule ^contests/$ http://labs.mozilla.com/contests/extendfirefox/ [R=302]
# broken
RewriteRule ^contests/extendfirefox(/.*)? http://labs.mozilla.com/contests/extendfirefox$1 [R=302]
# http://wiki.ecmascript.org doesn't seem to respond.
RewriteRule ^es4(/.*)?$ http://wiki.ecmascript.org/ [R]
  • These remain unimplemented:
RewriteCond %{QUERY_STRING} ^title=http [NC]
RewriteRule ^index\.php$ / [L,NC]

RewriteCond %{QUERY_STRING} ^title=(.*)$ [NC]
RewriteRule ^index\.php$ %1 [R=301,L,NC]

# HACK: Django will eventually redirect the user to the right spot, but skip a
# couple of redirects for these known legacy locales
RewriteRule ^en/(.*)$     /mwsgi/en-US/$1 [L,QSA,NE,NC]
RewriteRule ^cn/(.*)$     /mwsgi/zh-CN/$1 [L,QSA,NE,NC]
RewriteRule ^zh_cn/(.*)$  /mwsgi/zh-CN/$1 [L,QSA,NE,NC]
RewriteRule ^zh_tw/(.*)$  /mwsgi/zh-TW/$1 [L,QSA,NE,NC]

# Everything else passes through the Django handler
RewriteRule ^(.*)$ /mwsgi/$1 [L,QSA,NE]
  • For the following rule, we discussed using a Django view to serve robots.txt via a template to change content based on dev/stage/prod:
RewriteCond %{HTTP_HOST} allizom
RewriteRule ^robots.txt$ media/robots-go-away.txt [L]

@bookshelfdave
Copy link
Contributor Author

bookshelfdave commented May 10, 2017

I'm not sure if the Travis errors are related to this PR.

@bookshelfdave bookshelfdave changed the title additional redirects from SCL3 Apache to Django Bug 1362438: additional redirects from SCL3 Apache to Django May 10, 2017
redirect(
r'^(?P<pre>[\w\-]*)/HTML5$',
'/{pre}/docs/HTML/HTML5',
re_flags='i', prepend_locale=False, permanent=True),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern is causing the Kuma test failure. I'll need to look closer at Kuma and commit history to decide which needs to bend.

@jwhitlock
Copy link
Contributor

OK, here's what's happening. The failing test is requesting this path:

en-US/HTML/HTML5?view=edit

The redirect rule is parsing this as:

  • locale=en-US/
  • pre=HTML
  • (rest) = /HTML5

This is because locale_prefix=True, which is the default, and automatically looks for and captures an optional locale in the pattern.

So, the redirect pattern uses the rule:

{locale}{pre}/docs/HTML/HTML5

to redirect to:

en-US/HTML/docs/HTML/HTML5?view=edit

breaking the test, which is trying to detect MindTouch URLs like ?view=edit and convert them to Kuma URLs like $edit. Which is another level of dumb.

In any case, URL patterns that are trying to detect locales (this one and a few others) need to lean on locale_prefix instead.

I think I own this PR now, having made @metadave wait over 2 months for me to figure it out.

@jwhitlock jwhitlock force-pushed the dp_apache_redirects_part_2 branch 2 times, most recently from 39d8cd2 to 175321b Compare August 2, 2017 16:56
bookshelfdave and others added 3 commits August 3, 2017 10:05
Add helper function to explicitly set locale_prefix and prepend_locale
to False for "regular" redirects, to avoid matching on non-locale
prefixes.

Add helper function locale_redirect to set locale_prefix to True, and
default prepend_locale to True. Adjust locale redirect patterns to rely
on locale matching in redirect_urls library.

The /Learn topic has moved to /docs/Learn, but this redirect is
enforced in Apache rules. Adding a TODO to change the redirect after AWS
move.

Adjusted some test patterns to use valid locales, so that tests will
pass against Apache in SCL3 and locally.
@codecov-io
Copy link

Codecov Report

Merging #4231 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4231      +/-   ##
==========================================
+ Coverage   88.32%   88.33%   +<.01%     
==========================================
  Files         163      163              
  Lines       10229    10233       +4     
  Branches     1420     1420              
==========================================
+ Hits         9035     9039       +4     
  Misses        968      968              
  Partials      226      226
Impacted Files Coverage Δ
kuma/redirects/redirects.py 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f08f03c...17ec129. Read the comment docs.

Copy link
Contributor

@jwhitlock jwhitlock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Django tests are passing, and the redirect tests pass against Apache/SCL3 and just Django. Thanks, @metadave!

@jwhitlock jwhitlock merged commit 3b6807e into master Aug 3, 2017
@jwhitlock jwhitlock deleted the dp_apache_redirects_part_2 branch August 3, 2017 15:23
@jwhitlock
Copy link
Contributor

jwhitlock commented Aug 23, 2017

As part of mozmeao/infra#436, I took a look at what's left from the Apache config.

# Links to FTP'ed code samples and examples
RewriteRule ^patches(.*) data/www/patches$1 [L]
RewriteRule ^presentations(.*) data/www/presentations$1 [L]
RewriteRule ^samples(.*) data/www/samples$1 [L]
RewriteRule ^diagrams(.*) data/www/diagrams$1 [L]
RewriteRule ^web-tech(.*) data/www/web-tech$1 [L]
RewriteRule ^css(.*) data/www/css$1 [L]

presentations, diagrams, and samples are handled in kuma PR 4365 and infra PR 431.

patches, web-tech, and css are no longer present in the public folder on SCL3. There are some files in a non-served location for web-tech, which makes it appear to be a WordPress site from 2009-2010. The files are some scaled images, no text content. The internet will not miss these files, and we should start returning 404s.

RewriteRule ^robots.txt$ media/robots.txt [L]

robots.txt is a Django view, fixed in kuma PR 4273

RewriteRule ^sitemap.xml$ media/sitemap.xml [L]
RewriteRule ^sitemaps/([\w\-]*)/sitemap.xml$ media/sitemaps/$1/sitemap.xml [L]

These files are periodically generated, and accessed by Google and others to see what is new. We'll need to serve these from Django instead. If we move MEDIA_ROOT up a level, then we should be OK.

# Miscellaneous off-site redirects
# broken
RewriteRule ^contests/$ http://labs.mozilla.com/contests/extendfirefox/ [R=302]
# broken
RewriteRule ^contests/extendfirefox(/.*)? http://labs.mozilla.com/contests/extendfirefox$1 [R=302]
# http://wiki.ecmascript.org doesn't seem to respond.
RewriteRule ^es4(/.*)?$ http://wiki.ecmascript.org/ [R]

contests appears to refer to the Extend Firefox contest from 2008. There's no good place to send these requests, but http://www.mozillalabs.com/ feels as good as any. There were 3 requests in the last week.

es4 appears to have been the ECMAScript 4 committee wiki. It looks like Mozilla hosted it in 2006, before it moved. ES4 was abandoned. Wikipedia points to links on www.ecmascript.org, but these don't resolve. The new home appears to be http://www.ecma-international.org/memento/TC39.htm, which seems as good as any other place to send people.

RewriteCond %{QUERY_STRING} ^title=http [NC]
RewriteRule ^index\.php$ / [L,NC]

RewriteCond %{QUERY_STRING} ^title=(.*)$ [NC]
RewriteRule ^index\.php$ %1 [R=301,L,NC]

This must have been the way to load pages in the MediaWiki/DekiWiki days. I looked at the requests and they are roughly:

  • 90% over 1600 requests for recent changes RSS feeds. We have some feeds, but I think there is little value in making these legacy feed URLs start working.
  • 6% requests for legacy pages, mostly from search engine crawlers, none from internal referrers
  • 4% "sniffing" attempts, looking for Joomla admin, etc.

I don't see any reason to handle these requests, and they should return 404s

# HACK: Django will eventually redirect the user to the right spot, but skip a
# couple of redirects for these known legacy locales
RewriteRule ^en/(.*)$     /mwsgi/en-US/$1 [L,QSA,NE,NC]
RewriteRule ^cn/(.*)$     /mwsgi/zh-CN/$1 [L,QSA,NE,NC]
RewriteRule ^zh_cn/(.*)$  /mwsgi/zh-CN/$1 [L,QSA,NE,NC]
RewriteRule ^zh_tw/(.*)$  /mwsgi/zh-TW/$1 [L,QSA,NE,NC]

This is tracked in bug 962148. This rule serves up the duplicate content under a different URL, and instead we want a permanent redirect.

# Everything else passes through the Django handler
RewriteRule ^(.*)$ /mwsgi/$1 [L,QSA,NE]

This is the Apache alias for the Django modwsgi handler, and it not needed in AWS.

To determine the impact of these rules, I looked the the Apache access logs on developer1, which is one of six servers for developer.mozilla.org, developer.cdn.mozilla.net, and mozillademos.org.

for site in developer.mozilla.org mozillademos.org developer.cdn.mozilla.net; do
  echo $site;
  find $site -name 'access*' | xargs zgrep "GET /presentations" | wc -l;
done
Path developer.mozilla.org mozillademos.org developer.cdn.mozilla.net
/presentations 214 0 0
/patches 0 0 0
/samples 1083 0 0
/diagrams 0 0 0
/web-tech 339 0 0
/css 290 1 0
/robots.txt 835 203 3
/sitemap.xml 33 7 0
/sitemaps 181 0 0
/contests 2 0 0
/es4 13 0 0
/index.php 1769 0 0
/en/ 114320 0 0
/cn/ 451 0 0
/zh_cn/ 10 0 0
/zh_tw/ 96 0 0

Update 1: We do have recent changes feeds. I still think we should 404 the requests to the legacy feeds, rather than redirect them to "Create a New Page".

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants