New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Finalize sitemap #2492
Finalize sitemap #2492
Conversation
2a0baaa
to
b5610ba
Compare
@@ -379,6 +379,14 @@ default: &default_settings | |||
# | |||
font_proximanovasoft_url: | |||
|
|||
# Maximum number of links in sitemap.xml. | |||
# | |||
# The default number is 5000. Increasing the limit may affect on performance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's either "affect performance" or "have effect on performance"
aff9338
to
0ea4e1b
Compare
@@ -3,3 +3,4 @@ | |||
# Add new mime types for use in respond_to blocks: | |||
# Mime::Type.register "text/richtext", :rtf | |||
# Mime::Type.register_alias "text/html", :iphone | |||
Mime::Type.register "application/x-gzip", :xml_gz, [], ["xml.gz"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed anymore? The Rack::Deflater handles now the gzipping, and in the response we send:
Content-Encoding: gzip
Content-Type: application/xml
Full cURL:
➜ ~ curl 'http://test.lvh.me:3000/sitemap.xml' -H 'Pragma: no-cache' -H 'Accept-Encoding: gzip, deflate, sdch' -H 'Accept-Language: en-US,en;q=0.8,fi;q=0.6,de;q=0.4,fr;q=0.2' -H 'Upgrade-Insecure-Requests: 1' -H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8' -H 'Cache-Control: no-cache' -H 'Cookie: _sharetribe_session_newnewnew=2194b4c4245929eb8ac04fb7ae4ebf50; _sharetribe_session_newnewnew4=814a4b5ea25b5777676c9b8c5724415f; _st_session=c9d6795069a8a65b41f936ff5c1db67e' -H 'Connection: keep-alive' -i
HTTP/1.1 200 OK
X-Frame-Options: SAMEORIGIN
X-Xss-Protection: 1; mode=block
X-Content-Type-Options: nosniff
Content-Disposition: attachment; filename="sitemap.xml"
Content-Transfer-Encoding: binary
Content-Type: application/xml
Cache-Control: private
Etag: W/"e0002c666520520bb6e5ded9b894e758"
X-Meta-Request-Version: 0.4.0
X-Request-Id: 35c92e0a-d63f-4b7f-82f6-541dcb6116ce
X-Runtime: 0.048266
Vary: Accept-Encoding
Content-Encoding: gzip
Server: WEBrick/1.3.1 (Ruby/2.3.1/2016-04-26)
Date: Thu, 01 Sep 2016 13:57:19 GMT
Content-Length: 467
Connection: Keep-Alive
Set-Cookie: _st_session=c9d6795069a8a65b41f936ff5c1db67e; path=/; expires=Sat, 01 Oct 2016 13:57:19 -0000; HttpOnly
So to my understanding we can remove this, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference is that mime type applies to content being transferred and transfer encoding only applies to the transfer, i.e. the sitemap.xml is delivered over the wire as gzipped but the client delivers it to requesting app as xml. With application/x-gzip the content itself is gzipped so if you serve a sitemap.xml.gz file then that's appropriate mime type. I assume we only serve the xml and not a .gz file containing the xml? In that case application/xml is the correct mime type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, thanks for explanation!
|
||
if APP_CONFIG.asset_host.present? | ||
redirect_to ActionController::Base.helpers.asset_url( | ||
"/sitemap/generate.xml?sitemap_host=#{request.host}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to pass the host instead of pulling it in generate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok got it, we use different host for assets if asset_host is set.
include ActionController::Rescue | ||
include ActionController::Head | ||
include ActionController::Redirecting | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With metal controller you want to add:
# Ensure ActiveSupport::Notifications events are fired
include ActionController::Instrumentation
This triggers the structured request logging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I checked my facts. Instrumentation alone is not enough. append_into_to_payload method also needs to be overridden. The version used in LandingPageController is:
# Override basic instrumentation and provide additional info for
# lograge to consume. These are pulled and logged in environment
# configs.
def append_info_to_payload(payload)
super
payload[:host] = request.host
payload[:community_id] = community(request)&.id
payload[:current_user_id] = nil
payload[:request_uuid] = request.uuid
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch
1cbe9d1
to
c787314
Compare
c787314
to
90f817a
Compare
Only display active listings for public communities. Use the rails caching to generate it once a day. can now push sitemap from controller without intermediate file added support for rails caching not needed, all done at runtime fixed issue where community id wasn't being passed appropriately, and added more tests remove generate method because it isn't used anymore (no cdn support at this time). also refactored out the call to find the listings actually really call out to the abstraction use nice url in sitemap converted to metal controller and send responses back when a community is private or deleted replace default host with simpler solution moving to using currently open limit to 50k urls corrected name of method make class method private use id as a cache key so that we don't invalidate if the community object changes, just once a day use methods rather than accessing members fix for cache value--making it more generic than just the community id compress the data before adding it to the cache switched to using pluck, per request use more elegant map method for organizing plucked data fixing errors found by rubocop fixing rubucop errors that slipped through last time
Only display active listings for public communities. Use the rails caching to generate it once a day. can now push sitemap from controller without intermediate file added support for rails caching not needed, all done at runtime fixed issue where community id wasn't being passed appropriately, and added more tests remove generate method because it isn't used anymore (no cdn support at this time). also refactored out the call to find the listings actually really call out to the abstraction use nice url in sitemap converted to metal controller and send responses back when a community is private or deleted replace default host with simpler solution moving to using currently open limit to 50k urls corrected name of method make class method private use id as a cache key so that we don't invalidate if the community object changes, just once a day use methods rather than accessing members fix for cache value--making it more generic than just the community id compress the data before adding it to the cache switched to using pluck, per request use more elegant map method for organizing plucked data fixing errors found by rubocop fixing rubucop errors that slipped through last time
- Content negotiation (send gzip only if client accepts gzip) - Use Rack::Deflate for gzippping - Add max_sitemap_limit to config_defaults.yml
- Controller specs don't work well with Metal controllers
This was originally added for sitemap.xml.gz. However, we changed it so that we don't serve sitemap.xml.gz, but only sitemap.xml and let the Rack::Deflate middle ware to handle the transfer encoding.
The format that the crawlers expect when then they request sitemap.xml is xml. It doesn't make sense to redirect the request to URL which returns HTML. It's better just to send 404 status code, which is informative enough for the crawler.
8ae2e11
to
fb37ba5
Compare
- Generates path without protocol, i.e. starts with //asset.host.com
Cherry-picked commits from #2303