New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggested updates to tumblr.rb for better metadata preservation, and SEO. #54

Merged
merged 5 commits into from Sep 16, 2013

Conversation

Projects
None yet
4 participants
@benguild
Contributor

benguild commented Sep 3, 2013

No description provided.

benguild added some commits Sep 3, 2013

Preserving exact date in YAML headers (including time), and adding "c…
…anonical" HREF pointer for SEO in generated Tumblr pointer files. Also added "noindex,follow" tag for robots.
Preserving original Tumblr URL in YAML header. Could be made into an …
…option, but doesn't hurt and good for future redirect efforts.
Adding optional "index.html" file in /post/ID/ folder, for better SEO.
Tumblr doesn't discriminate by "slug" text. While it's not possible at the current time to do rule-based redirects on some hosts (such as GitHub Pages) ... this will at least prevent search-engines from stacking up a bunch of 404's checking that directory.

Unfortunately, there is still no way for hosts like GitHub Pages to do "catch-all" slug text, unlike Tumblr. So, people with messed up links to your blog that used to work will still return 404 unless they've simply left off the slug text.
Removing overkill <meta> tag. May hide pages temporarily from Google …
…Search.

https://productforums.google.com/d/msg/webmasters/0sqRrolO_Ss/igOdQIjwKdEJ

"One reason for this is that we sometimes find a non-canonical URL first. If this URL has a noindex robots meta tag, we might decide not to index anything until we crawl and index the canonical URL. Without the noindex robots meta tag (with the rel=canonical link element) we can start by indexing that URL and show it to users in search results. As soon as we crawl the canonical URL, we can change to the canonical URL instead. It's also much safer because you don't have to worry about serving different versions of the content depending on the exact URL :-)."
@mattr-

This comment has been minimized.

Show comment
Hide comment
@mattr-

mattr- Sep 4, 2013

Member

Cool. Thanks! ❤️

@parkr your turn. 😃

Member

mattr- commented Sep 4, 2013

Cool. Thanks! ❤️

@parkr your turn. 😃

@@ -121,7 +121,9 @@ def self.post_to_hash(post, format)
:header => {
"layout" => "post",
"title" => title,
"date" => DateTime.parse(post['date']).strftime('%Y-%m-%d %H:%M:%S'),

This comment has been minimized.

@parkr

parkr Sep 4, 2013

Member

Let's add the timezone here :)

@parkr

parkr Sep 4, 2013

Member

Let's add the timezone here :)

This comment has been minimized.

@benguild

benguild Sep 5, 2013

Contributor

I left it out since the timezone can be assigned globally in config.yml and would only matter in my opinion if the user had changed timezones over the course of their blog being in existence.

However, it wouldn't hurt I guess. — Would it just be "%Y-%m-%d %H:%M:%S %:z" ? (some countries indeed have a timezone that differs in minutes not hours ... I've been to them)

@benguild

benguild Sep 5, 2013

Contributor

I left it out since the timezone can be assigned globally in config.yml and would only matter in my opinion if the user had changed timezones over the course of their blog being in existence.

However, it wouldn't hurt I guess. — Would it just be "%Y-%m-%d %H:%M:%S %:z" ? (some countries indeed have a timezone that differs in minutes not hours ... I've been to them)

This comment has been minimized.

@parkr

parkr Sep 16, 2013

Member

The timezone configuration option doesn't change this timezone – in fact, when it's set, this usually reverts to UTC and is output in the specified timezone. I'd suggest using a built-in method like iso8601. We use Time.parse to parse these datetimes in Jekyll so as long as it can be read properly there, we're all good.

@parkr

parkr Sep 16, 2013

Member

The timezone configuration option doesn't change this timezone – in fact, when it's set, this usually reverts to UTC and is output in the specified timezone. I'd suggest using a built-in method like iso8601. We use Time.parse to parse these datetimes in Jekyll so as long as it can be read properly there, we're all good.

@@ -217,6 +220,7 @@ def self.add_syntax_highlights(content)
lines[start] = "{% highlight #{lang} %}"
lines[i - 1] = "{% endhighlight %}"
end
FileUtils.cp(redirect_dir + "index.html", redirect_dir + "../" + "index.html")

This comment has been minimized.

@parkr

parkr Sep 4, 2013

Member

Why is this happening?

@parkr

parkr Sep 4, 2013

Member

Why is this happening?

This comment has been minimized.

@benguild

benguild Sep 5, 2013

Contributor

Because a permalink with just an ID and no slug text is still a valid link for that post on Tumblr. Try it.
Some people link to those like that.

The additional "index.html" in the directory above catches any search engines at least going to just http://blog/post/postid/ without a slug, and lets them know the actual target for merging. Otherwise a 404 will return, which could risk the directory being delisted by a search engine due to confusion. (in theory, a root shouldn't return a 404 and then have a page in it, although plenty of people make this mistake I think it's best not to raise any "Red flags" as Google approaches)

Also, it's possible to link to any "slug" text when linking to a Tumblr. I can make up my own for any like,
http://blog/post/postid/some-stupid-title .... or http://blog/post/postid/some-even-more-stupid-title

.... And both will work as permalinks. This can also create some confusion as truncated links will still work. Unfortunately, without real redirect rules, it's impossible to catch these without manually creating them ... but I think the code above makes sense because the permalink with just an ID and no slug is still valid on the old platform. Can't hurt in my opinion.

@benguild

benguild Sep 5, 2013

Contributor

Because a permalink with just an ID and no slug text is still a valid link for that post on Tumblr. Try it.
Some people link to those like that.

The additional "index.html" in the directory above catches any search engines at least going to just http://blog/post/postid/ without a slug, and lets them know the actual target for merging. Otherwise a 404 will return, which could risk the directory being delisted by a search engine due to confusion. (in theory, a root shouldn't return a 404 and then have a page in it, although plenty of people make this mistake I think it's best not to raise any "Red flags" as Google approaches)

Also, it's possible to link to any "slug" text when linking to a Tumblr. I can make up my own for any like,
http://blog/post/postid/some-stupid-title .... or http://blog/post/postid/some-even-more-stupid-title

.... And both will work as permalinks. This can also create some confusion as truncated links will still work. Unfortunately, without real redirect rules, it's impossible to catch these without manually creating them ... but I think the code above makes sense because the permalink with just an ID and no slug is still valid on the old platform. Can't hurt in my opinion.

@parkr

This comment has been minimized.

Show comment
Hide comment
@parkr

parkr Sep 16, 2013

Member

LGTM other than the timezone offset!

Member

parkr commented Sep 16, 2013

LGTM other than the timezone offset!

@benguild

This comment has been minimized.

Show comment
Hide comment
@benguild

benguild Sep 16, 2013

Contributor

Do you want to pull it and make the timezone change thereafter? I'm not entirely sure why UTC wouldn't be better in this case.

Contributor

benguild commented Sep 16, 2013

Do you want to pull it and make the timezone change thereafter? I'm not entirely sure why UTC wouldn't be better in this case.

@parkr

This comment has been minimized.

Show comment
Hide comment
@parkr

parkr Sep 16, 2013

Member

It's just always better to be explicit, I'd say. Is vagueness desired here?

Member

parkr commented Sep 16, 2013

It's just always better to be explicit, I'd say. Is vagueness desired here?

@benguild

This comment has been minimized.

Show comment
Hide comment
@benguild

benguild Sep 16, 2013

Contributor

It's up to you. I'm OK with UTC. If you want to make the change to something else, pull it and do so after.

Contributor

benguild commented Sep 16, 2013

It's up to you. I'm OK with UTC. If you want to make the change to something else, pull it and do so after.

@parkr

This comment has been minimized.

Show comment
Hide comment
@parkr

parkr Sep 16, 2013

Member

OK.

Member

parkr commented Sep 16, 2013

OK.

parkr added a commit that referenced this pull request Sep 16, 2013

@parkr parkr merged commit 27fe445 into jekyll:master Sep 16, 2013

1 check passed

default The Travis CI build passed
Details

parkr added a commit that referenced this pull request Sep 16, 2013

@jekyll jekyll locked and limited conversation to collaborators Feb 27, 2017

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.