Generate a GUID when creating a new draft or generating the rss file #198

Open
wzpan opened this Issue Aug 2, 2013 · 13 comments

Projects

None yet

4 participants

@wzpan
wzpan commented Aug 2, 2013

Hi,

I wonder if we could generate a GUID(Globally Unique Identifier) when we creating a new draft, or at least attach such information via the rss generator.

Without the GUID info, it will be difficult for some application e.g. RSS readers to determine whether an article is a new post or just an update.

And it will be tedious when we try to migrate the articles from one site to another(For instance, we may need to do a lot of works to redirect the comments from the old site).

@plusjade
Member
plusjade commented Aug 2, 2013

Would you propose to just do a Digest::MD5 hash of the file contents (like assets) and collect the url -> fingerprint key pair somewhere? I don't personally understand the use but GUID would be possible in the way I've described. Do you know if Wordpress has a way to manage new vs updated posts in RSS so that I may better understand?

@wzpan
wzpan commented Aug 2, 2013

Hi @plusjade ,

  • To know the role of GUID in RSS you can firstly take a glance at the RSS2.0 spec
  • I don't know exactly the method to calculate a unique guid, but it should be possible to generate one even offline. You can take a look at the related Ruby document

Is that clear? ;-)

@coolaj86
Member
coolaj86 commented Aug 3, 2013

hmm... according to the related reading they actually mean permalink, not guid or uuid. It's poor nomenclature.

In RSS feeds each <item> may have a <guid> which may either contains the canonical permalink or an arbitrary string (such as a true uuid).

The purpose is so that if the name of the article were to change, the aggregator would count it as the same article, not a new one.

Judging by the vagueness of the specification and the example given as a URL, I'd say it's a safer bet to use the permalink than a true uuid.

@wzpan
wzpan commented Aug 3, 2013

Hi @coolaj86 ,

hmm... according to the related reading they actually mean permalink, not guid or uuid. It's poor nomenclature.

Yes! GUID can be a permalink! Here is a more clear specification on GUID: http://www.ietf.org/rfc/rfc4151.txt

When generating the rss.xml, also add the guid tag can be useful.

@wzpan
wzpan commented Aug 3, 2013

@plusjade ,

Do you know if Wordpress has a way to manage new vs updated posts in RSS so that I may better understand?

Yes. Maybe ruhoh need to know whether a post is an new one or just an update. Otherwise, if we modify the title, even thought it is just an update, RSS reader will regard it as a new post because the guid has changed(to the new permalink).

As I know(not 100% sure), WordPress firstly generates a postid when the user create a draft. The postid can be a short id, which is only used to identify articles in the same site. So it doesn't need to be globally unique. For ruhoh, we can generate one and attach it to the YAML post metadata. An example:

---
date: '2013-4-2'
title: the answer
description: don't panic
tags: [explore]
categories: fiction
postid: '42'
---

When generating rss, also generate a guid from postid. I think a better guid should be production_url/?p=postid, such as http://hahack.com/?p=42 . Then we add the guid tag to the rss file.

@wzpan
wzpan commented Aug 3, 2013

The more I thought about it, the more I feel the importance of postid - it is an ideal accordance for ordering posts!

I don't quite in favor of the way ruhoh sorts my posts - they are not sorted alphabetically nor exactly by time. For example, take a look at these two posts from my homepage.

  1. Alternative Video Source According to IP Address
  2. Video Released!
    The first one appears firstly in my homepage, but actually I created it earlier than the latter one. In ruhoh the granularity of sorting is date, but these two are both created at 2013-8-1, so ruhoh doesn't sort them correctly.

What's worse, all the articles from my wiki are sorted even more "randomly", because I didn't add the "date" metadata so ruhoh seems to sort them in a strange way.

Therefore, I think a better choice to sort the posts is by postid - just like WordPress do. When create one draft, also generate one number. It can be a integer, and works like a post counter - make the id grows incrementally.

Probably with a step works better. For example, when I create draft A, ruhoh will distribute a post id 10 to it. Then I create a new draft C, ruhoh will distribute a post id 15 instead of 11. Now I need to insert a draft B between A and C, I can easily modify the post id 20 from the metadata to a number between [11,14] without having to edit the post id of draft C(and draft D, E, F, ...)!

A log is needed to keep tracking the post id counter - each time ruhoh try to create a draft, it get the last post id by reading the log file, and calculate the new post id(=id_old + step) and then write it into the draft. After that, write the new post id back into the log file.

Also make it optional. Without the postid, ruhoh sort the articles by time, and attach the permalink as the guid tag to the rss.xml. For that will guarantee the downward compatibility.

I hope you carefully consider my advice. It is the key to save my wiki! ;-)

@coolaj86
Member
coolaj86 commented Aug 3, 2013

I think that extending the DATE to include the TIME would be a better approach than a number that increments by 5.

Time is much more granular.

Although I wouldn't want it to necessarily display the second of the time I created the post, It should be there for history's sake.

I'd prefer the date be created_at and modified_at so that once there's an online editor we can see both the original date when it was updated.

@wzpan
wzpan commented Aug 3, 2013

Time is more complex and longer than a post id, so it will brings more storage and transfering cost.

Although I wouldn't want it to necessarily display the second of the time I created the post, It should be there for history's sake.

In fact I seldom use draft command to creat drafts, but directly do that via my Emacs editor and generate the post meta with the help of yasnippet. It so I will not able to write the implicit creating time info to ruhoh history. Also I will be exhausted trying to append the missing creating time to all the posts from my wiki because neither me nor Linux file system can remember the time!

Compare to time, postid should be more transparent and controllable. The step can be changed too, I suggest to make it a variable value at config.yml. 5 can be a default value. For some sites that demands more space to insert articles, a larger number is needed.

The process id in Linux is incremental with a step, too.

@karfau
karfau commented Aug 4, 2013

I like both ideas, but I don't think they should get mixed up:
maybe the postid should rather be called permanentid and should never be changed (after first release).
This would be great to use for the guid-thing in RSS.
But I think it shouldn't matter what's in there: let it be some speaking text, some uuid or some integer, but the compiler should check that it is unique ;)

But why should this affect the ordering of the articles?
use the created(_at) field if available
or the date-field if available
or the file created date/time
for sorting.
And if those fields support adding a time when required it would help sorting articles from the same day.

This should be pretty easy to understand for everybody.
(Did I forget any aspect of the discussion?)

just my 2c

@wzpan
wzpan commented Aug 5, 2013

Hi @karfau ,

Thanks for your comment.

But why should this affect the ordering of the articles?

OK, I will try to (informally) explain more on why I dislike using time for sorting.

In most situation, like blogging, yes - sorting by time is enough.

However, if I decide to write a book via ruhoh(why not? 😺 ) on Python. Now I've write the first chapter titled Python: Basic and the second chapter titled Python: Data Struct, suddenly I think "Oh rats! I forgot to write something about the data type before I introduce data struct!" Then I write a new post titled Python: Data Type . But sadly, since it is created later, it is sorted as the third chapter instead of the second one!

Now how to fix it? Well, you may think that by modifying the created time of the second or the third post can change the order. But isn't it dirty?

Now the problem get even more troublesome: I've finished all the chapters. "That's so nice. I'm really great!", suddenly I come up with an idea: "Oh shit! I forgot to insert an exercise chapter after each chapter!!!" Now I need to do so many evil modifying on created time. Finally I got mad and suicided.

If the posts are sorted by post id however, I can easily modify the post id of each post to put it ahead. It saved my life.

or the file created date/time

But *nix file system are NOT able to record the file created time!

@karfau
karfau commented Aug 5, 2013

Ok, i totally get your point about the ordering.
But re-reading the whole discussion from the beginning, it starts with talking about how rss-readers should know if something is still the same articel or just an update.
Solving this issue with a postid is one valid possible solution(, if u don't change those ids after the first release).

In my view then there is another issue mixed in here: which is the one about ordering (which should be discussed in another ticket maybe?):
For posts the most natural way to sort things would be to use the date/time persisted in some way.
I see that in the specific use case you describe this makes no sense.

I had this problem multiple times, specially for sites, which I needed a(t least one) custom, changable order for.
I solved it by using a list of the sites somewhere in a config or even in a special page.
All those times I thought about, how nice it would be, having the possibility to tell ruhoh to sort a collection of items after a specific attribute.
This would be the most flexible solution, and could fix a lot of issues with ordering.

@plusjade
Member
plusjade commented Aug 5, 2013

@karfau

I had this problem multiple times, specially for sites, which I needed a(t least one) custom, changable order for (...) All those times I thought about, how nice it would be, having the possibility to tell ruhoh to sort a collection of items after a specific attribute.

Custom sort order is supported in v2+ via the base model_view

You can specify the attribute and sort direction on a per collection basis by updating config.yml:

#config.yml

essays :
    sort : ['guid', 'asc'] # Array is required

This will sort essays by the guid attribute in ascending order. Ascending/descending is handled by ruby's native comparator operator so it will handle dates, numbers, and strings (alpha).

Sorry this is not documented =/.
Regarding the primary thread topic, I've been working with @coolaj86 on this issue and I'll reply here after I get up to speed on all feedback in this thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment