Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extracting date and slug from filename #560

Closed
wants to merge 4 commits into from
Closed

Extracting date and slug from filename #560

wants to merge 4 commits into from

Conversation

yegle
Copy link
Contributor

@yegle yegle commented Oct 25, 2012

This is a quick fix for issue #552

Depending on DEFAULT_DATE="filename" and DEFAULT_SLUG="filename"

@bbinet
Copy link
Contributor

bbinet commented Oct 26, 2012

Thanks @yegle for your work on this. Here are my comments:
I'd like to avoid introducing new settings for this feature, we could do without settings['DEFAULT_DATE'] = "filename" and settings['DEFAULT_SLUG'] = "filename".
Instead, for the date, we could do:

  1. if date is available in metadata: use it from here
  2. else if date is available in the filename: use it from here
  3. else use settings['DEFAULT_DATE']

and for the slug, I now understand the reason why you introduced the DEFAULT_SLUG new setting. Actually there is a problem because currently, if slug is not available in the metadata, slug is generated (slugified) from the title metadata (which is mandatory). So if we choose to parse the slug from the filename, we break compat.
So if @getpelican/contributors agree to introduce a new setting for this feature, I would suggest a boolean setting, something like:
SLUG_FROM_FILENAME which would default to False.

@yegle
Copy link
Contributor Author

yegle commented Oct 26, 2012

@bbinet

The reason of settings['DEFAULT_DATE'] = "filename" is that I was afraid someone may rely on the old behavior and this may introduce backward incompatibility.

It's fine to add SLUG_FROM_FILENAME setting as a boolean. But extracting slug from filename depends on whether date was specified in filename. There maybe some edge cases.

e.g. how should we parse 2012-10-20-some-text-here.md? If I really want the date as part of the slug, I may need to:

  1. Use 2012-10-20-2012-10-20-some-text-here.md as filename
  2. Specify date meta data in file and name the file 2012-10-20-some-text-here.md

Neither way is a good solution from my point of view.

@bbinet
Copy link
Contributor

bbinet commented Oct 26, 2012

On 27 October 2012 00:01, Yuchen Ying notifications@github.com wrote:

@bbinet https://github.com/bbinet

The reason of settings['DEFAULT_DATE'] = "filename" is that I was afraid
someone may rely on the old behavior and this may introduce backward
incompatibility.

I think the risk is very low because that would only affect posts which
filename includes the date, and moreover this date should have been
correctly parsed.

It's fine to add SLUG_FROM_FILENAME setting as a boolean. But extracting
slug from filename depends on whether date was specified in filename. There
maybe some edge cases.

I think we could extract the date/slug from the filename using a regex:
this should flexible enough to handle the case where date is not specified
in the filename.
But, before working on a new patch, please wait for @ametaireau or other
@getpelican/contributors opinion on introducing this new
SLUG_FROM_FILENAME setting

e.g. how should we parse 2012-10-20-some-text-here.md? If I really want
the date as part of the slug, I may need to:

  1. Use 2012-10-20-2012-10-20-some-text-here.md as filename
  2. Specify date meta data in file and name the file
    2012-10-20-some-text-here.md

Neither way is a good solution from my point of view.

Why would you want the date to be part of the slug?
If it is because you want the date to part of the article url, then don't
bother with this, urls are configurable, see:
http://docs.getpelican.com/en/3.0/settings.html#url-settings


Reply to this email directly or view it on GitHubhttps://github.com//pull/560#issuecomment-9828464.

@almet
Copy link
Member

almet commented Oct 28, 2012

Different things:

  1. Reading metadata from the filename seems interesting and useful, thanks for bringing this on the table
  2. We should probably do this work in the readers, in a specialized function
  3. I'm wondering how this will impact the parsing of the files, since each time we will encounter a file, we will need to check if it's a valid date (fallbacking on different dates formats, even) and setting the metadata after.
  4. Who wins between the metadata in the article and the ones in the filename in case both are specified?
  5. +1 for the SLUG_FROM_FILENAME feature, but this is a different issue IMO.

@bbinet
Copy link
Contributor

bbinet commented Oct 28, 2012

On 28 October 2012 23:59, Alexis Metaireau notifications@github.com wrote:

Different things:

  1. Reading metadata from the filename seems interesting and useful,
    thanks for bringing this on the table
  2. We should probably do this work in the readers, in a specialized
    function

yes, +1

  1. I'm wondering how this will impact the parsing of the files, since
    each time we will encounter a file, we will need to check if it's a valid
    date (fallbacking on different dates formats, even) and setting the
    metadata after.

Do you really want to fallback on different date formats? I thought that
we could enforce a date format to follow. What do you mean by "how this
will impact the parsing of the files"?

  1. Who wins between the metadata in the article and the ones in the
    filename in case both are specified?

I would say: the metadata wins against the filename. (going this way,
backward compat is preserved)

  1. +1 for the SLUG_FROM_FILENAME feature, but this is a different
    issue IMO.


    Reply to this email directly or view it on GitHubhttps://github.com/Extracting date and slug from filename #560#issuecomment-9852527.

@almet
Copy link
Member

almet commented Oct 28, 2012

I'm thinking about the performance of regexps, and how this addition would make the generation slower

@bbinet
Copy link
Contributor

bbinet commented Oct 29, 2012

I don't think parsing filenames (would rarely be more than 50 chars long) with a regexp will introduce a noticeable performance penalty.

@almet
Copy link
Member

almet commented Oct 30, 2012

Well, it depends the number of files we proceed, it will slow down things, but I don't know by how much. I'm concerned by this since the performance of pelican when generating dropped a bit with 3.0, so I would like to take care of this, hence my concerns, but that doesn't mean it's a no-go.

Let's do this :-)

@jawher
Copy link
Contributor

jawher commented Oct 31, 2012

I'm interested in this feature too, as I name my post files <data>-<slug>.md, and end up duplicating these metadata in the file itself.

How about using special values in the file to tell Pelican to use the filename ? Something like:

date: <file>
slug: <file>

We could toss in an extra special value to get the date from the file timestamp, e.g.:

date: <fs>
slug: <file>

@almet
Copy link
Member

almet commented Nov 5, 2012

I think that's better to directly tell pelican where to look for in the settings. having date: sounds weird to me, isn't it?

timestamp fro the FS will be removed as it's not a good value to rely on (it's changed if we move the file for instance).

@bbinet
Copy link
Contributor

bbinet commented Nov 6, 2012

On 5 November 2012 21:02, Alexis Metaireau notifications@github.com wrote:

I think that's better to directly tell pelican where to look for in the
settings. having date: sounds weird to me, isn't it?

If possible, I'd rather not to introduce a new setting for that: I think it
would make sense to decide on who wins between the metadata in the article
and the ones in the filename, because the case where both are specified
seems unusual to me.
As I told before, my preference would be: the metadata wins against the
filename. (going this way, backward compat is preserved)

timestamp fro the FS will be removed as it's not a good value to rely on
(it's changed if we move the file for instance).


Reply to this email directly or view it on GitHubhttps://github.com//pull/560#issuecomment-10085467.

@davidjb
Copy link
Contributor

davidjb commented Nov 7, 2012

Having the slug available from the filename, aside from anything else, would be a great help for those importing their blog to keep the same URLs with ease (eg the importer creates file names based upon their Wordpress post_name).

@bbinet
Copy link
Contributor

bbinet commented Nov 13, 2012

@yegle are you still working on this one?

@yegle
Copy link
Contributor Author

yegle commented Nov 13, 2012

@bbinet
Sorry I'm a little confused.

So our conclusion is:

  1. If metadata provides date and slug, use metadata information
  2. If don't, the date should be extracted from filename then file modification time, the slug should be the filename without date part.

If it's correct, I can update my pull request today.

@bbinet
Copy link
Contributor

bbinet commented Nov 13, 2012

On 13 November 2012 22:59, Yuchen Ying notifications@github.com wrote:

@bbinet https://github.com/bbinet
Sorry I'm a little confused.

So our conclusion is:

  1. If metadata provides date and slug, use metadata information
  2. If don't, the date should be extracted from filename then file
    modification time, the slug should be the filename without date part.

Yes it is my position.
Note that extracting date from file mtime needs to be removed (but can be
addressed in a separate pull request).
The work of extracting date and slug from filename can be done in a
dedicated function in readers.py.

@ametaireau do you agree on this before @yegle updates this pr?

If it's correct, I can update my pull request today.


Reply to this email directly or view it on GitHubhttps://github.com//pull/560#issuecomment-10345719.

@almet
Copy link
Member

almet commented Nov 16, 2012

Yep, that's perfectly fine for me, go ahead!

@yegle yegle closed this Nov 24, 2012
@yegle yegle reopened this Nov 24, 2012
@yegle yegle closed this Nov 24, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants