Skip to content
Browse files
Html metadata (#2825)
* Extract metadata from HTML meta tags like Pelican does (Issue #1923)

* updated changelog

* docs, use html_metadata map

* doc tweak

* doc tweak
  • Loading branch information
ralsina committed Jun 6, 2017
1 parent c9e4caa commit 156277ad0f740595c09a9bf619657f604e42aae5
Showing 3 changed files with 64 additions and 2 deletions.
@@ -1,3 +1,11 @@
New in master


* Extract metadata from HTML meta and title tags like Pelican (Issue #1923)

New in v7.8.7

@@ -418,6 +418,7 @@ other static site generators. The currently supported metadata formats are:
* TOML, between ``+++`` (Hugo)
* reST docinfo (Pelican)
* Markdown metadata extension (Pelican)
* HTML meta tags (Pelican)

You can add arbitrary meta fields in any format.

@@ -529,6 +530,30 @@ the `markdown metadata extension docs <

Note that keys are converted to lowercase automatically.

HTML meta tags

For HTML source files, metadata will be extracted from ``meta`` tags, and the title from the ``title`` tag.
Following Pelican's behaviour, tags can be put in a "tags" meta tag or in a "keywords" meta tag. Example:

.. code:: html

<title>My super title</title>
<meta name="tags" content="thats, awesome" />
<meta name="date" content="2012-07-09 22:28" />
<meta name="modified" content="2012-07-10 20:14" />
<meta name="category" content="yeah" />
<meta name="authors" content="Conan Doyle" />
<meta name="summary" content="Short version for index and feeds" />
This is the content of my super blog post.

Mapping metadata from other formats

@@ -541,6 +566,7 @@ For Pelican, use:
"rest_docinfo": {"summary": "description", "modified": "updated"},
"markdown_metadata": {"summary": "description", "modified": "updated"}
"html_metadata": {"summary": "description", "modified": "updated"}

For Hugo, use:
@@ -28,12 +28,14 @@

from __future__ import unicode_literals

import os
import io
import os

import lxml.html

from nikola import shortcodes as sc
from nikola.plugin_categories import PageCompiler
from nikola.utils import makedirs, write_metadata
from nikola.utils import LocaleBorg, makedirs, map_metadata, write_metadata

class CompileHtml(PageCompiler):
@@ -84,3 +86,29 @@ def create_post(self, path, **kw):

def read_metadata(self, post, file_metadata_regexp=None, unslugify_titles=False, lang=None):
"""Read the metadata from a post's meta tags, and return a metadata dict."""
if lang is None:
lang = LocaleBorg().current_lang
source_path = post.translated_source_path(lang)

with, 'r', encoding='utf-8') as inf:
data =

metadata = {}
doc = lxml.html.document_fromstring(data)
title_tag = doc.find('*//title')
if title_tag is not None:
metadata['title'] = title_tag.text
meta_tags = doc.findall('*//meta')
for tag in meta_tags:
k = tag.get('name').lower()
if not k:
elif k == 'keywords':
k = 'tags'
metadata[k] = tag.get('content', '')
map_metadata(metadata, 'html_metadata',
return metadata

0 comments on commit 156277a

Please sign in to comment.