Html metadata (#2825)
* Extract metadata from HTML meta tags like Pelican does (Issue #1923)

* updated changelog

* docs, use html_metadata map

* doc tweak

* doc tweak
New in master


* Extract metadata from HTML meta and title tags like Pelican (Issue #1923)

New in v7.8.7

* TOML, between ``+++`` (Hugo)
* reST docinfo (Pelican)
* Markdown metadata extension (Pelican)
* HTML meta tags (Pelican)

You can add arbitrary meta fields in any format.

Note that keys are converted to lowercase automatically.

HTML meta tags

For HTML source files, metadata will be extracted from ``meta`` tags, and the title from the ``title`` tag.
Following Pelican's behaviour, tags can be put in a "tags" meta tag or in a "keywords" meta tag. Example:

<title>My super title</title>
<meta name="tags" content="thats, awesome" />
<meta name="date" content="2012-07-09 22:28" />
<meta name="modified" content="2012-07-10 20:14" />
<meta name="category" content="yeah" />
<meta name="authors" content="Conan Doyle" />
<meta name="summary" content="Short version for index and feeds" />
This is the content of my super blog post.

@@ -541,6 +566,7 @@ For Pelican, use:
"rest_docinfo": {"summary": "description", "modified": "updated"},
"markdown_metadata": {"summary": "description", "modified": "updated"}
"html_metadata": {"summary": "description", "modified": "updated"}

For Hugo, use:
from __future__ import unicode_literals

import os
import io
import os

import lxml.html

from nikola import shortcodes as sc
from nikola.plugin_categories import PageCompiler
from nikola.utils import makedirs, write_metadata
from nikola.utils import LocaleBorg, makedirs, map_metadata, write_metadata

class CompileHtml(PageCompiler):
def read_metadata(self, post, file_metadata_regexp=None, unslugify_titles=False, lang=None):
"""Read the metadata from a post's meta tags, and return a metadata dict."""
if lang is None:
lang = LocaleBorg().current_lang
source_path = post.translated_source_path(lang)

with, 'r', encoding='utf-8') as inf:
data =

metadata = {}
doc = lxml.html.document_fromstring(data)
title_tag = doc.find('*//title')
if title_tag is not None:
metadata['title'] = title_tag.text
meta_tags = doc.findall('*//meta')
for tag in meta_tags:
k = tag.get('name').lower()
if not k:
elif k == 'keywords':
k = 'tags'
metadata[k] = tag.get('content', '')
map_metadata(metadata, 'html_metadata',
return metadata

