Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tag weights #19

Merged
merged 28 commits into from Feb 11, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
3b3e9ee
Add tag weights
Mar 19, 2020
9d9b66e
Use existing pad instead of creating a new one.
Mar 22, 2020
c4204f6
Compute tag counts only if necessary
Mar 22, 2020
1e9d758
typo
Mar 29, 2020
53508fe
Replace arguments `a` and `b` by `lower` and `upper`
Mar 29, 2020
10b9213
Replace `collections.defaultdict(int)` by `collections.Counter()`.
Mar 29, 2020
1f3b141
Replace the `tagweight` dictionary with a `TagWeight` class
Mar 29, 2020
16e8c5b
Fix tag count
Mar 29, 2020
4b13f0d
Removed some magic: explict is better than implicit
Mar 29, 2020
586029e
Add AUTHORS and LICENSE files
Mar 30, 2020
464f52c
Remove my name from lektor_tags.py (my name has been added to AUTHORS…
Mar 30, 2020
2a59bce
[tag weights] Even with `lektor server`, tag weights are re-calculate…
Apr 2, 2020
4f49660
Add comment
Apr 3, 2020
0c3407d
Merge branch 'master' into weights
Feb 9, 2022
95e0f25
The primary data structure is a mapping of tag names to TagWeight() o…
Feb 9, 2022
e7d43cc
Update authors
Feb 9, 2022
388268c
Update documentation to match 95e0f251914fdcd000bfa67c6338d34e537868fc
Feb 10, 2022
63aa420
Apply black and reorder-python-imports
Feb 10, 2022
8637745
Proposed tests for tagweights PR
dairiki Feb 10, 2022
8d010e9
Turn TagWeight into a dataclass
Feb 11, 2022
d2b6899
TagWeight ordering: Use functools.total_ordering; raise NotImplemente…
Feb 11, 2022
6be1c9d
Return {} if there is not tags
Feb 11, 2022
91f5da1
Cosmetic change to the way TagWeight.linear() and TagWeight.log() are…
Feb 11, 2022
e1cfa91
Set ``tagweight`` jinja environment variable only once.
Feb 11, 2022
cf9753a
TagWeight: Do not fail with pages without any tags (or with a tag fie…
Feb 11, 2022
b98be0f
TagWeight tests: Use a group of three elements for loggroup() test
Feb 11, 2022
d55baa8
Blacken tests/test_tagweights.py
dairiki Feb 11, 2022
e2f8f4d
Fix TagWeight rich comparison methods
dairiki Feb 11, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
7 changes: 7 additions & 0 deletions AUTHORS
@@ -0,0 +1,7 @@
Original author: A. Jesse Jiryu Davis

Contributors:

- Joseph Nix (release, bug fixes)
- Jakob Schnitzer (release, bug fixes)
- Louis Paternault (tag weights)
8 changes: 8 additions & 0 deletions LICENSE
@@ -0,0 +1,8 @@
Copyright 2016 A. Jesse Jiryu Davis
Copyright 2018 Joseph Nix

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
86 changes: 86 additions & 0 deletions README.md
Expand Up @@ -233,3 +233,89 @@ tags = ["tag1", "tag2"]
See [the Lektor documentation for queries](https://www.getlektor.com/docs/api/db/query/).

Tags are always deduplicated. Tags are sorted in the order listed in the contents.lr / admin, allowing you to control their order manually. Since `{{ tags }}` simply returns a list, you can always apply any Jinja2 filter on that list such as sort, slice, or rejectattr.

## Tag cloud & tag weights

This plugin won't automatically build a tag cloud, but it provides the tools to build it.

The Jinja2 context has a `tagweights()` function, which returns a dictionary that maps tags to their weight using several attributes or functions. Here are those attributes and functions, with examples of how they can be used in a template.

Unused tags are ignored.

### TL;DR Which weight function should I use?

- To get the number of pages tagged by each tag, use `count`.
- To map tags to numbers, use `log(lower, upper)`.
- To map tags to everything else, use `loggroup(list)`.

### `count` — Number of pages tagged with this tag

This is the basic weight, used as a base for the following tags.

#### Example: Tags (with tag count) sorted by tag count (most used first)

```jinja
<ul>
{% for tag, weight in (tagweights() | dictsort(by='value', reverse=true)) %}
<li>{{ tag }} ({{ weight.count }} articles).</li>
{% endfor %}
</ul>
```

### `linear` — Tags are mapped with a number between `lower` and `upper`.

The less used tag is mapped `lower`, the most used tag is mapped `upper` (`lower` and `upper` can be equal, `upper` can be smaller than `lower`).

Mapping is done using a linear function.

The result is a float: you might want to convert them to integers first (see example for `log`).

Unless you know what you are doing, you should use `log` instead.

### `log` — Logarithm of tag counts are mapped with a number between `lower` and `upper`.

The less used tag is mapped `lower`, the most used tag is mapped `upper` (`lower` and `upper` can be equal, `upper` can be smaller than `lower`).

Mapping is done using a linear function over the logarithm of tag counts.

The result is a float: you might want to convert them to integers first (see example).

#### Example: Most used tag is twice as big as least used tag

```jinja
{% for tag, weight in tagweights()|dictsort %}
<a
href="{{ ('/blog@tag/' ~ tag)|url }}"
style="font-size: {{ weight.log(100, 200)|round|int }}%;"
>
{{ tag }}
</a>
{% endfor %}
```

### `lineargroup` — Map each tag with an item of the list given in argument

The less used tag is mapped with the first item, the most used tag is mapped with the last item.

Mapping is done using a linear function.

Unless you know what you are doing, you should use `loggroup` instead.

### `loggroup` — Logarithm of tag counts are mapped with an item of the list given in argument

The less used tag is mapped with the first item, the most used tag is mapped with the last item.

Mapping is done using a linear function over the logarithm of tag counts.

#### Example: Tags are given CSS classes `tagcloud-tiny`, `tagcloud-small`, etc.

```jinja
{% for tag, weight in tagweights()|dictsort %}
<a
href="{{ ('/blog@tag/' ~ tag)|url }}"
class="tagcloud-{{ weight.loggroup(["tiny", "small", "normal", "big", "large"]) }}"
>
{{ tag }}
</a>
{% endfor %}
```
114 changes: 114 additions & 0 deletions lektor_tags.py
@@ -1,8 +1,14 @@
# -*- coding: utf-8 -*-
import collections
import contextlib
import posixpath
from dataclasses import dataclass
from functools import total_ordering
from math import log

import pkg_resources
from lektor.build_programs import BuildProgram
from lektor.context import get_ctx
from lektor.environment import Expression
from lektor.environment import FormatExpression
from lektor.pluginsystem import Plugin
Expand Down Expand Up @@ -64,6 +70,84 @@ def build_artifact(self, artifact):
artifact.render_template_into(self.source.template_name, this=self.source)


@total_ordering
@dataclass
class TagWeight:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a dataclass, but they do not exist yet in python3.6, which is supported by lektor.

This comment was marked as resolved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 8d010e9


count: int
mincount: int
maxcount: int

def __lt__(self, other):
if isinstance(other, self.__class__):
return self.count < other.count
return NotImplemented

def __eq__(self, other):
if isinstance(other, self.__class__):
return self.count == other.count
return NotImplemented

def linear(self, lower, upper):
"""Map tag with a number between `lower` and `upper`.

The least used tag is mapped `lower`, the most used tag is mapped `upper`.
Mapping is done using a linear function.
"""
if self.mincount == self.maxcount:
return lower
return lower + (upper - lower) * (self.count - self.mincount) / (
self.maxcount - self.mincount
)

def lineargroup(self, groups):
"""Map each tag with an item of list `groups`.

The least used tag is mapped with the first item, the most used tag is mapped with the last item.
Mapping is done using a linear function.
"""
return groups[int(round(self.linear(0, len(groups) - 1)))]

def log(self, lower, upper):
"""Map each tag with a number between `lower` and `upper`.

The least used tag is mapped `lower`, the most used tag is mapped `upper`.
Mapping is done using a linear function over the logarithm of tag counts.

Theorem: The base of the logarithm used in this function is irrelevant.

Proof (idea of):
Let t0 and t1 be the tag counts of the least and most used tag,
a and b the `lower` and `upper` arguments of this function, and l
the base of the logarithm used in this function. Let t be the tag
count of an arbitrary tag.
To what number is t mapped?

Let f be the linear function such that f(log(t0)/log(l))=a and
f(log(t1)/log(l))=b.

The expression of this function is:
f(x) = ((b-a)×log(l)×x+a×log(t0)-b×log(t1))/(log(t1)-log(t0)).

Thus, the arbitrary tag t is mapped to f(log(t)/log(l)), and
the `log(l)` is crossed out and `l` disappears: the number `l`
is irrelevant.
"""
if self.mincount == self.maxcount:
return lower
return lower + (upper - lower) * log(self.count / self.mincount) / log(
self.maxcount / self.mincount
)

def loggroup(self, groups):
"""Map each tag with an item of list `groups`.

The least used tag is mapped with the first item, the most used tag is mapped with the last item.
Mapping is done using a linear function over the logarithm of tag counts.
"""
return groups[int(round(self.log(0, len(groups) - 1)))]


class TagsPlugin(Plugin):
name = u"Tags"
description = u"Lektor plugin to add tags."
Expand All @@ -74,6 +158,7 @@ class TagsPlugin(Plugin):
def on_setup_env(self, **extra):
pkg_dir = pkg_resources.resource_filename("lektor_tags", "templates")
self.env.jinja_env.loader.searchpath.append(pkg_dir)
self.env.jinja_env.globals["tagweights"] = self.tagweights
self.env.add_build_program(TagPage, TagPageBuildProgram)

@self.env.urlresolver
Expand Down Expand Up @@ -150,3 +235,32 @@ def get_all_tags(self, parent):

def ignore_missing(self):
return bool_from_string(self.get_config().get("ignore_missing"), False)

def tagcount(self):
"""Map each tag to the number of pages tagged with it."""
# Count tags, to be aggregated as "tag weights". Note that tags that
# only appear in non-discoverable pages are ignored.
tagcount = collections.Counter()
for page in get_ctx().pad.query(self.get_parent_path()):
with contextlib.suppress(KeyError, TypeError):
tagcount.update(page[self.get_tag_field_name()])
return tagcount

def tagweights(self):
"""Return the dictionary of tag weights.

That is:
- keys are tags (strings);
- weights are TagWeight objects.

This function is to be called AFTER the build have started
(so that ``get_ctx()`` returns something).
"""
tagcount = self.tagcount()
if sum(tagcount.values()) == 0:
return {}

return {
tag: TagWeight(count, min(tagcount.values()), max(tagcount.values()))
for tag, count in tagcount.items()
}
2 changes: 2 additions & 0 deletions setup.cfg
Expand Up @@ -21,6 +21,8 @@ include_package_data = True
setup_requires =
setuptools >= 45
setuptools_scm >= 6
install_requires =
dataclasses;python_version<"3.7"

[options.entry_points]
lektor.plugins =
Expand Down
96 changes: 96 additions & 0 deletions tests/test_tagweights.py
@@ -0,0 +1,96 @@
from collections import Counter

import pytest
from lektor.context import Context

from lektor_tags import TagWeight


@pytest.fixture
def tags_plugin(env):
return env.plugins["tags"]


@pytest.fixture
def lektor_context(pad):
with Context(pad=pad) as ctx:
yield ctx


@pytest.mark.usefixtures("lektor_context")
def test_tagcount(tags_plugin):
assert tags_plugin.tagcount() == Counter({"tag1": 2, "tag2": 1, "tag3": 1})


@pytest.mark.usefixtures("lektor_context")
def test_tagweights(tags_plugin):
assert tags_plugin.tagweights() == {
"tag1": TagWeight(2, 1, 2),
"tag2": TagWeight(1, 1, 2),
"tag3": TagWeight(1, 1, 2),
}


@pytest.mark.usefixtures("lektor_context")
def test_tagweights_no_tags(pad, tags_plugin):
config = tags_plugin.get_config()
config["tags_field"] = "test_no_tags"
assert tags_plugin.tagweights() == {}


@pytest.fixture
def tagweight(count, mincount, maxcount):
return TagWeight(count, mincount, maxcount)


@pytest.mark.parametrize(
"count, mincount, maxcount, lower, upper, expected",
[
(1, 1, 1, 1, 2, 1),
(1, 1, 3, 1, 2, 1),
(2, 1, 3, 1, 2, 1.5),
(3, 1, 3, 1, 2, 2),
],
)
def test_TagWeight_linear(tagweight, lower, upper, expected):
assert tagweight.linear(lower, upper) == expected


@pytest.mark.parametrize(
"count, mincount, maxcount, groups, expected",
[
(1, 1, 4, ("a", "b"), "a"),
(2, 1, 4, ("a", "b"), "a"),
(3, 1, 4, ("a", "b"), "b"),
(4, 1, 4, ("a", "b"), "b"),
],
)
def test_TagWeight_lineargroup(tagweight, groups, expected):
assert tagweight.lineargroup(groups) == expected


@pytest.mark.parametrize(
"count, mincount, maxcount, lower, upper, expected",
[
(1, 1, 1, 1, 3, 1),
(1, 1, 4, 1, 3, 1),
(2, 1, 4, 1, 3, 2),
(4, 1, 4, 1, 3, 3),
],
)
def test_TagWeight_log(tagweight, lower, upper, expected):
assert tagweight.log(lower, upper) == expected


@pytest.mark.parametrize(
"count, mincount, maxcount, groups, expected",
[
(1, 1, 100, ("a", "b", "c"), "a"),
(3, 1, 100, ("a", "b", "c"), "a"),
(12, 1, 100, ("a", "b", "c"), "b"),
(90, 1, 100, ("a", "b", "c"), "c"),
(100, 1, 100, ("a", "b", "c"), "c"),
],
)
def test_TagWeight_loggroup(tagweight, groups, expected):
assert tagweight.loggroup(groups) == expected