Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add tag weights #19

Open
wants to merge 12 commits into
base: master
from
Open

[WIP] Add tag weights #19

wants to merge 12 commits into from

Conversation

@paternal
Copy link

paternal commented Mar 21, 2020

Closes #18.

Theorem: The base of the logarithm used in this function is irrelevant.
Proof (ideo of):
Let t0 and t1 be the tag counts of the least and most used tag,
a and b the arguments of this function, and l the base of the
logarithm used in this function. Let t be the tag count of an
arbitrary tag. To what number is t mapped?
Let f be the linear function such that f(log(t0)/log(l))=a and
f(log(t1)/log(l))=b.
The expression of this function is:
f(x) = ((b-a)×log(l)×x+a×log(t0)-b×log(t1))/(log(t1)-log(t0)).
Thus, the arbitrary tag t is mapped to f(log(t)/log(l)), and
the `log(l)` is crossed out and `l` disappears: the number `l`
is irrelevant.
Comment on lines 167 to 183

This comment has been minimized.

Copy link
@paternal

paternal Mar 21, 2020

Author

I'm a math teacher. Sorry, I couldn't help it… 😜 Feel free to remove it if not needed.

@paternal

This comment has been minimized.

Copy link
Author

paternal commented Mar 21, 2020

Almost done. The remaining features to add are (1) don't precalculate the tag cloud unless it's used somewhere and (2) if it's used, don't calculate it more than once per build.
I can do it by caching the tag count somewhere (where?). Then, when needed, if the cache exists, use it; if not calculate it. At the end (or beginning) of each build, remove cache. This would solve both points, but I do not know if it is the "lektor way" to do it? Is there already a mechanism implemented in lektor to cache plugin data?
Done 2a59bce

@runfalk, @goanpeca ?

@paternal paternal mentioned this pull request Mar 21, 2020
@paternal

This comment has been minimized.

Copy link
Author

paternal commented Mar 22, 2020

It works.

  • lektor build: weights are not calculated, unless needed, and the results are cached, and they are calculated only once.
  • lektor serve: same thing, but they are not calculated again if tags are changed. I see three solutions:
    • there exists an event that is fired when lektor serve restarts a build process, and the cache is cleared there (but I could not see such an event);
    • we consider that lektor serve is only used for development purpose, and we explain this limitation in the documentation;
    • when lektor serve is used instead of lektor build, tags are never cached (and re-calculated for each new page where they are used).

What do you think?

Done : 2a59bce

Copy link

runfalk left a comment

Maybe the weight prefixed functions should be put in their own class TagWeights and then the Jinja global will just be an instance of that class. You could then expose the tag to count dictionary as a @property.

I have not yet looked at the documentation.

@@ -148,3 +166,103 @@ def get_all_tags(self, parent):

def ignore_missing(self):
return bool_from_string(self.get_config().get("ignore_missing"), False)

def _get_tagcount(self):

This comment has been minimized.

Copy link
@runfalk

runfalk Mar 23, 2020

I have not verified this, but will self._tagcount get cleared between rebuilds when Lektor runs in server mode (lektor server)?

This comment has been minimized.

Copy link
@runfalk

runfalk Mar 23, 2020

Couldn't this be renamed to _weight_count?

This comment has been minimized.

Copy link
@paternal

paternal Mar 29, 2020

Author

Couldn't this be renamed to _weight_count?

I have renamed it to count() (in the TagWeight class).

I have not verified this, but will self._tagcount get cleared between rebuilds when Lektor runs in server mode (lektor server)?

No. See #19 (comment). I will have another look at it, but later: my daughter is nagging at me right now… Yes: 2a59bce

def _weight_count(self):
"""Map each tag to the number of pages tagged with it."""
return self._get_tagcount()
Comment on lines 195 to 197

This comment has been minimized.

Copy link
@runfalk

This comment has been minimized.

Copy link
@paternal

paternal Mar 29, 2020

Author

Removed in 1f3b141


# Count tags, to be aggregated as "tag weights". Note that tags that
# only appear in non-discoverable pages are ignored.
self._tagcount = collections.defaultdict(int)

This comment has been minimized.

Copy link
@runfalk

runfalk Mar 23, 2020

You can use collections.Counter here and simplify the nested for loop to self._tagcount.update(page[self.get_tag_field_name()]).

This comment has been minimized.

Copy link
@runfalk

runfalk Mar 23, 2020

You can probably also use the most_common method to replace min/max calculation as well.

This comment has been minimized.

Copy link
@paternal

paternal Mar 29, 2020

Author

Done in 10b9213. I had never used the collections.Counter class; nice…

def _weight_linear(self, a, b):
"""Map each tag with a number between `a` and `b`.
The less used tag is mapped `a`, the most used tag is mapped `b`.

This comment has been minimized.

Copy link
@runfalk

runfalk Mar 23, 2020

less -> least?

This comment has been minimized.

Copy link
@paternal
weights = self._weight_linear(0, len(groups) - 1)
return {tag: groups[int(round(weights[tag]))] for tag in self._get_tagcount()}

def _weight_log(self, a, b):

This comment has been minimized.

Copy link
@runfalk

runfalk Mar 23, 2020

I know it's a bit more verbose (and less mathematical 😅) but I think lower/upper_bound or just lower/upper are simpler to understand when looking at the signature.

If you want to you can still assign a = lower_bound when you use them.

This comment has been minimized.

Copy link
@paternal
Copy link
Author

paternal left a comment

Maybe the weight prefixed functions should be put in their own class TagWeights and then the Jinja global will just be an instance of that class. You could then expose the tag to count dictionary as a @property.

Done : 1f3b141

@@ -1,6 +1,11 @@
# -*- coding: utf-8 -*-

# Tag weights contributed by Louis Paternault <spalax+python(at)gresille(dot)org>.

This comment has been minimized.

Copy link
@paternal

paternal Mar 29, 2020

Author

Where is the proper place to write my name? There is no AUTHORS or COPYRIGHT file, the original author does not appear in this file…

This comment has been minimized.

Copy link
@runfalk

runfalk Mar 29, 2020

I don't think it's that common in Python to such things. I never do such things unless I fork the project, and then I'll add a line to the LICENSE file. It seems like such a file is missing so @nixjdm probably has to add one. It seems the intended license is MIT (https://github.com/nixjdm/lektor-tags/blob/master/setup.py#L24)

This comment has been minimized.

Copy link
@paternal

paternal Mar 30, 2020

Author

I removed my name here, and created files AUTHORS and LICENSE files (based on the commit history).

# Add the `tagweights` dictionary to the jinja environment
self.env.jinja_env.globals["tagweights"] = TagWeights(self)
Comment on lines +190 to +191

This comment has been minimized.

Copy link
@paternal

paternal Apr 2, 2020

Author

This should be useless, since the very same line is present in the on_before_build_all() method, but on_before_build_all() is not always called before the very first build with lektor server. Should I report a bug on lektor?

@paternal

This comment has been minimized.

Copy link
Author

paternal commented Apr 2, 2020

Everything is OK for me.
@runfalk: Waiting for your review/merge. 😃

@runfalk

This comment has been minimized.

Copy link

runfalk commented Apr 2, 2020

@nixjdm would have to do that since this is his private project. I and other Lektor maintainers don't have access to this repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

2 participants
You can’t perform that action at this time.