New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: option to provide a static url->title mapping for the "Page titles" report #234

Open
wants to merge 2 commits into
base: master
from

Conversation

Projects
None yet
1 participant
@mackuba
Copy link

mackuba commented Nov 28, 2018

I want to use Matomo to track visits on my site, and I decided to use only log analytics. I'm aware of the limitations, that some things will be missing compared to JS tracking - like resolution info, outgoing links, plugin support etc., and I'm fine with that.

However, I've realized that on a relatively rarely changing site like mine - a blog with a few articles posted per year at most - I can solve one of these things, namely the "Page titles" which are normally read in JS from the <title> tag, by providing a static list of url -> title mappings in a file passed in a parameter to the import script.

The file looks like this (the first = is treated as a separator and both sides are trimmed from whitespace, but of course we can change the format to e.g. CSV or something else):

https://mackuba.eu/ = mackuba.eu
https://mackuba.eu/2018/09/07/new-stuff-from-wwdc-2018/ = New stuff from WWDC 2018 &ndash; mackuba.eu
https://mackuba.eu/2018/07/10/dark-side-mac-2/ = Dark Side of the Mac: Updating Your App &ndash; mackuba.eu
https://mackuba.eu/2018/06/11/notifications-in-ios-12/ = What's new in notifications in iOS 12 &ndash; mackuba.eu
...

Now, when I call import_logs.py with --page-titles-from=page_titles.txt, whenever the parser sees a hit with URL e.g. https://mackuba.eu/2018/07/10/dark-side-mac-2/, it will set the action_name to "Dark Side of the Mac: Updating Your App &ndash; mackuba.eu", and so on. My "Page titles" report looks just like with the JS tracker version, and I only need to remember to update the file whenever I post a new article (or better, automate it).

I believe quite a lot of sites using log analytics could use something like this. Depending on the size and type of the site the file can be maintained manually, or built from a database of articles/pages using a script or an action on the server. In my case, I wrote a small script that loads my sitemap.xml file and then goes through all link items listed there, fetches each HTML and extracts the <title> from it.

@mackuba mackuba force-pushed the mackuba:page_titles branch from e7085a3 to 8c55c0d Jan 26, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment