Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lots of unmatched time when using a simple rule that should tag everything #129

Open
nomeata opened this issue Apr 23, 2021 · 24 comments
Open

Comments

@nomeata
Copy link
Owner

nomeata commented Apr 23, 2021

Original report by Konzertheld (Bitbucket: Konzertheld, GitHub: Konzertheld).


Hello there,

I have a simple rule in my configuration: tag Program:$current.program that should tag every entry. When I run statistics: arbtt-stats -c Program --for-each Month -m 0.1 I get a lot of unmatched time though. What could be happening?

I use arbtt-stats version 0.10.4 on Manjaro Linux. Following is the entire categorizing file:

-- -*- mode: haskell; -*-
-- Comments in this file use the Haskell syntax:
-- A "--" comments the rest of the line.
-- A set of {- ... -} comments out a group of lines.

-- This defines some aliases, to make the reports look nicer:
aliases (
	"sun-awt-X11-XFramePeer"  -> "java",
	"sun-awt-X11-XDialogPeer" -> "java",
	"sun-awt-X11-XWindowPeer" -> "java",
	"gramps.py"               -> "gramps",
	"___nforschung"           -> "ahnenforschung",
	"Pidgin"                  -> "pidgin"
	)

-- A rule that probably everybody wants. Being inactive for over a minute
-- causes this sample to be ignored by default.
$idle > 60 ==> tag inactive,

-- A rule that matches on a list of strings
current window $program == ["Navigator","galeon"] ==> tag Web,

current window $program == "sun-awt-X11-XFramePeer" &&
current window $title == "I3P"
  ==> tag Program:I3P,

current window $program == "sun-awt-X11-XDialogPeer" &&
current window $title == " " &&
any window $title == "I3P"
  ==> tag Program:I3P,

-- Simple rule that just tags the current program
tag Program:$current.program,

-- Another simple rule, just tags the current desktop (a.k.a. workspace)
tag Desktop:$desktop,

tag Title:$current.title,

-- I'd like to know what evolution folders I'm working in. But when sending a
-- mail, the window title only contains the (not very helpful) subject. So I do
-- not tag necessarily by the active window title, but the title that contains
-- the folder
current window $program == "evolution" &&
any window ($program == "evolution" && $title =~ /^(.*) \([0-9]+/)
  ==> tag Evo-Folder:$1,

-- A general rule that works well with gvim and gnome-terminal and tells me
-- what project I'm currently working on
current window $title =~ m!(?:~|home/jojo)/projekte/(?:programming/(?:haskell/)?)?([^/)]*)!
  ==> tag Project:$1,
current window $title =~ m!(?:~|home/jojo)/debian!
  ==> tag Project:Debian,

-- This was a frequently looked-at pdf-File
current window $title =~ m!output.pdf! &&
any window ($title =~ /nforschung/)
  ==> tag Project:ahnenforschung,


-- My diploma thesis is in a different directory
current window $title =~ [ m!(?:~|home/jojo)/dokumente/Uni/DA!
                         , m!Diplomarbeit.pdf!
                         , m!LoopSubgroupPaper.pdf! ]
  ==> tag Project:DA,

current window $title =~ m!TDM!
  ==> tag Project:TDM,

( $date >= 2010-08-01 &&
  $date <= 2010-12-01 &&
  ( current window $program == "sun-awt-X11-XFramePeer" &&
      current window $title == "I3P" ||
    current window $program == "sun-awt-X11-XDialogPeer" &&
      current window $title == " " &&
      any window $title == "I3P" ||
    current window $title =~ m!(?:~|home/jojo)/dokumente/Uni/SA! ||
    current window $title =~ m!Isabelle200! ||
    current window $title =~ m!isar-ref.pdf! ||
    current window $title =~ m!document.pdf! ||
    current window $title =~ m!outline.pdf! ||
    current window $title =~ m!Studienarbeit.pdf! )
) ==> tag Project:SA,


-- Out of curiosity: what percentage of my time am I actually coding Haskell?
current window ($program == "gvim" && $title =~ /^[^ ]+\.hs \(/ )
  ==> tag Editing-Haskell,

{-
-- Example of time-related rules. I do not use these myself.

-- To be able to match on the time of day, I introduce tags for that as well.
-- $time evaluates to local time.
$time >=  2:00 && $time <  8:00 ==> tag time-of-day:night,
$time >=  8:00 && $time < 12:00 ==> tag time-of-day:morning,
$time >= 12:00 && $time < 14:00 ==> tag time-of-day:lunchtime,
$time >= 14:00 && $time < 18:00 ==> tag time-of-day:afternoon,
$time >= 18:00 && $time < 22:00 ==> tag time-of-day:evening,
$time >= 22:00 || $time <  2:00 ==> tag time-of-day:late-evening,

-- This tag always refers to the last 24h
$sampleage <= 24:00 ==> tag last-day,

-- To categorize by calendar periods (months, weeks, or arbitrary periods),
-- I use $date variable, and some auxiliary functions. All these functions
-- evaluate dates in local time. Set TZ environment variable if you need
-- statistics in a different time zone.

-- You can compare dates:
$date >= 2001-01-01 ==> tag this_century,
-- You have to write them in YYYY-MM-DD format, else they will not be recognized.

-- “format $date” produces a string with the date in ISO 8601 format
-- (YYYY-MM-DD), it may be compared with strings. For example, to match
-- everything on and after a particular date I can use
format $date =~ ".*-03-19"  ==> tag period:on_a_special_day,
-- but note that this is a rather expensive operation and will slow down your
-- data processing considerably.

-- “day of month $date” gives the day of month (1..31),
-- “day of week $date” gives a sequence number of the day of week
-- (1..7, Monday is 1):
(day of month $date == 13) && (day of week $date == 5) ==> tag day:friday_13,

year $date == 2010 ==> tag year:2010,

-- “$now” evaluates to the current time
day of month $now == day of month $date ==> tag current-day,
month $now == month $date ==> tag current-month,
year $now == year $date ==> tag current-year,
-}
-- “month $date” gives a month number (1..12), “year $date” gives a year:
month $date == 1 ==> tag month:Januar,
month $date == 2 ==> tag month:Februar,
month $date == 3 ==> tag month:März,
month $date == 4 ==> tag month:April,
month $date == 5 ==> tag month:Mai,
month $date == 6 ==> tag month:Juni,
month $date == 7 ==> tag month:Juli,
month $date == 8 ==> tag month:August,
month $date == 9 ==> tag month:September,
month $date == 10 ==> tag month:Oktober,
month $date == 11 ==> tag month:November,
month $date == 12 ==> tag month:Dezember,

(day of month $date == 10) && (month $date == 11) ==> tag DoM:7th,
(day of month $date == 14) && (month $date == 11) ==> tag DoM:14th,
(day of month $date == 21) && (month $date == 11) ==> tag DoM:21th,
(day of month $date == 28) && (month $date == 11) ==> tag DoM:28th,

--------------- ADDED 2020 -----------------
--------------------------------------------

current window $title =~ m!de\.wikipedia\.org! ==> tag Service:Wikipedia,

Thanks for reading and hopefully someone has an idea. Or maybe I found some bug, who knows. :slight_smile:

@nomeata
Copy link
Owner Author

nomeata commented Apr 23, 2021

Original comment by nomeata (Bitbucket: nomeata, GitHub: nomeata).


Odd indeed on first glance. Maybe you can minimize the configuration and check if it also occurs if you have just that rule?

Also, arbtt-stats --dump-samples is very useful, as it shows the raw data next to the tags.

@nomeata
Copy link
Owner Author

nomeata commented Apr 27, 2021

Original comment by Konzertheld (Bitbucket: Konzertheld, GitHub: Konzertheld).


Thanks for your answer. I tried a configuration file that contains nothing but the rule mentioned in the first sentence and used the same command mentioned above again, nothing changed. So, it’s not related to anything else in my config.

Then I learned I should not use --dump-samples without less. 😆

I then ran arbtt-stats --dump-samples --categorizefile=/home/christian/tmp/testcategorizearbtt | grep "^\W+Program:$" -B 8 | grep "(*)". A number of recurring windows appeared (sample):

() : franklin.hrz.tu-chemnitz.de hat ein Fenster freigegeben.
() : franklin.hrz.tu-chemnitz.de hat ein Fenster freigegeben.
() : franklin.hrz.tu-chemnitz.de hat ein Fenster freigegeben.
() : glxgears
() : glxgears
() : glxgears
() : VirtualBox - Error In suplibOsInit
() : VirtualBox - Error In suplibOsInit
() : VirtualBox - Error In suplibOsInit
() : VirtualBox - Error In suplibOsInit
() : VirtualBox - Error In suplibOsInit
() : VirtualBox - Error In suplibOsInit
() : VirtualBox - Error In suplibOsInit
() : VirtualBox - Error In suplibOsInit
() : VirtualBox - Error In suplibOsInit
() : VirtualBox - Error In suplibOsInit
() : VirtualBox - Error In suplibOsInit
() processing-app-Base: 11-Arduino-I2C-Gangschalthebel-live | Arduino 1.8.13
() : fitzgerald.hrz.tu-chemnitz.de hat ein Fenster freigegeben.
() google-chrome: BigBlueButton - NaGruSe-Christian - https://fitzgerald.hrz.tu-chemnitz.de/ - Google Chrome
() : fitzgerald.hrz.tu-chemnitz.de hat ein Fenster freigegeben.
() processing-app-Base: 11-Arduino-I2C-Gangschalthebel-live | Arduino 1.8.13
() : fitzgerald.hrz.tu-chemnitz.de hat ein Fenster freigegeben.
() processing-app-Base: 11-Arduino-I2C-Gangschalthebel-live | Arduino 1.8.13
(*) : fitzgerald.hrz.tu-chemnitz.de hat ein Fenster freigegeben.

I find it interesting that some do have a program associated and some do not. The first lines are from BigBlueButton, a video conference system, when I shared my screen (propably). I had hope that would explain a lot of the time as I worked as a lecturer and spent quite some time in that system. So I updated my test config to match most of the lines from my search. It reduced the amount of unmatched time by amazing 24 seconds, leaving 4 days and almost 10 hours. So, I still have no clue. :slight_smile:

The lines that my grep search turned out are not too many, so I also have doubt those lines explain the problem at all.

Edit: I missed the inactive tag clause in my test config. I added it. It only accounts for 2 hours of those 4+ days. (That way, I also confirmed that filtering out inactive time works.)

@nomeata
Copy link
Owner Author

nomeata commented Apr 27, 2021

Original comment by Konzertheld (Bitbucket: Konzertheld, GitHub: Konzertheld).


Additional info from things I came up with today. An even simpler rule tag bla:asdf that tags everything with something static works. It matches 100% of the entries. current window $program == "" ==> tag lol:lol, matched nothing.

@nomeata
Copy link
Owner Author

nomeata commented Apr 27, 2021

Original comment by nomeata (Bitbucket: nomeata, GitHub: nomeata).


So the problem is that current window $program is empty for many windows?
And then additionally it is not possible to match on current window $program?
I also don’t really have a clue 😕
Could you maybe do a fresh recording (so that we have only a few samples to look at) and send the the capture.log and categorize.cfg ? Then I can maybe debug this.

@nomeata nomeata added major and removed bug labels Jun 3, 2021
@nomeata
Copy link
Owner Author

nomeata commented Aug 10, 2021

Thanks for the test data! When I look at

arbtt-dump -f capture.log

I see that quite a few samples don't have an active window (the (*) in the first column). But some do. This would explain why current window-based rules fail, when there is no current window.

So the question is why that happens. Maybe there are modal dialogs open that are not recognized by arbtt? Or maybe one of the used applications is somehow confusing arbtt?

@Konzertheld
Copy link

Konzertheld commented Jan 10, 2022

New approach: Are there known issues with Wayland? I saw that there is a wayland compatible port of arbtt-capture. My system uses Wayland. (Asking because today I again have 6h 50min uptime but only 6h 1min captured including inactive times)

@nomeata
Copy link
Owner Author

nomeata commented Jan 11, 2022

Or maybe arbtt-capture's wake up interval isn't precise enough? If arbtt-capture takes 1s, and you set the interval to 10s, if I did the sleep naively, it might explain 10% lost time.

What's your interval time?

@nomeata
Copy link
Owner Author

nomeata commented Jan 11, 2022

(I don't use Wayland yet, and have no idea how arbtt will behave there)

@Konzertheld
Copy link

Hmm, my interval is 3 seconds. I can change it to 10 and see if it makes a difference.

@nomeata
Copy link
Owner Author

nomeata commented Jan 11, 2022

Ah, that might be an explanation. arbtt-capture is a bit naive in it’s interval calculation, and will always sleep the configured 3s between taking captures; not taking the time it takes to take the capture into account. At the default capture interval of one minute this might be ok, but with very short capture times, the error accumulates.

It shouldn’t be too hard to fix that, though.

@Konzertheld
Copy link

Ah well, if a fix is possible, that's even better. I use short intervals because I got quite a few captures of windows I accidentally alt-tabbed to which were then logged as whatever interval I had set instead of the 200 ms I actually focused them. With short intervals, those captures will disappear in irrelevance.

@nomeata
Copy link
Owner Author

nomeata commented Jan 11, 2022

My theory is that even with 60s interval, this will even out (you’ll catch them less often), but yes, depending on your needs and use cases this might be too imprecise. At the cost of a greatly increased log file, of course.

@Konzertheld
Copy link

Hmm, I'm trying to come up with a mathematical explanation but from my first thoughts, I think you're right.

@Konzertheld
Copy link

Update: It seems to decrease with increased intervals. I had 12 minutes leakage on 6 hours today with an interval of 10 seconds.

@nomeata
Copy link
Owner Author

nomeata commented Jan 12, 2022

#145 might improve the situation. Do you want to try it out? Can you build from a branch or need help with that?

@Konzertheld
Copy link

#145 might improve the situation. Do you want to try it out? Can you build from a branch or need help with that?

I can build from a branch. I thought I already did but I built master. Interestingly, it is extra bad today, almost a third of the time is lost. But I will try the branch now for the next few hours.

@Konzertheld
Copy link

Is this command correct? I get a weird output trying to check if the manually compiled version from the branch works at all.

dist/build/arbtt-stats/arbtt-stats -c Project --filter '$sampleage <= 00:05' --also-inactive                                                          ✔  33s   ghc-9.0.1  
Processing data [=======================================================================================================================================================================================================================] 101%
Processing data [=======================================================================================================================================================================================================================] 101%
Statistics for category "Project"
=================================
______________Tag_|_________Time_|_Percentage_
Project:MailsNews | 43d06h48m50s |      68.10
    Project:arbtt | 20d06h33m10s |      31.90
     (total time) | 63d13h22m00s |     100.00

Not only did it ignore the filter, the results are also completely wrong.

@nomeata
Copy link
Owner Author

nomeata commented Jan 13, 2022

This might be an old file, look at the time-stamp. Modern cabal puts its files in dist-newstyle.

@Konzertheld
Copy link

Nope, the directory did not exist before and the timestamp matches what I expected it to be.

@Konzertheld
Copy link

It was probably a bad idea that I just continued to use the old log file with no migration, right?

@Konzertheld
Copy link

Well, this is interesting. I tried a new logfile. It is already 27 MB large after not even five minutes. Processing takes way too long. While below, there are supposedly 4 days recorded, there are now 31 days recorded already. So either I messed up the build completely or something does not work at all.

ps aux | grep arbtt
[...] /AUR/arbtt-git/dist/build/arbtt-capture/arbtt-capture --sample-rate=10 --logfile=...experimental-logfile

dist/build/arbtt-stats/arbtt-stats -c Project --filter '$sampleage <= 00:05' --also-inactive --logfile=...experimental-logfile
Processing data [=======================================================================================================================================================================================================================] 101%
Processing data [=======================================================================================================================================================================================================================] 102%
Processing data [=======================================================================================================================================================================================================================] 103%
Processing data [=======================================================================================================================================================================================================================] 104%
Processing data [=======================================================================================================================================================================================================================] 105%
Statistics for category "Project"
=================================
__________Tag_|_________Time_|_Percentage_
Project:arbtt |  4d23h10m00s |     100.00
 (total time) |  4d23h10m00s |     100.00

Build commands were taken from the AUR package, I just removed those that write to system directories:

runhaskell Setup configure -O --enable-shared --enable-executable-dynamic --disable-library-vanilla \
        --dynlibdir=/usr/lib --libsubdir=\$compiler/site-local/\$pkgid
	
runhaskell Setup build

@nomeata
Copy link
Owner Author

nomeata commented Jan 14, 2022

Hmm, maybe my changes on the branch are just broken. :-/. Will have to careful look at them again.

@nomeata
Copy link
Owner Author

nomeata commented Jan 14, 2022

Did you run multiple arbtt-capture instances at the same time maybe?

@Konzertheld
Copy link

Did you run multiple arbtt-capture instances at the same time maybe?

no :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants