Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Dates #1

sirupsen opened this Issue Feb 21, 2012 · 29 comments


None yet
5 participants

I envy the simplicity of Atea, however, even though it would add to the complexity, wouldn't it be beneficial to see the dates in the CSV of when a task was created and/or finished?


pkamenarsky commented Feb 21, 2012

Yes, I've thought of that too but it would require a bit of additional code mainly for detection of new and closed tasks. I'll give it a try whenever I have the time.

I think it'll definitely be worth it. Improves the "[..] get a clear picture of what you have been doing the last couple of months." part.

@ghost ghost assigned pkamenarsky Feb 21, 2012

Actually, if you're going to implement this, it'd be really nice if you could have an option to write out each time someone starts and finishes working on a task. The nice thing about this is it lets you create a calendar that lets you visualize exactly what you've been spending your time on.


pkamenarsky commented Feb 22, 2012

Yes, I'm not quite sure what would be the best way of doing this though - producing a time entry on start/stop or on create/close?

Start/stop is more valuable as information but it would potentially produce huge files after a couple of months, so idk :/

One option would be to store the times in a directory rather than a single file, perhaps creating a new file per month or per year or just letting the user click a button that says start the next file. I know that each option adds a new layer of complexity, but I really think this is worth it.

Doing some quick math:

Let's say that a person starts and stops a task once every 5 minutes, 24 hours a day for a month, and that each line is 500 bytes. I feel like all of these are fairly generous overestimates. This comes out to ((60 * 24) / 5) * 30 * 500 bytes = 4 megabytes per month as an upper bound. That doesn't seem too bad to me..

Being able to nicely track what you were doing and when you were doing it is a huge step up information-wise from how long you spent on each task (and a visual calendar view gives you a lot of information in a little amount of time).

Also, thanks for creating this in the first place.


pkamenarsky commented Feb 22, 2012

Glad you like it :)

Yes, I think a file per month per .tasks file shouldn't be that bad. In this case it would actually be ok to abandon the current scheme at all, and just append a new entry to the current month file whenever someone starts or stops a task. Currently the csv file would grow indefinitely, since deleted items in the tasks file stay in the csv file.

The only problem I see with that scheme is that it wouldn't be possible to show the total time / task in the GUI anymore; I guess it would be possible to maintain a running counter for every csv entry and then try to find a match for the GUI task in all of the months files, but this is kinda messy.

What do people think about abandoning the total time per task in the GUI (called "Sum" currently)?

I think it's nice to be able to see at a glance how much total time you've spent on something, especially for short-lived tasks that just take an hour or two. I guess what I was imagining wasn't a replacement of the original file, but rather just another file that recorded start/stop times. There wouldn't be any need to synchronize the two files, it'd really just be an "events" file per month that let you do additional processing/visualization.

I might be missing something though -- is there a problem that you see with having the original file that kept sums and then a start/stop file per month?


pkamenarsky commented Feb 22, 2012

I guess there's no problem per se - but the original idea was to have one tasks / csv file per project or module in order to keep the csv file small, since it keeps all created tasks, ever.

However, if we now keep a csv file for months or years (+ a start/stop file), it could grow indefinitely - I don't know if this is really gonna be a problem, since a text file isn't gonna be the best solution for managing thousands of bugs. Then again, who knows - vim's (and I guess Emacs') search function is pretty awesome, so who knows what people will abuse that for :)

Keep in mind that even if removing the total time, the "session" time (the time elapsed since the last start work click) will still be visible.

For total time of a task I could imagine an additional shell script - #21

But I don't know, I'd like if more people would chime in :)

So a row for each interval worked:

project name worked estimate
[Atea] Add dates 5 45
[Atea] Fix #49 45 30
[Atea] Fix #42 0 60
[Atea] Add dates 10 45
[Atea] Fix #49 30 45

Results in duplicates: estimate and project. Furthermore, the identifier of a task is the name which doesn't seem very ideal as: "Laundry" or "Walk the walrus" may appear multiple times.

Why is the .csv file interesting? Because you can inspect your history. A single entry will rarely be interesting by itself.
For instance you might want to find out whether you are usually quicker or slower than your estimate of doing your laundry. Thus you will average "worked" on all rows where name matches /laundry/i and compare it to the average "estimate" for the same rows, and calculate a relative or absolute difference in order to determine how your actual completed time compares to your estimate for this type of case.

Doing this with the csv would require either an SQL query, or clever use of a spreadsheets program. The target group of this application seems to be programmers like yourself who generally prefer SQL. Programmers who love to use their text editors, and enjoy a menu bar icon too. Focusing on this target group I believe is the strength and what would differ this application from the dozens of other similar applications.

I propose the usage of sqlite to store this historical data, with three tables: projects, tasks and sessions. sqlite is fairly simple, it adds to the complexity, however, I believe it's worth it. Additionally you can easily keep sum in the app, by running an SQL sum query on the sessions of the task with the given id. An sqlite database can, like csv, be stored in a Dropbox. It adds a dependency, however, it could optional: No sqlite, no time-tracking, or fall back on the simple csv approach. Most users of this application would know how to install it with homebrew or macports if needed.

A project has many tasks, a task has many sessions. This allows for very interesting analysis via SQL which I believe other users of this application would love. It makes the app more opinionated, however, in my opinion for the better for the target group.

If you do something like proposed #21, you could make some really interesting analysis and charts with this approach that encourages lots of data. E.g. sessions would have "started_at" and "ended_at", and tasks the same (where ended_at is set when it is deleted from tasks.txt)

Perhaps using different CSV files to represent worksheets referencing each other would work too?


pkamenarsky commented Feb 22, 2012

@sirupsen this is a nice fresh point of view - I like it. Instead of having to do some awk / set mumbo-jumbo, just run a structured query and you can get whatever you like.

Some considerations:

  • what would be the latency of opening a sqllite file, running a query on let's say 10k records, and returning a result? Currently Atea is parsing the .tasks file whenever the status item icon is clicked so if this takes more than, idk, 100ms, it's gonna be a problem.
  • what is the dropbox sync behavior of sqllite, i.e. if you just change one row in one table, how much data are you gonna have to transfer (I think dropbox uses rsync)?
  • are there other single file databases that might work, i.e. berkleydb or smth like mongo etc?


$ sqlite3 test.db "select count(*) from tasks;"
$ time sqlite3 test.db "select * from tasks where name LIKE '%32%' AND id > 9800;"
ohai 73208|9831
ohai 32643|9888
ohai 32170|9902
ohai 13432|9917
ohai 32795|9927
ohai 32209|9930
ohai 12321|9933
ohai 15327|9946
ohai 53235|9977
ohai 30322|9982
ohai 53432|9997
ohai 32753|9998

real    0m0.008s
user    0m0.002s
sys 0m0.003s

sqlite is pretty fast! It's much faster and goes much further than most people expect, it also has e.g. free text search, which could come in handy too.


$ cp test.db test_10_001.db
$ sqlite3 test_10_001.db "insert into tasks values('ohai 328423', null);"
$ diff -a test.db test_10_001.db | wc -l
$ cp test.db test_update.db
$ sqlite3 test_update.db "update tasks set name = 'testing' where id = 1234;"
$ diff -a test.db test_update.db | wc -l

The -a flag treats the pseudo binary db as text (doesn't make sense to diff a binary), and only 9 lines (that aren't very long) update on updating, and 12 on insertion. I believe Dropbox/rsync works by diffing existing, so that should be perfect.

10 000 rows is:

$ ls -lh | grep "test.db"
-rw-r--r--   1 sirup  staff   184K Feb 22 18:04 test.db

184k, i.e. 184kB / 10 000 = 0.0184 kB = 18.4 b per row (and some of this is presumably boilerplate for the db, i.e. table columns and so on.), this will ultimately be bigger since a task will have more to it then than just an id and name, however, it gives an idea of the scale.


I have no experience with any of the others.


pkamenarsky commented Feb 22, 2012

Thanks for the analysis - I'm already almost convinced this is the way to go. I'm gonna try and look for a textfile based database if there's any, but if not - sqllite (or whatever else) it is.

Cross referencing CSV sheets would probably work well, however, I see the major benefit of this being SQL and the immense power that lies behind it for further analysis. I'm only mentioning sqlite because it's what I know. I know Zed Shaws use it for configuration for the mongrel2 project. Check his blog post about why.

I'm totally with @sirupsen on this one, tho it will 'ruin' the simplicity and minimalism of the app. I would much rather see that it moved all of the done task to a text file or something of that matter than implement a database structure

@metamorfos It becomes more difficult to do analysis then though. It won't ruin the simplicity of the app itself, it will look the same, the backbone will simply be changed. It will, however, add a layer to a beautifully simple implementation.


pkamenarsky commented Feb 22, 2012

@metamorfos the input .tasks file would still be plain text - what would change is that the output file wouldn't be csv anymore but rather a database. And once you have more than 20-30 tasks in a csv file you'll need some sort of analysis tool anyway - be it sed, awk or a shell script - and in that case you would be much better off with a structured sql query than trying to parse text files and make sense of them.

Sure, it would add some complexity to the implementation (not the user interface), but I'd rather have a more useful app than just simple code.

Well, I'm a fan of simple code.
I look forward to the changes tho!


profer commented Feb 22, 2012

An SQLite3 backend sure would open up all kinds of interesting possibilities but to me personally, ateas striking simplicity is its biggest asset.

There's a lot of beauty in having your tasks and projects open in a buffer in VIM/Emacs/TextMate and just have atea take care of the time tracking. Keeping all the data in an SQLite database with a unique ID for tasks and sessions would also necessitate a GUI for adding/editing tasks and if you want to go in that direction you might be better of with using an app like Billings or On The Job from the start.

For keeping a log of the start/stop times I would instead create a CSV file per day and append an entry each time a task is stopped.

Project, Task, Started, Duration
[Atea], "Dates, Issue #1", 2012-01-17 10:21, 00:42:22
[Atea], "Dates, Issue #1", 2012-02-20 09:56, 02:34:22
[Atea], "Dates, Issue #1", 2012-02-21 22:06, 01:30:22

For easier reporting it might be beneficial to record the start time as a unix timestamp and the duration as an integer in seconds. If I have my work log in separate CSV files for each day I can then simply open them in Numbers or Excel and do some quick charts or stay in the terminal/VIM and do crazy stuff like the following to get the number of hours spent on Atea in February 2012.

>> ls task-times*csv                                     
task-times-2010-02-20.csv task-times-2010-02-21.csv

>> cat task-times-2010-02-21.csv                         
[Atea], Issue #1, 1329943213, 1234
[Atea], Issue #1, 1329941344, 32144
[Atea], Issue #1, 1329912331, 5545

>> grep "[Atea]" ~/task-times-2010-02-*.csv | cut -d ',' -f 4 | paste -sd+ - | bc | xargs printf '%d / 3600.0\n' | bc

Anyway, if you decide to implement an SQLite3 backend there is no need to install it via homebrew. Apple ships it with OS X by default as it's the preferred backend for Core Data.


pkamenarsky commented Feb 22, 2012

@profer Yes, I would never exchange the .tasks file for a database. The idea so far is that Atea reads the text file and displays its items in the GUI, like it does now; it would however store the tracked times in a database, i.e. whenever a task is started / stopped a new time_intervals (or whatever) row is added.

The task ID would just be a hash of project + description, since this is everything identifying an entry in the .tasks file anyway.

The advantage is that by having a real database, you can run all sort of structured queries on it that you wouldn't be able to do with awk, sed etc in the command line with csv files.

From an implementation point of view, storing the intervals in csv files per day / month would mean that every time Atea is activated it'd then have to read / cache all of the csv files just in order to compute the total time of a task, or alternatively maintain a separate csv file with the total times - it gets messy fast. With sqllite the app would just need to run a quick query and be done.


pkamenarsky commented Feb 23, 2012

One thing tough that a database might not be that well suited for is manual correction of elapsed times - I just forgot to activate a task, remembered 15 minutes later and only had to edit the minutes in the csv file in order to adjust the time.

I have to admit this is something that happens pretty often to me, and I loved Fogbugz' feature of being able to edit the timetable when forgetting to start / stop the time.

With a database I guess this would make an extra command line tool necessary.

I'd have initially suggest sqlite myself, but I had assumed you wouldn't want it. But if you're on board with that, I think that's great.

Regarding editing you start/stop times, I think you could have the GUI have something where you can easily edit the past 10 start/stop times, and then maybe add a start/stop time for a task that you did when you didn't have your computer with you. I don't think you need a command line tool for this.


profer commented Feb 23, 2012

@pkamenarsky It depends on what kind of app you want atea to be. For me its a time tracker where the main interface for the tasks and the time record is a texteditor. I can build all kinds of fancy stuff myself to have charting, standard deviations, reporting etc. but the app does not force that complexity on me. A database backend feels to me like opening a can of worms, so bear with me while I ramble along :)

Adding SQLite to the mix might get messy fast - maybe not from an implementation standpoint but certainly from a UI standpoint. Then I'd have a textfile where I keep my projects in, the menubar item to actually track time and a bunch of shell scripts and/or sqlite3 itself for anything I want to do with my time records.
Also, the full benefit of the SQLite DB would only be realized if you move to an at least somewhat normalized datamodel like @sirupsen proposed, with tables for projects, tasks and sessions. But the question then is how to map the tasks file to the DB. Using hashes for the keys would make it even more unwieldly to work with the data. Users who don't want to tinker with SQL would be dependent on the provided shell scripts to do anything.

If you change or add something to the datamodel later you would need to have a system similar to RoR migrations which would make the datamodel more and more opaque over time.

When you're using any other device, where atea isn't installed or not available like a Windows machine at work or an iOS device with Dropbox, you can't do anything with your records at all. Otherwise, if you want to track some time or look something up you can still use any of the Dropbox enabled texteditors available on iOS/Android or Notepad, Excel, LibreOffice etc.

And if you really want to use SQL to mess with the data, no one is stopping you: http://www.sqlite.org/cvstrac/wiki?p=ImportingFiles

So the only real benefit I see with SQLite is that it would be easier to calculate the total time ('Sum') in the GUI. As I was going to suggest to change that into the total time spent the current day I can live with that :)

@profer Agree with Wolfgang, it's all up to what you want this app to be. It's amazing with all these insightful replies. I hope we can take the best of all the ideas.


pkamenarsky commented Feb 23, 2012

Ok this is a tough decision so I'd like to be as pragmatic about it as possible. What about the following structure:

  • total-times.csv - 1:1 mapping of the task file - its only use is to track the total time for the interface. If you remove a task in the .tasks file it is also going to be removed from total-times, meaning that renaming a task to something and then renaming it back would clear the total amount of time.
  • a csv file per month for the start / stop times & estimates
  • an external script for normalization / import into sqllite - task ID would be key & desciption.

I see one big additional benefit with this scheme - all changes to a task's attributes (priority and estimate so far) would be tracked explicitly in the csv files and can then be handled differently by the import script - it can just ignore them or alternatively, build a more sophisticated database schema where those changes are reflected somehow.

Especially changes to estimates may be very useful for statistical analysis - i.e. what is the "estimate quality" of a person before starting a task and after; is the second estimate given more accurate if given after having worked 2h on a 10h task or after 9.5h? And so on.

Having a built in schema allowing for tracking those changes would basically mean either having a flat table mirroring the csv structure or having a relatively complex schema that would needlessly complicate the code and probably necessitate migration scripts in the future.

Also, importing one year worth of csv files into sqllite for further analysis will take how much, 5 - 10 minutes? If even that - I think that's acceptable.

What do you guys think?

I like that approach. I think the external script for sqlite is the right thing to do.

Also, loading 100000 records with 5 columns into sqlite takes something like 1-2 seconds.

@pkamenarsky pkamenarsky reopened this Feb 23, 2012


pkamenarsky commented Feb 23, 2012

Oops, wrong button :)

@arjun810 I imagine normalization would take a little longer, maybe a whole 5 seconds :)

Yea, I like that approach a lot.

That's true -- when I just tested it, it was just loading some randomly generated data. Either way, definitely acceptable :)


profer commented Feb 23, 2012

@pkamenarsky I like your proposed solution, although I found a very minor issue when it comes to my intended usecase. If I manually add/edit the times in the monthly record, the sum in the UI would not be updated to reflect those changes.

I'll wait till you've implemented it and then I'll immediately file a bug report :-P

Other than that I think we've found a good way of keeping track of when (and not just how much) time was spent on a task.


pkamenarsky commented Feb 23, 2012

@profer Propose a better solution before filing future bug reports please :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment