Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Core #668

Merged
merged 10 commits into from
Mar 13, 2022
Merged

New Core #668

merged 10 commits into from
Mar 13, 2022

Conversation

diegogangl
Copy link
Contributor

@diegogangl diegogangl commented May 8, 2021

small_CC_emerghed_121210_wg

This is it! The new core, at least the first part of it.
Note that this isn't plugged into GTG at all. You can only interact with it through unit tests and the developer console.

This is heavily WIP and subject to a lot of change. I also push --force a lot. Also this doesn't include any UI changes that will be necessary to actually plug this into gtg.

Inviting @Neui @jaesivsm @leio @zeddo123 to chime in on this. No pressure, it's not a blocker for 0.6. Let me know what you think of the API, internals, etc. ☕

New stuff

  • Clearer separation of concerns. Where do you find everything related to tags? In the tags module. Tasks? The Tasks module. The basic types (Task, Tag, Saved Search) are relativley dumb dataclasses that don't know anything outside of them and their children (exception for tasks and their parents). The store classes take care of adding, removing, parenting, loading, saving, filtering and sorting. The datastore, like the rug in the big lebowsky, pulls everything together. All the code is similar and around the same place.
  • Better performance. It can load the entire bryce data set in roughly 1 sec. Filtering and sorting is also pretty fast. Saving the bryce set takes 75ms.
  • Made for unit testing and debugging. The store classes barely touch each other and the basic data types don't know anything outside of themselves. This makes it easier to write unit tests, since everything is separated. On top of that the classes include useful string reprs and print functions to inspect them with the developer console.
  • Saved searches are now a type of data and not tags with a special parent.
  • You can pop a developer console at any point and create a new datastore to play with. This will be possible even after the new core is plugged into the UI. It should make debugging weird task issues a lot easier.
  • Adds 43 new unit tests
  • Mostly Gtk4-ready. We can (probably 🤞 ) turn the dataclasses into GObject subclasses and the stores into nested list models, and get that sweet sweet performance. (Maybe we can do the former now and just be done with it?)
  • Supports filtering by multiple tags (picking only tasks that have the requested tags)
  • Includes a custom filter function that can be used by plugins to create their own views
  • Sorting is quite flexible and plugins can also create their own sorting
  • Lots of opportunities and place to add caches and improve performance (mostly in Taskstore)
  • Can also filter by parent tasks only or children only (probably only useful for debugging tho)
  • Adds GTK signals when adding, removing and re-parenting
  • Adds task counts per-tab (open, actionable, closed)
  • Adds a new function to the datastore to generate any number of tasks randomly for performance testing, etc
  • Can load a 13.5 MB XML file in roughly 5 seconds (10,000 tasks + 5,200 tags + 706 saved searches)

Not done yet

  • Search. The search module is quite well done and separated. It can be adapted very easily to the new core when the time comes.
  • The repeating task code. This is still being debated
  • Missing unit tests for all the sorting and filtering
  • Probably missing more unit tests for bad behaviours (suggestions welcome!)
  • Haven't tested the backends at all. I think the loading system might be unit-testeable now.
  • I think task and tag purging should be here too (in the appropiate store classes)
  • Signals. I haven't added any Gtk signals yet, as I need to research them more thorougly. This probably means all the store classes will have to subclass GObject.
  • Lots of bugs lurking in the darkness :)
  • Not exactly related, but I want to add a function or module that generates a large dataset of tasks, tags and saved searches with random words and properties. Something more real and dynamic than 'mmmm mmmm mmmmmmm' :p
  • More documentation, examples, "spit and polish" stuff, etc`

Testing

You can run tests with the usual ./run-tests. Or enable the Developer Console plugin and press F12 to open it.

From there import whatever modules and classes you want to play with. Almost all of them work independently. You need to give TaskStore.from_xml() a TagStore.

Examples:

from GTG.core.datastore2 import Datastore2
ds = Datastore2()

ds.print_info()  # (will say it's empty)

# Replace this with your path to GTG
ds.load_file('/home/you/gtg/tmp/default/xdg/data/gtg/gtg_data.xml')

ds.print_info()
# Example from the bryce set
#
# Datastore [Initialized]
# - Tags: 78
# - Saved Searches: 0
# - Tasks: 1771

task = ds.tasks.new('My new task')
task.id  # Will print the UUID

task.status # Will print  "open"
task.toggle_status()  

task.status # Will print  "closed"
task.toggle_status()  

tag = ds.tags.new('my cool tag')
tag.icon = ''

task.add_tag(tag)

# Goodbye task!
ds.tasks.remove(task.id)

# Each store class also has these two functions to print the list,
# as well as a tree of their items
ds.tasks.print_tree()
ds.tags.print_list()

# Get a list of dismissed tasks
from GTG.core.tasks2 import Filter, Status
dismissed = ds.tasks.filter(Filter.STATUS, Status.DISMISSED)

# Sort them by modified date
ds.tasks.sort(dismissed, 'date_modified')

Here's a one-liner import for the datastore and the xml to make life easier:

from GTG.core.datastore2 import Datastore2;ds = Datastore2();ds.load_file('/home/you/gtg/tmp/default/xdg/data/gtg/gtg_data.xml')

Here's a one-liner to init a datastore and fill it with random data (500 tasks):

from GTG.core.datastore2 import Datastore2;ds = Datastore2(); ds.fill_with_samples(500);

@diegogangl diegogangl added enhancement Pull Request performance Issues affecting the speed and responsiveness of the application labels May 8, 2021
Copy link
Contributor

@Neui Neui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it is late here, not so throughout basic look at the diff, mostly about the implementation itself.
I like that properties are now being used.

  • Not exactly related, but I want to add a function or module that generates a large dataset of tasks, tags and saved searches with random words and properties. Something more real and dynamic than 'mmmm mmmm mmmmmmm' :p

There is a stress test script in the scripts folder, but it uses an old currently removed DBus interface.

Ok too late here, good night.

GTG/core/tasks2.py Show resolved Hide resolved
GTG/core/tasks2.py Outdated Show resolved Hide resolved
GTG/core/tasks2.py Outdated Show resolved Hide resolved
GTG/core/tasks2.py Show resolved Hide resolved
tests/core/test_task2.py Outdated Show resolved Hide resolved

def save_file(self, path: str) -> None:

temp_file = path + '__'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using tempfile.mkstemp might be better, just in case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not actually creating the file there, just figuring out the name. Later we rename the main file into the temp name before saving.

encoding='UTF-8')

except (IOError, FileNotFoundError):
log.error('Could not write XML file at %r', path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering about passing errors. Like imagine if you exit GTG and for some reason you can't write the data. We could then display a dialog box telling the user that and the user could decide not to exit GTG to maybe export the precious tasks somewhere else.

Comment on lines 248 to 342
except PermissionError:
log.debug('Not allowed to open: %r. Trying next.', filepath)
continue

except etree.XMLSyntaxError as error:
log.debug('Syntax error in %r. %r. Trying next.',
filepath, error)
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe save that as well and maybe report to the user? Like another dialog box saying that it found files but couldn't open them (and thus loaded the defaulted one)

GTG/core/saved_searches.py Outdated Show resolved Hide resolved
GTG/core/tasks2.py Outdated Show resolved Hide resolved
@diegogangl
Copy link
Contributor Author

@Neui thanks for the quick review! I'll hopefully get to these by the weekend.

There is a stress test script in the scripts folder, but it uses an old currently removed DBus interface.
From what I could see the stress test script just generates tasks filled with UUID4s in the content. It's too far from real world usage to be really useful (not having tags, links, subtasks, long stretches of text, etc).

What I'm thinking about is a module that fills up the datastore and we can use from the dev console, then we can write that to a xml file to profile and test.

About errors, I kept swinging between letting the exceptions happen and get caught at a higher level or trying to manage the errors somehow. Maybe this can tie with your work on the error handler?

@Neui
Copy link
Contributor

Neui commented May 10, 2021

About errors, I kept swinging between letting the exceptions happen and get caught at a higher level or trying to manage the errors somehow.

I think letting it throw exception is better (unless it does something like trying all files while trying to open the database).

Maybe this can tie with your work on the error handler?

The error handler more for generic, unexpected errors and is a bit technical for normal end-users.
I'd guess some parts could be re-used (like hide the stack trace behind the additional options thing), but I'd rather make new dialog boxes for that kind of stuff.

Copy link
Contributor

@jaesivsm jaesivsm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only commented on what I seem to understand the best which are the tasks stuff.

All in all it seems so much cleaner 👍 :)

GTG/core/tasks2.py Outdated Show resolved Hide resolved
GTG/core/tasks2.py Outdated Show resolved Hide resolved
GTG/core/tasks2.py Show resolved Hide resolved

return task


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both method below from_xml and to_xml seems misplaced, don't they ? I may be mistaking but I see them more belonging in an xml related module.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that each store class is responsible for its xml representation. So taskstore reads/writes <tasklist> tags, tagstore does taglist and so on. Then the Datastore pulls them all together to read/write the entire file.

@diegogangl
Copy link
Contributor Author

Quick update: I did most of the suggested changes. I've also converted the dataclasses and store objects to GObjects and I've started adding signals. Will squash the Ws over the week.

Does anyone know if there's a performance hit for connecting lots of signals? For instance if we were to add signals to task objects and connect them in the taskstore. Would it be slow if there were +1000 task objects?

@diegogangl
Copy link
Contributor Author

Hey @zeddo123 o/, I've completed almost everything in this branch. I think it's a good time to talk about how to implement recurring tasks, do you have any ideas? I remember we talked about a python library to parse dates on some other ticket.

I've added an overview doc here:
https://github.com/getting-things-gnome/gtg/blob/c22c1e08fa498322d712c5a5ca235355cfbbc034/docs/contributors/core%20architecture.md

And the new Task class and TaskStore are here:
https://github.com/getting-things-gnome/gtg/blob/c22c1e08fa498322d712c5a5ca235355cfbbc034/GTG/core/tasks2.py

@Neui
Copy link
Contributor

Neui commented Sep 4, 2021

I remember we talked about a python library to parse dates on some other ticket.

According to #645 (comment) it was dateutil.

@zeddo123
Copy link
Member

zeddo123 commented Sep 4, 2021

Great, I'll try the new core and write up a small doc of how we could do repeating tasks. A while ago, I've made some experiments with dateutil. I found myself writing less code for repeating tasks overall.

To get the next occurrence, I only had to do:

rruleset.after(today, True) # to get the date
rruleset.exdate(self.task.due_date) # to exclude the previous date 

@zeddo123
Copy link
Member

zeddo123 commented Sep 5, 2021

I thought we could first agree on the related fields that repeating tasks should have. Here's what I came up with:

Fields related to repeating tasks

All of these fields should cover the issues we've had with repeating tasks.

  • enabled (True or False)
  • repeats_on (start_date, and/or due_date)
  • next_tid -- hold the id of the tasks that was duplicated after the current task (None if task is still active, uuid otherwise)
  • rules -- to support multiple repeating terms
    • rule (rrule object from the dateutil lib)
  • timestamp -- updated each time the repeating information is modified.
  • count

XML example

<repeating enabled="true">
  <repeats_on>ON_DUE</repeats_on>
  <next_tid>234234-234234-234234-234-234</next_tid>
  <timestamp>23-04-2020</timestamp>
  <count>3</count>
  <rules>
    <rule>
      <freq>TUE</freq>
      <start>05-02-2020</start>
    </rule>
    <rule>
      <freq>WEEKLY</freq>
      <start>05-02-2020</start>
    </rule>
  </rules>
</repeating>

We could also make a dedicated Repeating class that holds these fields separate from the Task class but I haven't thought of a way to make them loosely coupled (if that's even possible).

@jaesivsm
Copy link
Contributor

jaesivsm commented Sep 5, 2021

XML example

If I may, rrule casting to string might be simplified to great extent. Considering the following snippet :

>>> from dateutil import rrule
>>> rrule.rrulestr('RRULE:FREQ=MONTHLY')
<dateutil.rrule.rrule at 0x7fc5427e0b50>
>>> str(rrule.rrulestr('RRULE:FREQ=MONTHLY'))
'DTSTART:20210905T231505\nRRULE:FREQ=MONTHLY'

And considering the DTSTART might be held somewhere else on the task (say, Task.added_date the current implementation), very basic parsing would allow to hold all needed information in something like :

<repeating enabled="true">RRULE:FREQ=MONTHLY</repeating>

removing the need to parse out the actual rule in the xml.

next_tid -- hold the id of the tasks that was duplicated after the current task (None if task is still active, uuid otherwise)

As stated here and there I'm really not in favor of cloning task for the sake of the repeating task feature.

I would add to the list of benefit from another implementation that we would skip the whole "keeping-track-of-previous-iteration" thingy that is then mandatory and adds a lot of complexity to (among other thing) the xml payload.

@zeddo123
Copy link
Member

zeddo123 commented Sep 6, 2021

And considering the DTSTART might be held somewhere else on the task (say, Task.added_date the current implementation)

That might work for one repeating term. However if we had more than one, they would have different DTSTART dates.

As stated here and there I'm really not in favor of cloning task for the sake of the repeating task feature.

I'm not that attached to cloning tasks either. On the other hand, users might rely on having a history of the tasks they've done.
In any case, if we are to remove duplication, our work would be pretty much straight forward. And the count field we would have a simple way to track how many times the tasks was repeated.

@jaesivsm
Copy link
Contributor

jaesivsm commented Sep 6, 2021

That might work for one repeating term. However if we had more than one, they would have different DTSTART dates.

Rrule allow you various concurrent repetition, see here, twice a week on Tuesday and Thursday

RRULE:FREQ=WEEKLY;BYDAY=TU,TH

If BYDAY isn't specified, DTSTART is used to determine the first occurrences and the time of the repetitions.

That would cover several repetition in a single term (weekly / monthly and all) but it wouldn't allow various repetition term (somehow weekly and monthly for example). Is that something we want ?

I'm not that attached to cloning tasks either. On the other hand, users might rely on having a history of the tasks they've done.

The history is an interesting point. There is a feature in the caldav spec where you exclude occurrences but nothing to actually keep track of history. In the same way, we could mark history as a list of passed occurrences. It's definitely not handled by the dateutil.rrule module.

Although, a simpler way to look at it would be : history are just all occurrences from DTSTART/added_date.

TL;DR I feel it adds in complexity to an already complex feature.

@diegogangl
Copy link
Contributor Author

@zeddo123 I like where this is going!

timestamp -- updated each time the repeating information is modified.

Is there a reason to have a specific update for the repeating fields, instead of using the modified field in the tasks?

That would cover several repetition in a single term (weekly / monthly and all) but it wouldn't allow various repetition term (somehow weekly and monthly for example). Is that something we want ?

That's a good question. Multiple repetition terms would be a powerful feature, but it would also increase complexity. Are there any good use cases?

The history is an interesting point. There is a feature in the caldav spec where you exclude occurrences but nothing to actually keep track of history. In the same way, we could mark history as a list of passed occurrences. It's definitely not handled by the dateutil.rrule module.

Although, a simpler way to look at it would be : history are just all occurrences from DTSTART/added_date.

What would happen if the user changes the recurring term? Say going from weekly to monthly? Would that "obliterate" the previously done tasks from history?

Maybe we should keep track of closed occurrences (their date). That would probably bloat the file too, so we would have to include them in the method that purges tasks.

@jaesivsm
Copy link
Contributor

jaesivsm commented Sep 6, 2021

What would happen if the user changes the recurring term? Say going from weekly to monthly? Would that "obliterate" the previously done tasks from history?

I can't say what we should do, but taking example on various implementation of Caldav, when the recursion is changed, either the DTSTART contained in the rule was nonexistent and is added, thus ignoring the DTSTART (equiv to added_date in GTG) of the task, or it is updated to "now". Previous value is indeed obliterated and history is lost.

But again, do we need this history ? We don't keep it for any other value change (category, status, or anything).

Maybe we should keep track of closed occurrences (their date). That would probably bloat the file too, so we would have to include them in the method that purges tasks.

A simple way to look at that is that would be that all closed occurrences, are those that happened between added_date and closed_date at the pace of the rrule. But we would lose the precise moment at which each occurrences but the last one has been closed. Seems an ok deal to me regarding the complexity avoided.

@zeddo123
Copy link
Member

zeddo123 commented Sep 6, 2021

Is there a reason to have a specific update for the repeating fields, instead of using the modified field in the tasks?

I guess not. with dateutil begin powerful enough we might get a way with using modified (or not needing it at all)

That's a good question. Multiple repetition terms would be a powerful feature, but it would also increase complexity. Are there any good use cases?

I tried to think about some cases where it might be needed, but I could find any. @nekohayo might have some thoughts on this.
There is also a case for making the repeating module extensive enough that we don't have to add anything major in the future (e.g if a user needed complex repeating rules, it could be added with very minimal development (just some UI))

Maybe we should keep track of closed occurrences (their date). That would probably bloat the file too, so we would have to include them in the method that purges tasks.

As an alternative to cloning tasks, this has less bloat. Instead of having many duplicated tasks, we would have a (very) long task. The downside is displaying the history. Could we make a "meta" task? (a task that appears in the trees, but isn't really a task)

@diegogangl
Copy link
Contributor Author

diegogangl commented Sep 6, 2021

But again, do we need this history ? We don't keep it for any other value change (category, status, or anything).

Mostly for statistics, or to see if you payed that bill that you were supposed to pay.

As an alternative to cloning tasks, this has less bloat. Instead of having many duplicated tasks, we would have a (very) long task. The downside is displaying the history. Could we make a "meta" task? (a task that appears in the trees, but isn't really a task)

Yup, with this branch we are now in "anything goes" territory. Ideally we would always use the stores as ListStores for Gtk4's new list widgets, but we already have a filter() method that returns a new list. We could use that list to build a ListView manually, "making up" rows for tasks in the recurrent history.

@diegogangl diegogangl marked this pull request as ready for review September 9, 2021 13:58
@diegogangl diegogangl force-pushed the new_core branch 3 times, most recently from 12fb8c8 to 34f5f08 Compare September 11, 2021 00:26
@diegogangl
Copy link
Contributor Author

@zeddo123 doesn't look like I can let you push to this branch, what do you think about making a new branch to work on the repeating tasks once this is merged?

I'm totally ready to push the big green button now, but will wait for objections from @Neui , @jaesivsm and the others :)

@diegogangl
Copy link
Contributor Author

Ah that makes sense, I'll change that too

@diegogangl
Copy link
Contributor Author

Alright, fixed most of the issues. Still need to take care of backends

@diegogangl diegogangl force-pushed the new_core branch 2 times, most recently from b494b80 to 4228a95 Compare September 24, 2021 00:29
@diegogangl
Copy link
Contributor Author

@Neui pushed all the changes you suggested now. The BaseStore.remove() method has become a lot simpler :D
Did you have any other notes?

@zeddo123
Copy link
Member

zeddo123 commented Feb 3, 2022

As an alternative to cloning tasks, this has less bloat. Instead of having many duplicated tasks, we would have a (very) long task. The downside is displaying the history. Could we make a "meta" task? (a task that appears in the trees, but isn't really a task)

I've been thinking about this, and I am not sure if we should drop cloning tasks. one behaviour that I liked when repeating tasks were cloned is that they had their separate description. For instance, if I had a repeating task prepare presentation for class every:week I am able to add specific comments in the task's body about the presentation of that week.

The discussions about repeating tasks have been all over the place, we probably should stick to a single thread.

@diegogangl
Copy link
Contributor Author

Yeah it might be better to stick with what we know, at least for the initial port.

@diegogangl
Copy link
Contributor Author

@zeddo123 the eagle has landed. Feel free to hack on the repeating dates there.
Remember it's not plugged into anything yet (there will be another PR for that). You can play with it using the dev console and unit tests.

@nekohayo nekohayo added the maintainability Automated tests suite, tooling, refactoring, or anything that makes it easier for developers label Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement maintainability Automated tests suite, tooling, refactoring, or anything that makes it easier for developers performance Issues affecting the speed and responsiveness of the application Pull Request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants