Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I don't know what's happening during update_feeds() #204

Closed
lemon24 opened this issue Dec 7, 2020 · 7 comments
Closed

I don't know what's happening during update_feeds() #204

lemon24 opened this issue Dec 7, 2020 · 7 comments

Comments

@lemon24
Copy link
Owner

lemon24 commented Dec 7, 2020

I don't know what's happening during update_feeds() until it finishes. More so, there's no straightforward way to know programmatically which feeds failed (logging or guessing from feed.last_exception don't count).

From https://clig.dev/#robustness-guidelines:

Responsive is more important than fast. Print something to the user in <100ms. If you’re making a network request, print something before you do it so it doesn’t hang and look broken.

Show progress if something takes a long time. If your program displays no output for a while, it will look broken. A good spinner or progress indicator can make a program appear to be faster than it is.

Doing either of these is hard to do at the moment.

@lemon24
Copy link
Owner Author

lemon24 commented Dec 7, 2020

The simplest solution (to implement) is making an update_feeds_iter() that yields (feed URL, exception or none) tuples. update_feeds() could just call that (in a way, it already does). This could also allow us to bubble up _NotModified (since it may be useful to users).

Update: What about warnings?

Update: Should we also return the (new, updated) counts?

@lemon24
Copy link
Owner Author

lemon24 commented Feb 16, 2021

Here's the whole update data flow as of v1.13.

  • dashed arrows show "secondary output" (logging) that we may want to return to the user as data
    • parse warnings (e.g. from feedparser "survivable" exceptions)
    • feed and entry update reasons (currently logged for debugging)
    • entry new/updated counts (currently logged for debugging / CLI details)
  • "not modified" is returned via a signalling exception; as mentioned above, we may want to change that

reader-update-dataflow

lemon24 added a commit that referenced this issue Feb 16, 2021
@lemon24
Copy link
Owner Author

lemon24 commented Feb 16, 2021

We have two possible ways of using update_feeds_iter():

# a progressbar(item_show_func=...) will show the last completed item

with click.progressbar(update_feeds_iter()) as bar: 
    for result in bar:
        # item is already done
        pass

# a progressbar(item_show_func=...) will show the item that was last started;
# in practice, it will be identical to the snippet above due to
# https://github.com/pallets/click/issues/1353

with click.progressbar(update_feeds_iter()) as bar:
    for url, process in bar:
        # the item is not done yet
        result = process()
        # now it's done

The first version is straightforward to implement.

The second one is not:

  • Since retrieving/parsing happens in parallel, what would process() actually mean? Adding the feed to a queue of things to be done just to enable this would require moving away from the map() model (Update feeds concurrently #152 (comment)).
  • Also, if we were to do that, the progressbar would still lie about the item being processed, e.g.: item A starts, bar shows "updating A", item B starts, bar shows "updating B", B finishes, A hangs – but the progressbar still shows "updating B", which is a lie.
    • To not lie, you'd need something like the https://github.com/mitsuhiko/indicatif MultiProgress to show all the things that are in progress, but you'd need to expose at least an update_start() callback that's called with the feed for update before it's sent to be retrieved; it would look something like this:
    • "All the things that are in progress" could be 1 progressbar/download + 1 progressbar for the rest, or 1 progressbar for each stage of the pipeline (1 for downloads, 1 for the rest); at this point, people may just want to re-implement update_feeds() to get better control, and we should likely enable them to do that.
# much pseudo-code

def update_start(feed):
    nonlocal multi
    # start a sub-bar with one item (either not done or done)
    multi.add(feed.url, Progress(label=feed.url, length=1))
     
with MultiProgress(update_feeds_iter(update_start=update_started)) as multi:
    for result in multi:
        # finishes the bar corresponding to url
        multi.update(result.url, 1)

Given that the last example is essentially the first version + update_start (i.e. we can add update_start later), I'll just start with the first version.

@lemon24
Copy link
Owner Author

lemon24 commented Feb 17, 2021

So, to decide what update_feed_iter() should return, we'll work backwards from what we'd like the output of the update command to be:

  • update:
    • progressbar with last successful; only if stdout is a terminal (click does this by default); should likely be on stderr (?)
    • at the end, print counts: updated feeds (with new or updated entries), error feeds, new/updated entries
  • update -v:
    • one line per feed; tab-separated: progress (1/82), url, new/updated entries OR not updated OR error
    • maybe the same progressbar, but it "floats to the bottom"; can't be done with click (printing during progressbar breaks it)
    • same summary at the end
  • update -vv: like update -v, but also logging (on stderr?)

The minimum thing to be returned for that would be something like:

@dataclass(frozen=True)
class UpdateResult:
    url: str
    # updated/new and exception are mutually-exclusive;
    # maybe "not modified" is signaled by updated/new is None AND exception is None
    updated: Optional[int]
    new: Optional[int]
    exception: Optional[ParseError]

Alternatively, we could model the mutual-exclusiveness from the start:

@dataclass(frozen=True)
class UpdateResultOK:
    url: str
    updated: int
    new: int

@dataclass(frozen=True)
class UpdateResult:
    url: str
    value: Optional[UpdateResultOK, ParseError]
    
    # we can still have the attributes from the first version as properties
    # (i.e. we can go with that first, and extend to this later)

    @property
    def updated(self) -> Optional[int]:
        return self.value.updated if isinstance(self.value, UpdateOK) else None

    ...

To either of these, we can later add Feed/EntryData or Feed/EntryUpdateIntents, if needed (e.g. if we want to allow people to do stuff for the updated entries without using a plugin).


Also, to show how many feeds are left, get_feed_counts() needs to support the same filter as update_feeds(), i.e. new_only.

@lemon24
Copy link
Owner Author

lemon24 commented Feb 17, 2021

The final version of the types.

Getting the count of feeds to be updated will happen in #217.

class UpdatedFeed:
    url: str
    updated: int
    new: int

class UpdateResult(NamedTuple):
    url: str
    value: Union[UpdatedFeed, None, ReaderError]
    # the exception type is ReaderError and not ParseError
    # to allow suppressing new errors without breaking the API:
    # adding a new type to the union breaks the API,
    # not raising an exception type anymore doesn't.
    # currently, storage or plugin-raised exceptions
    # prevent updates for the following feeds,
    # but that's not necessarily by design.

    # convenience methods, to show the semantics; optional

    @property
    def error(self) -> Optional[ReaderError]:
        return self.value if isinstance(self.value, Exception) else None
    
    @property
    def ok(self) -> Optional[UpdatedFeed]:
        return self.value if not isinstance(self.value, Exception) else None

    @property
    def not_modified(self) -> bool:
        return self.value is None

def update_feeds_iter(...) -> Iterable[UpdateResult]:
    # raises StorageError
    ...

def update_feeds(...) -> None:
    # raises StorageError
    ...

# def update_feed(...) -> None:
def update_feed(...) -> Optional[UpdatedFeed]:
    # raises FeedNotFoundError
    # raises ParseError
    # raises StorageError
    ...

@lemon24
Copy link
Owner Author

lemon24 commented Feb 18, 2021

Remaining:

  • code
    • replace raising _NotModified with returning None
    • UpdatedFeed, UpdateResult
    • update_feed_iter()
    • update_feed() return type
    • tests
    • reader update progressbar
  • docs
    • API stuff, versionadded/versionchanged
    • user guide
    • dev notes
    • changelog

lemon24 added a commit that referenced this issue Feb 19, 2021
lemon24 added a commit that referenced this issue Feb 19, 2021
Related to #204.
lemon24 added a commit that referenced this issue Feb 19, 2021
Related to #204.
lemon24 added a commit that referenced this issue Feb 19, 2021
Related to #204.
lemon24 added a commit that referenced this issue Feb 19, 2021
lemon24 added a commit that referenced this issue Feb 19, 2021
lemon24 added a commit that referenced this issue Feb 21, 2021
lemon24 added a commit that referenced this issue Feb 21, 2021
lemon24 added a commit that referenced this issue Feb 21, 2021
lemon24 added a commit that referenced this issue Feb 21, 2021
lemon24 added a commit that referenced this issue Feb 21, 2021
lemon24 added a commit that referenced this issue Feb 22, 2021
lemon24 added a commit that referenced this issue Feb 22, 2021
@lemon24
Copy link
Owner Author

lemon24 commented Feb 22, 2021

Time spent:

                hours
thing                
design            7.0
refactoring       2.5
implementation    3.0
tests             1.0
cli               2.5
documentation     1.0

17.0

The implementation bit includes the docstrings; the documentation bit only means the user guide and changelog.

The refactoring part should have taken only 1 hour, but I made things more complicated than they should have been: instead of removing NotModified completely from the start, I made it internal to Parser and wasted time handling it (81da965, dd7a1a3).

@lemon24 lemon24 closed this as completed Feb 22, 2021
lemon24 added a commit that referenced this issue Jul 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant