Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bukudb] Various improvements #660

Merged
merged 2 commits into from Jan 27, 2023
Merged

Conversation

LeXofLeviafan
Copy link
Collaborator

@LeXofLeviafan LeXofLeviafan commented Jan 7, 2023

fixes #648:

  • implementing general-purpose private methods in BukuDb for fetching all/one bookmark (the latter returns None as empty result)
  • changing BukuDb code to use these fetch methods where applicable, replacing -1 result with None when an ID is returned
  • changing library, server & test code to handle the None results

fixes #654:

  • changing BukuDb methods returning a list to always return a list (instead of returning None occassionally)
  • simplifying the handling of these results

fixes #652:

  • replacing BookmarkVar definition with a typed namedtuple
  • converting bookmark values from the database into BookmarkVars
  • replacing tuple index access via magic numbers with named field access for BookmarkVars

fixes #653:

  • removing Optional[] type hint where it's unnecessary
  • removing the word "optional" from the docs for arguments that don't have an Optional[] type hint

fixes #664 (again):

  • fixing sys_platform dependency condition to match the possible values set
  • moving readline import into main() (to avoid loading it when buku is used as a library)

also:

The changes are split in two commits (return values & refactoring) for better viewing experience.

buku Show resolved Hide resolved
buku Show resolved Hide resolved
@LeXofLeviafan
Copy link
Collaborator Author

…I'll re-push these commits with rebase master applied once I make sure the tests actually pass.

@LeXofLeviafan
Copy link
Collaborator Author

…Alright, I'm confused: it says there's still some changes requested, but I only see two threads and both are marked as "resolved" 🤔

@jarun
Copy link
Owner

jarun commented Jan 9, 2023

I will need time to review this.

@@ -553,35 +567,33 @@ class BukuDb:
Returns
-------
int
DB index, or -1 if URL not found in DB.
DB index, or None if URL not found in DB.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Index is of integral type, what was the problem with returning -1?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was discussed in #648. -1 being a number is not a good reason to use it as a substitude for None in a high-level language.

  • Python is not C so None is always a valid return value
  • None is basically meant for this kind of purpose
  • the row numbers start with 1 so returning a falsey value on failure would make the success checks more natural (i.e. if buku.add_rec(…))
  • even the explicit value comparison becomes more transparent (is None instead of == -1).

Incidentally, you approved this change in that discussion.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you have sent out so many notes that I thought of seeing the actual changes. No, -1 is fine with integral types. Please revert these changes.


def get_max_id(self) -> int:
"""Fetch the ID of the last record.

Returns
-------
int
ID of the record if any record exists, else -1.
ID of the record if any record exists, else None.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Why should max id be None?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same answer as well – when there's no value to return, None is what should be used instead of an arbitrary number.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1 is not arbitrary. It's documented when it is being returned. Please revert.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just because it's documented doesn't make it any less arbitrary. -1 is a number and it means a negative amount or index (which isn't what it's being used for), None is a value meant specifically to represent a lack of a thing.

buku Outdated
immutable: Optional[int] = 0,
delay_commit: Optional[bool] = False,
fetch: Optional[bool] = True) -> int:
immutable: int = FLAG_NONE,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be bool?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can and really should in my opinion, but you shot down this suggestion back when I made it 😅

Copy link
Owner

@jarun jarun Jan 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this bool.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are pointing repeatedly to the fact that you have shared this and I responded some other way - this is a piece of code I wrote years back. I have forgotten the logic and python by now. Nor do I care much about cosmetic changes in a bookmarking utility anymore. I wanted to see all the actual changes in one place to decide what to use and what not to.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hold on. Keep this int. The idea was to keep space for more flags.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

…As I mentioned before, my original idea was to change it to bool in the API (as it makes more sense to be used this way), but to keep it a bitmask in the DB (to avoid data migration).

@@ -95,6 +95,8 @@ COLORMAP = {k: '\x1b[%sm' % v for k, v in {
'x': '0', 'X': '1', 'y': '7', 'Y': '7;1', 'z': '2',
}.items()}

[FLAG_NONE, FLAG_IMMUTABLE] = [0x00, 0x01]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why these 2 have been linked together. How are they related?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if they are just 2 values for immutable, why do we need them instead of True and False?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're bitmasks which are used as magic numbers in the current master (both in API and in DB). I simply provided explicit names for them here. And yes, they're directly related – they're values assigned to the flags field in DB (I imagine it can be migrated to a boolean field should you prefer it, though I'm not sure how viable this is, but my first attempt at this had indeed converted it to bool across the API).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. This is fine. Just add a comment above that says - various DB flags.

Fetch page from web and parse for data

Returns
-------
int
DB index of new bookmark on success, -1 on failure.
DB index of new bookmark on success, None on failure.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. I prefer to keep the integral values integral. Also, this breaks the caller side for third-party utilities using this API.

@@ -1137,7 +1149,7 @@ class BukuDb:
if index == -1:
# Edit the last records
index = self.get_max_id()
if index == -1:
if not index:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On reversal, this becomes -1 again.

buku Outdated
@@ -1148,14 +1160,13 @@ class BukuDb:

# If reading from DB, show empty title and desc as empty lines. We have to convert because
# even in case of add with a blank title or desc, '' is used as initializer to show '-'.
result = edit_rec(editor, rec[1], rec[2] if rec[2] != '' else None,
rec[3], rec[4] if rec[4] != '' else None)
result = edit_rec(editor, rec.url, rec.title or None, rec.tags, rec.desc or None)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please confirm empty rec.title fails the check.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list or None
List of search results, or None if no matches.
list
List of search results.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently third-party callers check for None. Will that work with this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per my original argumentation:

  • The claim "this function returns None when there's no matches" was not accurate in the first place: None was returned only in certain cases, outside of them the returned "no matches" result was [].
    (More specifically, the "certain cases" thing concerns specific checks in searchdb() and search_by_tag(); as for get_all_rec(), search_keywords_and_filter_by_tags() and exclude_results_from_search() – they always [1] [2] [3] returned a list(), the latter two explicitly and the former by docs of the returned call.)
  • Both [] and None are falsey, idiomatically for Python – checking specifically for is None makes little here when any conditional can simply accept the value itself without losing out on correctness (rather, doing a regular falsiness check is more correct, because [] is returned by all of these functions).
  • Returning [] instead of None makes processing of such results simpler, when there's a need to do something more complex than an emptiness check (like how it's done in search_keywords_and_filter_by_tags(), which previously needed workarounds for the None value).

(…Somehow I keep getting the feeling that this code was written in C originally then translated into Python directly without accounting for the language differences. 😅)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once again, if the caller checks for None as in the existing API, will this work or break? Does the caller need to change? That's my greatest concern.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is, the existing API does not return values as described in docs.

If the caller checks specifically for is None (as opposed to the idiomatic falsiness check that makes more sense when dealing with functions returning a list) in searchdb() or search_by_tag(), then he's not checking for empty search results (which are returned as [] in the current version) but specifically for invalid input passed to the function. Since input validation isn't a part of the documented API, there's no problem in changing it out.

And if the caller checks for is None in get_all_rec(), search_keywords_and_filter_by_tags() or exclude_results_from_search(), his code will keep not working just as it had from the very beginning (because these functions never returned None in the first place).

True if record should not be committed to the DB,
leaving commit responsibility to caller. Default is False.
"""

# Return if the last index left in DB was just deleted
max_id = self.get_max_id()
if max_id == -1:
if not max_id:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With reversal to return -1 this also has to be reverted.

@@ -1784,7 +1788,7 @@ class BukuDb:
if not is_range and index < 0:
# Show the last n records
_id = self.get_max_id()
if _id == -1:
if not _id:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With reversal to return -1 this also has to be reverted.

@jarun
Copy link
Owner

jarun commented Jan 15, 2023

I am done with the review. I think the only difference of opinion here is whether to return -1 or None for indices.

My issues with None are:

  • the index is of integral type
  • thrid-party callers explicitly checking for -1 may break and we can't afford that in a years old project

@rachmadaniHaryono what's your opinion? Also, please review.

@LeXofLeviafan
Copy link
Collaborator Author

My issues with None are:

Since we're waiting for @rachmadaniHaryono to give an opinion, I'll leave a direct link to the original discussion which starts with my arguments for making it closer to idiomatic Python.

(…Also it just occurred to me that -1 is indeed an idiomatically valid index value in Python, which makes it even worse of a fit for "no index" value for this language.)

I am done with the review.

…I'm still confused about the immutable values: do you want to change them to bools in the API calls (but not in DB), or just to add a comment for the bitmask constants? Note that it doesn't make much sense to keep it an integer in the API since the argument name refers only to one bit of the bitmask (so if you end up adding another flag, it's going to need a separate argument to be used anyway).

@jarun
Copy link
Owner

jarun commented Jan 15, 2023

immutable values: do you want to change them to bools in the API calls (but not in DB)

This is fine.

@jarun
Copy link
Owner

jarun commented Jan 21, 2023

@rachmadaniHaryono we are waiting on your input for this.

@rachmadaniHaryono
Copy link
Collaborator

rachmadaniHaryono commented Jan 22, 2023

replacing BookmarkVar definition with a typed namedtuple

i thought dataclasses maybe better but looking at following SO link, it is maybe better to use namedtuple

https://stackoverflow.com/questions/51671699/data-classes-vs-typing-namedtuple-primary-use-cases

immutable on record

maybe add immutable property on BookmarkVar

class BookmarkVar(NamedTuple):
    ...
    @property
    def immutable(self)->bool:
        ...

[FLAG_NONE, FLAG_IMMUTABLE] = [0x00, 0x01]

i would rather do this because there is no other flag yet

FLAG_NONE = 0
FLAG_IMMUTABLE = 1

i see there is some change on type on docstring

i would rather remove it entirely and just have it on function type hint


big change

my comment on dec 15 2022

#648 (comment)

im undecided on this

the reasoning seem good but it may break other program that use buku

but after checking related buku project, not a lot of program use buku with python

also the next release should be major version, so maybe just go for it?

i havent check bukuserver yet

things changed

on my break i tried to write program to use buku as bookmark manager for my logseq pkm tool

and there is indeed problem especially when the documentation is only on source code

i will explain on next section

also the next release should be major version, so maybe just go for it?

after i think more about it, if we dont add it on the next major release we have to release it on another major release and i dont think it is better.

just put it on one big release

but before next release we should also add more documentation

https://www.youtube.com/watch?v=azf6yzuJt54

i do have some note on example on using buku for developer and will expand on that


for my program i need to get buku record on url and there is no direct function for it

before that i know that to get rec_id, so naively i think this should work

>>> import buku
>>> bdb = buku.BukuDb()
rec = bdb.get_rec_by_id( bdb.add_rec('http://example.com') )

this is wrong when url is already on buku

correct solution should be

>>> url = 'https://example.com'
>>> rec_id = bdb.add_rec(url)
... if rec_id == -1:
...     rec_id = bdb.get_rec_id(url)
>>> bdb.get_rec_by_id(rec_id)

if operation return None, this could be easier

rec = bdb.get_rec_by_id(bdb.add_rec(url) or bdb.get_rec_id(url))

this also show that if a function require int parameter we could check for None value and raise error

other solution is to create function get_or_create_rec(url)

similar function

https://docs.djangoproject.com/en/4.2/ref/models/querysets/#get-or-create

@jarun
Copy link
Owner

jarun commented Jan 22, 2023

@LeXofLeviafan please proceed as suggested by @rachmadaniHaryono.

@jarun
Copy link
Owner

jarun commented Jan 22, 2023

@rachmadaniHaryono can you please list all the changes made in the bukuserver side since the last release?

@rachmadaniHaryono
Copy link
Collaborator

rachmadaniHaryono commented Jan 22, 2023

wait, @jarun so we can change it to return None?

if not clear, i agree with @LeXofLeviafan to change return value from -1 to None

e:

list all the changes made in the bukuserver side since the last release

i will add that as task on #657

@jarun
Copy link
Owner

jarun commented Jan 22, 2023

Yes, it's OK to change it to -1, I lost 2:1. ;)

Please add the list to #484 under Cooking.

@rachmadaniHaryono
Copy link
Collaborator

another thing for @LeXofLeviafan

there is https://docs.python.org/3/library/collections.html#collections.somenamedtuple._asdict which could simplify namedtuple to dict on some function

@LeXofLeviafan
Copy link
Collaborator Author

[FLAG_NONE, FLAG_IMMUTABLE] = [0x00, 0x01]

i would rather do this because there is no other flag yet

FLAG_NONE = 0
FLAG_IMMUTABLE = 1

The reason I'm using hexadecimal notation is specifically to make it clear that it's a bitmask (i.e. the decimal values don't actually matter, only enabled bits are); not to mention this notation also makes it clear which bits are enabled in the bitmask.

there is https://docs.python.org/3/library/collections.html#collections.somenamedtuple._asdict which could simplify namedtuple to dict on some function

That is indeed a convenient method… though it's not actually that much useful in this case, seeing as BookmarkVars contain raw DB data instead of the data format used in JSON export & bukuserver API (i.e. tags are a string wrapped in commas, and immutable is hidden within flags… though the latter isn't exported by buku, it'd be present in _asdict anyway).

maybe add immutable property on BookmarkVar

…I'm suddenly getting an urge to rename tags to _tags, and create two properties: tags that matches _tags[1:-1] (which is used way too often in the source, let's be honest), and tagslist that matches [x for x in bookmark.tags.split(',') if x] (this one is less prominent but certainly is convenient to have).

@LeXofLeviafan
Copy link
Collaborator Author

Also, @jarun – there was a discussion in #665 regarding the import of readline/pyreadline3 (which affects I/O properties of the program, and therefore should not be done when buku is used as a library – as opposed to running it as a CLI program).

I suggested moving the import to the beginning of main(), and @rachmadaniHaryono suggested that I add the change to this pull request. Do you have any problem with that?

@LeXofLeviafan
Copy link
Collaborator Author

Note: did a rebase.

@LeXofLeviafan
Copy link
Collaborator Author

i see there is some change on type on docstring

i would rather remove it entirely and just have it on function type hint

That makes sense, I suppose (type hints are included in the function docs – at least the ones produced by help()), but the types are listed in docstrings pretty much everywhere so they should probably be changed everywhere; I'd rather leave them alone for now, and maybe refactor the docstrings on a separate pull request or something. …Also, @jarun – your opinion on that?

@jarun
Copy link
Owner

jarun commented Jan 22, 2023

I suggested moving the import to the beginning of main(), and @rachmadaniHaryono suggested that I add the change to this pull request. Do you have any problem with that?

This is fine.

Also, @jarun – your opinion on that?

You can leave it for now.

BTW, once this is in, can we think about some good features? One long-standing request has been to add urls to the wayback machine but I didn't find any simple way to do that without adding tons of deps.

@jarun
Copy link
Owner

jarun commented Jan 22, 2023

@LeXofLeviafan please document the API changes in #484 so that it is available for third-party utilities.

@LeXofLeviafan
Copy link
Collaborator Author

One long-standing request has been to add urls to the wayback machine but I didn't find any simple way to do that without adding tons of deps.

This one?

Browse a bookmark (possibly dead URL) on Wayback machine

It seems to be marked as done, for some reason (or what does checked checkbox mean in the TODO list?) 😅

As for "how", I'm guessing querying their API is an option. (…Incidentally, this page also mentions a "Memento API", which appears to be intended for other sites with similar purpose; maybe this should be looked into as well/instead)

@jarun
Copy link
Owner

jarun commented Jan 22, 2023

Looking up on wayback machine is available. Not adding to it. I was looking for a solution with minimum deps.

@LeXofLeviafan
Copy link
Collaborator Author

…Alright, this seems to be all (except for possibly CLI changes for --immutable in case they're accepted as well).

And yes, the field filtering code is rather unintuitive, so I refactored it too.

This was added to the "also" list:

  • moved readline import into main() (to avoid loading it when buku is used as a library)
  • fixed sys_platform dependency condition to match the possible values set
  • changed field filtering implementation to declarative + updated JSON generation to adhere to --format description

P.S. Also updated the branch with bukuserver functional tests – should be ready for a pull-request after this one is merged.

@LeXofLeviafan
Copy link
Collaborator Author

LeXofLeviafan commented Jan 23, 2023

Edit: fix for #666

@LeXofLeviafan LeXofLeviafan force-pushed the lib-refactor branch 2 times, most recently from 1f9cbfd to 87a4d5b Compare January 23, 2023 21:05
@LeXofLeviafan
Copy link
Collaborator Author

LeXofLeviafan commented Jan 23, 2023

Edit: fixed flask-reverse-proxy-fix version in bukuserver/requirements.txt

@LeXofLeviafan
Copy link
Collaborator Author

…On second thought, let's do those in a separate pull-request 😅

@jarun
Copy link
Owner

jarun commented Jan 23, 2023

@rachmadaniHaryono please confirm if this is ready to merge.

buku Outdated Show resolved Hide resolved
@rachmadaniHaryono
Copy link
Collaborator

rachmadaniHaryono commented Jan 26, 2023

i have seen this several time

{
'id': bookmark.id,
'url': bookmark.url,
'title': bookmark.title,
'tags': bookmark.taglist,
'description': bookmark.desc,
}

maybe extend _asdict so it can be done in single function

for example

# from
res = jsonify({
  'url': bookmark.url,
  'title': bookmark.title,
  'tags': bookmark.taglist,
  'description': bookmark.desc,
})
# to
res = jsonify({bookmark._asdict())

@LeXofLeviafan
Copy link
Collaborator Author

…Like I said, that's not what _asdict() will produce on named-tuples of this type. The result will have keys id, url, title, tags_raw, desc and flags. Even if you don't care about raw DB fields being included, the properties will not be in it. Also, the keys for some fields in bukuserver API differ from keys in JSON generated by buku; so the best we can do here is make a separate function for generating a bukuserver API entity out of a BookmarkVar (in bukuserver/api.py).

@LeXofLeviafan
Copy link
Collaborator Author

Note: did a rebase, fixed typo, minor refactoring.

@LeXofLeviafan
Copy link
Collaborator Author

Smh the local check missed unused imports 😅

@rachmadaniHaryono
Copy link
Collaborator

…Like I said, that's not what _asdict() will produce on named-tuples of this type. The result will have keys id, url, title, tags_raw, desc and flags. Even if you don't care about raw DB fields being included, the properties will not be in it. Also, the keys for some fields in bukuserver API differ from keys in JSON generated by buku; so the best we can do here is make a separate function for generating a bukuserver API entity out of a BookmarkVar (in bukuserver/api.py).

i thought _asdict could be simpler like this

def _asdict(self):
    return {k: v for k, v in self._asdict().items() for k in ('url', 'title', ...)}

but i do like entity function implementation better


this one is optional. is it ok to use id as parameter? it is not reserved word but it is built in function

https://stackoverflow.com/a/6350862

i will approve it for now

@LeXofLeviafan
Copy link
Collaborator Author

this one is optional. is it ok to use id as parameter? it is not reserved word but it is built in function

Well it's used that way already in the code, both as a parameter and as a local variable. Anyway, there's no usecases for id() in the codebase, so overriding it locally shouldn't cause any issues.

@jarun jarun merged commit 4267909 into jarun:master Jan 27, 2023
@jarun
Copy link
Owner

jarun commented Jan 27, 2023

Thank you!

@thetechnodino
Copy link

Thanks to the help I received from you guys, I was able to get the software installed and working.

Thanks

John

@github-actions github-actions bot locked and limited conversation to collaborators Mar 2, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
4 participants