You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Doesnt seem that it'd be useful here, since we're already reading from a database (the firefox history database). Caching that info to another cachew database wouldn't make much sense.
Cant cache the live firefox history file because that keeps changing, so the only place cachew would improve any performance would be if we were spending a long time in merge_visits. But that doesnt even do any IO, its just a loop with a set, so doubtful.
For reference:
[ ~ ] $ time sh -c 'HPI_LOGS=debug python3 -c "from my.browsing import history; x = list(history())"'
[DEBUG 2020-09-05 03:07:21,267 my.browsing __init__.py:681] using inferred type <class 'ffexport.model.Visit'>
[D 200905 03:07:21 save_hist:66] backing up /home/sean/.mozilla/firefox/lsinsptf.dev-edition-default/places.sqlite to /tmp/tmpxvxci5yl/places-20200905100721.sqlite
[D 200905 03:07:21 save_hist:70] done!
[D 200905 03:07:21 merge_db:48] merging information from 2 databases...
[DEBUG 2020-09-05 03:07:21,303 my.browsing __init__.py:728] using /tmp/browser-cachw/homeseandatafirefoxdbsplaces-20200828223058.sqlite for db cache
[DEBUG 2020-09-05 03:07:21,303 my.browsing __init__.py:734] new hash: cachew: 0.7.0, schema: {'url': <class 'str'>, 'visit_date': <class 'datetime.datetime'>, 'visit_type': <class 'int'>, 'title': typing.Union[str, NoneType], 'description': typing.Union[str, NoneType], 'preview_image': typing.Union[str, NoneType]}, hash: 1598653858
[DEBUG 2020-09-05 03:07:21,310 my.browsing __init__.py:761] old hash: cachew: 0.7.0, schema: {'url': <class 'str'>, 'visit_date': <class 'datetime.datetime'>, 'visit_type': <class 'int'>, 'title': typing.Union[str, NoneType], 'description': typing.Union[str, NoneType], 'preview_image': typing.Union[str, NoneType]}, hash: 1598653858
[DEBUG 2020-09-05 03:07:21,310 my.browsing __init__.py:764] hash matched: loading from cache
[DEBUG 2020-09-05 03:07:22,083 my.browsing __init__.py:728] using /tmp/browser-cachw/tmptmpxvxci5ylplaces-20200905100721.sqlite for db cache
[DEBUG 2020-09-05 03:07:22,083 my.browsing __init__.py:734] new hash: cachew: 0.7.0, schema: {'url': <class 'str'>, 'visit_date': <class 'datetime.datetime'>, 'visit_type': <class 'int'>, 'title': typing.Union[str, NoneType], 'description': typing.Union[str, NoneType], 'preview_image': typing.Union[str, NoneType]}, hash: 1599300441
[DEBUG 2020-09-05 03:07:22,085 my.browsing __init__.py:761] old hash: None
[DEBUG 2020-09-05 03:07:22,085 my.browsing __init__.py:770] hash mismatch: computing data and writing to db
[D 200905 03:07:22 parse_db:69] Parsing visits from /tmp/tmpxvxci5yl/places-20200905100721.sqlite...
[D 200905 03:07:22 parse_db:88] Parsing sitedata from /tmp/tmpxvxci5yl/places-20200905100721.sqlite...
[D 200905 03:07:28 merge_db:60] Summary: removed 91,787 duplicates...
[D 200905 03:07:28 merge_db:61] Summary: returning 98,609 visit entries...
sh -c 7.46s user 0.19s system 99% cpu 7.711 total
[ ~ ] $ time sh -c 'HPI_LOGS=debug python3 -c "from my.browsing import history; x = list(history())"'
[D 200905 03:07:48 save_hist:66] backing up /home/sean/.mozilla/firefox/lsinsptf.dev-edition-default/places.sqlite to /tmp/tmpsvri7hr8/places-20200905100748.sqlite
[D 200905 03:07:48 save_hist:70] done!
[D 200905 03:07:48 merge_db:48] merging information from 2 databases...
[D 200905 03:07:48 parse_db:69] Parsing visits from /home/sean/data/firefox/dbs/places-20200828223058.sqlite...
[D 200905 03:07:48 parse_db:88] Parsing sitedata from /home/sean/data/firefox/dbs/places-20200828223058.sqlite...
[D 200905 03:07:49 parse_db:69] Parsing visits from /tmp/tmpsvri7hr8/places-20200905100748.sqlite...
[D 200905 03:07:49 parse_db:88] Parsing sitedata from /tmp/tmpsvri7hr8/places-20200905100748.sqlite...
[D 200905 03:07:50 merge_db:60] Summary: removed 91,787 duplicates...
[D 200905 03:07:50 merge_db:61] Summary: returning 98,609 visit entries...
sh -c 1.65s user 0.10s system 99% cpu 1.759 total
First run is 7 seconds, with a cached cachew hit for the backed up database. Second is reading from both of them directly, which takes 1.6 seconds.
For reference, this is how I modified my.browsing from HPI
diff --git a/my/browsing.py b/my/browsing.py
index 9f44322..af66530 100644
--- a/my/browsing.py
+++ b/my/browsing.py
@@ -25,17 +25,25 @@ import tempfile
from pathlib import Path
from typing import Iterator, Sequence
-from .core.common import listify, get_files
+from .core.common import listify, get_files, mcachew
+from .kython.klogging import LazyLogger, mklevel
# monkey patch ffexport logs
if "HPI_LOGS" in os.environ:
- from .kython.klogging import mklevel
os.environ["FFEXPORT_LOGS"] = str(mklevel(os.environ["HPI_LOGS"]))
+logger = LazyLogger(__name__, level="info")
-from ffexport import read_and_merge, Visit
+CACHEW_PATH = "/tmp/browser-cachw"
+
+# create cache path
+os.makedirs(CACHEW_PATH, exist_ok=True)
+
+from ffexport import Visit
from ffexport.save_hist import backup_history
+from ffexport.parse_db import read_visits
+from ffexport.merge_db import merge_visits
@listify
def inputs() -> Sequence[Path]:
@@ -60,7 +68,20 @@ def history(from_paths=inputs) -> Results:
import my.browsing
visits = list(my.browsing.history())
"""
- yield from read_and_merge(*from_paths())
+ # only load items that are in the config.export path using cachew
+ # the 'live_file' is always going to be uncached
+ db_paths = list(from_paths())
+ tmp_path = db_paths.pop()
+ yield from merge_visits(*map(_read_history, db_paths), _read_history(tmp_path))
+
+
+def _browser_mtime(p: Path) -> int:
+ return int(p.stat().st_mtime)
+
+@mcachew(hashf=_browser_mtime, logger=logger, cache_path=lambda db_path: f"{CACHEW_PATH}/{str(db_path).replace('/','')}")
+def _read_history(db: Path) -> Iterator[Visit]:
+ yield from read_visits(db)
+
def stats():
from .core import stat
The text was updated successfully, but these errors were encountered:
After using this for a while, can say with a fair amount of confidence that it'd end up being slower. Better to read/merge from the db's themselves.
If you're doing this all the time, I think it'd be most efficient to periodically read it in using read_and_merge, dump to pickle and load that back into memory whenever you need it.
Doesnt seem that it'd be useful here, since we're already reading from a database (the firefox history database). Caching that info to another cachew database wouldn't make much sense.
Cant cache the live firefox history file because that keeps changing, so the only place cachew would improve any performance would be if we were spending a long time in merge_visits. But that doesnt even do any IO, its just a loop with a set, so doubtful.
For reference:
First run is 7 seconds, with a cached cachew hit for the backed up database. Second is reading from both of them directly, which takes 1.6 seconds.
For reference, this is how I modified
my.browsing
from HPIThe text was updated successfully, but these errors were encountered: