Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new source for Viber Desktop local sqlite db #204

Merged
merged 21 commits into from
Feb 27, 2021

Conversation

ankostis
Copy link
Contributor

Add new source for Viber Desktop local sqlite db

Viber Desktop (at least in linux that i checked) stores its messages in a local SQLite file
~/.ViberPC/{user-phone-number}/viber.db and it contains everything needed to devise Visits stream,
as shown in the the sample below:
image

The indexer collects:

  • chat-name
  • sender
  • chat text
  • title extracted from any previewed url
  • chat-tags
  • any URLs on the chat-text

The only one missing is a working Locator.href, which i couldn't think of anything other than:

sqlite:///home/username/.ViberPC/123456789/viber.db?immutable=1#!Messages.EventId=3445` 

It is configured with:

SOURCES = [
    Source(viber_desktop.index, "~/.ViberPC/your-phone-number/viber.db"),
]
  • DOCs.have been updated;
  • Sample config has been updated.

bc cannot control src-name otherwise.
try:
yield from _handle_row(row)
except Exception as ex:
logger.warning(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replying to your comment here

Basically the options people usually use are:

  1. throw immediately (as you suggested in the comment)
    This is problematic because then on any unexpected situation (like 1/10000 messages parsed incorrectly), the program will break. Considering we have to deal with closed protocols, formats change without warning, and it's annoying when your software breaks until you fix it, so we need to be more flexible and defensive here. In addition, imagine if one data source breaks the whole program -- sometimes it's justified (e.g. if you were working with important medical data or something like that), but not in case of this tool.

  2. log and continue (as you're doing here)
    It works nice in terms of not breaking immediately. However then there is another danger -- imagine that for example Viber changes format for 'time' column to store the isoformatted string (instead of epoch), for whatever reason. Then promnesia indexer will carry on working (because of exception catching) -- it's good. However, you won't notice the error unless you regularly reviewing the logs, or maybe when you know for sure that Promnesia should fire but it doesn't.

So in Promnesia (and related projects that process data, like HPI) I'm doing something intermediate.

  • catch the exceptions (where it makes sense, e.g. parsing)
  • possibly log it if necessary
  • but then yield it to the parent function, as a normal value (there is even an alias for Result )

This allows the top level handler to exit with code 1 (so you get a cron email/systemd job failure etc and notice the error), while still doing what's the program supposed to do (process as much as it can and insert in the database).

As more examples why it's useful in Promnesia, errors are also inserted in the database:

  • this makes it easier to test in some cases (instead of intercepting logs, just check the database)
  • later it will allow to overview the errors form the browser which would be much more convenient for most users than messing with systemd logs or whatnote

I write more about it here, you might enjoy it!

Copy link
Owner

@karlicoss karlicoss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, thanks so much, especially for figuring out the query!
Looks good, there are some inor comments (and some optional, so up to you if you want to fix them, otherwise I can clean up sometime later).

I don't have Viber so can't test it (to ensure it doesn't break later). Do you know by any chance, if there some public viber database somewhere on github, so it would be possible to run some basic tests against it? Otherwise not a big problem, hopefully the most potentially fragile part (query) won't need to change at all for a while.

src/promnesia/sources/viber_desktop.py Outdated Show resolved Hide resolved
src/promnesia/sources/viber_desktop.py Outdated Show resolved Hide resolved
src/promnesia/sources/viber_desktop.py Outdated Show resolved Hide resolved
src/promnesia/sources/viber_desktop.py Outdated Show resolved Hide resolved
src/promnesia/sources/viber_desktop.py Outdated Show resolved Hide resolved
src/promnesia/sources/viber_desktop.py Outdated Show resolved Hide resolved
src/promnesia/sources/viber_desktop.py Outdated Show resolved Hide resolved
doc/SOURCES.org Outdated Show resolved Hide resolved
Copy link
Contributor Author

@ankostis ankostis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know by any chance, if there some public viber database somewhere on github

No.
I could empty mine quasi-completely, to keep just some for running the query, and upload it here or the creation sql-scripts.
Even better if i could upload the DDL+insert statements it in the sources, to create test-case in the sources.

Besides this requiring some time-investment from my side, it leads to another question:
maybe se shpould move the query part of the code in HPI, no?

src/promnesia/sources/viber_desktop.py Outdated Show resolved Hide resolved
src/promnesia/sources/viber_desktop.py Outdated Show resolved Hide resolved
src/promnesia/sources/viber_desktop.py Outdated Show resolved Hide resolved
src/promnesia/sources/viber_desktop.py Outdated Show resolved Hide resolved
@karlicoss
Copy link
Owner

Posting as a separate comment so it's not buried in the code comments; might be useful for someone else later.

So, neither symlink/hardlink for .db -> .sqlite file worked for me so I've gone down the rabbithole trying to debug it...
I checked mime types, and seems all good (same for .sqlite extension)

$ file karlicoss.db 
karlicoss.db: SQLite 3.x database, last written using SQLite version 3027002
$ mimetype karlicoss.db
karlicoss.db: application/vnd.sqlite3
$ mimeopen karlicoss.db 
Opening "karlicoss.db" with DB Browser for SQLite  (application/vnd.sqlite3

Ok, but MIME not only examines the extension, but also looks at the 'magic bytes' in the file (that would explain how it can guess '.db'). But browser can only guess by the extension type, it can't look in the file before downloading... let's check how the extension guessing works...

$ mimetype does-not-exist.pdf
does-not-exist.pdf: application/pdf # OK 
$ mimetype does-not-exist.sqlite
does-not-exist.sqlite: 
$ mimetype does-not-exist.db
does-not-exist.db:

Aha! So the system doesn't actually know .sqlite (let alone .db) extensions.
So what can we do?

Some search https://stackoverflow.com/a/31836/706389 suggests that you need to prepare an xml and install it with xdg-mime (at least if you're on Linux).
But wonder what are the default sqlite file extensions it expects?

$ rg -s 'pattern.*sqlite' /usr/share/mime                                                                                                                                                                                                  20:35:17
/usr/share/mime/application/vnd.sqlite3.xml
54:  <glob pattern="*.sqlite3"/>

/usr/share/mime/application/x-sqlite2.xml
54:  <glob pattern="*.sqlite2"/>

/usr/share/mime/packages/freedesktop.org.xml
2105:    <glob pattern="*.sqlite2"/>
2161:    <glob pattern="*.sqlite3"/>

So here we go -- apparently it knows .sqlite2/.sqlite3, but not .sqlite.

Ok, let's prepare the xml...

$ cat sqlite-mime.xml
<?xml version="1.0"?>
<mime-info xmlns='http://www.freedesktop.org/standards/shared-mime-info'>
  <mime-type type="application/sqlite-extra">
    <comment>Extra sqlite extensions</comment>
    <glob pattern="*.db"/>
    <glob pattern="*.sqlite"/>
  </mime-type>
</mime-info>

$ xdg-mime install sqlite-mime.xml && update-desktop-database

And...

$ mimetype does-not-exist.db
does-not-exist.db: application/sqlite-extra

Win! Now we only need to associate sqlitebrowser with it:

$ xdg-mime default sqlitebrowser.desktop application/sqlite-extra

And after that Firefox finally offers to open the database!

@karlicoss
Copy link
Owner

Also while debugging the above, found another way, posting it here since might be useful somewhere else later...

Found some help on changing the application handler in Firefox, but unclear how to add a new format: https://support.mozilla.org/en-US/kb/applications-panel-set-how-firefox-handles-files

Following advice from here: https://support.mozilla.org/en-US/questions/1235051#answer-1158433

$ ~/.mozilla/firefox/$PROFILEDIR/handlers.json

"mimeTypes": {
    "application/pdf": {
        "action": 3,
        "extensions": [
            "pdf"
        ]
    },
    "application/vnd.sqlite3": {
        "action": 4,
        "extensions": [
            "sqlite",
            "db"
        ],
        "handlers": [
            {
                "name": "sqlitebrowser",
                "path": "/usr/bin/sqlitebrowser"
            }
        ]
    }

Explanation for action field from here: https://support.mozilla.org/fr/questions/1262664#answer-1232236

saveToDisk = 0;
alwaysAsk = 1;
useHelperApp = 2;
handleInternally = 3;
useSystemDefault = 4;

After reloading Firefox this way also works. But it's a bit sad the functionality isn't exposed to the GUI :(

Also, neither of the ways work in Chrome -- it seems to have a mind of its own, and quick search seems to suggest there is no way without custom extensions 🤮

@karlicoss
Copy link
Owner

No. I could empty mine quasi-completely, to keep just some for running the query, and upload it here or the creation sql-scripts. Even better if i could upload the DDL+insert statements it in the sources, to create test-case in the sources.

Up to you -- don't expect anyone to break it, and I don't mind if we add tests for this later. I know how tedious and time consuming this 'database emptying' can be.
For uploading, I usually use submodules (e.g. https://github.com/karlicoss/HPI/blob/ad924ebca84b0846c98401b10e61488c45cf1e9c/.gitmodules#L1-L3) it keep the main repository lean (since database binaries are only needed in tests anyway).

Besides this requiring some time-investment from my side, it leads to another question: maybe se shpould move the query part of the code in HPI, no?

Yep, could do, e.g. iterator over messages could be useful, and then it would be imported in Promnesia for URL extraction.
But also similar -- I'm happy to merge it here, and later you can move it over to HPI -- up to you!

@karlicoss
Copy link
Owner

Looks good to me! 🎉 Let me know what you think about the hardlink thing, and after that I'm happy to merge if you are.

@ankostis
Copy link
Contributor Author

ankostis commented Feb 25, 2021

Let me know what you think about the hardlink thing, and after that I'm happy to merge if you are.

I will remove the had-link (your xdg-handler fits by purpose), and then you can merge it.
Tests will come later (submodules is the only thing in git that i'm not fluent).

ankostis added a commit to ankostis/promnesia that referenced this pull request Feb 26, 2021
prefer @karlicoss teaching firefox to recognize `.db` mime-type:
karlicoss#204 (comment)
prefer @karlicoss's teaching firefox to recognize `.db` mime-type:
karlicoss#204 (comment)
@ankostis
Copy link
Contributor Author

Removed hard-link hack in 264f543,
Read to be merged.

Tell me when to split it in HPI, better in 2 separate PRs, no?

@karlicoss
Copy link
Owner

Thank you! 🎉 merged

For two PRs -- you mean one would be to HPI (e.g. my.viber.local module), another to Promnesia to actually switch to use my.viber.local data? Yeah that makes sense

ankostis added a commit to ankostis/Garden that referenced this pull request Feb 27, 2021
@ankostis
Copy link
Contributor Author

Ok, let's prepare the xml...

Many thanks!

Would it make sense to augment karlicoss/open-in-editor to install those handlers you crafted in those 2 messages?

ankostis added a commit to ankostis/Garden that referenced this pull request Feb 27, 2021
@karlicoss
Copy link
Owner

Not sure about adding this to open-in-editor.. I guess it's somewhat unrelated.
I guess I won't mind it as a separate script in promnesia/script or open-in-editor repository perhaps? And maybe later it will be clearer where all this stuff actually belongs?

@ankostis
Copy link
Contributor Author

Will you add it?
It's really usefull since in just the mimetype registration made my gnome offer me a dialog-choice for which app to use to view aqlite files, and after choosing sqlitebrowser it works thereafter.

@karlicoss
Copy link
Owner

Ah, great to know that even mimetype is enough! Yep, will add later.

ankostis added a commit to ankostis/Garden that referenced this pull request Mar 2, 2021
ankostis added a commit to ankostis/Garden that referenced this pull request Mar 3, 2021
karlicoss added a commit that referenced this pull request Mar 5, 2021
also fix missed mypy issues after #204
karlicoss added a commit that referenced this pull request Mar 5, 2021
also fix missed mypy issues after #204
karlicoss added a commit that referenced this pull request Mar 5, 2021
also fix missed mypy issues after #204
@ankostis ankostis deleted the viber-src branch March 30, 2021 22:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants