refactor databases #1025

jackrobison · 2017-12-12T15:28:35Z

No description provided.

lyoshenka · 2017-12-13T14:33:26Z

@jackrobison who needs to review this? i glanced at it briefly and i still think more columns should be not null

jackrobison · 2017-12-13T14:55:49Z

@lyoshenka please check out fd0a7c3 , I rebased in some changes

lyoshenka · 2017-12-15T14:55:26Z

lbrynet/database/migrator/migrate5to6.py

+
+            create table stream_blob (
+                stream_hash char(96) not null,
+                blob_hash char(96),


my note disappeared somehow. blob_hash should be not null. a stream_blob's purpose is to connect a stream and a blob. if there's no blob, there's no stream_blob

@jackrobison this still applies

this is to handle stream terminators, which don't have a blob hash but do have an iv, position, and length.

kauffj · 2017-12-19T15:14:17Z

I think file_list should probably retain all of: outpoint, claim_id, channel_claim_id (doesn't exist as first class field currently and is instead currently buried 4 fields deep in an inconsistent way), and metadata (but this could presumably be what is currently file.metadata.stream.metadata).

I don't see how an application using LBRY can re-render to a user what they downloaded without these fields, or an efficient way to select them again.

lyoshenka · 2018-01-29T18:37:24Z

lbrynet/core/BlobManager.py


 log = logging.getLogger(__name__)


 class DiskBlobManager(DHTHashSupplier):
-    def __init__(self, hash_announcer, blob_dir, db_dir):
+    def __init__(self, hash_announcer, blob_dir, storage):


is it possible to require a class or interface for arguments? i feel like storage should be an interface with functions like "addBlobs", "getBlobs", etc. Then we can implement multiple types of storage (e.g. sqlite for production, in-memory for testing) and it's clear what the contract is.

A simple way of doing this now without too much refactoring is to create an interface that has all the functions that are already in use. Then we can pare it down later.

lyoshenka · 2018-01-29T18:39:33Z

lbrynet/database/migrator/migrate5to6.py

+
+            create table stream_blob (
+                stream_hash char(96) not null,
+                blob_hash char(96),


@jackrobison this still applies

eukreign

A bunch of my comments are structural but a lot of the other ones are code style, specifically where arguments go if they don't fit on the same line as the function call.

It would be nice to have a style guide for this.

Personally, I find putting arguments on a new line to be much more aesthetically pleasing and most of the time all of the arguments then fit on a single line, so it reads easier. With the "pan handle" approach (arguments line up with function call open parenthesis) you often end up having to overflow the arguments onto more lines, this issue compounds when the argument itself is long... prime case in this particular pull request is SQL statements which actually fit most of the time on a single line if you do the new-line-4-spaces-indent for the arguments.

So,

txid, nout = lbryfile_cursor.execute("select txid, n from lbry_file_metadata "
                                     "where lbry_file=?",
                                     (file_infos[stream_hash]['rowid'],)).fetchone()

vs.

txid, nout = lbryfile_cursor.execute(
    "select txid, n from lbry_file_metadata where lbry_file=?", (file_infos[stream_hash]['rowid'],)
).fetchone()

I think the second option above looks nicer and works well in more cases than the "pan handle" approach.

eukreign · 2018-01-29T17:49:01Z

lbrynet/core/StreamDescriptor.py

+        return d
+
+    d.addCallback(format_info)
+    return d


Could above be broken down into two functions to make it easier to test? Something like:

@defer.inlineCallbacks def format_sd_info(stream_info, include_blobs): fields = { 'stream_type': EncryptedFileStreamType, 'stream_name': stream_info[1], 'key': stream_info[0], 'suggested_file_name': stream_info[2], 'stream_hash': stream_hash } if include_blobs: blobs = yield storage.get_blobs_for_stream(stream_hash) formatted_blobs = [] for blob_info in blobs: blob = {} if blob_info.length != 0: blob['blob_hash'] = str(blob_info.blob_hash) blob['blob_num'] = blob_info.blob_num blob['iv'] = str(blob_info.iv) blob['length'] = blob_info.length formatted_blobs.append(blob) fields['blobs'] = formatted_blobs defer.returnValue(fields) @defer.inlineCallbacks def get_sd_info(storage, stream_hash, include_blobs): stream_info = yield storage.get_stream_info(stream_hash) formatted_info = yield format_sd_info(stream_info, include_blobs) defer.returnValue(formatted_info)

https://github.com/lbryio/lbry/blob/be57cc199f42c8f12b0d39f2c7eca81c36d1b2c4/lbrynet/core/StreamDescriptor.py#L271

eukreign · 2018-01-29T17:52:38Z

lbrynet/core/BlobManager.py

@@ -75,7 +63,7 @@ def _immediate_announce(self, blob_hashes):
    def blob_completed(self, blob, next_announce_time=None, should_announce=True):
        if next_announce_time is None:
            next_announce_time = self.get_next_announce_time()
-        yield self._add_completed_blob(blob.blob_hash, blob.length,
+        yield self.storage.add_completed_blob(blob.blob_hash, blob.length,
                                       next_announce_time, should_announce)


I guess we don't have a style guide for this yet but I've always personally preferred the arguments-on-new-line-indented-4-spaces whenever there are too many arguments to a function, this seems more consistent (indentation wise with the rest of the soruce) and looks nicer (less jagged and you can fit more arguments on a single line vs. trying to line it up with parenthesis):

yield self.storage.add_completed_blob( blob.blob_hash, blob.length, next_announce_time, should_announce )

eukreign · 2018-01-29T17:54:16Z

lbrynet/core/BlobManager.py


    def set_should_announce(self, blob_hash, should_announce):
        if blob_hash in self.blobs:
            blob = self.blobs[blob_hash]
            if blob.get_is_verified():
-                return self._set_should_announce(blob_hash,
+                return self.storage.set_should_announce(blob_hash,
                                                 self.get_next_announce_time(),
                                                 should_announce)


Same comment as before about putting all arguments on a new line and indented 4 spaces.

return self.storage.set_should_announce( blob_hash, self.get_next_announce_time(), should_announce )

eukreign · 2018-01-29T17:56:58Z

lbrynet/core/Session.py

@@ -300,7 +302,7 @@ def _setup_other_components(self):
            else:
                self.blob_manager = DiskBlobManager(self.hash_announcer,
                                                    self.blob_dir,
-                                                    self.db_dir)
+                                                    self.storage)


Same comment as before about all arguments on single new line below function call indented 4 spaces.

eukreign · 2018-01-29T17:58:56Z

lbrynet/core/Session.py

+        d = self.storage.setup()
+        d.addCallback(lambda _: self.wallet.start())
+        d.addCallback(lambda _: self.blob_tracker.start())
+        return d


This change doesn't have negative performance side affect? Seems like before the manager setup and wallet start up happened in parallel and now they are serial. Just making sure I'm understanding this change correctly.

It shouldn't - setup for storage is just creating the sqlite tables.

eukreign · 2018-01-29T18:54:59Z

lbrynet/database/storage.py

+    def get_sd_blob_hash_for_stream(self, stream_hash):
+        d = self.db.runQuery("select sd_hash from stream where stream_hash=?", (stream_hash,))
+        d.addCallback(lambda r: None if not r else r[0][0])
+        return d


Use _get_one_or_none.

eukreign · 2018-01-29T18:55:04Z

lbrynet/database/storage.py

+    def get_stream_hash_for_sd_hash(self, sd_blob_hash):
+        d = self.db.runQuery("select stream_hash from stream where sd_hash = ?", (sd_blob_hash, ))
+        d.addCallback(lambda r: None if not r else r[0][0])
+        return d


Use _get_one_or_none.

eukreign · 2018-01-29T18:55:39Z

lbrynet/database/storage.py

+    def get_lbry_file_status(self, rowid):
+        d = self.db.runQuery("select status from file where rowid = ?", (rowid,))
+        d.addCallback(lambda r: (r[0][0] if len(r) else None))
+        return d


Use _get_one_or_none.

eukreign · 2018-01-29T18:55:44Z

lbrynet/database/storage.py

+    def get_rowid_for_stream_hash(self, stream_hash):
+        d = self.db.runQuery("select rowid from file where stream_hash=?", (stream_hash, ))
+        d.addCallback(lambda r: (r[0][0] if len(r) else None))
+        return d


Use _get_one_or_none.

eukreign · 2018-01-29T19:07:16Z

lbrynet/file_manager/EncryptedFileCreator.py

+            blob['length'] = blob_info.length
+            formatted_blobs.append(blob)
+        sd_blob['blobs'] = formatted_blobs
+        return defer.succeed(sd_blob)


I'm having a bit of déjà vu here, is this function defined already somewhere else? lbrynet/core/StreamDescriptor.py They may do something slightly different but the function names are identical and they appear to be very similar internally.

now the formatting is only in lbrynet/core/StreamDescriptor.py

eukreign

This is such a massive pull request :-) Difficult to review in one sitting!

Anyways, I added a couple of minor new things but there are still some comments from the last time I reviewed that are valid.

eukreign · 2018-02-06T03:21:17Z

.pylintrc

@@ -290,7 +290,7 @@ spelling-store-unknown-words=no
 [FORMAT]

 # Maximum number of characters on a single line.
-max-line-length=100
+max-line-length=120


I've been thinking that maybe 120 is too much. In cases where you want to have two files side by side, even in PyCharm at slightly enlarged font (like mine), I can't comfortably fit two 120 lines side by side. 100 fits though... what do you think? Also, Python PEP8 suggests 100 as the max: https://www.python.org/dev/peps/pep-0008/#maximum-line-length

I like 120 but don't particularly prefer it to 100.

I like 120 in general, but on the rare occasion that you want to have two files open at the same time, it helps a lot to have slightly more reasonable lines.

Actually, since you use the PyCharm git tools, I'd think you would prefer the shorter line length too since PyCharm shows diffs side by side.

eukreign · 2018-02-06T03:26:17Z

lbrynet/tests/unit/lbryfilemanager/test_EncryptedFileCreator.py

-        out = yield EncryptedFileCreator.create_lbry_file(
-            session, manager, filename, handle, key, iv_generator())
+        out = yield EncryptedFileCreator.create_lbry_file(self.session, manager, filename, handle,
+                                                          key, iv_generator())


This seems like a reverse formatting operation back to the pan handle :-D

-remove blob upload/download history

-remove CachedClaim and wallet storage classes

-remove lbrynet/lbry_file/StreamDescriptor.py

-remove stream info manager (DBEncryptedMetadataManager) -split `add_lbry_file` into separate `add_published_file` and `add_downloaded_file` functions -set the download path upon adding file to the db, use the source file path for publishes -remove the lbry file manager-wide download directory, set for each file individually -add claim `metadata`, `claim_name`, `claim_id`, `outpoint`, `txid`, `nout`, `channel_claim_id`, and `channel_name` attributes to EncryptedFileDownloader

-also fixes a few wallet tests that weren't checking exceptions correctly

jackrobison force-pushed the database-refactor branch 3 times, most recently from 0df7d75 to 1defab6 Compare December 12, 2017 17:24

lyoshenka mentioned this pull request Dec 12, 2017

File info fix lbryio/lbry-desktop#735

Closed

lyoshenka assigned jackrobison Dec 13, 2017

jackrobison force-pushed the database-refactor branch from 1defab6 to fd0a7c3 Compare December 13, 2017 14:53

lyoshenka reviewed Dec 15, 2017

View reviewed changes

lyoshenka mentioned this pull request Jan 9, 2018

file_list not updated after claim update #944

Closed

jackrobison force-pushed the database-refactor branch from fd0a7c3 to 599214e Compare January 16, 2018 17:12

This was referenced Jan 22, 2018

download_path is incorrect for files uploaded from outside of download directory #827

Closed

LBRY forgets specified destination filenames when daemon restarted #926

Closed

review #1025 #1077

Closed

jackrobison force-pushed the database-refactor branch from 52e9977 to 22af6eb Compare January 29, 2018 03:20

lyoshenka reviewed Jan 29, 2018

View reviewed changes

eukreign requested changes Jan 29, 2018

View reviewed changes

lbry-bot assigned jackrobison and unassigned jackrobison Jan 29, 2018

jackrobison force-pushed the database-refactor branch 6 times, most recently from 569956c to 706645a Compare February 6, 2018 03:00

eukreign requested changes Feb 6, 2018

View reviewed changes

lbry-bot assigned jackrobison and unassigned jackrobison Feb 6, 2018

jackrobison force-pushed the database-refactor branch from d82eae9 to db1e444 Compare February 9, 2018 20:04

jackrobison added 15 commits February 13, 2018 10:27

update BlobManager to use SQLiteStorage, remove old database functions

e671005

-remove blob upload/download history

remove sqlite_helpers

af87df3

update Wallet to use SQLiteStorage

0b1f419

-remove CachedClaim and wallet storage classes

refactor lbrynet.core.StreamDescriptor

f8c33b6

-remove lbrynet/lbry_file/StreamDescriptor.py

update reflector client and server to use SQLiteStorage

9a8cac2

update Daemon to use SQLiteStorage

ce4b82d

update Downloader and Publisher

0df383b

re-enable filtering lbry files by claim attributes

c15850c

migrate databases

9d539aa

whitespace

e9b9118

update unit tests, add test_SQLiteStorage

49507b9

update functional tests

b7d0191

cleaner foreign keys

9e11c07

pylint

68c906a

jackrobison force-pushed the database-refactor branch from 7980834 to 68c906a Compare February 13, 2018 15:28

lbry-bot assigned jackrobison and unassigned jackrobison Feb 13, 2018

jackrobison added 3 commits February 13, 2018 14:22

fix setup and teardown for test_streamify

3769287

fix setup and teardown for blob manager tests

9d87de1

fix setup and teardown for wallet tests

73f09ed

-also fixes a few wallet tests that weren't checking exceptions correctly

jackrobison force-pushed the database-refactor branch 2 times, most recently from 969cc12 to 16fe885 Compare February 13, 2018 19:41

fix default db_revision

e40d4f7

jackrobison force-pushed the database-refactor branch from 16fe885 to e40d4f7 Compare February 13, 2018 19:43

jackrobison added 2 commits February 13, 2018 15:02

fix file name for files that already exist

a756736

changelog

3da25b8

jackrobison mentioned this pull request Feb 13, 2018

Refactor file_manager, stream_info_manager #1020

Closed

jackrobison merged commit 3da25b8 into master Feb 13, 2018

lyoshenka deleted the database-refactor branch February 23, 2018 18:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor databases #1025

refactor databases #1025

jackrobison commented Dec 12, 2017

lyoshenka commented Dec 13, 2017

jackrobison commented Dec 13, 2017

lyoshenka Dec 15, 2017

lyoshenka Jan 29, 2018

jackrobison Jan 29, 2018 •

edited

kauffj commented Dec 19, 2017

lyoshenka Jan 29, 2018

lyoshenka Jan 29, 2018

eukreign left a comment

eukreign Jan 29, 2018

jackrobison Feb 12, 2018

eukreign Jan 29, 2018

jackrobison Feb 12, 2018

eukreign Jan 29, 2018

eukreign Jan 29, 2018

eukreign Jan 29, 2018

jackrobison Feb 12, 2018 •

edited

eukreign Jan 29, 2018

eukreign Jan 29, 2018

eukreign Jan 29, 2018

eukreign Jan 29, 2018

eukreign Jan 29, 2018

jackrobison Feb 12, 2018

eukreign left a comment

eukreign Feb 6, 2018

jackrobison Feb 12, 2018

eukreign Feb 12, 2018

eukreign Feb 6, 2018

refactor databases #1025

refactor databases #1025

Conversation

jackrobison commented Dec 12, 2017

lyoshenka commented Dec 13, 2017

jackrobison commented Dec 13, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jackrobison Jan 29, 2018 • edited

Choose a reason for hiding this comment

kauffj commented Dec 19, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eukreign left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jackrobison Feb 12, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eukreign left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jackrobison Jan 29, 2018 •

edited

jackrobison Feb 12, 2018 •

edited