-
Notifications
You must be signed in to change notification settings - Fork 477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor databases #1025
refactor databases #1025
Conversation
0df7d75
to
1defab6
Compare
@jackrobison who needs to review this? i glanced at it briefly and i still think more columns should be |
1defab6
to
fd0a7c3
Compare
@lyoshenka please check out fd0a7c3 , I rebased in some changes |
|
||
create table stream_blob ( | ||
stream_hash char(96) not null, | ||
blob_hash char(96), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my note disappeared somehow. blob_hash
should be not null. a stream_blob's purpose is to connect a stream and a blob. if there's no blob, there's no stream_blob
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jackrobison this still applies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is to handle stream terminators, which don't have a blob hash but do have an iv, position, and length.
I think I don't see how an application using LBRY can re-render to a user what they downloaded without these fields, or an efficient way to select them again. |
fd0a7c3
to
599214e
Compare
52e9977
to
22af6eb
Compare
|
||
log = logging.getLogger(__name__) | ||
|
||
|
||
class DiskBlobManager(DHTHashSupplier): | ||
def __init__(self, hash_announcer, blob_dir, db_dir): | ||
def __init__(self, hash_announcer, blob_dir, storage): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to require a class or interface for arguments? i feel like storage
should be an interface with functions like "addBlobs", "getBlobs", etc. Then we can implement multiple types of storage (e.g. sqlite
for production, in-memory
for testing) and it's clear what the contract is.
A simple way of doing this now without too much refactoring is to create an interface that has all the functions that are already in use. Then we can pare it down later.
|
||
create table stream_blob ( | ||
stream_hash char(96) not null, | ||
blob_hash char(96), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jackrobison this still applies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bunch of my comments are structural but a lot of the other ones are code style, specifically where arguments go if they don't fit on the same line as the function call.
It would be nice to have a style guide for this.
Personally, I find putting arguments on a new line to be much more aesthetically pleasing and most of the time all of the arguments then fit on a single line, so it reads easier. With the "pan handle" approach (arguments line up with function call open parenthesis) you often end up having to overflow the arguments onto more lines, this issue compounds when the argument itself is long... prime case in this particular pull request is SQL statements which actually fit most of the time on a single line if you do the new-line-4-spaces-indent for the arguments.
So,
txid, nout = lbryfile_cursor.execute("select txid, n from lbry_file_metadata "
"where lbry_file=?",
(file_infos[stream_hash]['rowid'],)).fetchone()
vs.
txid, nout = lbryfile_cursor.execute(
"select txid, n from lbry_file_metadata where lbry_file=?", (file_infos[stream_hash]['rowid'],)
).fetchone()
I think the second option above looks nicer and works well in more cases than the "pan handle" approach.
lbrynet/core/StreamDescriptor.py
Outdated
return d | ||
|
||
d.addCallback(format_info) | ||
return d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could above be broken down into two functions to make it easier to test? Something like:
@defer.inlineCallbacks
def format_sd_info(stream_info, include_blobs):
fields = {
'stream_type': EncryptedFileStreamType,
'stream_name': stream_info[1],
'key': stream_info[0],
'suggested_file_name': stream_info[2],
'stream_hash': stream_hash
}
if include_blobs:
blobs = yield storage.get_blobs_for_stream(stream_hash)
formatted_blobs = []
for blob_info in blobs:
blob = {}
if blob_info.length != 0:
blob['blob_hash'] = str(blob_info.blob_hash)
blob['blob_num'] = blob_info.blob_num
blob['iv'] = str(blob_info.iv)
blob['length'] = blob_info.length
formatted_blobs.append(blob)
fields['blobs'] = formatted_blobs
defer.returnValue(fields)
@defer.inlineCallbacks
def get_sd_info(storage, stream_hash, include_blobs):
stream_info = yield storage.get_stream_info(stream_hash)
formatted_info = yield format_sd_info(stream_info, include_blobs)
defer.returnValue(formatted_info)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lbrynet/core/BlobManager.py
Outdated
@@ -75,7 +63,7 @@ def _immediate_announce(self, blob_hashes): | |||
def blob_completed(self, blob, next_announce_time=None, should_announce=True): | |||
if next_announce_time is None: | |||
next_announce_time = self.get_next_announce_time() | |||
yield self._add_completed_blob(blob.blob_hash, blob.length, | |||
yield self.storage.add_completed_blob(blob.blob_hash, blob.length, | |||
next_announce_time, should_announce) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we don't have a style guide for this yet but I've always personally preferred the arguments-on-new-line-indented-4-spaces whenever there are too many arguments to a function, this seems more consistent (indentation wise with the rest of the soruce) and looks nicer (less jagged and you can fit more arguments on a single line vs. trying to line it up with parenthesis):
yield self.storage.add_completed_blob(
blob.blob_hash, blob.length, next_announce_time, should_announce
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
lbrynet/core/BlobManager.py
Outdated
|
||
def set_should_announce(self, blob_hash, should_announce): | ||
if blob_hash in self.blobs: | ||
blob = self.blobs[blob_hash] | ||
if blob.get_is_verified(): | ||
return self._set_should_announce(blob_hash, | ||
return self.storage.set_should_announce(blob_hash, | ||
self.get_next_announce_time(), | ||
should_announce) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as before about putting all arguments on a new line and indented 4 spaces.
return self.storage.set_should_announce(
blob_hash, self.get_next_announce_time(), should_announce
)
lbrynet/core/Session.py
Outdated
@@ -300,7 +302,7 @@ def _setup_other_components(self): | |||
else: | |||
self.blob_manager = DiskBlobManager(self.hash_announcer, | |||
self.blob_dir, | |||
self.db_dir) | |||
self.storage) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as before about all arguments on single new line below function call indented 4 spaces.
d = self.storage.setup() | ||
d.addCallback(lambda _: self.wallet.start()) | ||
d.addCallback(lambda _: self.blob_tracker.start()) | ||
return d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change doesn't have negative performance side affect? Seems like before the manager setup and wallet start up happened in parallel and now they are serial. Just making sure I'm understanding this change correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't - setup for storage
is just creating the sqlite tables.
lbrynet/database/storage.py
Outdated
def get_sd_blob_hash_for_stream(self, stream_hash): | ||
d = self.db.runQuery("select sd_hash from stream where stream_hash=?", (stream_hash,)) | ||
d.addCallback(lambda r: None if not r else r[0][0]) | ||
return d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use _get_one_or_none
.
lbrynet/database/storage.py
Outdated
def get_stream_hash_for_sd_hash(self, sd_blob_hash): | ||
d = self.db.runQuery("select stream_hash from stream where sd_hash = ?", (sd_blob_hash, )) | ||
d.addCallback(lambda r: None if not r else r[0][0]) | ||
return d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use _get_one_or_none
.
lbrynet/database/storage.py
Outdated
def get_lbry_file_status(self, rowid): | ||
d = self.db.runQuery("select status from file where rowid = ?", (rowid,)) | ||
d.addCallback(lambda r: (r[0][0] if len(r) else None)) | ||
return d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use _get_one_or_none
.
lbrynet/database/storage.py
Outdated
def get_rowid_for_stream_hash(self, stream_hash): | ||
d = self.db.runQuery("select rowid from file where stream_hash=?", (stream_hash, )) | ||
d.addCallback(lambda r: (r[0][0] if len(r) else None)) | ||
return d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use _get_one_or_none
.
blob['length'] = blob_info.length | ||
formatted_blobs.append(blob) | ||
sd_blob['blobs'] = formatted_blobs | ||
return defer.succeed(sd_blob) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having a bit of déjà vu here, is this function defined already somewhere else? lbrynet/core/StreamDescriptor.py
They may do something slightly different but the function names are identical and they appear to be very similar internally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now the formatting is only in lbrynet/core/StreamDescriptor.py
569956c
to
706645a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is such a massive pull request :-) Difficult to review in one sitting!
Anyways, I added a couple of minor new things but there are still some comments from the last time I reviewed that are valid.
@@ -290,7 +290,7 @@ spelling-store-unknown-words=no | |||
[FORMAT] | |||
|
|||
# Maximum number of characters on a single line. | |||
max-line-length=100 | |||
max-line-length=120 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been thinking that maybe 120 is too much. In cases where you want to have two files side by side, even in PyCharm at slightly enlarged font (like mine), I can't comfortably fit two 120 lines side by side. 100 fits though... what do you think? Also, Python PEP8 suggests 100 as the max: https://www.python.org/dev/peps/pep-0008/#maximum-line-length
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like 120 but don't particularly prefer it to 100.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like 120 in general, but on the rare occasion that you want to have two files open at the same time, it helps a lot to have slightly more reasonable lines.
Actually, since you use the PyCharm git tools, I'd think you would prefer the shorter line length too since PyCharm shows diffs side by side.
out = yield EncryptedFileCreator.create_lbry_file( | ||
session, manager, filename, handle, key, iv_generator()) | ||
out = yield EncryptedFileCreator.create_lbry_file(self.session, manager, filename, handle, | ||
key, iv_generator()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a reverse formatting operation back to the pan handle :-D
d82eae9
to
db1e444
Compare
-remove blob upload/download history
-remove CachedClaim and wallet storage classes
-remove lbrynet/lbry_file/StreamDescriptor.py
-remove stream info manager (DBEncryptedMetadataManager) -split `add_lbry_file` into separate `add_published_file` and `add_downloaded_file` functions -set the download path upon adding file to the db, use the source file path for publishes -remove the lbry file manager-wide download directory, set for each file individually -add claim `metadata`, `claim_name`, `claim_id`, `outpoint`, `txid`, `nout`, `channel_claim_id`, and `channel_name` attributes to EncryptedFileDownloader
7980834
to
68c906a
Compare
-also fixes a few wallet tests that weren't checking exceptions correctly
969cc12
to
16fe885
Compare
16fe885
to
e40d4f7
Compare
No description provided.