-
-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streamreader #421
Streamreader #421
Conversation
The `SharedBuffer` name is pretty badly chosen, but the plan is to make it replace `Buffer` so the name is only temporary.
3e97db7
to
744bc97
Compare
Codecov Report
@@ Coverage Diff @@
## master #421 +/- ##
==========================================
+ Coverage 47.17% 47.39% +0.22%
==========================================
Files 66 73 +7
Lines 3305 3317 +12
Branches 1422 1426 +4
==========================================
+ Hits 1559 1572 +13
+ Misses 1746 1742 -4
- Partials 0 3 +3
Continue to review full report at Codecov.
|
`Buffer` is no more a abstract class but the real class containing the data. This is mostly a (size aware) wrapper around a `shared_ptr`.
We don't need to keep a reference to the `Buffer` to keep the data alive. We can simply use the internal `shared_ptr`.
As we now use a internal `shared_ptr` to properly free the buffer memory (and make `Blob` do the same instead of using a shared_ptr<Buffer>), we can directly use a `Buffer` instead of a `shared_ptr<Buffer>`
- Rename `IDataStream` to `IStreamReader` - `IStreamReader` allow to get a (sub)reader instead of a `IDataStream::Blob`. While it may seem a bit odd, it allows streamReader to delay the actual read when we really want the data.
BufferStreamer will not be use "in place" of a IStreamReader and it doesn't share the same methods. So we don't need it to inherint the `IStreamReader`.
The cluster now internally use a `IStreamReader` to read the cluster data. As we are reading the data sequentially, we have to store the previous blob's data in a vector. As we don't want to read/mmap the data of the blob we won't read, we don't store a buffer but a reader. Depending of the `IStreamReader`, the stored readers may be : - A BufferReader (on the decompressed data). We have no choice here, we need to store the data if we don't want to decompress the data twice. - A FileReader (on uncompressed data). This is a lightweight object with just a offset to the data to read. The data will be read only when we really want to access the blob.
744bc97
to
8b83dc1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The duplication between Blob
and Buffer
is an unjustified redundancy at this point.
There seems to be only one serious issue that can lead to a bug in the future (the const_cast
in Buffer::data()
). Otherwise, this is just an uglified version of my implementation 😄
static const Buffer makeBuffer(const char* data, zsize_t size); | ||
static const Buffer makeBuffer(const DataPtr& data, zsize_t size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const
qualifiers on return types offer no real protection if we talk about return-by-value of a type with reference semantics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One [issue from the previous review] (#421 (comment)) has not been fixed
All issues should be fixed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving these changes wouldn't be sincere of me, so please let me simply state that I don't see in this PR any issues that should block it from being merged.
@veloman-yunkan Thx |
@mgautierfr This 3 small comestic things reported here https://www.codefactor.io/repository/github/openzim/libzim/pull/421 |
I can't see what are the issues. Can you ? |
This PR is equivalent to #411 and #420.
It mainly reuse the commits of @veloman-yunkan but architecture the code differently.
IDataStream
,DecodeDataStream
andReaderStreamWrapper
. But rename it asIStreamReader
,DecoderStreamReader
andRawStreamReader
.BufDataStream
, but rename itBufferStreamer
and don't make it inherit fromISrtreamReader
.IStreamReader
allow to getReader
instead of aBuffer
.It allow the uncompressed cluster to sequentially get (and store) a
Reader
for each blob in the cluster with copying any data.The commits are somehow a succession of @veloman-yunkan's commits from #411/#420 without change followed by a commit to adapt the code to this PR's design.