Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streamreader #421

Merged
merged 29 commits into from
Sep 23, 2020
Merged

Streamreader #421

merged 29 commits into from
Sep 23, 2020

Conversation

mgautierfr
Copy link
Collaborator

@mgautierfr mgautierfr commented Sep 16, 2020

This PR is equivalent to #411 and #420.

It mainly reuse the commits of @veloman-yunkan but architecture the code differently.

  • Reuse the simplification of Buffer (but do not merge it with Blob).
  • Reuse the IDataStream, DecodeDataStream and ReaderStreamWrapper. But rename it as IStreamReader, DecoderStreamReader and RawStreamReader.
  • Reuse BufDataStream, but rename it BufferStreamer and don't make it inherit from ISrtreamReader.

IStreamReader allow to get Reader instead of a Buffer.
It allow the uncompressed cluster to sequentially get (and store) a Reader for each blob in the cluster with copying any data.

The commits are somehow a succession of @veloman-yunkan's commits from #411/#420 without change followed by a commit to adapt the code to this PR's design.

mgautierfr and others added 3 commits September 16, 2020 16:56
The `SharedBuffer` name is pretty badly chosen, but the plan is to make
it replace `Buffer` so the name is only temporary.
@codecov
Copy link

codecov bot commented Sep 16, 2020

Codecov Report

Merging #421 into master will increase coverage by 0.22%.
The diff coverage is 90.47%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #421      +/-   ##
==========================================
+ Coverage   47.17%   47.39%   +0.22%     
==========================================
  Files          66       73       +7     
  Lines        3305     3317      +12     
  Branches     1422     1426       +4     
==========================================
+ Hits         1559     1572      +13     
+ Misses       1746     1742       -4     
- Partials        0        3       +3     
Impacted Files Coverage Δ
include/zim/fileheader.h 100.00% <ø> (ø)
src/file_reader.h 66.66% <ø> (-33.34%) ⬇️
src/reader.h 100.00% <ø> (ø)
src/fileimpl.cpp 80.33% <38.46%> (+0.33%) ⬆️
src/istreamreader.h 57.14% <57.14%> (ø)
include/zim/blob.h 91.66% <66.66%> (ø)
src/buffer.h 60.00% <75.00%> (-24.22%) ⬇️
src/file_reader.cpp 69.00% <82.60%> (+1.67%) ⬆️
src/buffer_reader.cpp 85.00% <85.00%> (ø)
src/decoderstreamreader.h 96.15% <96.15%> (ø)
... and 21 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 79e6500...004afcb. Read the comment docs.

veloman-yunkan and others added 17 commits September 16, 2020 18:34
`Buffer` is no more a abstract class but the real class containing the
data.
This is mostly a (size aware) wrapper around a `shared_ptr`.
We don't need to keep a reference to the `Buffer` to keep the data
alive.
We can simply use the internal `shared_ptr`.
As we now use a internal `shared_ptr` to properly free the buffer memory
(and make `Blob` do the same instead of using a shared_ptr<Buffer>),
we can directly use a `Buffer` instead of a `shared_ptr<Buffer>`
- Rename `IDataStream` to `IStreamReader`
- `IStreamReader` allow to get a (sub)reader instead of a `IDataStream::Blob`.
  While it may seem a bit odd, it allows streamReader to delay the
  actual read when we really want the data.
BufferStreamer will not be use "in place" of a IStreamReader and it doesn't
share the same methods.
So we don't need it to inherint the `IStreamReader`.
The cluster now internally use a `IStreamReader` to read the cluster data.
As we are reading the data sequentially, we have to store the previous
blob's data in a vector.

As we don't want to read/mmap the data of the blob we won't read, we don't
store a buffer but a reader.
Depending of the `IStreamReader`, the stored readers may be :
- A BufferReader (on the decompressed data). We have no choice here, we
  need to store the data if we don't want to decompress the data twice.
- A FileReader (on uncompressed data). This is a lightweight object with
  just a offset to the data to read. The data will be read only when we
  really want to access the blob.
Copy link
Collaborator

@veloman-yunkan veloman-yunkan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The duplication between Blob and Buffer is an unjustified redundancy at this point.

There seems to be only one serious issue that can lead to a bug in the future (the const_cast in Buffer::data()). Otherwise, this is just an uglified version of my implementation 😄

src/buffer.cpp Outdated Show resolved Hide resolved
Comment on lines +41 to +42
static const Buffer makeBuffer(const char* data, zsize_t size);
static const Buffer makeBuffer(const DataPtr& data, zsize_t size);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const qualifiers on return types offer no real protection if we talk about return-by-value of a type with reference semantics

src/istreamreader.cpp Outdated Show resolved Hide resolved
src/istreamreader.cpp Outdated Show resolved Hide resolved
src/decoderstreamreader.h Show resolved Hide resolved
src/bufferstreamer.h Outdated Show resolved Hide resolved
src/decoderstreamreader.h Outdated Show resolved Hide resolved
src/rawstreamreader.h Outdated Show resolved Hide resolved
src/buffer.h Outdated Show resolved Hide resolved
Copy link
Collaborator

@veloman-yunkan veloman-yunkan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One [issue from the previous review] (#421 (comment)) has not been fixed

@mgautierfr
Copy link
Collaborator Author

All issues should be fixed.
I've also move the three write_to_buffer test functions into one.

Copy link
Collaborator

@veloman-yunkan veloman-yunkan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving these changes wouldn't be sincere of me, so please let me simply state that I don't see in this PR any issues that should block it from being merged.

@kelson42
Copy link
Contributor

@veloman-yunkan Thx
@mgautierfr Any chance to get the split of compressed/uncompressed clusters @veloman-yunkan did in an other PR? It seems there is an agreement that this is a good approach?

@kelson42
Copy link
Contributor

@mgautierfr This 3 small comestic things reported here https://www.codefactor.io/repository/github/openzim/libzim/pull/421

@mgautierfr
Copy link
Collaborator Author

@mgautierfr This 3 small comestic things reported here codefactor.io/repository/github//pull/421

I can't see what are the issues. Can you ?

@kelson42 kelson42 merged commit 2b4558b into master Sep 23, 2020
@kelson42 kelson42 deleted the streamreader branch September 23, 2020 17:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Streaming decompression Do partial cluster decompression
3 participants