Streamreader #421

mgautierfr · 2020-09-16T09:07:16Z

This PR is equivalent to #411 and #420.

It mainly reuse the commits of @veloman-yunkan but architecture the code differently.

Reuse the simplification of Buffer (but do not merge it with Blob).
Reuse the IDataStream, DecodeDataStream and ReaderStreamWrapper. But rename it as IStreamReader, DecoderStreamReader and RawStreamReader.
Reuse BufDataStream, but rename it BufferStreamer and don't make it inherit from ISrtreamReader.

IStreamReader allow to get Reader instead of a Buffer.
It allow the uncompressed cluster to sequentially get (and store) a Reader for each blob in the cluster with copying any data.

The commits are somehow a succession of @veloman-yunkan's commits from #411/#420 without change followed by a commit to adapt the code to this PR's design.

The `SharedBuffer` name is pretty badly chosen, but the plan is to make it replace `Buffer` so the name is only temporary.

codecov · 2020-09-16T16:19:29Z

Codecov Report

Merging #421 into master will increase coverage by 0.22%.
The diff coverage is 90.47%.

@@            Coverage Diff             @@
##           master     #421      +/-   ##
==========================================
+ Coverage   47.17%   47.39%   +0.22%     
==========================================
  Files          66       73       +7     
  Lines        3305     3317      +12     
  Branches     1422     1426       +4     
==========================================
+ Hits         1559     1572      +13     
+ Misses       1746     1742       -4     
- Partials        0        3       +3

Impacted Files	Coverage Δ
include/zim/fileheader.h	`100.00% <ø> (ø)`
src/file_reader.h	`66.66% <ø> (-33.34%)`	⬇️
src/reader.h	`100.00% <ø> (ø)`
src/fileimpl.cpp	`80.33% <38.46%> (+0.33%)`	⬆️
src/istreamreader.h	`57.14% <57.14%> (ø)`
include/zim/blob.h	`91.66% <66.66%> (ø)`
src/buffer.h	`60.00% <75.00%> (-24.22%)`	⬇️
src/file_reader.cpp	`69.00% <82.60%> (+1.67%)`	⬆️
src/buffer_reader.cpp	`85.00% <85.00%> (ø)`
src/decoderstreamreader.h	`96.15% <96.15%> (ø)`
... and 21 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 79e6500...004afcb. Read the comment docs.

`Buffer` is no more a abstract class but the real class containing the data. This is mostly a (size aware) wrapper around a `shared_ptr`.

We don't need to keep a reference to the `Buffer` to keep the data alive. We can simply use the internal `shared_ptr`.

As we now use a internal `shared_ptr` to properly free the buffer memory (and make `Blob` do the same instead of using a shared_ptr<Buffer>), we can directly use a `Buffer` instead of a `shared_ptr<Buffer>`

- Rename `IDataStream` to `IStreamReader` - `IStreamReader` allow to get a (sub)reader instead of a `IDataStream::Blob`. While it may seem a bit odd, it allows streamReader to delay the actual read when we really want the data.

BufferStreamer will not be use "in place" of a IStreamReader and it doesn't share the same methods. So we don't need it to inherint the `IStreamReader`.

The cluster now internally use a `IStreamReader` to read the cluster data. As we are reading the data sequentially, we have to store the previous blob's data in a vector. As we don't want to read/mmap the data of the blob we won't read, we don't store a buffer but a reader. Depending of the `IStreamReader`, the stored readers may be : - A BufferReader (on the decompressed data). We have no choice here, we need to store the data if we don't want to decompress the data twice. - A FileReader (on uncompressed data). This is a lightweight object with just a offset to the data to read. The data will be read only when we really want to access the blob.

veloman-yunkan

The duplication between Blob and Buffer is an unjustified redundancy at this point.

There seems to be only one serious issue that can lead to a bug in the future (the const_cast in Buffer::data()). Otherwise, this is just an uglified version of my implementation 😄

src/buffer.cpp

veloman-yunkan · 2020-09-16T17:42:18Z

src/buffer.h

+    static const Buffer makeBuffer(const char* data, zsize_t size);
+    static const Buffer makeBuffer(const DataPtr& data, zsize_t size);


const qualifiers on return types offer no real protection if we talk about return-by-value of a type with reference semantics

src/istreamreader.cpp

src/decoderstreamreader.h

src/bufferstreamer.h

src/decoderstreamreader.h

src/rawstreamreader.h

src/buffer.h

…tream.

veloman-yunkan

One [issue from the previous review] (#421 (comment)) has not been fixed

This avoid code duplication in tests.

mgautierfr · 2020-09-23T12:18:10Z

All issues should be fixed.
I've also move the three write_to_buffer test functions into one.

veloman-yunkan

Approving these changes wouldn't be sincere of me, so please let me simply state that I don't see in this PR any issues that should block it from being merged.

kelson42 · 2020-09-23T13:25:22Z

@veloman-yunkan Thx
@mgautierfr Any chance to get the split of compressed/uncompressed clusters @veloman-yunkan did in an other PR? It seems there is an agreement that this is a good approach?

kelson42 · 2020-09-23T13:30:18Z

@mgautierfr This 3 small comestic things reported here https://www.codefactor.io/repository/github/openzim/libzim/pull/421

mgautierfr · 2020-09-23T13:59:29Z

@mgautierfr This 3 small comestic things reported here codefactor.io/repository/github//pull/421

I can't see what are the issues. Can you ?

mgautierfr and others added 3 commits September 16, 2020 16:56

Introduce SharedBuffer.

bd6b210

The `SharedBuffer` name is pretty badly chosen, but the plan is to make it replace `Buffer` so the name is only temporary.

Dropped MemoryViewBuffer

da21218

Dropped MemoryBuffer

2767c04

mgautierfr force-pushed the streamreader branch from 3e97db7 to 744bc97 Compare September 16, 2020 16:16

veloman-yunkan and others added 17 commits September 16, 2020 18:34

Dropped MMapBuffer

269c942

Remove SharedBuffer and make Buffer the only class to contain data.

fe9754f

`Buffer` is no more a abstract class but the real class containing the data. This is mostly a (size aware) wrapper around a `shared_ptr`.

Blob do not depend of Buffer.

6382f68

We don't need to keep a reference to the `Buffer` to keep the data alive. We can simply use the internal `shared_ptr`.

Do not use external shared_ptr to keep buffer memory alive.

2a025ec

As we now use a internal `shared_ptr` to properly free the buffer memory (and make `Blob` do the same instead of using a shared_ptr<Buffer>), we can directly use a `Buffer` instead of a `shared_ptr<Buffer>`

Introduced zim::IDataStream

f19fd25

IStreamReader allow to get a reader.

86ef980

- Rename `IDataStream` to `IStreamReader` - `IStreamReader` allow to get a (sub)reader instead of a `IDataStream::Blob`. While it may seem a bit odd, it allows streamReader to delay the actual read when we really want the data.

Enter DecodedDataStream

a4ed832

Adapt DecoderStreamReader to wrap a Reader instead of a InputStream.

695fb9f

zim::ReaderDataStreamWrapper

227df39

Adapt RawStreamReader to wrap a reader.

9c469f8

Enter BufDataStream

b8f3eb7

Adapt BufferStreamer to wrap a Buffer instead of raw data.

d796085

BufferStreamer will not be use "in place" of a IStreamReader and it doesn't share the same methods. So we don't need it to inherint the `IStreamReader`.

Got rid of read_size() in cluster.cpp

480780a

Make Dirent use BufferStreamer.

76c60b4

Make FileHeader use BufferStreamer.

9d358d4

Faster Blob/Buffer constructor for non-owned data case

8b83dc1

mgautierfr force-pushed the streamreader branch from 744bc97 to 8b83dc1 Compare September 16, 2020 16:34

mgautierfr requested a review from veloman-yunkan September 16, 2020 16:36

veloman-yunkan reviewed Sep 16, 2020

View reviewed changes

mgautierfr added 5 commits September 17, 2020 11:33

fixup! Adapt BufferStreamer to wrap a Buffer instead of raw data.

04c4020

fixup! Adapt DecoderStreamReader to wrap a Reader instead of a InputS…

39533c7

…tream.

fixup! Adapt RawStreamReader to wrap a reader

04843d9

Move BufferReader to its own file.

4672b19

Remove Buffer.as method.

b3e64fe

This was linked to issues Sep 17, 2020

Do partial cluster decompression #78

Closed

Streaming decompression #394

Closed

mgautierfr requested a review from veloman-yunkan September 21, 2020 09:34

veloman-yunkan reviewed Sep 22, 2020

View reviewed changes

mgautierfr added 3 commits September 23, 2020 14:02

fixup! Do not use external shared_ptr to keep buffer memo

8a816f2

Rename tempfile.(cpp|h) to tools.(cpp|h)

12218e2

Move write_to_buffer test function to a generic helper function.

f5e682d

This avoid code duplication in tests.

veloman-yunkan reviewed Sep 23, 2020

View reviewed changes

Remove a few useless empty lines

004afcb

kelson42 merged commit 2b4558b into master Sep 23, 2020

kelson42 deleted the streamreader branch September 23, 2020 17:23

This was referenced Sep 23, 2020

De-Buffer-ization of libzim #420

Closed

Partial/incremental decompression of clusters #411

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streamreader #421

Streamreader #421

mgautierfr commented Sep 16, 2020 •

edited

Loading

codecov bot commented Sep 16, 2020 •

edited

Loading

veloman-yunkan left a comment

veloman-yunkan Sep 16, 2020

veloman-yunkan left a comment

mgautierfr commented Sep 23, 2020

veloman-yunkan left a comment

kelson42 commented Sep 23, 2020

kelson42 commented Sep 23, 2020

mgautierfr commented Sep 23, 2020

		static const Buffer makeBuffer(const char* data, zsize_t size);
		static const Buffer makeBuffer(const DataPtr& data, zsize_t size);

Streamreader #421

Streamreader #421

Conversation

mgautierfr commented Sep 16, 2020 • edited Loading

codecov bot commented Sep 16, 2020 • edited Loading

Codecov Report

veloman-yunkan left a comment

Choose a reason for hiding this comment

veloman-yunkan Sep 16, 2020

Choose a reason for hiding this comment

veloman-yunkan left a comment

Choose a reason for hiding this comment

mgautierfr commented Sep 23, 2020

veloman-yunkan left a comment

Choose a reason for hiding this comment

kelson42 commented Sep 23, 2020

kelson42 commented Sep 23, 2020

mgautierfr commented Sep 23, 2020

mgautierfr commented Sep 16, 2020 •

edited

Loading

codecov bot commented Sep 16, 2020 •

edited

Loading