Skip to content
This repository has been archived by the owner on Jan 11, 2021. It is now read-only.

Update Thrift IO structs, add file sink #129

Merged
merged 4 commits into from
Jul 13, 2018

Conversation

sadikovi
Copy link
Collaborator

@sadikovi sadikovi commented Jul 8, 2018

This PR makes the following changes:

  • adds FileSink, which is a buffered wrapper on a std::fs::File and also tracks position without requiring the flush data into a file.
  • renames FileChunk into FileSource
  • removes TMemoryBuffer, as we do not really need it.

@sadikovi
Copy link
Collaborator Author

sadikovi commented Jul 8, 2018

@sunchao could you review this PR, please? I am open to suggestions for new names of structs!
Couple of questions that I would like to know your opinion on:

  • Should we use std::io::Cursor instead of FileSink, e.g. Cursor<BufReader<File>>?
  • Should we use std::io::Seek trait instead of Position trait?

Thanks!

@coveralls
Copy link

coveralls commented Jul 8, 2018

Pull Request Test Coverage Report for Build 547

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 33 unchanged lines in 5 files lost coverage.
  • Overall coverage increased (+0.04%) to 95.28%

Files with Coverage Reduction New Missed Lines %
util/io.rs 1 98.92%
encodings/encoding.rs 1 94.64%
schema/types.rs 6 96.68%
schema/parser.rs 10 96.04%
file/reader.rs 15 96.57%
Totals Coverage Status
Change from base Build 533: 0.04%
Covered Lines: 10861
Relevant Lines: 11399

💛 - Coveralls

@sunchao
Copy link
Owner

sunchao commented Jul 11, 2018

Thanks @sadikovi . I think you are right: Cursor<T> seems only useful when T is Vec<u8> or &[u8], so not quite useful for us. I'm inclined for Seek though since we'd eventually need that, and BufReader<File> already implements that. I'm not sure whether we need a trait for getting position - maybe just add a method to FileSink?

src/util/io.rs Outdated
use thrift::transport::TWriteTransport;

// ----------------------------------------------------------------------
// Read/write wrappers for `File`.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Read/write -> Read/Write?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for that, will fix.


/// Struct that represents `File` output stream with position tracking.
/// Used as a sink in file writer.
pub struct FileSink {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we named this FileSink, should we rename FileChunk to FileSource?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, sounds good - will update.

src/util/io.rs Outdated
// Thrift IO.

/// A thin read-only wrapper on `T: Read` to be used by Thrift transport.
pub struct TInputBuffer<'a, T> where T: 'a + Read {
Copy link
Owner

@sunchao sunchao Jul 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These (both TInputBuffer and TOutputBuffer) do not seem necessary anymore as we can pass Read and Write to Thrift transport.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. I also made changes in the prototype branch to check it compiles. Will remove.

@sadikovi
Copy link
Collaborator Author

@sunchao I updated the PR. We do not use Thrift wrappers, and I replaced Position trait with built-in Seek implementation.

But we only support SeekFrom::Current(0) for FileSink, this means returning the current position in the sink. This is because we would have to flush all data to the file, in order to update position. Right now we just return an Err, if the offset requires cursor movements. (Technically we could support SeekFrom::Start(x), when 0 + x == current position, similar to SeekFrom::End(x), when end - x == current position).

Let me know if this is required, I will update the code.

@sunchao
Copy link
Owner

sunchao commented Jul 11, 2018

@sadikovi : could you explain the motivation for tracking cursor position without flushing the internal buffer to file? is this about sharing file handle among multiple sinks?

Also, could we also implement Seek for FileSource? this can be done via a separate PR though.

@sadikovi
Copy link
Collaborator Author

@sunchao My main idea was minimising number of flushes to a file. We need to extract position quite a few times in the writer, so I thought we should make sure that it does not result in any unnecessary IO. Note that we only need to know the current position, we do not seek in the writer.

Let me know what you think.

For FileSource, do we need to implement all SeekFrom::Start, SeekFrom::Current, SeekFrom::End?

@sunchao
Copy link
Owner

sunchao commented Jul 12, 2018

@sadikovi : I see. Sorry for the confusion. In that case, maybe we don't need Seek after all, since we now just use Seek to get the position - perhaps we just need to add a method pos() for the FileSink?

I can work on implementing Seek for FileSource: seems we just need to implement seek(&mut self, pos: SeekFrom).

@sadikovi
Copy link
Collaborator Author

Yes, you are right. I was thinking if we should instead add Position trait, so both FileSource and FileSeek could implement it. They both have a pointer to the current position. Let me know what you think.

The reason for me to have a trait was testing - I could use Cursor<Vec<u8>> for file sink in tests.

@sunchao
Copy link
Owner

sunchao commented Jul 13, 2018

Sounds good @sadikovi . I agree with you.

@sadikovi
Copy link
Collaborator Author

Okay, thanks. Will make the changes and let you know.

@sadikovi
Copy link
Collaborator Author

I updated the code:

  • Added Position trait that both FileSource and FileSink implement.
  • Made start and end in FileSource to be u64, because one could potentially start with large offset within the file. Note that end - start <= usize, so that it is safe to compare with buf.len().
  • Added some additional tests.

@sadikovi
Copy link
Collaborator Author

@sunchao Let me know if we need to change/update anything!

@sunchao sunchao merged commit 1a70843 into sunchao:master Jul 13, 2018
@sunchao
Copy link
Owner

sunchao commented Jul 13, 2018

Looks good @sadikovi . Merged. Thanks!

@sadikovi
Copy link
Collaborator Author

@sunchao Thanks for merging!

@sadikovi sadikovi deleted the refactor-thrift-io branch July 13, 2018 20:53
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants