Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SstFileReader to read sst files #4717

Closed
wants to merge 2 commits into from
Closed

Add SstFileReader to read sst files #4717

wants to merge 2 commits into from

Conversation

huachaohuang
Copy link
Contributor

A user friendly sst file reader is useful when we want to access sst
files outside of RocksDB. For example, we can generate an sst file
with SstFileWriter and send it to other places, then use SstFileReader
to read the file and process the entries in other ways.

Also rename the original SstFileReader to SstFileDumper because of
name conflict, and seems SstFileDumper is more appropriate for tools.

TODO: there is only a very simple test now, because I want to get some feedback first.
If the changes look good, I will add more tests soon.

A user friendly sst file reader is useful when we want to access sst
files outside of RocksDB. For example, we can generate an sst file
with SstFileWriter and send it to other places, then use SstFileReader
to read the file and process the entries in other ways.

Also rename the original SstFileReader to SstFileDumper because of
name conflict, and seems SstFileDumper is more appropriate for tools.
Copy link
Contributor

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, a couple questions, particularly about the decision to use snapshot in the API

kMaxSequenceNumber;
auto internal_iter = r->table_reader->NewIterator(
options, r->moptions.prefix_extractor.get());
return NewDBIterator(r->options.env, options, r->ioptions, r->moptions,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting, so you don't need to expose sequence number?

const Slice& key, PinnableSlice* value);

// Returns a new iterator over the table contents.
Iterator* NewIterator(const ReadOptions& options);
Copy link
Contributor

@ajkr ajkr Nov 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe document how it returns only latest keys unless ReadOptions::snapshot is set to a live snapshot

Do you plan to use this API for reading from snapshots? If so is this API easier for you compared to having an iterator that returns internal keys?

@huachaohuang
Copy link
Contributor Author

@ajkr IMO, there are two kinds of requirements to read from an ss file:

  1. We just want to read the valid key-values from the file without relying on the internal implementation. We can provide a more user-oriented interface and hide details like the sequence number or value type of an internal key.

  2. We want to know everything about the file. In this case, we have to know about the internal implementation like InternalIterator and other things from dbformat.h. So for this requirement, we can use the TableReader API directly, which gives us more control.

This PR tries to provide the API for the first use case. For whether to support snapshot read, I don't need that personally because I only need to read from sst files generated by SstFileWriter. I handle snapshot here to make SstFileReader more general and more consistent with how we read from DB.

Copy link
Contributor

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation @huachaohuang. I guess there's no harm in supporting snapshot reads, even if they won't be used right now.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@huachaohuang huachaohuang deleted the sst-file-reader branch November 28, 2018 02:47
DorianZheng pushed a commit to tikv/rocksdb that referenced this pull request Nov 28, 2018
Summary:
A user friendly sst file reader is useful when we want to access sst
files outside of RocksDB. For example, we can generate an sst file
with SstFileWriter and send it to other places, then use SstFileReader
to read the file and process the entries in other ways.

Also rename the original SstFileReader to SstFileDumper because of
name conflict, and seems SstFileDumper is more appropriate for tools.

TODO: there is only a very simple test now, because I want to get some feedback first.
If the changes look good, I will add more tests soon.
Pull Request resolved: facebook#4717

Differential Revision: D13212686

Pulled By: ajkr

fbshipit-source-id: 737593383264c954b79e63edaf44aaae0d947e56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants