Skip to content
This repository has been archived by the owner on Jan 11, 2021. It is now read-only.

Refactor Row #89

Merged
merged 5 commits into from
Apr 20, 2018
Merged

Refactor Row #89

merged 5 commits into from
Apr 20, 2018

Conversation

sunchao
Copy link
Owner

@sunchao sunchao commented Apr 15, 2018

This breaks the Row struct into two parts: a RowField
which represent a single field in a row, and a top-level Row
struct which represents the value for a message or struct type.
Only the latter is exposed as public API.

@coveralls
Copy link

coveralls commented Apr 16, 2018

Coverage Status

Coverage remained the same at 94.898% when pulling ef04ba8 on refactor-row into 9681558 on master.

@sadikovi
Copy link
Collaborator

Thanks for making the change!

First of all, sorry for the long and funky tests in record module that you had to change.
And for a conflicting files in this PR.

Looks good, I have a couple questions though.

Now Row represents the top level struct and RowField::Group represents the some internal struct. But they both would need getXXX methods then. Will it result in code duplication? By the way, are getXXX methods ones that are part of RowAccessor in another PR?

With the current approach we can map Row to some other structure, e.g. JSON struct. Is it still possible with this update?

@sadikovi
Copy link
Collaborator

Another thing - should we rename Row to something else? I just think it is a bit weird name for a record:).

@sunchao
Copy link
Owner Author

sunchao commented Apr 16, 2018

Thanks for taking a look @sadikovi .

Now Row represents the top level struct and RowField::Group represents the some internal struct. But they both would need getXXX methods then. Will it result in code duplication? By the way, are getXXX methods ones that are part of RowAccessor in another PR?

Only Row needs to implement the getXXX methods, and we don't need to change anything on RowField::Group (calling getStruct on a Row will yield another Row). And yes, these getXXX methods are the same as in the other PR.

With the current approach we can map Row to some other structure, e.g. JSON struct. Is it still possible with this update?

The mapping can still happen at the RowField level, which not much has changed. We just added a public Row API with a few getters.

Another thing - should we rename Row to something else? I just think it is a bit weird name for a record:).

I'm debating on Row or Record, and BTW, Spark also uses Row for similar purpose. Row also maps to row group. I don't like very much RowField though - maybe we can just call it Field or Column. I'm not quite sure yet...

Copy link
Collaborator

@sadikovi sadikovi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it makes sense. It was a bit difficult for me to see it coming together in two separate PRs. As long as we can map Row enum and fields to something and get values, it is good.

Agree, we should just call it Field.

I left some minor comments. And sorry again for all those tests that you need to update.

fields: Vec<(String, RowField)>
}

pub fn make_row(fields: Vec<(String, RowField)>) -> Row {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some doc to the function? Thanks.
Would it be better to keep it as macro, since it is used in different places? Would it be inlined? If so, then it is fine.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I'll mark it inline and document it.

mod triplet;

pub use self::api::Row;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we export RowField as well or will it work with just Row?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not expose RowField (or Field). For primitive types, the getters will return Rust native types, such as i32, bool etc. We also need to define and expose clean-defined interfaces for array and map. The former should also have getters like getBoolean(rowIndex: usize) to return the value at a particular row, and the latter should have something like getKeys(), getValues(), etc.

I'll take a further look into this. It seems rather verbose if we have to define getXXX for every data type on struct, array, etc. I wonder if there's a cleaner way to do it.

@@ -297,7 +298,24 @@ impl Reader {

/// Reads current record as `Row` from the reader tree.
/// Automatically advances all necessary readers.
/// This must be called on the root level reader (i.e., for Message type).
/// Otherwise, it will panic.
fn read(&mut self) -> Row {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When do we call read? I am just a bit confused:)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The read is called on next() for both RowIter and ReaderIter.

List(Vec<Row>), // List of elements
Map(Vec<(Row, Row)>) // List of key-value pairs
Group(Row), // Struct, child elements are tuples of field-value pairs
List(Vec<RowField>), // List of elements
Copy link
Collaborator

@sadikovi sadikovi Apr 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess when it is list of structs, it will be List(Group(Row), Group(Row), ...). Is that right?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it will be something like List(Vec(Group(Row), Group(Row),...)), and the array interface that I proposed above should have something like getStruct(rowIndex) which will return a Row.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks.

Would it be beneficial to also make List and Map separate, similar to Row, so they can define their own set of methods to get length, elements by index, keys and values?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking about to replace List<Vec<Row>> to something like List<Array> (we'll need better naming on this) where the Array implements the public APIs. Similar for Map<Vec<Row, Row>>. What do you think?

Copy link
Collaborator

@sadikovi sadikovi Apr 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember you suggested this some time ago. Yes, sounds like a good idea. You could consider calling them Struct, Array, Map, Field, and we would return Struct for a top-level record, so it aligns with get_struct, get_array, and get_map methods.

Copy link
Collaborator

@sadikovi sadikovi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I guess, all TODOs will be fixed as part of field accessors work that is open right now.

Can you add docs for Map and List and rebase? Thanks!

@sunchao
Copy link
Owner Author

sunchao commented Apr 20, 2018

Yes all the TODOs will be fixed later. Will add doc for Map & List, and rebase. Thanks

This breaks the `Row` struct into two parts: a `RowField`
which represent a single field in a row, and a top-level `Row`
struct which represents the value for a message or struct type.
Only the latter is exposed as public API.
@sunchao
Copy link
Owner Author

sunchao commented Apr 20, 2018

Merged. Thanks @sadikovi !

@sunchao sunchao deleted the refactor-row branch April 20, 2018 04:22
@sadikovi
Copy link
Collaborator

@sunchao No worries, thanks for the changes.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants