Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coappearances graph as exhaustive example. #21

Merged
merged 18 commits into from
Feb 17, 2018
Merged

Conversation

boxdot
Copy link
Collaborator

@boxdot boxdot commented Feb 4, 2018

The goal of this example is to show how to convert a nested data format into a flat representation in flatdata (here: json, a sample file is included). Further, it should introduce and use all available data structures in flatdata, hence exhaustive.

When #2 is implemented, this example could be extended to use enums.

I personally think that this example can be also used for an introductory tutorial in flatdata. Especially, the techniques as sentinels, ranges, saving of string in a raw memory blocks, multivectors etc. can be explained easily.

The goal of this example is to show how to convert a nested data
format in to a flat representation in flatdata (here: json).
Further, it should introduce and use all available data structures
in flatdata.
{
// Since flatdata's mutators are not holding any data, we are creating a vector with a single
// element for holding the data.
flatdata::Vector< co::Meta > data( 1 );
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a better way to do it right now? -- If not, we should think about a better approach. I find this very contra-intuitive.

One approach could be introducing a method start_meta to the builder, which would hide the vector with a single field from the user. close method would then do what set_meta is doing now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We always wanted to add a flatdata::Struct< T > or flatdata::Object< T > but there was no time (aka good reason) to. Could do that, if you like (:

Copy link
Collaborator Author

@boxdot boxdot Feb 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, referenced in #23 #22.


auto character = vertices.grow( );
character.name_ref = strings.size( );
strings += data.at( "name" ).get< std::string >( ) + '\0';
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty straight-forward, but still too error-prone. It is very easy to forget to add + '\0' and then the reader would read complete garbage (happened to me several times). What do you think about adding a simple helper class? The interfaces could be similar to flatdata::Vector: grow and close.

Copy link
Contributor

@imagovrn imagovrn Feb 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Something like a "StringList" or "StringListBuilder". And raw raw_data is for other uses then.

}
}

vertices_data.next_item( );
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hit me several times already. In vector I need first to call grow and then start adding data. In the multivector it is the other way around: first, add the data (to the bucket), then call next_item. What do you think about unifying the behavior?

We could even go further and unify the interface by replacing next_item by grow.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solved in #25.

Coappearances
-------------

This examples converts a graph of coappearances from json to flatdata. A graph of coappearances is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a wonderful example, finally ٩(◕‿◕。)۶

minor: u8 : 7;
}

// @bound_implicitly( characters: vertices, vertices_data )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We never thought of that... "Bound Implicitly" meant binding two vectors together, as a columnar way to store a single structure.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was done in #18.

meta : Meta;

@explicit_reference( Character.name_ref, strings )
vertices : vector< Character >;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not characters? :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am describing a graph consisting of vertices and edges. This is also a nice way to say vertices are Characters by using types.

int
main( int argc, char const* argv[] )
{
if ( argc < 3 )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might make sense to use gflags or similar library to make this leaner. As an example, this can afford extra dependency IMO.

return 1;
}
}
catch ( std::runtime_error err )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.

return ( static_cast< uint8_t >( id[ 0 ] ) << 8 ) + static_cast< uint8_t >( id[ 1 ] );
}

using CharactersIndex = std::map< uint16_t /* id */, uint16_t /* ref */ >;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might need a better comment as id on one side and ref on the other is a little bit confusing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, makes sense. I also realized that I actually do not need convert_id function, since I am not storing ids anyway.

* ]
* }
*/
namespace coappearances {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The schema needs a few comments. Mostly, about the way things refer to each other. Otherwise presence of both "Relations" and "edges" doesn't make it straightforward to understand..

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, make sense. I also added extended docs to the whole schema.

int
main( int argc, char const* argv[] )
{
auto args = docopt::docopt( USAGE, {argv + 1, argv + argc} );
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure it makes sense to use something like that for such simple argument parsing.

@boxdot
Copy link
Collaborator Author

boxdot commented Feb 10, 2018

We should wait with merging until #25 is merged. The latter solves an issue with MultiVector and makes it interface easier. These changes should be applied here first.

@boxdot
Copy link
Collaborator Author

boxdot commented Feb 12, 2018

All comments are fixed. Please review.

Also
* Output the number of vertices and edges.
auto to_refs = relation.at( "to" ).get< picojson::array >( );
assert( to_refs.size( ) == 2 );
rel.to_a_ref = characters_index.at( to_refs[ 0 ].get< std::string >( ) );
rel.to_a_ref = characters_index.at( to_refs[ 1 ].get< std::string >( ) );
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assigned the same value twice

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Fixed.

@boxdot boxdot merged commit 560f83d into master Feb 17, 2018
@boxdot boxdot deleted the exhaustive-example branch February 17, 2018 10:29
gferon pushed a commit to gferon/flatdata that referenced this pull request Feb 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants