Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add stronger schema definition for map (structs) mutations #14

Merged
merged 4 commits into from
Jun 15, 2018

Conversation

WolfDan
Copy link
Contributor

@WolfDan WolfDan commented Jun 1, 2018

Hello o/

I've been working with dgraph for a while in a personal project, one of the problems of dgraph is that don't have an strong schema definition, the use of indexes without a "proper" schema is a big problem, here an example of what I mean:

Let's suppose we have a database with movies and their characters, so you define the name edge

name: string @index(term) .

And you add the movie with the name Thor, but you also add the character with the name Thor (which is valid in this context), the two have the same name definition, they could have different edges as well, but if you you do a term search for the name Thor you'll get both the movie and the character, I (and others as well) discussed it with the dev team, for the moment the best solution is creating indexes like this:

movie.name: string @index(term) .
character.name: string @index(term) .

This allow you to be more explicit on exactly what you want to query and don't have mixed data

So this PR take the user defined structs, check if it is an struct and add the name of that struct (for example MyApp.Models.Player and turn it into a "schema" for dgrapg (like this player.(the struct name definition), I also fixed a few problems in code that dializer warned

Summary of the PR

  • Add a stronger schema for mutations making use of Structs
  • Fixing unsafe variables
  • Fixing non-used variables

Greetings ^^/

- Fixing unsafe variables
- Fixing non-used variables
@coveralls
Copy link

coveralls commented Jun 2, 2018

Pull Request Test Coverage Report for Build 109

  • 55 of 64 (85.94%) changed or added relevant lines in 4 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+4.0%) to 70.996%

Changes Missing Coverage Covered Lines Changed/Added Lines %
lib/exdgraph/utils.ex 7 8 87.5%
lib/exdgraph/protocol.ex 7 9 77.78%
lib/exdgraph/mutation.ex 39 45 86.67%
Totals Coverage Status
Change from base Build 105: 4.0%
Covered Lines: 164
Relevant Lines: 231

💛 - Coveralls

@ospaarmann
Copy link
Owner

Hey @WolfDan ,

thank you for your PR. I will have a closer look asap.

About the issue that nodes don't have a type (or label in different graph databases). I use it in another project where I use exdgraph by adding a bool attribute to the nodes which is by default true. So for a movie node I would add type_movie: true.

This is based on the Dgraph best practises: https://docs.dgraph.io/howto/#giving-nodes-a-type

@WolfDan
Copy link
Contributor Author

WolfDan commented Jun 2, 2018

Hey @ospaarmann

Yeah it is a possible solution, but is not optimal in a bigger project when you need to have more control about the indexes or how to get your data, there is a discussion already in the forum if you like to take a look (I also commented with a solution that I'm working on right now)

https://discuss.dgraph.io/t/migrating-away-from-generic-type-edge-in-our-app-plus-modeling-complex-schema/2107

We use way too many generic predicates. The predicates “VertexType”, “Class”, “Property”, “Relation”, “ExternalId”, “Namespace”, and others are used by ALL entities in the system. Thus as the system scales, we basically must perform all mutations in series, because any two transactions would very often conflict.

The idea is that wherever possible you should use a more fine-grained predicate. For e.g. for nodes which are Class, you could have Class.name and Class.description. This would help with sharding the data.

Greetings ^^/

@ospaarmann
Copy link
Owner

ospaarmann commented Jun 4, 2018

Hey @WolfDan,

so if I understand you correctly what you want is: If you have a struct and use it as the argument in ExDgraph.set_map(conn, %User{}) that every key in the struct is prefixed with user and this value is then used as the predicate name for Dgraph.

user = %User{name: "Bob"}
conn = ExDgraph.conn()
ExDgraph.set_map(conn, user)

would result in a schema of

user.name: string @index(term) .

If that is correct: I like the idea. But your PR doesn't seem to achieve that. Most of the changes are made in the generation of tmp uids. This is a little hack that I use to return a map that already contains the uid set by Dgraph. Let's take a simple example:

map = %{
  name: "Alice",
  friends: [
    %{
      name: "Betty"
    },
    %{
      name: "Bob",
      job: "Truck driver"
    }
  ]
}
ExDgraph.set_map(conn, map)

This would return:

{:ok,
 %{
   context: %ExDgraph.Api.TxnContext{
     aborted: false,
     commit_ts: 236,
     keys: [],
     lin_read: %ExDgraph.Api.LinRead{ids: %{1 => 200}},
     start_ts: 235
   },
   uids: %{"blank-0" => "0x1ac", "blank-1" => "0x1ad", "blank-2" => "0x1ae"}
 }}

Since no uid is set in the json, Dgraph will create a blank node and name it _:blank-0. This will happen also for every predicate and make it very difficult to map the returned uid to my map. Especially when inserting a nested map that creates multiple nodes with edges.

So a solution is to set a uid that doesn't exist, using UUID.uuid4().

Will be converted into the RDFs:

_:blank-0 "diggy" .
_:blank-0 "pizza" .

The result of the mutation would also contain a map, which would have the uid assigned corresponding to the key blank-0. You could specify your own key like

{
"uid": "_:diggy",
"name": "diggy",
"food": "pizza"
}

In this case, the assigned uids map would have a key called diggy with the value being the uid assigned to it.

https://docs.dgraph.io/mutations/#setting-literal-values

So a user inserts their stuff via a map. Each level of the map, I assign a key, value pair

“uid”: “_:SOMEUUIDHERE”

When the mutation comes back, it returns SOMEUUIDHERE with the value of the uid inserted into dgraph. Now I can set the correct uid coming from Dgraph by replacing the tmp uid that I set in the original map with the uid coming back from Dgraph.

In order to achieve what you want you would have to write a helper that replaces every key in the struct with a prefixed key. And maybe create a new function set_struct/2.

Also: This should not be the default behaviour since it changes the db structure without being explicit about it. When sending a query I would have to query for the prefixed predicates:

"""
  {
    user(func: eg(name, "Bob"))
    {
      name
      username
      email
    }
  }
"""

So this should either be triggered by an option for the set_struct/2 function or - even better - in the configuration, with the default for use_strong_schema: false.

This also needs to be documented and added to the Readme.

But in general great idea!

@WolfDan
Copy link
Contributor Author

WolfDan commented Jun 4, 2018

Hey @ospaarmann

I must confess that I quite don't understand your point here ^^', the actual implementation takes the struct and add the prefixes to the mutation (the schema and queries to dgraph are done by hand in the actual library so no need to touch them from my point of view)

The only field that isn't modified is uid since is the unique field and shared for all the database isn't necessary to add a prefix (also dgraph won't allow that), it already takes the uids and applies the UUID.uuid4() thing, when the mutation is done and the result returned the replace_tmp_uids deletes the prefix and return a map (not an struct) with the result data

¿Why a Map? You might ask

structs are build in top of maps in elixir, since elixir is not strongly typed you can bind any values to the struct, so there is no point at all to bring the uids to the given struct and is up to the user if want to bind that map to the struct (is as simple to use builtin function struct/2)

But as I said I quite don't get your point, do you want that the function binds the returning uid into the struct itself and instead of using set_map create a function called set_struct? Or what do you mean?

Copy link
Owner

@ospaarmann ospaarmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WolfDan So I had a look again and could reproduce it. So I take back my statement

But your PR doesn't seem to achieve that.

😉. I still would ask you for a couple of changes:

  • Right now what happens is not transparent enough since you put the function to change the predicate name based on pattern matching (struct or no struct) inside a function that has nothing to do with that (set_tmp_uids/2). AND it is done inside the set_map/2 function, even though it matches for a struct. That is where my confusion came from. Please write a separate function set_struct/2 and a separate helper that only adds the prefixes for any key other than uid.
  • You could create a little pipeline in set_struct/2 and replace map_with_tmp_uids with prepared_map or so.
  • Please write a test to check if nested different structs work as well. When struct A contains struct B, the predicates coming from A should have the prefix a and the predicates coming from B should have the prefix b.
@struct_insert_mutation %MutationTest.Person{
    name: "Alice",
    identifier: "alice_json",
    dogs: [
      %MutationTest.Dog{
        name: "Bello"
      }
    ]
  }
  • Please document what is going on in @doc and in the comments.
  • Users should be able to turn this feature on and off via configuration. Please add a enforce_struct_schema option to the config with default == false. And document that as well

The reason for these change requests is that I want everything to be as transparent as possible. Setting a different predicate name in the db without telling the user or allowing them to turn that off would lead to a lot of confusion. And adding functionality to a function which has a different job will make it much harder for other developers to understand what is going on.

Besides that: Very good idea and can be very helpful! I will definitely use this feature in a project where I throw structs into Dgraph.

@WolfDan
Copy link
Contributor Author

WolfDan commented Jun 12, 2018

As requested @ospaarmann 0/

If I missed something feel free to tell me ^^

Copy link
Owner

@ospaarmann ospaarmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, thanks for the changes. Looks almost good. I still have two change requests:

  • Please add the default value for the new config in /lib/exdgraph/utils.ex
  • I would prefer to separate the setting of the tmp uids and the adding of the prefix in two separate functions that are named accordingly. This makes it easier for other developers to understand what each does.

Cheers,
Ole

@WolfDan
Copy link
Contributor Author

WolfDan commented Jun 12, 2018

Hey I added to utils.ex the default value,

I didn't understand very well your second request, I've done the changes, but if are wrong just tell me ^^/

@ospaarmann
Copy link
Owner

Hey,

about the second request. Almost there! But still the set_schema is called within tmp_ids_into_struct. This hides what is going on when reading the code quickly. What I mean is that you have a clear pipeline of functions inside set_struct. Something like:

prepared_struct = struct
|> tmp_ids_into_struct()
|> set_schema()

This way it is clear what is happening. I don't even think that you need a specific tmp_ids_into_struct function. The same one should work for maps and structs.

If you need two different functions for setting the temp uids the naming is not consistent (insert_tmp_uids and tmp_ids_into_struct). I would get rid of the two functions. If you really need different functions for structs and maps pattern match for struct or map and then do the schema setting work in a separate function. Chain them all together (see above) and you are good to go.

@ospaarmann
Copy link
Owner

Hey @WolfDan, one more thing. And I hope I don't go on your nerves by now! Please also make sure and test that a map inside a struct and a struct inside a map work as well. You could do that recursively by pattern matching for struct and map in set_schema and either return the changed struct or the unchanged map.

That means this should work as well:

struct_data = %MutationTest.Person{
  name: "Alice",
  identifier: "alice_json",
  dogs: [
    %MutationTest.Dog{
      name: "Bello"
    }
  ],
  some_map: %{
    some: "value",
    map_owner: %MutationTest.Person{
      name: "Bob"
    }
  }
}

By the way: I now encountered a situation in a project where I would like to insert structs and your PR will be very helpful with that.

@WolfDan
Copy link
Contributor Author

WolfDan commented Jun 14, 2018

Hey @ospaarmann

Well I did the test with the map as you requestted (I'm glad you find this PR usefull)

Now for the uids and the schema, I changed the function naming only, basically because you need to re map all the struct and check types again just to change the keys of the schema (and the map don't contains the sctruct info after beign added by the tmp ids), so is easier just to handle it from a single function

Anyway just tell me what do you think about it

@ospaarmann ospaarmann merged commit b43f8ea into ospaarmann:master Jun 15, 2018
@ospaarmann
Copy link
Owner

Now for the uids and the schema, I changed the function naming only, basically because you need to re map all the struct and check types again just to change the keys of the schema (and the map don't contains the sctruct info after beign added by the tmp ids), so is easier just to handle it from a single function

That makes sense. Looks good now. I merged the PR. Thanks for the work!
🥇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants