Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] How to populate nested "entities" that require self-join #252

Closed
itaranto opened this issue Jul 13, 2023 · 13 comments
Closed

[Question] How to populate nested "entities" that require self-join #252

itaranto opened this issue Jul 13, 2023 · 13 comments
Labels
bug Something isn't working

Comments

@itaranto
Copy link

itaranto commented Jul 13, 2023

I have the following models:

type Components struct {
	ID uuid.UUID `sql:"primary_key"`
	// ...
}

type Vulnerabilities struct {
	ID uuid.UUID `sql:"primary_key"`
	// ...
}

type ComponentVulnerabilities struct {
	ComponentID     uuid.UUID `sql:"primary_key"`
	VulnerabilityID uuid.UUID `sql:"primary_key"`
	// ...
}

type ComponentChildren struct {
	ComponentID uuid.UUID `sql:"primary_key"`
	ChildID     uuid.UUID `sql:"primary_key"`
	// ...
}

And this is the struct where I want the data to be mapped to:

type component struct {
	model.Components
	Vulnerabilities []model.Vulnerabilities
	Children        []component
}

Notice the Children field refers to itself : component instead of model.Components because I want children to have the same data as the parent, in this case, the Vulnerabilities.

If I didn't want to populate the children's sub-relations, I could have done something like this:

type component struct {
	model.Components
	Vulnerabilities []model.Vulnerabilities
	Children        []model.Components `alias:"children"`
}

And then make a query like this:

	tableChildren := table.Components.AS("children")

	stmt := postgres.SELECT(
		table.Components.AllColumns,
		table.Vulnerabilities.AllColumns,
		tableChildren.AllColumns,
	).FROM(
		table.Components.
			LEFT_JOIN(
				table.ComponentVulnerabilities,
				table.ComponentVulnerabilities.ComponentID.EQ(table.Components.ID),
			).
			LEFT_JOIN(
				table.Vulnerabilities,
				table.Vulnerabilities.ID.EQ(table.ComponentVulnerabilities.VulnerabilityID),
			).
			LEFT_JOIN(
				table.ComponentChildren,
				table.ComponentChildren.ComponentID.EQ(table.Components.ID),
			).
			LEFT_JOIN(
				tableChildren,
				tableChildren.ID.EQ(table.ComponentChildren.ChildID),
			),
	)

In order to get the children with its sub-entities, I know that I need to do a sub-query but I'm not sure if the QRM supports this.

I tried something like this:

	tmpTableChildren := postgres.SELECT(
		table.Components.AllColumns,
		table.Vulnerabilities.AllColumns,
	).FROM(
		table.Components.
			LEFT_JOIN(
				table.ComponentVulnerabilities,
				table.ComponentVulnerabilities.ComponentID.EQ(table.Components.ID),
			).
			LEFT_JOIN(
				table.Vulnerabilities,
				table.Vulnerabilities.ID.EQ(table.ComponentVulnerabilities.VulnerabilityID),
			),
	).AsTable("tmp_children")

	tmpTableChildrenID := table.Components.ID.From(tmpTable)

	stmt := postgres.SELECT(
		table.Components.AllColumns,
		table.Vulnerabilities.AllColumns,
		tmpTableChildren.AllColumns(),
	).FROM(
		table.Components.
			LEFT_JOIN(
				table.ComponentVulnerabilities,
				table.ComponentVulnerabilities.ComponentID.EQ(table.Components.ID),
			).
			LEFT_JOIN(
				table.Vulnerabilities,
				table.Vulnerabilities.ID.EQ(table.ComponentVulnerabilities.VulnerabilityID),
			).
			LEFT_JOIN(
				table.ComponentChildren,
				table.ComponentChildren.ComponentID.EQ(table.Components.ID),
			).
			LEFT_JOIN(
				tmpTableChildren,
				tmpTableChildrenID.EQ(table.ComponentChildren.ChildID),
			),
	)

But I can't make the mapper to fill the structs in the way I need it.

@itaranto itaranto changed the title [Question] How to populate ntested "entities" that require self-join [Question] How to populate nested "entities" that require self-join Jul 13, 2023
@itaranto
Copy link
Author

Anyway, as a side-note, I tried the second approach directly into a sample DB and the performance of that query is just terrible.

I may need to evaluate a different approach about how to populate this.

@houten11
Copy link

Yeah, you'll need to be careful with the number of joins. Although the number of joins is not limited with jet, the performance can suffer if the query returns a huge number of rows.
For instance, in your first query, you'll end up with a row count equal to vulnerabilities row count x children row count. If there are 10 vulnerabilities and 10 children you'll end up with 100 rows, which should be fine. But if you also want to get children's vulnerabilities, you might end up with a huge number of rows.

One possible way to do it is to have 2 queries. In the first query, you fill model.Components and Vulnerabilities and in the second Children.

type component struct {
	model.Components			//<-- first query
	Vulnerabilities []model.Vulnerabilities	//<-- first query
	Children        []component		//<-- second query
}

If stmt1 and stmt2 are two queries, you'll scan:

var dest component

err := stmt1.Query(db, &dest)

err := stmt2.Query(db, &dest.Children)

@itaranto
Copy link
Author

itaranto commented Jul 13, 2023

Yeah, you'll need to be careful with the number of joins. Although the number of joins is not limited with jet, the performance can suffer if the query returns a huge number of rows. For instance, in your first query, you'll end up with a row count equal to vulnerabilities row count x children row count. If there are 10 vulnerabilities and 10 children you'll end up with 100 rows, which should be fine. But if you also want to get children's vulnerabilities, you might end up with a huge number of rows.

One possible way to do it is to have 2 queries. In the first query, you fill model.Components and Vulnerabilities and in the second Children.

type component struct {
	model.Components			//<-- first query
	Vulnerabilities []model.Vulnerabilities	//<-- first query
	Children        []component		//<-- second query
}

If stmt1 and stmt2 are two queries, you'll scan:

var dest component

err := stmt1.Query(db, &dest)

err := stmt2.Query(db, &dest.Children)

Thanks for the suggestion, but I cannot do the query you proposed since I need to map the results to a slice of components (I should have made that clear from the start).

Like this:

	components := []*component{}
	if err := stmt.Query(db, &components); err != nil {
		return nil, err
	}

And, if I resolve it naively, it results in the N+1 problem.

@itaranto
Copy link
Author

itaranto commented Jul 13, 2023

Well, I came up with a query that returns the correct information, but I don't know how to map it using the QRM:

	stmt:= postgres.SELECT(
		table.Components.AllColumns,
		table.Vulnerabilities.AllColumns,
		tableChildren.AllColumns,
		tableChildrenVulnerabilities.AllColumns,
	).FROM(
		table.Components.
			LEFT_JOIN(
				table.ComponentVulnerabilities,
				table.ComponentVulnerabilities.ComponentID.EQ(table.Components.ID),
			).
			LEFT_JOIN(
				table.Vulnerabilities,
				table.Vulnerabilities.ID.EQ(table.ComponentVulnerabilities.VulnerabilityID),
			).
			LEFT_JOIN(
				table.ComponentChildren,
				table.ComponentChildren.ComponentID.EQ(table.Components.ID),
			).
			LEFT_JOIN(
				tableChildren,
				tableChildren.ID.EQ(table.ComponentChildren.ChildID),
			).
			LEFT_JOIN(
				tableChildrenComponentVulnerabilities,
				tableChildrenComponentVulnerabilities.ComponentID.EQ(tableChildren.ID),
			).
			LEFT_JOIN(
				tableChildrenVulnerabilities,
				tableChildrenVulnerabilities.ID.EQ(tableChildrenComponentVulnerabilities.VulnerabilityID),
			),
	)

In SQL (simplified version with just IDs):

SELECT components.id AS "components.id",
     vulnerabilities.id AS "vulnerabilities.id",
     children.id AS "children.id",
     children_vulnerabilities.id AS "children_vulnerabilities.id"
FROM components
     LEFT JOIN component_vulnerabilities ON (component_vulnerabilities.component_id = components.id)
     LEFT JOIN vulnerabilities ON (vulnerabilities.id = component_vulnerabilities.vulnerability_id)
     LEFT JOIN component_children ON (component_children.component_id = components.id)
     LEFT JOIN components AS children ON (children.id = component_children.child_id)
     LEFT JOIN component_vulnerabilities AS children_component_vulnerabilities ON (children_component_vulnerabilities.component_id = children.id)
     LEFT JOIN vulnerabilities AS children_vulnerabilities ON (children_vulnerabilities.id = children_component_vulnerabilities.vulnerability_id);

@houten11
Copy link

Aha, in that case your destination would need to have a separate type for children. For instance:

type component struct {
	model.Components
	Vulnerabilities []model.Vulnerabilities
	Children        []struct{
		model.Components `alias:"children"`
		Vulnerabilities  []model.Vulnerabilities `alias:"children_vulnerabilities"`
	}
}

You can't use the same component type for children because qrm needs to differentiate component and component children destination type. Also, you can't alias complex type, only model and custom model.
Still, beware of the number of rows this query can return, and maybe consider pagination or some other approach.

@itaranto
Copy link
Author

Aha, in that case your destination would need to have a separate type for children. For instance:

type component struct {
	model.Components
	Vulnerabilities []model.Vulnerabilities
	Children        []struct{
		model.Components `alias:"children"`
		Vulnerabilities  []model.Vulnerabilities `alias:"children_vulnerabilities"`
	}
}

You can't use the same component type for children because qrm needs to differentiate component and component children destination type. Also, you can't alias complex type, only model and custom model.

That worked! This library rocks!

Still, beware of the number of rows this query can return, and maybe consider pagination or some other approach.

Yes, I'm aware of this, thank you.

@itaranto
Copy link
Author

itaranto commented Jul 19, 2023

Sorry to re-open this...

Here's the thing: I thought it worked, but using the query I shared above, it only fills the first vulnerability for each child.

For what I understand, given that my query returns something like:

 components.id | vulnerabilities.id | children.id  | children_vulnerabilities.id
---------------+--------------------+--------------+-----------------------------
 component_00  |  vulnerability_00  | component_01 |       vulnerability_11
 component_00  |  vulnerability_00  | component_01 |       vulnerability_12
 component_00  |  vulnerability_01  | component_01 |       vulnerability_11
 component_00  |  vulnerability_01  | component_01 |       vulnerability_12
 component_00  |  vulnerability_02  | component_02 |       vulnerability_21
 component_00  |  vulnerability_03  | component_02 |       vulnerability_21
 ...

then, using the following structs:

type component struct {
	model.Components
	Vulnerabilities []model.Vulnerabilities
	Children        []child
}

type child struct {
	model.Components `alias:"children"`
	Vulnerabilities  []model.Vulnerabilities `alias:"children_vulnerabilities"`
}

the QRM should populate the structs to something like:

{
  "components": [
    {
      "id": "component_00",
      "vulnerabilities": [
        {
          "id": "vulnerability_00"
        },
        {
          "id": "vulnerability_01"
        },
        {
          "id": "vulnerability_02"
        },
        {
          "id": "vulnerability_03"
        }
      ],
      "children": [
        {
          "id": "component_01",
          "vulnerabilities": [
            {
              "id": "vulnerability_11"
            },
            {
              "id": "vulnerability_12"
            }
          ]
        },
        {
          "id": "component_02",
          "vulnerabilities": [
            {
              "id": "vulnerability_21"
            }
          ]
        }
      ]
    }
  ]
}

Am I doing something wrong?

Is there a more efficient way to debug the QRM?

@itaranto itaranto reopened this Jul 19, 2023
@go-jet
Copy link
Owner

go-jet commented Jul 19, 2023

Hi @itaranto ,
This is a bug. alias:"children_vulnerabilities" is not accounted when qrm group key is constructed. I've made a fix on bug252 branch. Give it a try.
On master branch, workaround can be made by wrapping model.Vulnerabilities into new struct:

type child struct {
    model.Components `alias:"children"`
    Vulnerabilities  []struct { 
        model.Vulnerabilities `alias:"children_vulnerabilities"`
    }
}

@go-jet go-jet added the bug Something isn't working label Jul 19, 2023
@go-jet go-jet added this to the Version 2.10.1 milestone Jul 19, 2023
@itaranto
Copy link
Author

itaranto commented Jul 19, 2023

Well, I'm glad this is a bug then...

I tried your branch and it solved my issue in regards to only populating the first vulnerability for each children.

Now, there's another similar issue this bug252 branch solved, but instead of reproducing it in my unit tests with a dummy database, I found this in a real one.

The problem was the following:
Given lots of children of a given component that have the same vulnerability, then the mapper would map all of those (duplicated) vulnerabilities into the first child and not the others.

My guess is these two issues are basically the same.

@go-jet
Copy link
Owner

go-jet commented Jul 20, 2023

Basically, a bug is if you have two model slices of the same type in your destination, and one slice is aliased then qrm would not populate destination correctly:

type struct Dest {
    ....
    TList []model.T 
    ....
         TList2 []model.T `alias:"t_alias"`
    ....
}
// if insetead TList2 []model.T `alias:"t_alias"` a new struct is used,
// like TList2 struct{ []model.T `alias:"t_alias"` } then it's not an issue

This usually requires, like in your case, self join and 2 N joins on the same level, which is not a common case, and probably a reason why it hasn't been detected so far.

@go-jet
Copy link
Owner

go-jet commented Jul 20, 2023

The problem was the following:
Given lots of children of a given component that have the same vulnerability, then the mapper would map all of those (duplicated) vulnerabilities into the first child and not the others.

Could you share your destination? It would be easier to confirm.

@itaranto
Copy link
Author

itaranto commented Jul 20, 2023

Could you share your destination? It would be easier to confirm.

The destination struct is the same as in my previous comment.

My point was, about two (seemingly) different issues that got solved by this fix:

  • The QRM was only filling the first vulnerability for each child (This was reproduced in my unit-tests with a dummy, but real, database).
  • Given lots of children of a given component that have the same vulnerability, then the mapper would map all of those (duplicated) vulnerabilities into the first child and not the others (I saw this in a real database). Sorry I don't have a more precise reproduction steps for now.

@go-jet
Copy link
Owner

go-jet commented Jul 24, 2023

Fixed with Release v2.10.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants