Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for multiple foreign keys in one table. #185

Closed
JagdishKolhe opened this issue Sep 3, 2020 · 7 comments
Closed

Support for multiple foreign keys in one table. #185

JagdishKolhe opened this issue Sep 3, 2020 · 7 comments
Assignees
Labels
feature request Request for a new feature
Milestone

Comments

@JagdishKolhe
Copy link

  • SDV version: 0.4.0
  • Python version: 3.6.9
  • Operating System: CentOS

Description

This is new feature. many of the real databases has such scenarios.

@csala
Copy link
Contributor

csala commented Sep 14, 2020

Hi @JagdishKolhe would you mind providing a few more details about what you mean in this case?

Do you mean supporting something like having multiple relationships existing between two tables?

For example, one such scenario might be:

  • You have an employees table that has an employee_id as the primary key.
  • You have a tasks table with, among other things:
    • assignee_id: The employee to whom the task is assigned - defines a first Foreign Key to employees
    • supervisor_id: The employee who will supervise the task - defines a second Foreign Key to employees

Is this what you mean?

@kvrameshreddy
Copy link

Hi @csala,
Does SDV work for the above scenario ?

@JagdishKolhe
Copy link
Author

JagdishKolhe commented Oct 20, 2020

@csala, Thanks for attention .

Above can be one scenario, I have not yet thought of much on that.
But we have similar scenario as follows.

user table with primary key user_id
product table with primary key product_id
order table with primary key order_id and TWO foreign keys (user.user_id and product.product_id)

@csala
Copy link
Contributor

csala commented Oct 20, 2020

Hi @JagdishKolhe what you are describing seems to be a multi-parent scenario, which was added in #162

You can test it using the got_families demo, which has this structure:

image

>>> from sdv import SDV
>>> from sdv.demo import load_demo
>>> 
>>> metadata, tables = load_demo('got_families', metadata=True)
>>> 
>>> sdv = SDV()
>>> sdv.fit(metadata, tables)
>>> sdv.sample()
{'characters':    character_id      name  age
0             0      Arya   14
1             1      Arya   22
2             2      Robb   25
3             3      Robb   21
4             4  Daenerys   16
5             5      Bran   17
6             6     Sansa   21, 'character_families':    character_id  family_id    type  generation
0             0          4    both           4
1             0          5  mother           7
2             1          5  mother           7
3             2          5  mother           7
4             0          6    both           6
5             1          6    both           6
6             0          7    both          12
7             1          7    both          11, 'families':    family_id       name
0          4  Lannister
1          5      Tully
2          6  Lannister
3          7  Lannister}

You can also see an example of how to define a schema like this in issue #193

@Wim65
Copy link

Wim65 commented Oct 21, 2020

Hi , would it preserve column to column correlation between columns in the parents tables ?

(This is Wim Blommaert )

@abhisheknagar1983
Copy link

@csala : It clearly evident in your example (got_families) that character_id = 0 is being generated with all the combinations of family_id (4,5,6,7) and same is with the case of character_id = 1 which is also being generated with family_id (5,6,7).

Now the question arias how the samples are being generated?

  1. Based on cardinality ? (1:1 or 1:N)
  2. In your example there is no rows generated for character_id ;= (3,4,5,6), so it seems that system is generating data for only few of the parent table (not for all).

Note: We try the multi parent scenario based on our data model and our results alos looks similar.

Kindly let us know how sampling works in multiparent scenario and does the systems considers the correlation while between columns?

@csala
Copy link
Contributor

csala commented Jan 21, 2021

Multi-foreign-key scenarios are supported after #298 so this can be closed.

@csala csala closed this as completed Jan 21, 2021
@csala csala added the feature request Request for a new feature label Jan 21, 2021
@csala csala added this to the 0.6.2 milestone Jan 21, 2021
@csala csala self-assigned this Jan 21, 2021
JonathanDZiegler pushed a commit to JonathanDZiegler/SDV that referenced this issue Feb 7, 2022
* Add working addons

* Add eradicate

* Add dlint

* Decrease complexity (sdv-dev#184)

* Add addon (sdv-dev#186)

* Add `pytest-style` (sdv-dev#192)

* Add addon

* Fix randomized error message

* Add addon (sdv-dev#188)

* Add addon (#191)

* Add `pandas-vet` (sdv-dev#190)

* Add addon

* noqa torch.stack

* remove double quotes (sdv-dev#187)

* Add addon (sdv-dev#185)

* Add `flake8-docstrings` (sdv-dev#193)

* Add addon

* Fix D100

* Add more docstrings

* Fix docstrings

* Update docstrings

* Fix lint

* Add `flake8-builtins` (sdv-dev#189)

* Add addon

* Add variables-names

* Fix bug

* Fix mistakes

* Add `flake8-multiline-containers` (sdv-dev#183)

* Add addon

* Add addon

* Address feedback

* Fix lint

* Fix bugs

* Remove pydoclint

* Ignore D101 errors

* Update ignores
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
Development

No branches or pull requests

5 participants