Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use own dataset #192

Closed
Arrmlet opened this issue Sep 14, 2020 · 5 comments
Closed

Use own dataset #192

Arrmlet opened this issue Sep 14, 2020 · 5 comments
Labels
question General question about the software

Comments

@Arrmlet
Copy link

Arrmlet commented Sep 14, 2020

  • SDV version: 0.4.1
  • Python version: 3.8
  • Operating System: Linux

Description

Hello everybody,
I am enjoy SDV package, I have tried to generate synthetic data using demo models.
Owing to this, I have tried to load my own dataset( csv file), but I cannot find a way how to do this, I am ensured that I could do this.
Can somebody help me with that, or share some documentation where I could find this.

Regards,
arrmlet

What I Did

I cannot imagine what to do.
@csala
Copy link
Contributor

csala commented Sep 15, 2020

Hello @Arrmlet

If you have your data stored in a CSV file you should be able to load it using pandas.

In most cases you would only need these two lines:

import pandas as pd

your_dataframe = pd.read_csv('path/to/your.csv')

But in some cases you might need to tweak the read_csv call with different arguments to fully adapt it to your data format.

Here you should be able to find all the details and tutorials about loading the data with pandas: https://pandas.pydata.org/docs/getting_started/intro_tutorials/02_read_write.html

@csala csala added the question General question about the software label Sep 15, 2020
@Arrmlet
Copy link
Author

Arrmlet commented Sep 15, 2020

Hello @csala,
Thanks for response, I have tried this code example:

from sdv import SDV
from sdv import Metadata
import pandas as pd


# Load csv file into pandas dataframe
df = pd.read_csv('filepath')

sdv = SDV()

# Generate a metadata object
metadata = Metadata()
metadata.add_table('test_df')
tables = {'test_df': df}
# Train the model
sdv.fit(metadata, tables)

But when I run this code I got an error:

MetadataError: Unexpected column in table test_df: column_name

I think I do not creating a metadata object in right way.

Thank you for your job,
arrmlet

@csala
Copy link
Contributor

csala commented Sep 15, 2020

# Generate a metadata object
metadata = Metadata()
metadata.add_table('test_df')

The problem is that you are only passing the name of the table to the metadata instance, so it does not have the chance to learn about which columns it should expect your dataframe to have. To make it work you will have to pass both the name and the data:

metadata.add_table(name='test_df', data=df)

If you change this line, the rest should work as expected.

In any case, if you are using only one table, you might want to can consider using the Tabular models instead of the SDV class, as these do not require you to create a metadata beforehand and have additional options. You can read more about them here: https://sdv.dev/SDV/user_guides/single_table/gaussian_copula.html

@Arrmlet
Copy link
Author

Arrmlet commented Sep 15, 2020

@csala Thanks a lot:)

@csala
Copy link
Contributor

csala commented Oct 13, 2020

Closing this, as the question is already answered.

@csala csala closed this as completed Oct 13, 2020
JonathanDZiegler pushed a commit to JonathanDZiegler/SDV that referenced this issue Feb 7, 2022
* Add working addons

* Add eradicate

* Add dlint

* Decrease complexity (sdv-dev#184)

* Add addon (sdv-dev#186)

* Add `pytest-style` (sdv-dev#192)

* Add addon

* Fix randomized error message

* Add addon (sdv-dev#188)

* Add addon (#191)

* Add `pandas-vet` (sdv-dev#190)

* Add addon

* noqa torch.stack

* remove double quotes (sdv-dev#187)

* Add addon (sdv-dev#185)

* Add `flake8-docstrings` (sdv-dev#193)

* Add addon

* Fix D100

* Add more docstrings

* Fix docstrings

* Update docstrings

* Fix lint

* Add `flake8-builtins` (sdv-dev#189)

* Add addon

* Add variables-names

* Fix bug

* Fix mistakes

* Add `flake8-multiline-containers` (sdv-dev#183)

* Add addon

* Add addon

* Address feedback

* Fix lint

* Fix bugs

* Remove pydoclint

* Ignore D101 errors

* Update ignores
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question General question about the software
Projects
None yet
Development

No branches or pull requests

2 participants