Use own dataset #192

Arrmlet · 2020-09-14T20:28:49Z

SDV version: 0.4.1
Python version: 3.8
Operating System: Linux

Description

Hello everybody,
I am enjoy SDV package, I have tried to generate synthetic data using demo models.
Owing to this, I have tried to load my own dataset( csv file), but I cannot find a way how to do this, I am ensured that I could do this.
Can somebody help me with that, or share some documentation where I could find this.

Regards,
arrmlet

What I Did

I cannot imagine what to do.

The text was updated successfully, but these errors were encountered:

csala · 2020-09-15T11:16:10Z

Hello @Arrmlet

If you have your data stored in a CSV file you should be able to load it using pandas.

In most cases you would only need these two lines:

import pandas as pd

your_dataframe = pd.read_csv('path/to/your.csv')

But in some cases you might need to tweak the read_csv call with different arguments to fully adapt it to your data format.

Here you should be able to find all the details and tutorials about loading the data with pandas: https://pandas.pydata.org/docs/getting_started/intro_tutorials/02_read_write.html

Arrmlet · 2020-09-15T12:05:20Z

Hello @csala,
Thanks for response, I have tried this code example:

from sdv import SDV
from sdv import Metadata
import pandas as pd


# Load csv file into pandas dataframe
df = pd.read_csv('filepath')

sdv = SDV()

# Generate a metadata object
metadata = Metadata()
metadata.add_table('test_df')
tables = {'test_df': df}
# Train the model
sdv.fit(metadata, tables)

But when I run this code I got an error:

MetadataError: Unexpected column in table test_df: column_name

I think I do not creating a metadata object in right way.

Thank you for your job,
arrmlet

csala · 2020-09-15T12:12:18Z

# Generate a metadata object
metadata = Metadata()
metadata.add_table('test_df')

The problem is that you are only passing the name of the table to the metadata instance, so it does not have the chance to learn about which columns it should expect your dataframe to have. To make it work you will have to pass both the name and the data:

metadata.add_table(name='test_df', data=df)

If you change this line, the rest should work as expected.

In any case, if you are using only one table, you might want to can consider using the Tabular models instead of the SDV class, as these do not require you to create a metadata beforehand and have additional options. You can read more about them here: https://sdv.dev/SDV/user_guides/single_table/gaussian_copula.html

Arrmlet · 2020-09-15T12:24:33Z

@csala Thanks a lot:)

csala · 2020-10-13T06:52:49Z

Closing this, as the question is already answered.

* Add working addons * Add eradicate * Add dlint * Decrease complexity (sdv-dev#184) * Add addon (sdv-dev#186) * Add `pytest-style` (sdv-dev#192) * Add addon * Fix randomized error message * Add addon (sdv-dev#188) * Add addon (#191) * Add `pandas-vet` (sdv-dev#190) * Add addon * noqa torch.stack * remove double quotes (sdv-dev#187) * Add addon (sdv-dev#185) * Add `flake8-docstrings` (sdv-dev#193) * Add addon * Fix D100 * Add more docstrings * Fix docstrings * Update docstrings * Fix lint * Add `flake8-builtins` (sdv-dev#189) * Add addon * Add variables-names * Fix bug * Fix mistakes * Add `flake8-multiline-containers` (sdv-dev#183) * Add addon * Add addon * Address feedback * Fix lint * Fix bugs * Remove pydoclint * Ignore D101 errors * Update ignores

csala added the question General question about the software label Sep 15, 2020

csala closed this as completed Oct 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use own dataset #192

Use own dataset #192

Arrmlet commented Sep 14, 2020

csala commented Sep 15, 2020

Arrmlet commented Sep 15, 2020

csala commented Sep 15, 2020

Arrmlet commented Sep 15, 2020

csala commented Oct 13, 2020

Use own dataset #192

Use own dataset #192

Comments

Arrmlet commented Sep 14, 2020

Description

What I Did

csala commented Sep 15, 2020

Arrmlet commented Sep 15, 2020

csala commented Sep 15, 2020

Arrmlet commented Sep 15, 2020

csala commented Oct 13, 2020