Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add usage logging #1903

Closed
amontanez24 opened this issue Apr 8, 2024 · 0 comments · Fixed by #1920
Closed

Add usage logging #1903

amontanez24 opened this issue Apr 8, 2024 · 0 comments · Fixed by #1920
Assignees
Labels
feature request Request for a new feature
Milestone

Comments

@amontanez24
Copy link
Contributor

Problem Description

It would be helpful for debugging and overall library management to have a better understanding of SDV usage. We want to log more information about how synthesizers are used.

Expected behavior

For each of the following events, create one INFO level log statement with what is required

Synthesizer Initialization

  • Timestamp
  • Synthesizer class name (GaussianCopula, CTGAN, HSA, etc.)
  • Synthesizer id

Fit

  • Timestamp
  • Synthesizer class name (GaussianCopula, CTGAN, HSA, etc.)
  • Statistics of the fit data:
    • Total # of tables
    • Total # of rows
    • Total # of columns
  • Synthesizer Id

Sample

  • Timestamp
  • Synthesizer class name (GaussianCopula, CTGAN, HSA, etc.)
  • Statistics of the sample size:
    • Total # of tables
    • Total # of rows
    • Total # of columns
  • Synthesizer Id

Synthesizer save and load

  • Timestamp
  • Synthesizer class name (GaussianCopula, CTGAN, HSA, etc.)
  • Synthesizer id

Metadata save

  • Timestamp
  • Metadata type (single or multi table)
  • Statistics about the metadata
    • Total # of tables
    • Total # of columns
    • Total # of relationships

Additional context

We should use a different logger than the loggers we currently use to record this information.

Proposal

  • Create a yaml file that defines the configuration for the logger that should be used for usage. Whenever we need the logger, we can load that yaml file and configure the logger from a dictionary.
  • Store this yaml in sdv/logging
  • Add a variable to the logger for how the logs should be stored (valid values will be None or 'local' for now). If set to None, we should not log this info.
    Example yaml
log_style: 'local'
version: 1
handlers:
  file:
    class: logging.FileHandler
    filename: sdv_logs.log
loggers:
  BasSynthesizer:
    level: INFO
    handlers: [file]
    propagate: no
@amontanez24 amontanez24 added the feature request Request for a new feature label Apr 8, 2024
@amontanez24 amontanez24 added this to the 1.12.2 milestone May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants