Add summary function and command #92

amontanez24 · 2021-06-03T22:00:56Z

No description provided.

* sdgym-gretel: adding gretel synthesizer * pr comments and changes discussed in OH * getting rid of error messages * moving static method out * Cureate dependencies to avoid conflicts Co-authored-by: Carles Sala <carles@pythiac.com>

csala

I added a few suggestions to improve the current implementation and interface

csala · 2021-06-04T14:42:59Z

sdgym/summary.py

@@ -150,3 +153,54 @@ def errors_summary(data):
        all_errors[synthesizer] = errors.fillna(0).astype(int)

    return all_errors
+
+
+def make_summary_spreadsheet(file_path):


I would make a few changes to this function signature:

I would rename file_path to raw_csv_path (or results_csv_path)

I would get the output path as an input argument that would default to None. If an output path is given, use it, otherwise write the output as re.sub('.csv$', '.xlsx', raw_csv_path)

I would make the baseline a module-level dictionary, with data modalities as keys and baseline lists as values and also add a baselines argument which would default to None (meaning use the default one we just defined).

At the implementation level, I would make a sub-function called _make_summary which would receive:

A name (which will be the data modality)

The data already filtered by data modality

The baselines list (list, not dict!)

The writer

The overall process would be something similar to this:

def _add_summary(data, modality, baselines, writer): # Compute the summaries on the data and add the corresponding # sheets to the writer, using `{Sheet name} ({modality})` as names def make_summary_spreadsheet(raw_csv_path, output_path=None, baselines=None): # Preprocess the data # Create and configure the writer for modality, subset in data.groupby('modality'): modality_baselines = baselines[modality] _add_summary(subset, modality, modality_baselines, writer) writer.save()

@amontanez24 we should also add support to read from and write to S3 buckets, following an approach similar to the collect command.
This means we need to add aws_key and aws_secret arguments, and that we need to use the s3 module functions to read and write files (we may need to change something there to write the xlsx file?)

csala · 2021-06-04T15:12:04Z

sdgym/summary.py

@@ -3,6 +3,8 @@
 import numpy as np
 import pandas as pd

+from sdgym.results import add_sheet


If the results.py module is not being used anywhere else (which I think is not), I would copy whatever is necessary here and remove the module.

csala

Comment about how to decide whether to write to S3 or not, and how to do it.

csala · 2021-06-07T16:17:44Z

sdgym/summary.py

+    baselines = baselines or MODALITY_BASELINES
+    output_path = output_path or re.sub('.csv$', '.xlsx', results_csv_path)
+    output = io.BytesIO()
+    writer = pd.ExcelWriter(output) if aws_key and aws_secret else pd.ExcelWriter(output_path)


aws_key and aws_secret cannot be used to decide whether we are writing in S3 or not, since an S3 path can be passed without keys to let boto3 use the system-wide credentials.

To keep things simple, I would just not distinguish between a local or an S3 path here and always write to the BytesIO and pass it down to write_file, which will decide whether to write to a local or remote file based on the given path.

I see, I didn't know that. This is nice though since it will be cleaner in the end

csala

Good to go! Thanks @amontanez24 !

amontanez24 and others added 3 commits May 26, 2021 12:41

sdgym-gretel (#1)

9974517

* sdgym-gretel: adding gretel synthesizer * pr comments and changes discussed in OH * getting rid of error messages * moving static method out * Cureate dependencies to avoid conflicts Co-authored-by: Carles Sala <carles@pythiac.com>

Merge branch 'master' of https://github.com/sdv-dev/SDGym

dbd524a

Merge branch 'master' of https://github.com/sdv-dev/SDGym

cc2d126

amontanez24 marked this pull request as draft June 3, 2021 22:03

amontanez24 requested a review from csala June 3, 2021 22:03

csala suggested changes Jun 4, 2021

View reviewed changes

amontanez24 marked this pull request as ready for review June 4, 2021 21:36

amontanez24 requested a review from csala June 4, 2021 21:36

csala suggested changes Jun 7, 2021

View reviewed changes

amontanez24 and others added 8 commits June 7, 2021 12:20

Merge branch 'master' of https://github.com/sdv-dev/SDGym

c08bc41

Minor improvements to the summary functions

79f1a5f

Add summary function and command

8f4f9b2

adding cli commands and finishing up script

6040019

adding tests

e80c7da

refactoring and changing method signature

6e058a7

Adding s3 parameters and functionality

ff4862e

pr comments and removing unused code

07a6988

amontanez24 force-pushed the sdgym-summary branch from 2f73a59 to 07a6988 Compare June 7, 2021 17:22

amontanez24 changed the base branch from summary-improvements to master June 7, 2021 17:22

csala approved these changes Jun 7, 2021

View reviewed changes

csala merged commit 7c7b7a9 into master Jun 7, 2021

csala deleted the sdgym-summary branch June 7, 2021 18:45

katxiao added this to the 0.4.0 milestone Jun 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add summary function and command #92

Add summary function and command #92

amontanez24 commented Jun 3, 2021

csala left a comment

csala Jun 4, 2021

csala Jun 4, 2021

csala Jun 4, 2021

csala left a comment

csala Jun 7, 2021

amontanez24 Jun 7, 2021

csala left a comment

Add summary function and command #92

Add summary function and command #92

Conversation

amontanez24 commented Jun 3, 2021

csala left a comment

Choose a reason for hiding this comment

csala Jun 4, 2021

Choose a reason for hiding this comment

csala Jun 4, 2021

Choose a reason for hiding this comment

csala Jun 4, 2021

Choose a reason for hiding this comment

csala left a comment

Choose a reason for hiding this comment

csala Jun 7, 2021

Choose a reason for hiding this comment

amontanez24 Jun 7, 2021

Choose a reason for hiding this comment

csala left a comment

Choose a reason for hiding this comment