Skip to content

Add verbosity to metadata auto-detection #2871

@npatki

Description

@npatki

Problem Description

When auto-detecting metadata, the algorithm makes many nuanced decisions about the columns, sdtypes, and other properties. Right now, it is not possible to easily view properties such as datetime or regex formats. The visualize function doesn't display this information; the only other alternative is to print out the JSON object, which is hard to parse.

Expected behavior

For all the metadata detection functions (multi-table detect_from_dataframes and single-table detect_from_dataframe), add a parameter called verbose:

  • (default) False: Do not print out any information about what is detected
  • True: Print out information about what is detected

If set to True, the metadata auto-detection should print out what it's doing. For example:

metadata = Metadata.detect_from_dataframes(data, verbose=True)
Detecting table 'hotels':
- Column 'hotel_id': sdtype='id', regex_format='HID_[0-9]{3}'
- Column 'city': sdtype='categorical'
- Column 'rating': sdtype='numerical', computer_representation='Float'

Detecting table 'guests':
- ...

Detecting primary keys:
- Table 'hotels': primary_key='hotel_id'
- Table 'guests': primary_key=None

Detecting foreign keys:
- Column 'guests.hotel_id' refers to column 'hotels.hotel_id' (updating sdtype to 'id')
- ...

Additional context

  • Depending on the parameters, not all sections may be present
  • Some of the sdtypes may change based on the foreign key detection (as the foreign key and primary key's sdtype should match)

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions