Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusion about domain_label_colname in tabletshift/core/features.py #8

Open
lisa-lthorrold opened this issue Jan 18, 2024 · 0 comments

Comments

@lisa-lthorrold
Copy link

lisa-lthorrold commented Jan 18, 2024

Describe the bug
There is some confusion about domain_label_colname in tabletshift/core/features.py. What is it's purpose, and how is it different from domain_split_varname?

Is there a reason it is not added in the self.get_passthrough_columns call? In get_passthrough_columns it seems this is an optional attribute, but it is only being called from one place.

In any case, without it being added, the columns in the datasets are transformed (one hot coded or binned), and the column names are adjusted accordingly. At the point this code is being run, domain_label_colname == domain_label_varname

If domain_label_colname is a categorical attribute (as it's the case for anes dataset) then the transformed data butchers it's column name, so by the time this code is called straight after:

if domain_label_colname:
           # Case: fit the domain label transformer and apply it.
           transformed.loc[:, domain_label_colname] = \
               self.fit_transform_domain_labels(
                   transformed.loc[:, domain_label_colname])

we have exception, as the column name no longer exists (4 new columns with an extended version of that name is present). In the diabetes readmission dataset, the column which is domain_label_column is an int, so it retrains its column name when this code is called, and no exception is thrown.

    # Fit the feature transformer and apply it.
        self.fit_feature_transformer(data, train_idxs, passthrough_columns)
        transformed = self.transform_features(data)

        transformed = self._post_transform(
            transformed, cast_dtypes=post_transform_cast_dtypes)

To Reproduce
Change the dataset to 'anes' in run_expt.py and run it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant