Merged
Conversation
added 18 commits
June 22, 2020 11:55
When generating a training dataset, we needs all the values in the feature column. We used to only generate the value at lookup time; however, it's easier to eagerly generate the column to be able to use it natively in pandas.
Table indices are casted to be strings. This is a reasonable default for our current use cases, except when there is no index column. In this case, the row number should be used as the index (And not casted as a string).
The Column should have the same name as the feature.
This avoids manual importing of every error in errors.py.
This clarifies that no column is being set to the primary key.
entity_values is an array of hte value of each entity not the value of each feature.
Training sets can be configured by providing labels, features, and entity mappings. The entities mapping is used to get the actual value of each feature per label.
Adds two tests, one to check entity mapping across two CSVs and one where the features are in the same file as the labels.
It's cleaner to not have any characters in the version.
This allows python3 setup.py sdist to behave properly
Thist dist/ directory is generated when pushing a new version to pypi.
It was missing a comma.
Name should only be changed via the rename method. This removes any confusion about this.
This simplifies the API and expectations. Currently, renaming only happen when a column is transformed.
This adds a series of simple tests for a column containing all integers between 1-100 (inclusive).
8b46a59 to
582f1b2
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When generating a training dataset, we needs all the values in the
feature column. We used to only generate the value at lookup time;
however, it's easier to eagerly generate the column to be able to use
it natively in pandas.