New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
documention checkpoint #129
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
montanalow
added a commit
that referenced
this pull request
Dec 7, 2018
* bulk insert support for snowflake (#122) * bulk insert support for snowflake * always use slices * cleanup shadowing slice * Fix issue where copying between different file systems would break data retrieval (#125) `os.rename` only works if the source and destination path are on the same file system Copying using `shutil.copy`, and subsequentially manually removing the source file fixes the issue. Traceback: ``` 11:21:36.644 ERROR root:293 => Exception: Traceback (most recent call last): File "/home/thomas/.pyenv/versions/3.6.4/envs/my_app/bin/lore", line 11, in <module> sys.exit(main()) File "/home/thomas/.pyenv/versions/3.6.4/envs/my_app/lib/python3.6/site-packages/lore/__main__.py", line 331, in main known.func(known, unknown) File "/home/thomas/.pyenv/versions/3.6.4/envs/my_app/lib/python3.6/site-packages/lore/__main__.py", line 483, in fit model.fit(score=parsed.score, test=parsed.test, **fit_args) File "/home/thomas/.pyenv/versions/3.6.4/envs/my_app/lib/python3.6/site-packages/lore/models/base.py", line 49, in fit x=self.pipeline.encoded_training_data.x, File "/home/thomas/.pyenv/versions/3.6.4/envs/my_app/lib/python3.6/site-packages/lore/pipelines/holdout.py", line 132, in encoded_training_data self._encoded_training_data = self.observations(self.training_data) File "/home/thomas/.pyenv/versions/3.6.4/envs/my_app/lib/python3.6/site-packages/lore/pipelines/holdout.py", line 110, in training_data self._split_data() File "/home/thomas/.pyenv/versions/3.6.4/envs/my_app/lib/python3.6/site-packages/lore/util.py", line 210, in wrapper return func(*args, **kwargs) File "/home/thomas/.pyenv/versions/3.6.4/envs/my_app/lib/python3.6/site-packages/lore/pipelines/holdout.py", line 234, in _split_data self._data = self.get_data() File "/home/thomas/code/my_app/my_app/pipelines/product_popularity.py", line 20, in get_data lore.io.download(url, cache=True, extract=True) File "/home/thomas/.pyenv/versions/3.6.4/envs/my_app/lib/python3.6/site-packages/lore/io/__init__.py", line 124, in download os.rename(temp_path, local_path) OSError: [Errno 18] Invalid cross-device link: '/tmp/tmpwl6lvhon' -> '/home/thomas/code/my_app/data/instacart_online_grocery_shopping_2017_05_01.tar.gz' ``` * documention checkpoint (#129) * Create Naive Estimator (#127) * Create Naive estimator A naive estimator will just predict the mean of the response variable. It is useful for benchmarking models * Create simple base class for Naive model * Add predict_proba method for xgboost * Add predict_proba method to base class * Add unit tests for naive model * Test for XGBoost predict_proba * Return probabilities for both classes ala sklearn/xgboost * Add documentation for naive estimator * Generalize documentation for multi-class classification * Use numpy.full instead of numpy.ones * Add basic documentation and `predict_proba` to SKLearn Estimator (#130) * Add basic documentation for sklearn estimators * Expose predict_proba method for sklearn BinaryClassifier * Add documentation for predict_proba. This should probably be done in a DRY fashion. But doing it this way for now * Improve OneHot encoder (#131) * Fix names for OneHot encoded columns * Add option to drop first level This is useful for algorithms like linear regression which do not like singular matrices * Test for drop_first * Add percent_occurrences to OneHot * Add documentation for OneHot * Version bump * [Lore] Add exception handling for unauthenticated snowflake connections (#132) * [Lore] Add exception handling for unauthenticated snowflake connections * [Lore] Added more strict error handling for expired snowflake connection renewal * [Lore] Added test cased for unauthenticated snowflake connection error * [Lore] Disable failing tests * Fix tasks invocation (#133) * python2 compatibility for tests * Helper for creating prediction dataframe * Helper for logging predictions * Store latest predictions on every predict * Util function for converting df columns to json * Create Mock model for unit test * Create test for prediction logging * Integrate relevant changes from montana/model_store * Add metadata DB * Add class method to get_or_create instance * Change schema for metadata * Instrument model base class for metadata logging * Update fitting schema to include model uploads * Ignore commit data for now * Add memoized property to utils * Add basic unit test for fit metadata * Change metadata schema 1. Remove fitting and snapshot status 2. Change fitting_name to fitting_num * Add additional imports * Modify model fit and save for metadata logging * Save best estimator as fitting with hyper_parameter_search * Fix paths so model upload works * Refactor uploading/downloading code * Modify last_fitting to get correct fitting name * Modify predictions metadata * Log predictions * Modify unit tests for logging * Add additional columns to metadata * Change prediction logging to use custom_data column * Save model URL on upload * Raise error if no fittings found for model in metadata * Add test for prediction logging * Fix last_fitting function * Fix bug where fitting name was not being set properly on downlaod * Define model_name as property * Temp commit * hunt down sql(alchemy+ite) bug * - use env aware default metadata database - use workaround to re-enable watermarking w/ sqlite - cleanup test outoput * use in memory database * test batch mode in CI * go go postgres in CI * prevent database schema caching * prediction log testing is in metadata tests. * Add get() to return classes by key. * Support loading legacy models in lore. * bump version * add ganesh
ganesh-krishnan
added a commit
that referenced
this pull request
Dec 7, 2018
* Helper for creating prediction dataframe * Helper for logging predictions * Store latest predictions on every predict * Util function for converting df columns to json * Create Mock model for unit test * Create test for prediction logging * Integrate relevant changes from montana/model_store * Add metadata DB * Add class method to get_or_create instance * Change schema for metadata * Instrument model base class for metadata logging * Update fitting schema to include model uploads * Ignore commit data for now * Add memoized property to utils * Add basic unit test for fit metadata * Change metadata schema 1. Remove fitting and snapshot status 2. Change fitting_name to fitting_num * Add additional imports * Modify model fit and save for metadata logging * Save best estimator as fitting with hyper_parameter_search * Fix paths so model upload works * Refactor uploading/downloading code * Modify last_fitting to get correct fitting name * Modify predictions metadata * Log predictions * Modify unit tests for logging * Add additional columns to metadata * Change prediction logging to use custom_data column * Save model URL on upload * Raise error if no fittings found for model in metadata * Add test for prediction logging * Fix last_fitting function * Fix bug where fitting name was not being set properly on downlaod * Define model_name as property * Temp commit * hunt down sql(alchemy+ite) bug (#134) * bulk insert support for snowflake (#122) * bulk insert support for snowflake * always use slices * cleanup shadowing slice * Fix issue where copying between different file systems would break data retrieval (#125) `os.rename` only works if the source and destination path are on the same file system Copying using `shutil.copy`, and subsequentially manually removing the source file fixes the issue. Traceback: ``` 11:21:36.644 ERROR root:293 => Exception: Traceback (most recent call last): File "/home/thomas/.pyenv/versions/3.6.4/envs/my_app/bin/lore", line 11, in <module> sys.exit(main()) File "/home/thomas/.pyenv/versions/3.6.4/envs/my_app/lib/python3.6/site-packages/lore/__main__.py", line 331, in main known.func(known, unknown) File "/home/thomas/.pyenv/versions/3.6.4/envs/my_app/lib/python3.6/site-packages/lore/__main__.py", line 483, in fit model.fit(score=parsed.score, test=parsed.test, **fit_args) File "/home/thomas/.pyenv/versions/3.6.4/envs/my_app/lib/python3.6/site-packages/lore/models/base.py", line 49, in fit x=self.pipeline.encoded_training_data.x, File "/home/thomas/.pyenv/versions/3.6.4/envs/my_app/lib/python3.6/site-packages/lore/pipelines/holdout.py", line 132, in encoded_training_data self._encoded_training_data = self.observations(self.training_data) File "/home/thomas/.pyenv/versions/3.6.4/envs/my_app/lib/python3.6/site-packages/lore/pipelines/holdout.py", line 110, in training_data self._split_data() File "/home/thomas/.pyenv/versions/3.6.4/envs/my_app/lib/python3.6/site-packages/lore/util.py", line 210, in wrapper return func(*args, **kwargs) File "/home/thomas/.pyenv/versions/3.6.4/envs/my_app/lib/python3.6/site-packages/lore/pipelines/holdout.py", line 234, in _split_data self._data = self.get_data() File "/home/thomas/code/my_app/my_app/pipelines/product_popularity.py", line 20, in get_data lore.io.download(url, cache=True, extract=True) File "/home/thomas/.pyenv/versions/3.6.4/envs/my_app/lib/python3.6/site-packages/lore/io/__init__.py", line 124, in download os.rename(temp_path, local_path) OSError: [Errno 18] Invalid cross-device link: '/tmp/tmpwl6lvhon' -> '/home/thomas/code/my_app/data/instacart_online_grocery_shopping_2017_05_01.tar.gz' ``` * documention checkpoint (#129) * Create Naive Estimator (#127) * Create Naive estimator A naive estimator will just predict the mean of the response variable. It is useful for benchmarking models * Create simple base class for Naive model * Add predict_proba method for xgboost * Add predict_proba method to base class * Add unit tests for naive model * Test for XGBoost predict_proba * Return probabilities for both classes ala sklearn/xgboost * Add documentation for naive estimator * Generalize documentation for multi-class classification * Use numpy.full instead of numpy.ones * Add basic documentation and `predict_proba` to SKLearn Estimator (#130) * Add basic documentation for sklearn estimators * Expose predict_proba method for sklearn BinaryClassifier * Add documentation for predict_proba. This should probably be done in a DRY fashion. But doing it this way for now * Improve OneHot encoder (#131) * Fix names for OneHot encoded columns * Add option to drop first level This is useful for algorithms like linear regression which do not like singular matrices * Test for drop_first * Add percent_occurrences to OneHot * Add documentation for OneHot * Version bump * [Lore] Add exception handling for unauthenticated snowflake connections (#132) * [Lore] Add exception handling for unauthenticated snowflake connections * [Lore] Added more strict error handling for expired snowflake connection renewal * [Lore] Added test cased for unauthenticated snowflake connection error * [Lore] Disable failing tests * Fix tasks invocation (#133) * python2 compatibility for tests * Helper for creating prediction dataframe * Helper for logging predictions * Store latest predictions on every predict * Util function for converting df columns to json * Create Mock model for unit test * Create test for prediction logging * Integrate relevant changes from montana/model_store * Add metadata DB * Add class method to get_or_create instance * Change schema for metadata * Instrument model base class for metadata logging * Update fitting schema to include model uploads * Ignore commit data for now * Add memoized property to utils * Add basic unit test for fit metadata * Change metadata schema 1. Remove fitting and snapshot status 2. Change fitting_name to fitting_num * Add additional imports * Modify model fit and save for metadata logging * Save best estimator as fitting with hyper_parameter_search * Fix paths so model upload works * Refactor uploading/downloading code * Modify last_fitting to get correct fitting name * Modify predictions metadata * Log predictions * Modify unit tests for logging * Add additional columns to metadata * Change prediction logging to use custom_data column * Save model URL on upload * Raise error if no fittings found for model in metadata * Add test for prediction logging * Fix last_fitting function * Fix bug where fitting name was not being set properly on downlaod * Define model_name as property * Temp commit * hunt down sql(alchemy+ite) bug * - use env aware default metadata database - use workaround to re-enable watermarking w/ sqlite - cleanup test outoput * use in memory database * test batch mode in CI * go go postgres in CI * prevent database schema caching * prediction log testing is in metadata tests. * Add get() to return classes by key. * Support loading legacy models in lore. * bump version * add ganesh * Lets make this a Release Candidate before launching broadly. * merge master
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.