documention checkpoint #129

montanalow · 2018-10-30T23:29:18Z

No description provided.

* bulk insert support for snowflake (#122) * bulk insert support for snowflake * always use slices * cleanup shadowing slice * Fix issue where copying between different file systems would break data retrieval (#125) `os.rename` only works if the source and destination path are on the same file system Copying using `shutil.copy`, and subsequentially manually removing the source file fixes the issue. Traceback: ``` 11:21:36.644 ERROR root:293 => Exception: Traceback (most recent call last): File "/home/thomas/.pyenv/versions/3.6.4/envs/my_app/bin/lore", line 11, in <module> sys.exit(main()) File "/home/thomas/.pyenv/versions/3.6.4/envs/my_app/lib/python3.6/site-packages/lore/__main__.py", line 331, in main known.func(known, unknown) File "/home/thomas/.pyenv/versions/3.6.4/envs/my_app/lib/python3.6/site-packages/lore/__main__.py", line 483, in fit model.fit(score=parsed.score, test=parsed.test, **fit_args) File "/home/thomas/.pyenv/versions/3.6.4/envs/my_app/lib/python3.6/site-packages/lore/models/base.py", line 49, in fit x=self.pipeline.encoded_training_data.x, File "/home/thomas/.pyenv/versions/3.6.4/envs/my_app/lib/python3.6/site-packages/lore/pipelines/holdout.py", line 132, in encoded_training_data self._encoded_training_data = self.observations(self.training_data) File "/home/thomas/.pyenv/versions/3.6.4/envs/my_app/lib/python3.6/site-packages/lore/pipelines/holdout.py", line 110, in training_data self._split_data() File "/home/thomas/.pyenv/versions/3.6.4/envs/my_app/lib/python3.6/site-packages/lore/util.py", line 210, in wrapper return func(*args, **kwargs) File "/home/thomas/.pyenv/versions/3.6.4/envs/my_app/lib/python3.6/site-packages/lore/pipelines/holdout.py", line 234, in _split_data self._data = self.get_data() File "/home/thomas/code/my_app/my_app/pipelines/product_popularity.py", line 20, in get_data lore.io.download(url, cache=True, extract=True) File "/home/thomas/.pyenv/versions/3.6.4/envs/my_app/lib/python3.6/site-packages/lore/io/__init__.py", line 124, in download os.rename(temp_path, local_path) OSError: [Errno 18] Invalid cross-device link: '/tmp/tmpwl6lvhon' -> '/home/thomas/code/my_app/data/instacart_online_grocery_shopping_2017_05_01.tar.gz' ``` * documention checkpoint (#129) * Create Naive Estimator (#127) * Create Naive estimator A naive estimator will just predict the mean of the response variable. It is useful for benchmarking models * Create simple base class for Naive model * Add predict_proba method for xgboost * Add predict_proba method to base class * Add unit tests for naive model * Test for XGBoost predict_proba * Return probabilities for both classes ala sklearn/xgboost * Add documentation for naive estimator * Generalize documentation for multi-class classification * Use numpy.full instead of numpy.ones * Add basic documentation and `predict_proba` to SKLearn Estimator (#130) * Add basic documentation for sklearn estimators * Expose predict_proba method for sklearn BinaryClassifier * Add documentation for predict_proba. This should probably be done in a DRY fashion. But doing it this way for now * Improve OneHot encoder (#131) * Fix names for OneHot encoded columns * Add option to drop first level This is useful for algorithms like linear regression which do not like singular matrices * Test for drop_first * Add percent_occurrences to OneHot * Add documentation for OneHot * Version bump * [Lore] Add exception handling for unauthenticated snowflake connections (#132) * [Lore] Add exception handling for unauthenticated snowflake connections * [Lore] Added more strict error handling for expired snowflake connection renewal * [Lore] Added test cased for unauthenticated snowflake connection error * [Lore] Disable failing tests * Fix tasks invocation (#133) * python2 compatibility for tests * Helper for creating prediction dataframe * Helper for logging predictions * Store latest predictions on every predict * Util function for converting df columns to json * Create Mock model for unit test * Create test for prediction logging * Integrate relevant changes from montana/model_store * Add metadata DB * Add class method to get_or_create instance * Change schema for metadata * Instrument model base class for metadata logging * Update fitting schema to include model uploads * Ignore commit data for now * Add memoized property to utils * Add basic unit test for fit metadata * Change metadata schema 1. Remove fitting and snapshot status 2. Change fitting_name to fitting_num * Add additional imports * Modify model fit and save for metadata logging * Save best estimator as fitting with hyper_parameter_search * Fix paths so model upload works * Refactor uploading/downloading code * Modify last_fitting to get correct fitting name * Modify predictions metadata * Log predictions * Modify unit tests for logging * Add additional columns to metadata * Change prediction logging to use custom_data column * Save model URL on upload * Raise error if no fittings found for model in metadata * Add test for prediction logging * Fix last_fitting function * Fix bug where fitting name was not being set properly on downlaod * Define model_name as property * Temp commit * hunt down sql(alchemy+ite) bug * - use env aware default metadata database - use workaround to re-enable watermarking w/ sqlite - cleanup test outoput * use in memory database * test batch mode in CI * go go postgres in CI * prevent database schema caching * prediction log testing is in metadata tests. * Add get() to return classes by key. * Support loading legacy models in lore. * bump version * add ganesh

* Helper for creating prediction dataframe * Helper for logging predictions * Store latest predictions on every predict * Util function for converting df columns to json * Create Mock model for unit test * Create test for prediction logging * Integrate relevant changes from montana/model_store * Add metadata DB * Add class method to get_or_create instance * Change schema for metadata * Instrument model base class for metadata logging * Update fitting schema to include model uploads * Ignore commit data for now * Add memoized property to utils * Add basic unit test for fit metadata * Change metadata schema 1. Remove fitting and snapshot status 2. Change fitting_name to fitting_num * Add additional imports * Modify model fit and save for metadata logging * Save best estimator as fitting with hyper_parameter_search * Fix paths so model upload works * Refactor uploading/downloading code * Modify last_fitting to get correct fitting name * Modify predictions metadata * Log predictions * Modify unit tests for logging * Add additional columns to metadata * Change prediction logging to use custom_data column * Save model URL on upload * Raise error if no fittings found for model in metadata * Add test for prediction logging * Fix last_fitting function * Fix bug where fitting name was not being set properly on downlaod * Define model_name as property * Temp commit * hunt down sql(alchemy+ite) bug (#134) * bulk insert support for snowflake (#122) * bulk insert support for snowflake * always use slices * cleanup shadowing slice * Fix issue where copying between different file systems would break data retrieval (#125) `os.rename` only works if the source and destination path are on the same file system Copying using `shutil.copy`, and subsequentially manually removing the source file fixes the issue. Traceback: ``` 11:21:36.644 ERROR root:293 => Exception: Traceback (most recent call last): File "/home/thomas/.pyenv/versions/3.6.4/envs/my_app/bin/lore", line 11, in <module> sys.exit(main()) File "/home/thomas/.pyenv/versions/3.6.4/envs/my_app/lib/python3.6/site-packages/lore/__main__.py", line 331, in main known.func(known, unknown) File "/home/thomas/.pyenv/versions/3.6.4/envs/my_app/lib/python3.6/site-packages/lore/__main__.py", line 483, in fit model.fit(score=parsed.score, test=parsed.test, **fit_args) File "/home/thomas/.pyenv/versions/3.6.4/envs/my_app/lib/python3.6/site-packages/lore/models/base.py", line 49, in fit x=self.pipeline.encoded_training_data.x, File "/home/thomas/.pyenv/versions/3.6.4/envs/my_app/lib/python3.6/site-packages/lore/pipelines/holdout.py", line 132, in encoded_training_data self._encoded_training_data = self.observations(self.training_data) File "/home/thomas/.pyenv/versions/3.6.4/envs/my_app/lib/python3.6/site-packages/lore/pipelines/holdout.py", line 110, in training_data self._split_data() File "/home/thomas/.pyenv/versions/3.6.4/envs/my_app/lib/python3.6/site-packages/lore/util.py", line 210, in wrapper return func(*args, **kwargs) File "/home/thomas/.pyenv/versions/3.6.4/envs/my_app/lib/python3.6/site-packages/lore/pipelines/holdout.py", line 234, in _split_data self._data = self.get_data() File "/home/thomas/code/my_app/my_app/pipelines/product_popularity.py", line 20, in get_data lore.io.download(url, cache=True, extract=True) File "/home/thomas/.pyenv/versions/3.6.4/envs/my_app/lib/python3.6/site-packages/lore/io/__init__.py", line 124, in download os.rename(temp_path, local_path) OSError: [Errno 18] Invalid cross-device link: '/tmp/tmpwl6lvhon' -> '/home/thomas/code/my_app/data/instacart_online_grocery_shopping_2017_05_01.tar.gz' ``` * documention checkpoint (#129) * Create Naive Estimator (#127) * Create Naive estimator A naive estimator will just predict the mean of the response variable. It is useful for benchmarking models * Create simple base class for Naive model * Add predict_proba method for xgboost * Add predict_proba method to base class * Add unit tests for naive model * Test for XGBoost predict_proba * Return probabilities for both classes ala sklearn/xgboost * Add documentation for naive estimator * Generalize documentation for multi-class classification * Use numpy.full instead of numpy.ones * Add basic documentation and `predict_proba` to SKLearn Estimator (#130) * Add basic documentation for sklearn estimators * Expose predict_proba method for sklearn BinaryClassifier * Add documentation for predict_proba. This should probably be done in a DRY fashion. But doing it this way for now * Improve OneHot encoder (#131) * Fix names for OneHot encoded columns * Add option to drop first level This is useful for algorithms like linear regression which do not like singular matrices * Test for drop_first * Add percent_occurrences to OneHot * Add documentation for OneHot * Version bump * [Lore] Add exception handling for unauthenticated snowflake connections (#132) * [Lore] Add exception handling for unauthenticated snowflake connections * [Lore] Added more strict error handling for expired snowflake connection renewal * [Lore] Added test cased for unauthenticated snowflake connection error * [Lore] Disable failing tests * Fix tasks invocation (#133) * python2 compatibility for tests * Helper for creating prediction dataframe * Helper for logging predictions * Store latest predictions on every predict * Util function for converting df columns to json * Create Mock model for unit test * Create test for prediction logging * Integrate relevant changes from montana/model_store * Add metadata DB * Add class method to get_or_create instance * Change schema for metadata * Instrument model base class for metadata logging * Update fitting schema to include model uploads * Ignore commit data for now * Add memoized property to utils * Add basic unit test for fit metadata * Change metadata schema 1. Remove fitting and snapshot status 2. Change fitting_name to fitting_num * Add additional imports * Modify model fit and save for metadata logging * Save best estimator as fitting with hyper_parameter_search * Fix paths so model upload works * Refactor uploading/downloading code * Modify last_fitting to get correct fitting name * Modify predictions metadata * Log predictions * Modify unit tests for logging * Add additional columns to metadata * Change prediction logging to use custom_data column * Save model URL on upload * Raise error if no fittings found for model in metadata * Add test for prediction logging * Fix last_fitting function * Fix bug where fitting name was not being set properly on downlaod * Define model_name as property * Temp commit * hunt down sql(alchemy+ite) bug * - use env aware default metadata database - use workaround to re-enable watermarking w/ sqlite - cleanup test outoput * use in memory database * test batch mode in CI * go go postgres in CI * prevent database schema caching * prediction log testing is in metadata tests. * Add get() to return classes by key. * Support loading legacy models in lore. * bump version * add ganesh * Lets make this a Release Candidate before launching broadly. * merge master

documention checkpoint

0e48fe0

montanalow merged commit 69b1fc3 into master Oct 30, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

documention checkpoint #129

documention checkpoint #129

montanalow commented Oct 30, 2018

documention checkpoint #129

documention checkpoint #129

Conversation

montanalow commented Oct 30, 2018