Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

documention checkpoint #129

Merged
merged 1 commit into from Oct 30, 2018
Merged

documention checkpoint #129

merged 1 commit into from Oct 30, 2018

Conversation

montanalow
Copy link
Contributor

No description provided.

@montanalow montanalow merged commit 69b1fc3 into master Oct 30, 2018
montanalow added a commit that referenced this pull request Dec 7, 2018
* bulk insert support for snowflake (#122)

* bulk insert support for snowflake

* always use slices

* cleanup shadowing slice

* Fix issue where copying between different file systems would break data retrieval (#125)

`os.rename` only works if the source and destination path are on the same file system

Copying using `shutil.copy`, and subsequentially manually removing the source file fixes the issue.

Traceback:
```
11:21:36.644  ERROR    root:293 => Exception: Traceback (most recent call last):
  File "/home/thomas/.pyenv/versions/3.6.4/envs/my_app/bin/lore", line 11, in <module>
    sys.exit(main())
  File "/home/thomas/.pyenv/versions/3.6.4/envs/my_app/lib/python3.6/site-packages/lore/__main__.py", line 331, in main
    known.func(known, unknown)
  File "/home/thomas/.pyenv/versions/3.6.4/envs/my_app/lib/python3.6/site-packages/lore/__main__.py", line 483, in fit
    model.fit(score=parsed.score, test=parsed.test, **fit_args)
  File "/home/thomas/.pyenv/versions/3.6.4/envs/my_app/lib/python3.6/site-packages/lore/models/base.py", line 49, in fit
    x=self.pipeline.encoded_training_data.x,
  File "/home/thomas/.pyenv/versions/3.6.4/envs/my_app/lib/python3.6/site-packages/lore/pipelines/holdout.py", line 132, in encoded_training_data
    self._encoded_training_data = self.observations(self.training_data)
  File "/home/thomas/.pyenv/versions/3.6.4/envs/my_app/lib/python3.6/site-packages/lore/pipelines/holdout.py", line 110, in training_data
    self._split_data()
  File "/home/thomas/.pyenv/versions/3.6.4/envs/my_app/lib/python3.6/site-packages/lore/util.py", line 210, in wrapper
    return func(*args, **kwargs)
  File "/home/thomas/.pyenv/versions/3.6.4/envs/my_app/lib/python3.6/site-packages/lore/pipelines/holdout.py", line 234, in _split_data
    self._data = self.get_data()
  File "/home/thomas/code/my_app/my_app/pipelines/product_popularity.py", line 20, in get_data
    lore.io.download(url, cache=True, extract=True)
  File "/home/thomas/.pyenv/versions/3.6.4/envs/my_app/lib/python3.6/site-packages/lore/io/__init__.py", line 124, in download
    os.rename(temp_path, local_path)
OSError: [Errno 18] Invalid cross-device link: '/tmp/tmpwl6lvhon' -> '/home/thomas/code/my_app/data/instacart_online_grocery_shopping_2017_05_01.tar.gz'

```

* documention checkpoint (#129)

* Create Naive Estimator (#127)

* Create Naive estimator

A naive estimator will just predict the mean of the response variable.
It is useful for benchmarking models

* Create simple base class for Naive model

* Add predict_proba method for xgboost

* Add predict_proba method to base class

* Add unit tests for naive model

* Test for XGBoost predict_proba

* Return probabilities for both classes ala sklearn/xgboost

* Add documentation for naive estimator

* Generalize documentation for multi-class classification

* Use numpy.full instead of numpy.ones

* Add basic documentation and `predict_proba` to SKLearn Estimator (#130)

* Add basic documentation for sklearn estimators

* Expose predict_proba method for sklearn BinaryClassifier

* Add documentation for predict_proba.

This should probably be done in a DRY fashion. But doing it this way for now

* Improve OneHot encoder (#131)

* Fix names for OneHot encoded columns

* Add option to drop first level

This is useful for algorithms like linear regression which do not
like singular matrices

* Test for drop_first

* Add percent_occurrences to OneHot

* Add documentation for OneHot

* Version bump

* [Lore] Add exception handling for unauthenticated snowflake connections (#132)

* [Lore] Add exception handling for unauthenticated snowflake connections
* [Lore] Added more strict error handling for expired snowflake connection renewal
* [Lore] Added test cased for unauthenticated snowflake connection error
* [Lore] Disable failing tests

* Fix tasks invocation (#133)

* python2 compatibility for tests

* Helper for creating prediction dataframe

* Helper for logging predictions

* Store latest predictions on every predict

* Util function for converting df columns to json

* Create Mock model for unit test

* Create test for prediction logging

* Integrate relevant changes from montana/model_store

* Add metadata DB

* Add class method to get_or_create instance

* Change schema for metadata

* Instrument model base class for metadata logging

* Update fitting schema to include model uploads

* Ignore commit data for now

* Add memoized property to utils

* Add basic unit test for fit metadata

* Change metadata schema

1. Remove fitting and snapshot status
2. Change fitting_name to fitting_num

* Add additional imports

* Modify model fit and save for metadata logging

* Save best estimator as fitting with hyper_parameter_search

* Fix paths so model upload works

* Refactor uploading/downloading code

* Modify last_fitting to get correct fitting name

* Modify predictions metadata

* Log predictions

* Modify unit tests for logging

* Add additional columns to metadata

* Change prediction logging to use custom_data column

* Save model URL on upload

* Raise error if no fittings found for model in metadata

* Add test for prediction logging

* Fix last_fitting function

* Fix bug where fitting name was not being set properly on downlaod

* Define model_name as property

* Temp commit

* hunt down sql(alchemy+ite) bug

* - use env aware default metadata database
- use workaround to re-enable watermarking w/ sqlite
- cleanup test outoput

* use in memory database

* test batch mode in CI

* go go postgres in CI

* prevent database schema caching

* prediction log testing is in metadata tests.

* Add get() to return classes by key.

* Support loading legacy models in lore.

* bump version

* add ganesh
ganesh-krishnan added a commit that referenced this pull request Dec 7, 2018
* Helper for creating prediction dataframe

* Helper for logging predictions

* Store latest predictions on every predict

* Util function for converting df columns to json

* Create Mock model for unit test

* Create test for prediction logging

* Integrate relevant changes from montana/model_store

* Add metadata DB

* Add class method to get_or_create instance

* Change schema for metadata

* Instrument model base class for metadata logging

* Update fitting schema to include model uploads

* Ignore commit data for now

* Add memoized property to utils

* Add basic unit test for fit metadata

* Change metadata schema

1. Remove fitting and snapshot status
2. Change fitting_name to fitting_num

* Add additional imports

* Modify model fit and save for metadata logging

* Save best estimator as fitting with hyper_parameter_search

* Fix paths so model upload works

* Refactor uploading/downloading code

* Modify last_fitting to get correct fitting name

* Modify predictions metadata

* Log predictions

* Modify unit tests for logging

* Add additional columns to metadata

* Change prediction logging to use custom_data column

* Save model URL on upload

* Raise error if no fittings found for model in metadata

* Add test for prediction logging

* Fix last_fitting function

* Fix bug where fitting name was not being set properly on downlaod

* Define model_name as property

* Temp commit

* hunt down sql(alchemy+ite) bug (#134)

* bulk insert support for snowflake (#122)

* bulk insert support for snowflake

* always use slices

* cleanup shadowing slice

* Fix issue where copying between different file systems would break data retrieval (#125)

`os.rename` only works if the source and destination path are on the same file system

Copying using `shutil.copy`, and subsequentially manually removing the source file fixes the issue.

Traceback:
```
11:21:36.644  ERROR    root:293 => Exception: Traceback (most recent call last):
  File "/home/thomas/.pyenv/versions/3.6.4/envs/my_app/bin/lore", line 11, in <module>
    sys.exit(main())
  File "/home/thomas/.pyenv/versions/3.6.4/envs/my_app/lib/python3.6/site-packages/lore/__main__.py", line 331, in main
    known.func(known, unknown)
  File "/home/thomas/.pyenv/versions/3.6.4/envs/my_app/lib/python3.6/site-packages/lore/__main__.py", line 483, in fit
    model.fit(score=parsed.score, test=parsed.test, **fit_args)
  File "/home/thomas/.pyenv/versions/3.6.4/envs/my_app/lib/python3.6/site-packages/lore/models/base.py", line 49, in fit
    x=self.pipeline.encoded_training_data.x,
  File "/home/thomas/.pyenv/versions/3.6.4/envs/my_app/lib/python3.6/site-packages/lore/pipelines/holdout.py", line 132, in encoded_training_data
    self._encoded_training_data = self.observations(self.training_data)
  File "/home/thomas/.pyenv/versions/3.6.4/envs/my_app/lib/python3.6/site-packages/lore/pipelines/holdout.py", line 110, in training_data
    self._split_data()
  File "/home/thomas/.pyenv/versions/3.6.4/envs/my_app/lib/python3.6/site-packages/lore/util.py", line 210, in wrapper
    return func(*args, **kwargs)
  File "/home/thomas/.pyenv/versions/3.6.4/envs/my_app/lib/python3.6/site-packages/lore/pipelines/holdout.py", line 234, in _split_data
    self._data = self.get_data()
  File "/home/thomas/code/my_app/my_app/pipelines/product_popularity.py", line 20, in get_data
    lore.io.download(url, cache=True, extract=True)
  File "/home/thomas/.pyenv/versions/3.6.4/envs/my_app/lib/python3.6/site-packages/lore/io/__init__.py", line 124, in download
    os.rename(temp_path, local_path)
OSError: [Errno 18] Invalid cross-device link: '/tmp/tmpwl6lvhon' -> '/home/thomas/code/my_app/data/instacart_online_grocery_shopping_2017_05_01.tar.gz'

```

* documention checkpoint (#129)

* Create Naive Estimator (#127)

* Create Naive estimator

A naive estimator will just predict the mean of the response variable.
It is useful for benchmarking models

* Create simple base class for Naive model

* Add predict_proba method for xgboost

* Add predict_proba method to base class

* Add unit tests for naive model

* Test for XGBoost predict_proba

* Return probabilities for both classes ala sklearn/xgboost

* Add documentation for naive estimator

* Generalize documentation for multi-class classification

* Use numpy.full instead of numpy.ones

* Add basic documentation and `predict_proba` to SKLearn Estimator (#130)

* Add basic documentation for sklearn estimators

* Expose predict_proba method for sklearn BinaryClassifier

* Add documentation for predict_proba.

This should probably be done in a DRY fashion. But doing it this way for now

* Improve OneHot encoder (#131)

* Fix names for OneHot encoded columns

* Add option to drop first level

This is useful for algorithms like linear regression which do not
like singular matrices

* Test for drop_first

* Add percent_occurrences to OneHot

* Add documentation for OneHot

* Version bump

* [Lore] Add exception handling for unauthenticated snowflake connections (#132)

* [Lore] Add exception handling for unauthenticated snowflake connections
* [Lore] Added more strict error handling for expired snowflake connection renewal
* [Lore] Added test cased for unauthenticated snowflake connection error
* [Lore] Disable failing tests

* Fix tasks invocation (#133)

* python2 compatibility for tests

* Helper for creating prediction dataframe

* Helper for logging predictions

* Store latest predictions on every predict

* Util function for converting df columns to json

* Create Mock model for unit test

* Create test for prediction logging

* Integrate relevant changes from montana/model_store

* Add metadata DB

* Add class method to get_or_create instance

* Change schema for metadata

* Instrument model base class for metadata logging

* Update fitting schema to include model uploads

* Ignore commit data for now

* Add memoized property to utils

* Add basic unit test for fit metadata

* Change metadata schema

1. Remove fitting and snapshot status
2. Change fitting_name to fitting_num

* Add additional imports

* Modify model fit and save for metadata logging

* Save best estimator as fitting with hyper_parameter_search

* Fix paths so model upload works

* Refactor uploading/downloading code

* Modify last_fitting to get correct fitting name

* Modify predictions metadata

* Log predictions

* Modify unit tests for logging

* Add additional columns to metadata

* Change prediction logging to use custom_data column

* Save model URL on upload

* Raise error if no fittings found for model in metadata

* Add test for prediction logging

* Fix last_fitting function

* Fix bug where fitting name was not being set properly on downlaod

* Define model_name as property

* Temp commit

* hunt down sql(alchemy+ite) bug

* - use env aware default metadata database
- use workaround to re-enable watermarking w/ sqlite
- cleanup test outoput

* use in memory database

* test batch mode in CI

* go go postgres in CI

* prevent database schema caching

* prediction log testing is in metadata tests.

* Add get() to return classes by key.

* Support loading legacy models in lore.

* bump version

* add ganesh

* Lets make this a Release Candidate before launching broadly.

* merge master
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant