New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow installation of additional primitives #326

Merged
merged 51 commits into from Dec 11, 2018

Conversation

Projects
None yet
2 participants
@kmax12
Copy link
Member

kmax12 commented Nov 25, 2018

This PR introduces a method to install additional primitives into an existing Featuretools installation.

Installation from command line

$ featuretools install s3://featuretools-static/primitives_to_install.tar.gz
Install primitives: CustomSum, CustomMean, CustomMax? (Y/n) Y
100%|███████████████| 3/3 [00:00<00:00, 1187.63it/s]

Now a user can access the primitives either by importing or by string

import featuretools as ft
from featuretools.primitives import CustomSum

es = ft.demo.load_mock_customer(return_entityset=True)
feature_matrix, feature_defs = ft.dfs(entityset=es,
                                      target_entity="customers",
                                      agg_primitives=[CustomSum, "custommax"],
                                      trans_primitives=[],
                                      max_depth=2)

This returns

            zip_code  CUSTOMSUM(transactions.amount)                         ...                          CUSTOMMAX(sessions.CUSTOMSUM(transactions.amount))  CUSTOMMAX(sessions.CUSTOMMAX(transactions.amount))
customer_id                                                                  ...
1              60091                         9025.62                         ...                                                                    1613.93                                              139.43
2              13244                         7200.28                         ...                                                                    1320.64                                              146.81
3              13244                         6236.62                         ...                                                                    1477.97                                              149.15
4              60091                         8727.68                         ...                                                                    1351.46                                              149.95
5              60091                         6349.66                         ...                                                                    1700.67                                              149.02

[5 rows x 7 columns]

How it works

The installation command is provided a directory or tar archive of primitive source files. If it is an archive, the installation script extracts it to directory. The archive can also be remote as in the example above, in which case it is downloaded to a temporary directory first. We use smart_open to handle the downloading, so it supports downloading primitives from S3, HDFS, and HTTP/HTTPS.

Primitives are installed by copying their source files into a new directory installed within the featuretools.primitives submodule. The installation script only considers files with a .py extension. The installation script detects if a primitive is in the file by looking for an object that is a subclass of PrimitiveBase. Each of these files must only exactly one primitive, otherwise the installation will through an error. When featuretools.primitives is a loaded, primitives source files in installed/ are automatically imported.

To support the CLI, there is new configuration for entry_points in the setup.py based on the instruction here: https://chriswarrick.com/blog/2014/09/15/python-apps-the-right-way-entry_points-and-scripts/

Users can also install from a python script using ft.install_primitives(...)

import featuretools as ft
ft.install_primitives("s3://featuretools-static/primitives_to_install.tar.gz")
Install primitives: CustomSum, CustomMean, CustomMax? (Y/n) Y
100%|███████████████| 3/3 [00:00<00:00, 930.28it/s]

In this PR I also updated the structure of featuretools.primitives to more organized. It doesn't change any of the external API, but there is a now

  • featuretools.primitives.standard - all the primitives that come with featuretools
  • featuretools.primitives.installed - all the primitives that are installed into featuretools
  • featuretools.primitives.base - the base classes used by primitives e.g PrimitiveBase, AggregationPrimitive, etc

Finally, I also removed our usage of tox. It wasn't providing any useful functionality after we separate each version into separate circle ci jobs

TODO before ready for review

  • Finalize the submodule structure

Future Development

  • Add more details to documentation
  • Host some new downloadable primitives on S3 for users to access
  • Support uninstalling primitives
  • Improve visual display during installation process
  • Support installing dependencies of primitives
  • Handle trying to reinstall a same primitive or another primitive with same name
  • Give primitives more annotations that point to the author of the primitive, the license, recommended use cases, etc
  • Provide scaffolding for easily testing primitives
  • Discuss if builtin primitives should also have the restriction of one primitive per file

kmax12 added some commits Nov 21, 2018

@kmax12 kmax12 changed the base branch from clean-primitive-tests to master Nov 27, 2018

kmax12 added some commits Nov 27, 2018

@kmax12 kmax12 changed the title (WIP) Allow installation of additional primitives Allow installation of additional primitives Nov 27, 2018

kmax12 and others added some commits Dec 7, 2018

@@ -1,14 +0,0 @@
[tox]

This comment has been minimized.

@rwedge

rwedge Dec 10, 2018

Contributor

We can probably remove tox from dev-requirements.txt as well

kmax12 added some commits Dec 10, 2018

s3 = s3fs.S3FileSystem(anon=True)
remote_archive = s3.open(uri, 'rb')

f.write(remote_archive.read())

This comment has been minimized.

@rwedge

rwedge Dec 10, 2018

Contributor

Instead of reading the whole archive should we read it line by line or some number of bytes

This comment has been minimized.

@kmax12

kmax12 Dec 10, 2018

Member

i don't think it matters, does it? at least for now, I don't expect these archives to be too big

Show resolved Hide resolved featuretools/primitives/install.py Outdated
Show resolved Hide resolved featuretools/primitives/install.py
Show resolved Hide resolved featuretools/tests/primitive_tests/test_install_primitives.py Outdated
Show resolved Hide resolved featuretools/tests/primitive_tests/test_install_primitives.py

kmax12 and others added some commits Dec 10, 2018

@rwedge

rwedge approved these changes Dec 11, 2018

@kmax12 kmax12 merged commit e13e8a6 into master Dec 11, 2018

1 check passed

license/cla Contributor License Agreement is signed.
Details

@rwedge rwedge referenced this pull request Dec 17, 2018

Merged

v0.5.0 #351

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment