Skip to content

Commit

Permalink
Python Documentation (#114)
Browse files Browse the repository at this point in the history
  • Loading branch information
moritzmeister committed Oct 23, 2020
1 parent 38d9291 commit 4128aa5
Show file tree
Hide file tree
Showing 43 changed files with 1,411 additions and 244 deletions.
5 changes: 4 additions & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ We use `mkdocs` to build the documentation and a plugin called `keras-autodoc` t
1. Currently we are using our own version of `keras-autodoc`

```bash
pip install git+https://github.com/moritzmeister/keras-autodoc@split-tags
pip install git+https://github.com/moritzmeister/keras-autodoc@split-tags-properties
```

2. Install HSFS with `docs` extras:
Expand Down Expand Up @@ -129,3 +129,6 @@ Some extra content here.
```
Finally, run the `auto_doc.py` script, as decribed above, to update the documentation.
For information about Markdown syntax and possible Admonitions/Highlighting etc. see
the [Material for Mkdocs themes reference documentation](https://squidfunk.github.io/mkdocs-material/reference/abbreviations/).
4 changes: 2 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,6 @@ RUN apt-get update && \
RUN pip3 install twine \
mkdocs \
mkdocs-material \
git+https://github.com/moritzmeister/keras-autodoc@split-tags
git+https://github.com/moritzmeister/keras-autodoc@split-tags-properties

RUN mkdir -p /.local && chmod -R 777 /.local
RUN mkdir -p /.local && chmod -R 777 /.local
77 changes: 59 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,49 @@
Hopsworks Feature Store API
===========================
# Hopsworks Feature Store

<p align="center">
<a href="https://community.hopsworks.ai"><img
src="https://img.shields.io/discourse/users?label=Hopsworks%20Community&server=https%3A%2F%2Fcommunity.hopsworks.ai"
alt="Hopsworks Community"
/></a>
<a href="https://docs.hopsworks.ai"><img
src="https://img.shields.io/badge/docs-HSFS-orange"
alt="Hopsworks Feature Store Documentation"
/></a>
<a href="https://pypi.org/project/hsfs/"><img
src="https://img.shields.io/pypi/v/hsfs?color=blue"
alt="PyPiStatus"
/></a>
<a href="https://archiva.hops.works/#artifact/com.logicalclocks/hsfs"><img
src="https://img.shields.io/badge/java-HSFS-green"
alt="Scala/Java Artifacts"
/></a>
<a href="https://pepy.tech/project/hsfs/month"><img
src="https://pepy.tech/badge/hsfs/month"
alt="Downloads"
/></a>
<a href="https://github.com/psf/black"><img
src="https://img.shields.io/badge/code%20style-black-000000.svg"
alt="CodeStyle"
/></a>
<a><img
src="https://img.shields.io/pypi/l/hsfs?color=green"
alt="License"
/></a>
</p>

HSFS is the library to interact with the Hopsworks Feature Store. The library makes creating new features, feature groups and training datasets easy.

The library can be used in two modes:
The library is environment indpendent and can be used in two modes:

- Spark mode : For data engineering jobs that create and write features into the feature store or generate training datasets. It requires a Spark environment such as the one provided in the Hopsworks platform or Databricks. In Spark mode, HSFS provides binding both for Python and JVM languages.
- Spark mode: For data engineering jobs that create and write features into the feature store or generate training datasets. It requires a Spark environment such as the one provided in the Hopsworks platform or Databricks. In Spark mode, HSFS provides bindings both for Python and JVM languages.

- Python mode : For data science jobs to explore the features available in the feature store, generate training datasets and feed them in a training pipeline. Python mode requires just a Python interpreter and can be used both in Hopsworks from Python Jobs/Jupyter Kernels, Amazon SageMaker, KubeFlow.
- Python mode: For data science jobs to explore the features available in the feature store, generate training datasets and feed them in a training pipeline. Python mode requires just a Python interpreter and can be used both in Hopsworks from Python Jobs/Jupyter Kernels, Amazon SageMaker or KubeFlow.

The library automatically configures itself based on the environment it is run.
However, to connect from an external environment such as Databricks or AWS Sagemaker,
additional connection information, such as host and port, is required. For more information about the setup from external environments, see the setup section.

You can read more about the Hopsworks Feature Store and its concepts [here](https://hopsworks.readthedocs.io)

Getting Started
---------------
## Getting Started On Hopsworks

Instantiate a connection and get the project feature store handler
```python
Expand All @@ -38,7 +67,7 @@ fg.save(dataframe)
Join features together
```python
feature_join = rain_fg.select_all()
.join(temperature_fg.select_all(), ["date", "location_id"])
.join(temperature_fg.select_all(), on=["date", "location_id"])
.join(location_fg.select_all()))

feature_join.show(5)
Expand All @@ -47,23 +76,35 @@ feature_join.show(5)
Use the query object to create a training dataset:
```python
td = fs.create_training_dataset("training_dataset",
version=1,
data_format="tfrecords",
description="A test training dataset saved in TfRecords format",
splits={'train': 0.7, 'test': 0.2, 'validate': 0.1})
version=1,
data_format="tfrecords",
description="A test training dataset saved in TfRecords format",
splits={'train': 0.7, 'test': 0.2, 'validate': 0.1})

td.save(feature_join)
```

Feed the training dataset to a TensorFlow model:
```python
train_input_feeder = training_dataset.feed(target_name='label',split='train', is_training=True)
train_input_feeder = training_dataset.feed(target_name="label",
split="train",
is_training=True)
train_input = train_input_feeder.tf_record_dataset()
```

You can find more examples on how to use the library in our [hops-examples](https://github.com/logicalclocks/hops-examples) repository.

Issues
------
## Documentation

Documentation is available at [Hopsworks Feature Store Documentation](https://docs.hopsworks.ai/).

## Issues

For general questions about the usage of Hopsworks and the Feature Store please open a topic on [Hopsworks Community](https://community.hopsworks.ai/).

Please report any issue using [Github issue tracking](https://github.com/logicalclocks/feature-store-api/issues).


## Contributing

Please report any issue using [Github issue tracking](https://github.com/logicalclocks/feature-store-api/issues)
If you would like to contribute to this library, please see the [Contribution Guidelines](CONTRIBUTING.md).
62 changes: 57 additions & 5 deletions auto_doc.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,61 @@
import keras_autodoc

PAGES = {
"connection.md": [
"hsfs.connection.Connection.connection",
"hsfs.connection.Connection.setup_databricks",
]
"project.md": {
"connection": ["hsfs.connection.Connection"],
"connection_methods": keras_autodoc.get_methods(
"hsfs.connection.Connection", exclude=["connection"]
),
},
"feature_store.md": {
"fs_get": ["hsfs.connection.Connection.get_feature_store"],
"fs_properties": keras_autodoc.get_properties(
"hsfs.feature_store.FeatureStore"
),
"fs_methods": keras_autodoc.get_methods(
"hsfs.feature_store.FeatureStore", exclude=["from_response_json"]
),
},
"feature_group.md": {
"fg_create": ["hsfs.feature_store.FeatureStore.create_feature_group"],
"fg_get": ["hsfs.feature_store.FeatureStore.get_feature_group"],
"fg_properties": keras_autodoc.get_properties(
"hsfs.feature_group.FeatureGroup"
),
"fg_methods": keras_autodoc.get_methods(
"hsfs.feature_group.FeatureGroup",
exclude=[
"from_response_json",
"update_from_response_json",
"json",
"to_dict",
],
),
},
"api/connection_api.md": {
"connection": ["hsfs.connection.Connection"],
"connection_properties": keras_autodoc.get_properties(
"hsfs.connection.Connection"
),
"connection_methods": keras_autodoc.get_methods("hsfs.connection.Connection"),
},
"api/feature_store_api.md": {
"fs": ["hsfs.feature_store.FeatureStore"],
"fs_get": ["hsfs.connection.Connection.get_feature_store"],
"fs_properties": keras_autodoc.get_properties(
"hsfs.feature_store.FeatureStore"
),
"fs_methods": keras_autodoc.get_methods("hsfs.feature_store.FeatureStore"),
},
"api/feature_group_api.md": {
"fg": ["hsfs.feature_group.FeatureGroup"],
"fg_create": ["hsfs.feature_store.FeatureStore.create_feature_group"],
"fg_get": ["hsfs.feature_store.FeatureStore.get_feature_group"],
"fg_properties": keras_autodoc.get_properties(
"hsfs.feature_group.FeatureGroup"
),
"fg_methods": keras_autodoc.get_methods("hsfs.feature_group.FeatureGroup"),
},
}

hsfs_dir = pathlib.Path(__file__).resolve().parents[0]
Expand All @@ -18,8 +69,9 @@ def generate(dest_dir):
PAGES,
project_url="https://github.com/logicalclocks/feature-store-api/blob/master/python",
template_dir="./docs/templates",
titles_size="###",
)
shutil.copyfile(hsfs_dir / "CONTRIBUTING.md", dest_dir / "contributing.md")
shutil.copyfile(hsfs_dir / "CONTRIBUTING.md", dest_dir / "CONTRIBUTING.md")
shutil.copyfile(hsfs_dir / "README.md", dest_dir / "index.md")

doc_generator.generate(dest_dir / "generated")
Expand Down
5 changes: 4 additions & 1 deletion docs/contributing.md → docs/CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ We use `mkdocs` to build the documentation and a plugin called `keras-autodoc` t
1. Currently we are using our own version of `keras-autodoc`

```bash
pip install git+https://github.com/moritzmeister/keras-autodoc@split-tags
pip install git+https://github.com/moritzmeister/keras-autodoc@split-tags-properties
```

2. Install HSFS with `docs` extras:
Expand Down Expand Up @@ -129,3 +129,6 @@ Some extra content here.
```
Finally, run the `auto_doc.py` script, as decribed above, to update the documentation.
For information about Markdown syntax and possible Admonitions/Highlighting etc. see
the [Material for Mkdocs themes reference documentation](https://squidfunk.github.io/mkdocs-material/reference/abbreviations/).
Binary file added docs/assets/images/api-key.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/images/benefits.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/images/extract-zip.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/images/featurestore-sharing-2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/images/featurestore-sharing.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/images/fs-concepts.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/images/hopsworks-version.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/images/hw-concepts.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/images/integrations.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/images/offline-online.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/images/parameter-store-policy.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/images/parameter-store.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/images/quickstart.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/images/sagemaker-role.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/images/secrets-manager-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/images/secrets-manager-2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/images/secrets-manager-policy.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
5 changes: 2 additions & 3 deletions docs/css/custom.css
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@

.md-header {
background-color: #1EB382 !important;
:root {
--md-primary-fg-color: #1EB382;
}

.md-logo {
Expand Down
Binary file removed docs/img/favicon.ico
Binary file not shown.
77 changes: 59 additions & 18 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,49 @@
Hopsworks Feature Store API
===========================
# Hopsworks Feature Store

<p align="center">
<a href="https://community.hopsworks.ai"><img
src="https://img.shields.io/discourse/users?label=Hopsworks%20Community&server=https%3A%2F%2Fcommunity.hopsworks.ai"
alt="Hopsworks Community"
/></a>
<a href="https://docs.hopsworks.ai"><img
src="https://img.shields.io/badge/docs-HSFS-orange"
alt="Hopsworks Feature Store Documentation"
/></a>
<a href="https://pypi.org/project/hsfs/"><img
src="https://img.shields.io/pypi/v/hsfs?color=blue"
alt="PyPiStatus"
/></a>
<a href="https://archiva.hops.works/#artifact/com.logicalclocks/hsfs"><img
src="https://img.shields.io/badge/java-HSFS-green"
alt="Scala/Java Artifacts"
/></a>
<a href="https://pepy.tech/project/hsfs/month"><img
src="https://pepy.tech/badge/hsfs/month"
alt="Downloads"
/></a>
<a href="https://github.com/psf/black"><img
src="https://img.shields.io/badge/code%20style-black-000000.svg"
alt="CodeStyle"
/></a>
<a><img
src="https://img.shields.io/pypi/l/hsfs?color=green"
alt="License"
/></a>
</p>

HSFS is the library to interact with the Hopsworks Feature Store. The library makes creating new features, feature groups and training datasets easy.

The library can be used in two modes:
The library is environment indpendent and can be used in two modes:

- Spark mode : For data engineering jobs that create and write features into the feature store or generate training datasets. It requires a Spark environment such as the one provided in the Hopsworks platform or Databricks. In Spark mode, HSFS provides binding both for Python and JVM languages.
- Spark mode: For data engineering jobs that create and write features into the feature store or generate training datasets. It requires a Spark environment such as the one provided in the Hopsworks platform or Databricks. In Spark mode, HSFS provides bindings both for Python and JVM languages.

- Python mode : For data science jobs to explore the features available in the feature store, generate training datasets and feed them in a training pipeline. Python mode requires just a Python interpreter and can be used both in Hopsworks from Python Jobs/Jupyter Kernels, Amazon SageMaker, KubeFlow.
- Python mode: For data science jobs to explore the features available in the feature store, generate training datasets and feed them in a training pipeline. Python mode requires just a Python interpreter and can be used both in Hopsworks from Python Jobs/Jupyter Kernels, Amazon SageMaker or KubeFlow.

The library automatically configures itself based on the environment it is run.
However, to connect from an external environment such as Databricks or AWS Sagemaker,
additional connection information, such as host and port, is required. For more information about the setup from external environments, see the setup section.

You can read more about the Hopsworks Feature Store and its concepts [here](https://hopsworks.readthedocs.io)

Getting Started
---------------
## Getting Started On Hopsworks

Instantiate a connection and get the project feature store handler
```python
Expand All @@ -38,7 +67,7 @@ fg.save(dataframe)
Join features together
```python
feature_join = rain_fg.select_all()
.join(temperature_fg.select_all(), ["date", "location_id"])
.join(temperature_fg.select_all(), on=["date", "location_id"])
.join(location_fg.select_all()))

feature_join.show(5)
Expand All @@ -47,23 +76,35 @@ feature_join.show(5)
Use the query object to create a training dataset:
```python
td = fs.create_training_dataset("training_dataset",
version=1,
data_format="tfrecords",
description="A test training dataset saved in TfRecords format",
splits={'train': 0.7, 'test': 0.2, 'validate': 0.1})
version=1,
data_format="tfrecords",
description="A test training dataset saved in TfRecords format",
splits={'train': 0.7, 'test': 0.2, 'validate': 0.1})

td.save(feature_join)
```

Feed the training dataset to a TensorFlow model:
```python
train_input_feeder = training_dataset.feed(target_name='label',split='train', is_training=True)
train_input_feeder = training_dataset.feed(target_name="label",
split="train",
is_training=True)
train_input = train_input_feeder.tf_record_dataset()
```

You can find more examples on how to use the library in our [hops-examples](https://github.com/logicalclocks/hops-examples) repository.

Issues
------
## Documentation

Documentation is available at [Hopsworks Feature Store Documentation](https://docs.hopsworks.ai/).

## Issues

For general questions about the usage of Hopsworks and the Feature Store please open a topic on [Hopsworks Community](https://community.hopsworks.ai/).

Please report any issue using [Github issue tracking](https://github.com/logicalclocks/feature-store-api/issues).


## Contributing

Please report any issue using [Github issue tracking](https://github.com/logicalclocks/feature-store-api/issues)
If you would like to contribute to this library, please see the [Contribution Guidelines](CONTRIBUTING.md).
1 change: 0 additions & 1 deletion docs/installation.md

This file was deleted.

3 changes: 3 additions & 0 deletions docs/integrations/databricks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Databricks Integration

## TBD
Loading

0 comments on commit 4128aa5

Please sign in to comment.