Skip to content

Commit

Permalink
Move deprecated integrations docs and clean up code files (#1655)
Browse files Browse the repository at this point in the history
* clean up bigquery plugin files

Signed-off-by: nikki everett <nikki@union.ai>

* update index page to reflect that deprecated integration docs have moved

Signed-off-by: nikki everett <nikki@union.ai>

* remove markdown from deprecated examples

Signed-off-by: nikki everett <nikki@union.ai>

* clean up READMEs

Signed-off-by: nikki everett <nikki@union.ai>

* exclude deprecated integrations from flytesnacks docs build

Signed-off-by: nikki everett <nikki@union.ai>

* fix typo

Signed-off-by: nikki everett <nikki@union.ai>

* formatting

Signed-off-by: nikki everett <nikki@union.ai>

---------

Signed-off-by: nikki everett <nikki@union.ai>
  • Loading branch information
neverett committed Apr 25, 2024
1 parent 1ff40f2 commit 9bbf486
Show file tree
Hide file tree
Showing 12 changed files with 38 additions and 379 deletions.
4 changes: 4 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -182,6 +182,10 @@
"extending",
"productionizing",
"testing",
"bigquery_plugin",
"databricks_plugin",
"mmcloud_plugin",
"snowflake_plugin",
]

# intersphinx configuration
Expand Down
17 changes: 0 additions & 17 deletions docs/deprecated_integrations.md

This file was deleted.

10 changes: 5 additions & 5 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -138,11 +138,11 @@ auto_examples/whylogs_plugin/index
:caption: Deprecated integrations
:hidden:
Deprecated integrations <deprecated_integrations>
auto_examples/bigquery_plugin/index
auto_examples/databricks_plugin/index
auto_examples/mmcloud_plugin/index
auto_examples/snowflake_plugin/index
Deprecated integrations <https://docs.flyte.org/en/latest/deprecated_integrations/index.html>
BigQuery plugin <https://docs.flyte.org/en/latest/deprecated_integrations/bigquery_plugin/index.html>
Databricks plugin <https://docs.flyte.org/en/latest/deprecated_integrations/databricks_plugin/index.html>
MMCloud plugin <https://docs.flyte.org/en/latest/deprecated_integrations/mmcloud_plugin/index.html>
Snowflake plugin <https://docs.flyte.org/en/latest/deprecated_integrations/snowflake_plugin/index.html>
```

```{toctree}
Expand Down
41 changes: 3 additions & 38 deletions examples/bigquery_plugin/README.md
Original file line number Diff line number Diff line change
@@ -1,40 +1,5 @@
(bigquery_plugin)=
# BigQuery plugin example

# BigQuery plugin
**Note:** This example code uses the legacy implementation of the BigQuery integration. We recommend using the [BigQuery agent](https://docs.flyte.org/en/latest/flytesnacks/examples/bigquery_agent/index.html) instead.

```{note}
This is a legacy implementation of the BigQuery integration. We recommend using the {ref}`BigQuery agent <bigquery_agent>` instead.
```

## Installation

To install the BigQuery plugin, run the following command:

```{eval-rst}
.. prompt:: bash
pip install flytekitplugins-bigquery
```

This plugin is purely a spec. Since SQL is completely portable, there is no need to build a Docker container.

## Example usage

For a usage example, see the {doc}`BigQuery example query<bigquery_plugin_example>` page.

## Flyte deployment configuration

BigQuery plugins are [enabled in FlytePropeller's config](https://docs.flyte.org/en/latest/deployment/plugin_setup/gcp/bigquery.html#deployment-plugin-setup-gcp-bigquery).

To run the BigQuery plugin on a Flyte cluster, you must configure it in your Flyte deployment. For more information, see the
{std:ref}`BigQuery plugin setup guide <flyte:deployment-plugin-setup-gcp-bigquery>`.

```{toctree}
:maxdepth: -1
:hidden:
bigquery_plugin_example
```
This directory contains example code for the deprecated BigQuery plugin. For documentation on installing and using the plugin, see the [BigQuery plugin documentation](https://docs.flyte.org/en/latest/deprecated_integrations/bigquery_plugin/index.html)
Original file line number Diff line number Diff line change
@@ -1,8 +1,3 @@
# %% [markdown]
# # BigQuery example query
#
# This example shows how to use a Flyte BigQueryTask to execute a query.
# %%
try:
from typing import Annotated
except ImportError:
Expand All @@ -12,10 +7,8 @@
from flytekit import StructuredDataset, kwtypes, task, workflow
from flytekitplugins.bigquery import BigQueryConfig, BigQueryTask

# %% [markdown]
# This is the world's simplest query. Note that in order for registration to work properly, you'll need to give your
# Note that in order for registration to work properly, you'll need to give your
# BigQuery task a name that's unique across your project/domain for your Flyte installation.
# %%
bigquery_task_no_io = BigQueryTask(
name="sql.bigquery.no_io",
inputs={},
Expand All @@ -29,15 +22,6 @@ def no_io_wf():
return bigquery_task_no_io()


# %% [markdown]
# Of course, in real world applications we are usually more interested in using BigQuery to query a dataset.
# In this case we use crypto_dogecoin data which is public dataset in BigQuery.
# [here](https://console.cloud.google.com/bigquery?project=bigquery-public-data&page=table&d=crypto_dogecoin&p=bigquery-public-data&t=transactions)
#
# Let's look out how we can parameterize our query to filter results for a specific transaction version, provided as a user input
# specifying a version.

# %%
DogeCoinDataset = Annotated[StructuredDataset, kwtypes(hash=str, size=int, block_number=int)]

bigquery_task_templatized_query = BigQueryTask(
Expand All @@ -50,10 +34,6 @@ def no_io_wf():
)


# %% [markdown]
# StructuredDataset transformer can convert query result to pandas dataframe here.
# We can also change "pandas.dataframe" to "pyarrow.Table", and convert result to Arrow table.
# %%
@task
def convert_bq_table_to_pandas_dataframe(sd: DogeCoinDataset) -> pd.DataFrame:
return sd.open(pd.DataFrame).all()
Expand All @@ -65,6 +45,4 @@ def full_bigquery_wf(version: int) -> pd.DataFrame:
return convert_bq_table_to_pandas_dataframe(sd=sd)


# %% [markdown]
# Check query result on bigquery console: `https://console.cloud.google.com/bigquery`
#
2 changes: 1 addition & 1 deletion examples/databricks_agent/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
(databricks_agent)=

# Databricks agent
# Databricks agent example

```{eval-rst}
.. tags:: Spark, Integration, DistributedComputing, Data, Advanced
Expand Down
63 changes: 3 additions & 60 deletions examples/databricks_plugin/README.md
Original file line number Diff line number Diff line change
@@ -1,62 +1,5 @@
# Databricks plugin
# Databricks plugin example

```{eval-rst}
.. tags:: Spark, Integration, DistributedComputing, Data, Advanced
```
**Note:** This example code uses a legacy implementation of the Databricks integration. We recommend using the [Databricks agent](https://docs.flyte.org/en/latest/flytesnacks/examples/databricks_agent/index.html) instead.

```{note}
This is a legacy implementation of the Databricks integration. We recommend using the {ref}`Databricks agent <databricks_agent>` instead.
```

Flyte can be integrated with the [Databricks](https://www.databricks.com/) service,
enabling you to submit Spark jobs to the Databricks platform.

## Installation

The Databricks plugin comes bundled with the Spark plugin. To install the Spark plugin, run the following command:

```
pip install flytekitplugins-spark
```

## Flyte deployment configuration

To run the Databricks plugin on a Flyte cluster, you must configure it in your Flyte deployment. For more information, see the
{std:ref}`Databricks plugin setup guide <flyte:deployment-plugin-setup-webapi-databricks>`.

## Example usage

For a usage example, see the {doc}`Databricks plugin example <databricks_plugin_example>` page.

### Run the example on the Flyte cluster

To run the provided example on a Flyte cluster, use the following command:

```
pyflyte run --remote \
--image ghcr.io/flyteorg/flytecookbook:databricks_plugin-latest \
https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/databricks_plugin/databricks_plugin/databricks_job.py \
my_databricks_job
```

Using Spark on Databricks allows comprehensive versioning through a
custom-built Spark container. This container also facilitates the execution of standard Spark tasks.

To use Spark, the image should employ a base image provided by Databricks,
and the workflow code must be copied to `/databricks/driver`.

```{literalinclude} ../../../examples/databricks_plugin/Dockerfile
:language: docker
:emphasize-lines: 1,7-8,20
```


```{toctree}
:maxdepth: -1
:hidden:
databricks_plugin_example
```
This directory contains example code for the deprecated Databricks plugin. For documentation on installing and using the plugin, see the [Databricks plugin documentation](https://docs.flyte.org/en/latest/deprecated_integrations/databricks_plugin/index.html)
Original file line number Diff line number Diff line change
@@ -1,10 +1,3 @@
# %% [markdown]
# (spark_on_databricks_plugin)=
#
# # Running Spark on Databricks
#
# To begin, import the required dependencies.
# %%
import datetime
import random
from operator import add
Expand All @@ -14,11 +7,12 @@
from flytekitplugins.spark import Databricks


# %% [markdown]
# To run a Spark job on the Databricks platform, simply include Databricks configuration in the task config.
# The Databricks config is the same as the Databricks job request. For more details, please refer to the
# [Databricks job request](https://docs.databricks.com/dev-tools/api/2.0/jobs.html#request-structure) documentation.
# %%
# To run a Spark job on the Databricks platform,
# include the Databricks configuration in the task config.
# The Databricks configuration is the same as the Databricks job request.
# For more details, please refer to the
# Databricks job request documentation:
# https://docs.databricks.com/dev-tools/api/2.0/jobs.html#request-structure
@task(
task_config=Databricks(
spark_conf={
Expand Down Expand Up @@ -56,39 +50,32 @@ def hello_spark(partitions: int) -> float:
return pi_val


# %% [markdown]
# For this particular example,
# we define a function that executes the map-reduce operation within the Spark cluster.
# %%
# Define a function that executes a map-reduce operation
# within the Spark cluster.
def f(_):
x = random.random() * 2 - 1
y = random.random() * 2 - 1
return 1 if x**2 + y**2 <= 1 else 0


# %% [markdown]
# Additionally, we define a standard Flyte task that won't be executed on the Spark cluster.
# %%
# Define a standard Flyte task that won't be executed on the Spark cluster.
@task(cache_version="1")
def print_every_time(value_to_print: float, date_triggered: datetime.datetime) -> int:
print("My printed value: {} @ {}".format(value_to_print, date_triggered))
return 1


# %% [markdown]
# Finally, define a workflow that connects your tasks in a sequence.
# Remember, Spark and non-Spark tasks can be chained together as long as their parameter specifications match.
# %%
# Define a workflow that connects your tasks in a sequence.
# Spark and non-Spark tasks can be chained together
# as long as their parameter specifications match.
@workflow
def my_databricks_job(triggered_date: datetime.datetime = datetime.datetime.now()) -> float:
pi = hello_spark(partitions=1)
print_every_time(value_to_print=pi, date_triggered=triggered_date)
return pi


# %% [markdown]
# You can execute the workflow locally.
# %%
# Execute the workflow locally
if __name__ == "__main__":
print(f"Running {__file__} main...")
print(
Expand Down
99 changes: 2 additions & 97 deletions examples/mmcloud_plugin/README.md
Original file line number Diff line number Diff line change
@@ -1,98 +1,3 @@
```{eval-rst}
.. tags:: AWS, GCP, AliCloud, Integration, Advanced
```
**Note:** This example code uses a legacy implementation of the MemVerge Memory Machine Cloud (MMCloud) integration. We recommend using the [MMCloud agent](https://docs.flyte.org/en/latest/flytesnacks/examples/mmcloud_agent/index.html) instead.

(mmcloud_plugin)=

# Memory Machine Cloud plugin

```{note}
This is a legacy implementation of the Memory Machine Cloud integration. We recommend using the {ref}`Memory Machine Cloud agent <mmcloud_agent>` instead.
```

[MemVerge](https://memverge.com/) [Memory Machine Cloud](https://www.mmcloud.io/) (MMCloud)—available on AWS, GCP, and AliCloud—empowers users to continuously optimize cloud resources during runtime, safely execute stateful tasks on spot instances, and monitor resource usage in real time. These capabilities make it an excellent fit for long-running batch workloads. Flyte can be integrated with MMCloud, allowing you to execute Flyte tasks using MMCloud.

## Installation

To install the plugin, run the following command:

```{eval-rst}
.. prompt:: bash
pip install flytekitplugins-mmcloud
```

To get started with MMCloud, see the [MMCloud user guide](https://docs.memverge.com/mmce/current/userguide/olh/index.html).

## Flyte deployment configuration

The MMCloud plugin is [enabled in FlytePropeller's configuration](https://docs.flyte.org/en/latest/deployment/plugins/memverge/mmcloud.html).

## Getting Started

This plugin allows executing `PythonFunctionTask` using MMCloud without changing any function code.

```{eval-rst}
.. testcode:: awsbatch-quickstart
from flytekitplugins.mmcloud import MMCloudConfig
@task(task_config=MMCloudConfig())
def to_str(i: int) -> str:
return str(i)
```

[Resource](https://docs.flyte.org/en/latest/user_guide/productionizing/customizing_task_resources.html) (cpu and mem) requests and limits, [container](https://docs.flyte.org/en/latest/user_guide/customizing_dependencies/multiple_images_in_a_workflow.html) images, and environment variable specifications are supported.

[ImageSpec](https://docs.flyte.org/en/latest/user_guide/customizing_dependencies/imagespec.html#image-spec-example) may be used to define images to run tasks.

### Credentials

The following [secrets](https://docs.flyte.org/en/latest/user_guide/productionizing/secrets.html) are required to be defined for the agent server:
* `mmc_address`: MMCloud OpCenter address
* `mmc_username`: MMCloud OpCenter username
* `mmc_password`: MMCloud OpCenter password

### Defaults

Compute resources:
* If only requests are specified, there are no limits.
* If only limits are specified, the requests are equal to the limits.
* If neither resource requests nor limits are specified, the default requests used for job submission are `cpu="1"` and `mem="1Gi"`, and there are no limits.

### Agent Image

Install `flytekitplugins-mmcloud` in the agent image.

A `float` binary (obtainable via the OpCenter) is required. Copy it to the agent image `PATH`.

Sample `Dockerfile` for building an agent image:
```dockerfile
FROM python:3.11-slim-bookworm

WORKDIR /root
ENV PYTHONPATH /root

# flytekit will autoload the agent if package is installed.
RUN pip install flytekitplugins-mmcloud
COPY float /usr/local/bin/float

# For flytekit versions <= v1.10.2, use pyflyte serve.
# CMD pyflyte serve --port 8000
# For flytekit versions > v1.10.2, use pyflyte serve agent.
CMD pyflyte serve agent --port 8000
```

## Example usage

For a usage example, see the {ref}`Memory Machine Cloud example <mmcloud_plugin_example>` page.



```{toctree}
:maxdepth: -1
:hidden:
mmcloud_plugin_example
```
This directory contains example code for the deprecated MMCloud plugin. For documentation on installing and using the plugin, see the [MMCloud plugin documentation](https://docs.flyte.org/en/latest/deprecated_integrations/mmcloud_plugin/index.html)

0 comments on commit 9bbf486

Please sign in to comment.