Move deprecated integrations docs and clean up code files (#1655)

* clean up bigquery plugin files Signed-off-by: nikki everett <nikki@union.ai> * update index page to reflect that deprecated integration docs have moved Signed-off-by: nikki everett <nikki@union.ai> * remove markdown from deprecated examples Signed-off-by: nikki everett <nikki@union.ai> * clean up READMEs Signed-off-by: nikki everett <nikki@union.ai> * exclude deprecated integrations from flytesnacks docs build Signed-off-by: nikki everett <nikki@union.ai> * fix typo Signed-off-by: nikki everett <nikki@union.ai> * formatting Signed-off-by: nikki everett <nikki@union.ai> --------- Signed-off-by: nikki everett <nikki@union.ai>
flyteorg · Apr 25, 2024 · 9bbf486 · 9bbf486
1 parent 1ff40f2
commit 9bbf486
Show file tree

Hide file tree

Showing 12 changed files with 38 additions and 379 deletions.
diff --git a/docs/conf.py b/docs/conf.py
@@ -182,6 +182,10 @@
     "extending",
     "productionizing",
     "testing",
+    "bigquery_plugin",
+    "databricks_plugin",
+    "mmcloud_plugin",
+    "snowflake_plugin",
 ]
 
 # intersphinx configuration

diff --git a/docs/deprecated_integrations.md b/docs/deprecated_integrations.md
diff --git a/docs/index.md b/docs/index.md
@@ -138,11 +138,11 @@ auto_examples/whylogs_plugin/index
 :caption: Deprecated integrations
 :hidden:
 
-Deprecated integrations <deprecated_integrations>
-auto_examples/bigquery_plugin/index
-auto_examples/databricks_plugin/index
-auto_examples/mmcloud_plugin/index
-auto_examples/snowflake_plugin/index
+Deprecated integrations <https://docs.flyte.org/en/latest/deprecated_integrations/index.html>
+BigQuery plugin <https://docs.flyte.org/en/latest/deprecated_integrations/bigquery_plugin/index.html>
+Databricks plugin <https://docs.flyte.org/en/latest/deprecated_integrations/databricks_plugin/index.html>
+MMCloud plugin <https://docs.flyte.org/en/latest/deprecated_integrations/mmcloud_plugin/index.html>
+Snowflake plugin <https://docs.flyte.org/en/latest/deprecated_integrations/snowflake_plugin/index.html>
 ```
 
 ```{toctree}

diff --git a/examples/bigquery_plugin/README.md b/examples/bigquery_plugin/README.md
@@ -1,40 +1,5 @@
-(bigquery_plugin)=
+# BigQuery plugin example
 
-# BigQuery plugin
+**Note:** This example code uses the legacy implementation of the BigQuery integration. We recommend using the [BigQuery agent](https://docs.flyte.org/en/latest/flytesnacks/examples/bigquery_agent/index.html) instead.
 
-```{note}
-
-This is a legacy implementation of the BigQuery integration. We recommend using the {ref}`BigQuery agent <bigquery_agent>` instead.
-
-```
-
-## Installation
-
-To install the BigQuery plugin, run the following command:
-
-```{eval-rst}
-.. prompt:: bash
-
-    pip install flytekitplugins-bigquery
-```
-
-This plugin is purely a spec. Since SQL is completely portable, there is no need to build a Docker container.
-
-## Example usage
-
-For a usage example, see the {doc}`BigQuery example query<bigquery_plugin_example>` page.
-
-## Flyte deployment configuration
-
-BigQuery plugins are [enabled in FlytePropeller's config](https://docs.flyte.org/en/latest/deployment/plugin_setup/gcp/bigquery.html#deployment-plugin-setup-gcp-bigquery).
-
-To run the BigQuery plugin on a Flyte cluster, you must configure it in your Flyte deployment. For more information, see the
-{std:ref}`BigQuery plugin setup guide <flyte:deployment-plugin-setup-gcp-bigquery>`.
-
-```{toctree}
-:maxdepth: -1
-:hidden:
-
-bigquery_plugin_example
-
-```
+This directory contains example code for the deprecated BigQuery plugin. For documentation on installing and using the plugin, see the [BigQuery plugin documentation](https://docs.flyte.org/en/latest/deprecated_integrations/bigquery_plugin/index.html)
diff --git a/examples/bigquery_plugin/bigquery_plugin/bigquery_plugin_example.py b/examples/bigquery_plugin/bigquery_plugin/bigquery_plugin_example.py
@@ -1,8 +1,3 @@
-# %% [markdown]
-# # BigQuery example query
-#
-# This example shows how to use a Flyte BigQueryTask to execute a query.
-# %%
 try:
     from typing import Annotated
 except ImportError:
@@ -12,10 +7,8 @@
 from flytekit import StructuredDataset, kwtypes, task, workflow
 from flytekitplugins.bigquery import BigQueryConfig, BigQueryTask
 
-# %% [markdown]
-# This is the world's simplest query. Note that in order for registration to work properly, you'll need to give your
+# Note that in order for registration to work properly, you'll need to give your
 # BigQuery task a name that's unique across your project/domain for your Flyte installation.
-# %%
 bigquery_task_no_io = BigQueryTask(
     name="sql.bigquery.no_io",
     inputs={},
@@ -29,15 +22,6 @@ def no_io_wf():
     return bigquery_task_no_io()
 
 
-# %% [markdown]
-# Of course, in real world applications we are usually more interested in using BigQuery to query a dataset.
-# In this case we use crypto_dogecoin data which is public dataset in BigQuery.
-# [here](https://console.cloud.google.com/bigquery?project=bigquery-public-data&page=table&d=crypto_dogecoin&p=bigquery-public-data&t=transactions)
-#
-# Let's look out how we can parameterize our query to filter results for a specific transaction version, provided as a user input
-# specifying a version.
-
-# %%
 DogeCoinDataset = Annotated[StructuredDataset, kwtypes(hash=str, size=int, block_number=int)]
 
 bigquery_task_templatized_query = BigQueryTask(
@@ -50,10 +34,6 @@ def no_io_wf():
 )
 
 
-# %% [markdown]
-# StructuredDataset transformer can convert query result to pandas dataframe here.
-# We can also change "pandas.dataframe" to "pyarrow.Table", and convert result to Arrow table.
-# %%
 @task
 def convert_bq_table_to_pandas_dataframe(sd: DogeCoinDataset) -> pd.DataFrame:
     return sd.open(pd.DataFrame).all()
@@ -65,6 +45,4 @@ def full_bigquery_wf(version: int) -> pd.DataFrame:
     return convert_bq_table_to_pandas_dataframe(sd=sd)
 
 
-# %% [markdown]
 # Check query result on bigquery console: `https://console.cloud.google.com/bigquery`
-#
diff --git a/examples/databricks_agent/README.md b/examples/databricks_agent/README.md
@@ -1,6 +1,6 @@
 (databricks_agent)=
 
-# Databricks agent
+# Databricks agent example
 
 ```{eval-rst}
 .. tags:: Spark, Integration, DistributedComputing, Data, Advanced

diff --git a/examples/databricks_plugin/README.md b/examples/databricks_plugin/README.md
@@ -1,62 +1,5 @@
-# Databricks plugin
+# Databricks plugin example
 
-```{eval-rst}
-.. tags:: Spark, Integration, DistributedComputing, Data, Advanced
-```
+**Note:** This example code uses a legacy implementation of the Databricks integration. We recommend using the [Databricks agent](https://docs.flyte.org/en/latest/flytesnacks/examples/databricks_agent/index.html) instead.
 
-```{note}
-
-This is a legacy implementation of the Databricks integration. We recommend using the {ref}`Databricks agent <databricks_agent>` instead.
-
-```
-
-Flyte can be integrated with the [Databricks](https://www.databricks.com/) service,
-enabling you to submit Spark jobs to the Databricks platform.
-
-## Installation
-
-The Databricks plugin comes bundled with the Spark plugin. To install the Spark plugin, run the following command:
-
-```
-pip install flytekitplugins-spark
-
-```
-
-## Flyte deployment configuration
-
-To run the Databricks plugin on a Flyte cluster, you must configure it in your Flyte deployment. For more information, see the
-{std:ref}`Databricks plugin setup guide <flyte:deployment-plugin-setup-webapi-databricks>`.
-
-## Example usage
-
-For a usage example, see the {doc}`Databricks plugin example <databricks_plugin_example>` page.
-
-### Run the example on the Flyte cluster
-
-To run the provided example on a Flyte cluster, use the following command:
-
-```
-pyflyte run --remote \
-  --image ghcr.io/flyteorg/flytecookbook:databricks_plugin-latest \
-  https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/databricks_plugin/databricks_plugin/databricks_job.py \
-  my_databricks_job
-```
-
-Using Spark on Databricks allows comprehensive versioning through a
-custom-built Spark container. This container also facilitates the execution of standard Spark tasks.
-
-To use Spark, the image should employ a base image provided by Databricks,
-and the workflow code must be copied to `/databricks/driver`.
-
-```{literalinclude} ../../../examples/databricks_plugin/Dockerfile
-:language: docker
-:emphasize-lines: 1,7-8,20
-```
-
-
-```{toctree}
-:maxdepth: -1
-:hidden:
-
-databricks_plugin_example
-```
+This directory contains example code for the deprecated Databricks plugin. For documentation on installing and using the plugin, see the [Databricks plugin documentation](https://docs.flyte.org/en/latest/deprecated_integrations/databricks_plugin/index.html)
diff --git a/examples/databricks_plugin/databricks_plugin/databricks_plugin_example.py b/examples/databricks_plugin/databricks_plugin/databricks_plugin_example.py
@@ -1,10 +1,3 @@
-# %% [markdown]
-# (spark_on_databricks_plugin)=
-#
-# # Running Spark on Databricks
-#
-# To begin, import the required dependencies.
-# %%
 import datetime
 import random
 from operator import add
@@ -14,11 +7,12 @@
 from flytekitplugins.spark import Databricks
 
 
-# %% [markdown]
-# To run a Spark job on the Databricks platform, simply include Databricks configuration in the task config.
-# The Databricks config is the same as the Databricks job request. For more details, please refer to the
-# [Databricks job request](https://docs.databricks.com/dev-tools/api/2.0/jobs.html#request-structure) documentation.
-# %%
+# To run a Spark job on the Databricks platform,
+# include the Databricks configuration in the task config.
+# The Databricks configuration is the same as the Databricks job request.
+# For more details, please refer to the
+# Databricks job request documentation:
+# https://docs.databricks.com/dev-tools/api/2.0/jobs.html#request-structure
 @task(
     task_config=Databricks(
         spark_conf={
@@ -56,39 +50,32 @@ def hello_spark(partitions: int) -> float:
     return pi_val
 
 
-# %% [markdown]
-# For this particular example,
-# we define a function that executes the map-reduce operation within the Spark cluster.
-# %%
+# Define a function that executes a map-reduce operation
+# within the Spark cluster.
 def f(_):
     x = random.random() * 2 - 1
     y = random.random() * 2 - 1
     return 1 if x**2 + y**2 <= 1 else 0
 
 
-# %% [markdown]
-# Additionally, we define a standard Flyte task that won't be executed on the Spark cluster.
-# %%
+# Define a standard Flyte task that won't be executed on the Spark cluster.
 @task(cache_version="1")
 def print_every_time(value_to_print: float, date_triggered: datetime.datetime) -> int:
     print("My printed value: {} @ {}".format(value_to_print, date_triggered))
     return 1
 
 
-# %% [markdown]
-# Finally, define a workflow that connects your tasks in a sequence.
-# Remember, Spark and non-Spark tasks can be chained together as long as their parameter specifications match.
-# %%
+# Define a workflow that connects your tasks in a sequence.
+# Spark and non-Spark tasks can be chained together
+# as long as their parameter specifications match.
 @workflow
 def my_databricks_job(triggered_date: datetime.datetime = datetime.datetime.now()) -> float:
     pi = hello_spark(partitions=1)
     print_every_time(value_to_print=pi, date_triggered=triggered_date)
     return pi
 
 
-# %% [markdown]
-# You can execute the workflow locally.
-# %%
+# Execute the workflow locally
 if __name__ == "__main__":
     print(f"Running {__file__} main...")
     print(

diff --git a/examples/mmcloud_plugin/README.md b/examples/mmcloud_plugin/README.md
@@ -1,98 +1,3 @@
-```{eval-rst}
-.. tags:: AWS, GCP, AliCloud, Integration, Advanced
-```
+**Note:** This example code uses a legacy implementation of the MemVerge Memory Machine Cloud (MMCloud) integration. We recommend using the [MMCloud agent](https://docs.flyte.org/en/latest/flytesnacks/examples/mmcloud_agent/index.html) instead.
 
-(mmcloud_plugin)=
-
-# Memory Machine Cloud plugin
-
-```{note}
-
-This is a legacy implementation of the Memory Machine Cloud integration. We recommend using the {ref}`Memory Machine Cloud agent <mmcloud_agent>` instead.
-
-```
-
-[MemVerge](https://memverge.com/) [Memory Machine Cloud](https://www.mmcloud.io/) (MMCloud)—available on AWS, GCP, and AliCloud—empowers users to continuously optimize cloud resources during runtime, safely execute stateful tasks on spot instances, and monitor resource usage in real time. These capabilities make it an excellent fit for long-running batch workloads. Flyte can be integrated with MMCloud, allowing you to execute Flyte tasks using MMCloud.
-
-## Installation
-
-To install the plugin, run the following command:
-
-```{eval-rst}
-.. prompt:: bash
-
-    pip install flytekitplugins-mmcloud
-```
-
-To get started with MMCloud, see the [MMCloud user guide](https://docs.memverge.com/mmce/current/userguide/olh/index.html).
-
-## Flyte deployment configuration
-
-The MMCloud plugin is [enabled in FlytePropeller's configuration](https://docs.flyte.org/en/latest/deployment/plugins/memverge/mmcloud.html).
-
-## Getting Started
-
-This plugin allows executing `PythonFunctionTask` using MMCloud without changing any function code.
-
-```{eval-rst}
-.. testcode:: awsbatch-quickstart
-    from flytekitplugins.mmcloud import MMCloudConfig
-
-    @task(task_config=MMCloudConfig())
-    def to_str(i: int) -> str:
-        return str(i)
-```
-
-[Resource](https://docs.flyte.org/en/latest/user_guide/productionizing/customizing_task_resources.html) (cpu and mem) requests and limits, [container](https://docs.flyte.org/en/latest/user_guide/customizing_dependencies/multiple_images_in_a_workflow.html) images, and environment variable specifications are supported.
-
-[ImageSpec](https://docs.flyte.org/en/latest/user_guide/customizing_dependencies/imagespec.html#image-spec-example) may be used to define images to run tasks.
-
-### Credentials
-
-The following [secrets](https://docs.flyte.org/en/latest/user_guide/productionizing/secrets.html) are required to be defined for the agent server:
-* `mmc_address`: MMCloud OpCenter address
-* `mmc_username`: MMCloud OpCenter username
-* `mmc_password`: MMCloud OpCenter password
-
-### Defaults
-
-Compute resources:
-* If only requests are specified, there are no limits.
-* If only limits are specified, the requests are equal to the limits.
-* If neither resource requests nor limits are specified, the default requests used for job submission are `cpu="1"` and `mem="1Gi"`, and there are no limits.
-
-### Agent Image
-
-Install `flytekitplugins-mmcloud` in the agent image.
-
-A `float` binary (obtainable via the OpCenter) is required. Copy it to the agent image `PATH`.
-
-Sample `Dockerfile` for building an agent image:
-```dockerfile
-FROM python:3.11-slim-bookworm
-
-WORKDIR /root
-ENV PYTHONPATH /root
-
-# flytekit will autoload the agent if package is installed.
-RUN pip install flytekitplugins-mmcloud
-COPY float /usr/local/bin/float
-
-# For flytekit versions <= v1.10.2, use pyflyte serve.
-# CMD pyflyte serve --port 8000
-# For flytekit versions > v1.10.2, use pyflyte serve agent.
-CMD pyflyte serve agent --port 8000
-```
-
-## Example usage
-
-For a usage example, see the {ref}`Memory Machine Cloud example <mmcloud_plugin_example>` page.
-
-
-
-```{toctree}
-:maxdepth: -1
-:hidden:
-
-mmcloud_plugin_example
-```
+This directory contains example code for the deprecated MMCloud plugin. For documentation on installing and using the plugin, see the [MMCloud plugin documentation](https://docs.flyte.org/en/latest/deprecated_integrations/mmcloud_plugin/index.html)