# S3 Virtual Schema configuration

[S3 Virtual Schema](https://github.com/exasol/s3-document-files-virtual-schema) is an Exasol extension that allows access
to structured and semi-structured documents residing in AWS S3 buckets. Once configured and set up, you can query JSON, 
Parquet and CSV data directly from the database, as if they were imported into the Exasol tables.

In this notebook we setting up the extension in the database and creating the required scripts.

In [None]:
from exasol.nb_connector import github, bfs_utils, cloud_storage
from exasol.nb_connector.connections import open_bucketfs_connection, open_pyexasol_connection

# TODO: to be moved into notebook-connector's Project enum
import enum

class MyProj(enum.Enum):
    S3_DOCUMENT_VS = "s3-document-files-virtual-schema" 


jar_local_path = github.retrieve_jar(MyProj.S3_DOCUMENT_VS, use_local_cache=True)
bfs_bucket = open_bucketfs_connection(ai_lab_config)
bfs_path = bfs_utils.put_file(bfs_bucket, jar_local_path)

In [None]:
SQLS = [
        "OPEN SCHEMA {schema!i}",
        """
--/
CREATE OR REPLACE JAVA ADAPTER SCRIPT S3_FILES_ADAPTER AS
    %scriptclass com.exasol.adapter.RequestDispatcher;
    %jar {jar_path!r};
/
        """,
        """
--/
CREATE OR REPLACE JAVA SET SCRIPT IMPORT_FROM_S3_DOCUMENT_FILES(
  DATA_LOADER VARCHAR(2000000),
  SCHEMA_MAPPING_REQUEST VARCHAR(2000000),
  CONNECTION_NAME VARCHAR(500))
  EMITS(...) AS
    %scriptclass com.exasol.adapter.document.UdfEntryPoint;
    %jar {jar_path!r};
/
        """,
]

In [None]:
with open_pyexasol_connection(ai_lab_config) as conn:
    for sql in SQLS:
        conn.execute(sql, query_params={
            "schema": ai_lab_config.db_schema,
            "jar_path": bfs_path.as_udf_path(),
        })

print("S3 Virtual Schema was initialized")