diff --git a/spiceaidocs/docs/data-connectors/ftp.md b/spiceaidocs/docs/data-connectors/ftp.md index e3b11fe2..8aae9270 100644 --- a/spiceaidocs/docs/data-connectors/ftp.md +++ b/spiceaidocs/docs/data-connectors/ftp.md @@ -7,72 +7,61 @@ description: 'FTP/SFTP Data Connector Documentation' import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; -The FTP/SFTP Data Connector enables federated SQL query across files stored in FTP/SFTP servers. +The FTP/SFTP Data Connector enables federated SQL query across Parquet/CSV files stored in FTP/SFTP servers. -Supports Parquet and CSV file formats. +If a folder is provided, all child Parquet/CSV files will be loaded. -If a folder is proivided, all child files will be loaded. - -To connect to any FTP/SFTP server, specify `ftp` or `sftp` as a selector in the `from` value for the dataset. +## Configuration - ```yaml - datasets: - - from: ftp:///path/to/folder/ - name: my_dataset - ``` - - - ```yaml - datasets: - - from: sftp:///path/to/folder/ - name: my_dataset - ``` - - + ### Parameters -## Configuration + The connection to FTP can be configured by providing the following params: - - - - `file_format`: Optional parameter, specifies the requested file format. + - `file_format`: Optional, specifies the requested file format. - `parquet`: (default) Parquet file format. - `csv`: CSV file format. - - `ftp_port`: Optional parameter, specifies the port of the FTP server. Default is 21. E.g. `ftp_port: 21` + - `ftp_port`: Optional, specifies the port of the FTP server. Default is 21. E.g. `ftp_port: 21` - `ftp_user`: The username for the FTP server. E.g. `ftp_user: my-ftp-user` - `ftp_pass`: The password for the FTP server. E.g. `ftp_pass: my-ftp-password` - `ftp_pass_key`: The secret key container the password to connect with. E.g. `ftp_pass_key: my-ftp-password-key` + + More CSV related parameters can be configured, see [CSV Parameters](../reference/file_format.md#CSV) + + ### Examples + ```yaml + - from: ftp://remote-ftp-server.com/path/to/folder/ + name: my_dataset + params: + file_format: csv + ftp_user: my-ftp-user + ftp_pass_key: my-ftp-password + ``` - - `file_format`: Optional parameter, specifies the requested file format. + ### Parameters + + The connection to SFTP can be configured by providing the following params: + + - `file_format`: Optional, specifies the requested file format. - `parquet`: (default) Parquet file format. - `csv`: CSV file format. - - `sftp_port`: Optional parameter, specifies the port of the SFTP server. Default is 22. E.g. `sftp_port: 22` - - `sftp_user`: The username for the FTP server. E.g. `sftp_user: my-sftp-user` - - `sftp_pass`: The password for the FTP server. E.g. `sftp_pass: my-sftp-password` + - `sftp_port`: Optional, specifies the port of the SFTP server. Default is 22. E.g. `sftp_port: 22` + - `sftp_user`: The username for the SFTP server. E.g. `sftp_user: my-sftp-user` + - `sftp_pass`: The password for the SFTP server. E.g. `sftp_pass: my-sftp-password` - `sftp_pass_key`: The secret key container the password to connect with. E.g. `sftp_pass_key: my-sftp-password-key` - - - -Configuration `params` are provided either in the top level `dataset` for a dataset source and federated SQL query. - -```yaml - - from: ftp://remote-ftp-server.com/path/to/folder/ - name: my_dataset - params: - file_format: csv - ftp_user: my-ftp-user - ftp_pass_key: my-ftp-password -``` - -```yaml - - from: sftp://remote-ftp-server.com/path/to/folder/ - name: my_dataset - params: - sftp_port: 20 - sftp_user: my-ftp-user - sftp_pass_key: my-ftp-password -``` + More CSV related parameters can be configured, see [CSV Parameters](../reference/file_format.md#CSV) + ### Examples + ```yaml + - from: sftp://remote-sftp-server.com/path/to/folder/ + name: my_dataset + params: + sftp_port: 20 + sftp_user: my-sftp-user + sftp_pass_key: my-sftp-password + ``` + + \ No newline at end of file diff --git a/spiceaidocs/docs/data-connectors/s3.md b/spiceaidocs/docs/data-connectors/s3.md index ba0443a3..28516bde 100644 --- a/spiceaidocs/docs/data-connectors/s3.md +++ b/spiceaidocs/docs/data-connectors/s3.md @@ -7,11 +7,11 @@ description: 'S3 Data Connector Documentation' import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; -The S3 Data Connector enables federated SQL query across Parquet files stored in S3, or S3-compatible storage solutions (e.g. MinIO, Cloudflare R2). +The S3 Data Connector enables federated SQL query across Parquet/CSV files stored in S3, or S3-compatible storage solutions (e.g. MinIO, Cloudflare R2). -Support for Iceberg, CSV, and other file-formats are on the roadmap. +Support for Iceberg and other file-formats are on the roadmap. -If a folder is provided, all child Parquet files will be loaded. +If a folder is provided, all child Parquet/CSV files will be loaded. ## Dataset Schema Reference @@ -29,12 +29,12 @@ Example: `name: cool_dataset` ### `params` (optional) -- `file_format`: Specifies the requested file format. Default is `parquet`. - - `parquet`: (default) Parquet file format. - - `csv`: CSV file format. - `endpoint`: The S3 endpoint, or equivalent (e.g. MinIO endpoint), for the S3-compatible storage. Defaults to region endpoint. E.g. `endpoint: https://my.minio.server` - `region`: Region of the S3 bucket, if region specific. Default value is `us-east-1` E.g. `region: us-east-1` - `timeout`: Specifies timeout for S3 operations. Default value is `30s` E.g. `timeout: 60s` +- `file_format`: Optional. The file format to query against, either `csv` or `parquet`. Defaults to `parquet`. + +More CSV related parameters can be configured, see [CSV Parameters](../reference/file_format.md#CSV) ## Auth diff --git a/spiceaidocs/docs/reference/file_format.md b/spiceaidocs/docs/reference/file_format.md new file mode 100644 index 00000000..a37f3e2a --- /dev/null +++ b/spiceaidocs/docs/reference/file_format.md @@ -0,0 +1,20 @@ +--- +title: "File Formats" +sidebar_label: "File Formats" +pagination_prev: 'reference/index' +pagination_next: null +--- + +Spice currently supports CSV and Parquet data file-formats. Support for Iceberg and other file-formats are on the roadmap. + +The parameters supported for specific file-format are detailed on this page. + +## CSV + +### Parameters + +- `has_header`: Optional. Indicate if the CSV file has header row. Defaults to `true` +- `quote`: Optional. A one-character string used to quote fields containing special characters. Defaults to `"` +- `escape`: Optional. A one-character string used to represent special characters or to include characters that would normally be interpreted as delimiters or new line characters within a field value. Defaults to `null` +- `schema_infer_max_records`: Optional. A number used to set the limit in terms of records to scan to infer the schema. Defaults to `1000` +- `delimiter`: Optional. A one-character string used to separate individual fields. Defaults to `,`