Azure Synapse Analytics includes serverless SQL pools, which are tailored for querying data in a data lake. With a serverless SQL pool you can use SQL code to query data in files of various common formats without needing to load the file data into database storage. This capability helps data analysts and data engineers analyze and process file data in the data lake using a familiar data processing language, without the need to create or maintain a relational database store. (pay-per-query)

Azure Synapse SQL is a distributed query system in Azure Synapse Analytics that offers two kinds of runtime environments:

* Serverless SQL pool: on-demand SQL query processing, primarily used to work with data in a data lake.
* Dedicated SQL pool: Enterprise-scale relational database instances used to host data warehouses in which data is stored in relational tables.


### Benefits of serverless SQL pool

* A familiar Transact-SQL syntax to query data in place without the need to copy or load data into a specialized store.
* Integrated connectivity from a wide range of business intelligence and ad-hoc querying tools, including the most popular drivers.
* Distributed query processing that is built for large-scale data, and computational functions - resulting in fast query performance.
* Built-in query execution fault-tolerance, resulting in high reliability and success rates even for long-running queries involving large data sets.
* No infrastructure to setup or clusters to maintain. A built-in endpoint for this service is provided within every Azure Synapse workspace, so you can start querying data as soon as the workspace is created.
* No charge for resources reserved, you're only charged for the data processed by queries you run.


### Common use cases for serverless SQL pools

* Data exploration
        Data exploration involves browsing the data lake to get initial insights about the data, and is easily achievable with Azure Synapse Studio. You can browse through the files in your linked data lake storage, and use the built-in serverless SQL pool to automatically generate a SQL script to select TOP 100 rows from a file or folder just as you would do with a table in SQL Server. From there, you can apply projections, filtering, grouping, and most of the operation over the data as if the data were in a regular SQL Server table.
* Data transformation
        While Azure Synapse Analytics provides great data transformations capabilities with Synapse Spark, some data engineers might find data transformation easier to achieve using SQL. Serverless SQL pool enables you to perform SQL-based data transformations; either interactively or as part of an automated data pipeline.
* Logical data warehouse
        After your initial exploration of the data in the data lake, you can define external objects such as tables and views in a serverless SQL database. The data remains stored in the data lake files, but are abstracted by a relational schema that can be used by client applications and analytical tools to query the data as they would in a relational database hosted in SQL Server.


### Querying files using serverless SQL pool
Can be used to query CSV, JSON and Parquet. The basic syntax for querying is the same for all of these types of file, and is built on the <b>OPENROWSET</b> SQL function

In [None]:
# without datasource - uses basic authentication option  - SAS token for SQL logins
SELECT TOP 100 *
FROM OPENROWSET(
    BULK 'https://mydatalake.blob.core.windows.net/data/files/*.csv',
    FORMAT = 'csv',
    PARSER_VERSION = '2.0')
WITH (
    product_id INT,
    product_name VARCHAR(20) COLLATE Latin1_General_100_BIN2_UTF8,
    list_price DECIMAL(5,2)
) AS rows


In OPENROWSET without DATA_SOURCE authentication mechanism depends on caller type.

* Any user can use OPENROWSET without DATA_SOURCE to read publicly available files on Azure storage.
* Microsoft Entra logins can access protected files using their own Microsoft Entra identity if Azure storage allows the Microsoft Entra user to access underlying files (for example, if the caller has Storage Reader permission on Azure storage).
* SQL logins can also use OPENROWSET without DATA_SOURCE to access publicly available files, files protected using SAS token, or Managed Identity of Synapse workspace. <b>You would need to create server-scoped credential to allow access to storage files.</b> https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/develop-storage-files-storage-access-control?tabs=user-identity


### Create external database objects
You can use the OPENROWSET function in SQL queries that run in the default master database of the built-in serverless SQL pool to explore data in the data lake. However, sometimes you may want to create a custom database that contains some objects that make it easier to work with external data in the data lake that you need to query frequently.

In [None]:
CREATE DATABASE SCOPED CREDENTIAL sqlcred
WITH
    IDENTITY='SHARED ACCESS SIGNATURE',  
    SECRET = 'sv=xxx...';
GO

CREATE EXTERNAL DATA SOURCE secureFiles
WITH (
    LOCATION = 'https://mydatalake.blob.core.windows.net/data/secureFiles/'
    CREDENTIAL = sqlcred
);
GO
    
CREATE EXTERNAL FILE FORMAT CsvFormat
    WITH (
        FORMAT_TYPE = DELIMITEDTEXT,
        FORMAT_OPTIONS(
            FIELD_TERMINATOR = ',',
            STRING_DELIMITER = '"'
        )
    );
GO