# Reading External Files with OPENROWSET(BULK)

The `OPENROWSET(BULK)` function allows you to directly read and query files stored in Azure Data Lake Storage as if they were database tables. This is especially useful for data exploration and analysis without requiring prior data import or transformation.

You can use `OPENROWSET(BULK)` to read files in the following formats:
- Comma-separated values (CSV) format, including variants such as tab-separated and custom encodings
- Parquet format
- [JSON Lines (JSONL)](https://jsonlines.org/) format

To access file content, simply specify the URI of the target file(s). The function returns the data as a set of rows, ready for querying and further processing.

## Reading Data from a Parquet File

Let's start by exploring the contents of a Parquet file. You can use the following example to read sample data using the `OPENROWSET(BULK)` function, specifying Parquet as the source format:

In [None]:
SELECT TOP 10 * 
FROM OPENROWSET(BULK 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/bing_covid-19_data/latest/bing_covid-19_data.parquet')

You might expect to see the content of this file in a tabular output, for example:


| id     | updated    | confirmed | deaths | recovered | country_region |
|--------|------------|-----------|--------|-----------|----------------|
| 338995 | 2020-01-21 | 262       | NULL   | 0         | Worldwide      |
| 338996 | 2020-01-22 | 313       | 51     | 0         | Worldwide      |
| 338997 | 2020-01-23 | 578       | 265    | 0         | Worldwide      |

## Reading Data from a CSV File

You can also examine data stored in a CSV file. The following example demonstrates how to retrieve sample data using the `OPENROWSET(BULK)` function, specifying CSV as the input format:

In [None]:
SELECT TOP 10 * 
FROM OPENROWSET(BULK 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/bing_covid-19_data/latest/bing_covid-19_data.csv',
                HEADER_ROW=True,
                ROWTERMINATOR='\n',
                FIELDTERMINATOR=',')

The output should match the previous example, even though the data comes from a different file. If your file doesn't follow the standard CSV format, you may need to adjust formatting parameters such as the field and row terminators.

## Reading Content of a JSONL File

JSON Lines files are another supported format you can explore. The following example demonstrates how to read sample data using the `OPENROWSET(BULK)` function:

In [None]:
SELECT TOP 10 * 
FROM OPENROWSET(BULK 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/bing_covid-19_data/latest/bing_covid-19_data.jsonl')

## Explore column metadata

With the `OPENROWSET(BULK)` function, you can easily view the file columns and their types by combining the query that reads sample data with the `sp_describe_first_result_set` procedure:

In [None]:
EXEC sp_describe_first_result_set 
N'SELECT TOP 0 * 
FROM OPENROWSET(BULK ''https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/bing_covid-19_data/latest/bing_covid-19_data.parquet'')';

As a result, you will get a table with the column names, types and other information, for example:

| column_ordinal	| name	| is_nullable	| system_type_id	| system_type_name	| max_length	| precision	| scale	| collation_name |
|---|---|---|---|---|---|---|---| --- |
| 1	| id	| 1	| 56	| int	| 4	| 10	| 0	| NULL |
| 2	| updated | 1	| 40	| date	| 3	| 10	| 0	| NULL | 
| 3	| country_region	| 1	| 167	| varchar(max)	| -1 |	0	| 0	| Latin1_General_100_BIN2_UTF8 | 

## Specify the schema of OPENROWSET function

The `OPENROWSET(BULK)` function returns estimated column types based on a sample of the data.

If the sample is not representative, you might get unexpected types or their sizes.

If you know the column types in your files, you can explicitly define the schema of the columns using the WITH clause:

In [None]:
SELECT TOP 10 * 
FROM OPENROWSET(BULK 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/bing_covid-19_data/latest/bing_covid-19_data.parquet')
WITH (updated date,
      load_time datetime2,
      deaths_change smallint,
      id int,
      confirmed int,
      confirmed_change int,
      deaths int,
      recovered int,
      recovered_change int,
      latitude float,
      longitude float,
      iso2 varchar(20),
      iso3 varchar(20),
      country_region varchar(100),
      admin_region_1 varchar(100),
      iso_subdivision varchar(100),
      admin_region_2 varchar(100)
) AS data;

As a result, only the columns defined in the `WITH` clause will be returned, each with the data type you specified.

# Learn More

For detailed guidance on using `OPENROWSET(BULK)` in Fabric Data Warehouse, explore the official documentation:  
[Browse file content with OPENROWSET](https://learn.microsoft.com/fabric/data-warehouse/browse-file-content-with-openrowset)