# Read external files using OPENROWSET(BULK) function

The `OPENROWSET(BULK)` enables you to read CSV, Parquet, and JSONL files stored in Azure Data Lake storage. You ned to specify the URI of the file(s) and you will get the content as a set of rows.

## Read content of a Parquet file

In the first example, we will inspect data from a Parquet file. Use the following code to read sample data from a file using the `OPENROWSET(BULK)` function with a Parquet source:

In [None]:
SELECT TOP 10 * 
FROM OPENROWSET(BULK 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/bing_covid-19_data/latest/bing_covid-19_data.parquet')

## Read content of a CSV file

In the second example, we will inspect data from a CSV file. Use the following code to read sample data from a file using the `OPENROWSET(BULK)` function with a CSV source:

In [None]:
SELECT TOP 10 * 
FROM OPENROWSET(BULK 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/bing_covid-19_data/latest/bing_covid-19_data.csv',
                HEADER_ROW=True,
                ROWTERMINATOR='\n',
                FIELDTERMINATOR=',')

## Read content of a JSONL file

In the first example, we will inspect data from a JSON Lines file. Use the following code to read sample data from a file using the `OPENROWSET(BULK)` function:

In [None]:
SELECT TOP 10 * 
FROM OPENROWSET(BULK 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/bing_covid-19_data/latest/bing_covid-19_data.jsonl')

## Explore column metadata

With the `OPENROWSET(BULK)` function, you can easily view the file columns and their types by combining the query that reads sample data with the `sp_describe_first_result_set` procedure:

In [None]:
EXEC sp_describe_first_result_set 
N'SELECT TOP 0 * 
FROM OPENROWSET(BULK ''https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/bing_covid-19_data/latest/bing_covid-19_data.parquet'')';

## Specify the schema of OPENROWSET function

The `OPENROWSET(BULK)` function returns estimated column types based on a sample of the data.

If the sample is not representative, you might get unexpected types or their sizes.

If you know the column types in your files, you can explicitly define the schema of the columns using the WITH clause:

In [None]:
SELECT TOP 10 * 
FROM OPENROWSET(BULK 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/bing_covid-19_data/latest/bing_covid-19_data.parquet')
WITH (updated date,
      load_time datetime2,
      deaths_change smallint,
      id int,
      confirmed int,
      confirmed_change int,
      deaths int,
      recovered int,
      recovered_change int,
      latitude float,
      longitude float,
      iso2 varchar(20),
      iso3 varchar(20),
      country_region varchar(100),
      admin_region_1 varchar(100),
      iso_subdivision varchar(100),
      admin_region_2 varchar(100)
) AS data;