## Question 7. External table storage

Where is the data stored in the External Table you created?

- Big Query
- Container Registry
- GCP Bucket
- Big Table

Answer: GCP Bucket

Because an external table in BigQuery only points to files stored outside BigQuery (in our case, Parquet files in GCS).

## Question 8. Clustering best practices

It is best practice in Big Query to always cluster your data:

- True
- False

Answer: False

Clustering is not always best practice.

You should cluster only when:
- you frequently filter / group / order by certain columns
- and the table is large enough for clustering to actually help

Otherwise, clustering can add unnecessary overhead.

## Question 9. Understanding table scans

Write a SELECT count(*) query FROM the materialized table you created. How many bytes does it estimate will be read? Why?

In [None]:
SELECT COUNT(*)
FROM `zoomcamp_hw3.yellow_tripdata_2024`;

![Question 9](./images/question9.png)

BigQuery shows:
This query will process 0 B when run 
and still returned: 20332093

Why is it showing 0 B ?
Because BigQuery can answer COUNT(*) using table metadata.

For native (materialized) tables, BigQuery often stores the row count in metadata.
So it does not need to scan any data blocks.

This is why:
- 0 bytes are read
- but you still get the correct row count.

This is expected behaviour - 
Our table is: zoomcamp_hw3.yellow_tripdata_2024

which is:
- a native BigQuery table (not external)
- not a view
- not using any filters

BigQuery optimizes: COUNT(*)
â†’ and answers it directly from metadata.

Earlier we said: COUNT(*) usually scans the full table
That is logically true, but in practice:

BigQuery uses a metadata shortcut when possible.
So in our case the correct explanation is:

BigQuery is able to return the result of COUNT(*) from table metadata, therefore the query processes 0 bytes.