
# BigQuery Partitioning & Clustering

This hands-on notebook covers:
- Create Partitioned and Clustered Table
- Verifying partitions via `INFORMATION_SCHEMA.PARTITIONS`


**Partitioning** stores data in segments (e.g., by `DATE`), so queries can scan only relevant partitions—often reducing scanned bytes and speeding up queries.  
**Clustering** organizes rows within a table (or per-partition) by selected columns (e.g., `City`), which can lower scanned data for filters on the clustering columns.  
**Partition metadata** is visible at `INFORMATION_SCHEMA.PARTITIONS`.  



### Authenticate and load BigQuery magics


In [None]:
# Authenticate
from google.colab import auth
auth.authenticate_user()

# Set your GCP project id
PROJECT_ID = "pp-bigquery-02"   # <— change if needed

import os
os.environ["GOOGLE_CLOUD_PROJECT"] = PROJECT_ID

from google.cloud.bigquery.magics import context
context.project = PROJECT_ID
print("Project set to:", context.project)



### Create dataset (schema)


In [None]:

%%bigquery
CREATE SCHEMA IF NOT EXISTS `pp-bigquery-02.partitioning_demo_<your_name>`;



### Create base table (no partitioning)
Columns: `Id, First Name, Last Name, Email Id, City, Country, Date`.


In [None]:

%%bigquery
CREATE OR REPLACE TABLE `pp-bigquery-02.partitioning_demo_<your_name>.non_partitioned_table` (
  Id INT64,
  `First Name` STRING,
  `Last Name` STRING,
  `Email Id` STRING,
  City STRING,
  Country STRING,
  `Date` DATE
);



### Load sample data


In [None]:
%%bigquery
INSERT INTO `pp-bigquery-02.partitioning_demo_<your_name>.non_partitioned_table`
(Id, `First Name`, `Last Name`, `Email Id`, City, Country, `Date`)
VALUES
(1,'John','Dsouza','john.dsouza@noemail.com','Bengaluru','India','2025-08-15'),
(2,'Peter','Pereira','peter.pereira@noemail.com','Mumbai','India','2025-07-30'),
(3,'Mary','Fernandes','mary.fernandes@noemail.com','Chennai','India','2025-05-01'),
(4,'Joseph','Mathew','joseph.mathew@noemail.com','Bengaluru','India','2025-07-30'),
(5,'Thomas','Thomas','thomas.thomas@noemail.com','Mumbai','India','2025-08-15'),
(6,'Grace','Pinto','grace.pinto@noemail.com','Chennai','India','2025-05-01'),
(7,'Harpreet','Singh','harpreet.singh@noemail.com','Bengaluru','India','2025-05-01'),
(8,'Gurpreet','Kaur','gurpreet.kaur@noemail.com','Mumbai','India','2025-07-30'),
(9,'Manjit','Gill','manjit.gill@noemail.com','Chennai','India','2025-08-15'),
(10,'Aamir','Khan','aamir.khan@noemail.com','Bengaluru','India','2025-07-30'),
(11,'Ayesha','Ahmed','ayesha.ahmed@noemail.com','Mumbai','India','2025-05-01'),
(12,'Hemant','Jain','hemant.jain@noemail.com','Chennai','India','2025-07-30'),
(13,'Prachi','Jain','prachi.jain@noemail.com','Bengaluru','India','2025-08-15'),
(14,'Rustom','Irani','rustom.irani@noemail.com','Mumbai','India','2025-08-15'),
(15,'Pavan','Gowda','pavan.gowda@noemail.com','Chennai','India','2025-07-30'),
(16,'Suresh','Kamble','suresh.kamble@noemail.com','Bengaluru','India','2025-05-01'),
(17,'Aarav','Sharma','aarav.sharma@noemail.com','Bengaluru','India','2025-08-15'),
(18,'Vivaan','Verma','vivaan.verma@noemail.com','Mumbai','India','2025-07-30'),
(19,'Aditya','Reddy','aditya.reddy@noemail.com','Chennai','India','2025-05-01'),
(20,'Vihaan','Iyer','vihaan.iyer@noemail.com','Bengaluru','India','2025-07-30'),
(21,'Arjun','Patel','arjun.patel@noemail.com','Mumbai','India','2025-08-15'),
(22,'Reyansh','Gupta','reyansh.gupta@noemail.com','Chennai','India','2025-05-01'),
(23,'Krishna','Naidu','krishna.naidu@noemail.com','Bengaluru','India','2025-05-01'),
(24,'Ishaan','Menon','ishaan.menon@noemail.com','Mumbai','India','2025-07-30'),
(25,'Shaurya','Nair','shaurya.nair@noemail.com','Chennai','India','2025-08-15'),
(26,'Atharv','Bhat','atharv.bhat@noemail.com','Bengaluru','India','2025-07-30'),
(27,'Ananya','Shetty','ananya.shetty@noemail.com','Mumbai','India','2025-05-01'),
(28,'Diya','Rao','diya.rao@noemail.com','Chennai','India','2025-07-30'),
(29,'Kiara','Shukla','kiara.shukla@noemail.com','Bengaluru','India','2025-08-15'),
(30,'Aadhya','Mishra','aadhya.mishra@noemail.com','Mumbai','India','2025-08-15'),
(31,'Ira','Kulkarni','ira.kulkarni@noemail.com','Chennai','India','2025-05-01'),
(32,'Anika','Desai','anika.desai@noemail.com','Bengaluru','India','2025-07-30'),
(33,'Saanvi','Joshi','saanvi.joshi@noemail.com','Mumbai','India','2025-08-15'),
(34,'Myra','Murthy','myra.murthy@noemail.com','Chennai','India','2025-05-01'),
(35,'Navya','Chakraborty','navya.chakraborty@noemail.com','Bengaluru','India','2025-07-30'),
(36,'Aarohi','Banerjee','aarohi.banerjee@noemail.com','Mumbai','India','2025-05-01'),
(37,'Rohan','Chatterjee','rohan.chatterjee@noemail.com','Chennai','India','2025-07-30'),
(38,'Kabir','Mukherjee','kabir.mukherjee@noemail.com','Bengaluru','India','2025-08-15'),
(39,'Siddharth','Das','siddharth.das@noemail.com','Mumbai','India','2025-05-01'),
(40,'Yash','Ghosh','yash.ghosh@noemail.com','Chennai','India','2025-07-30'),
(41,'Kunal','Tripathi','kunal.tripathi@noemail.com','Bengaluru','India','2025-05-01'),
(42,'Ritika','Dubey','ritika.dubey@noemail.com','Mumbai','India','2025-07-30'),
(43,'Shruti','Pandey','shruti.pandey@noemail.com','Chennai','India','2025-08-15'),
(44,'Nisha','Singh','nisha.singh@noemail.com','Bengaluru','India','2025-08-15'),
(45,'Pooja','Yadav','pooja.yadav@noemail.com','Mumbai','India','2025-05-01'),
(46,'Meera','Kumar','meera.kumar@noemail.com','Chennai','India','2025-07-30'),
(47,'Aisha','Agarwal','aisha.agarwal@noemail.com','Bengaluru','India','2025-05-01'),
(48,'Neha','Mehta','neha.mehta@noemail.com','Mumbai','India','2025-08-15'),
(49,'Riya','Patil','riya.patil@noemail.com','Chennai','India','2025-05-01'),
(50,'Priya','Sawant','priya.sawant@noemail.com','Bengaluru','India','2025-07-30');


In [None]:
%%bigquery
SELECT *
FROM pp-bigquery-02.partitioning_demo_<your_name>.non_partitioned_table;

In [None]:

%%bigquery
SELECT *
FROM `pp-bigquery-02.partitioning_demo_<your_name>`.INFORMATION_SCHEMA.PARTITIONS
WHERE table_name = 'non_partitioned_table'
ORDER BY partition_id;




---


### Create **partitioned** table by `Date`


In [None]:

%%bigquery
CREATE OR REPLACE TABLE `pp-bigquery-02.partitioning_demo_<your_name>.customers_partitioned` (
  Id INT64,
  `First Name` STRING,
  `Last Name` STRING,
  `Email Id` STRING,
  City STRING,
  Country STRING,
  `Date` DATE
)
PARTITION BY `Date`;

INSERT INTO `pp-bigquery-02.partitioning_demo_<your_name>.customers_partitioned`
SELECT * FROM `pp-bigquery-02.partitioning_demo_<your_name>.non_partitioned_table`;



### Check partitions


In [None]:

%%bigquery
SELECT table_name, partition_id, total_rows
FROM `pp-bigquery-02.partitioning_demo_<your_name>`.INFORMATION_SCHEMA.PARTITIONS
WHERE table_name = 'customers_partitioned'



### Create **partitioned + clustered** table (by `Date`, clustered by `City`)


In [None]:

%%bigquery
CREATE OR REPLACE TABLE `pp-bigquery-02.partitioning_demo_<your_name>.customers_clustered` (
  Id INT64,
  `First Name` STRING,
  `Last Name` STRING,
  `Email Id` STRING,
  City STRING,
  Country STRING,
  `Date` DATE
)
PARTITION BY `Date`
CLUSTER BY City;

INSERT INTO `pp-bigquery-02.partitioning_demo_<your_name>.customers_clustered`
SELECT * FROM `pp-bigquery-02.partitioning_demo_<your_name>.non_partitioned_table`;



### Check partitions

In [None]:

%%bigquery
SELECT *
FROM `pp-bigquery-02.partitioning_demo_<your_name>`.INFORMATION_SCHEMA.PARTITIONS
WHERE table_name = 'customers_clustered'
ORDER BY partition_id;



### Clean up — drop the dataset

In [None]:

%%bigquery
DROP SCHEMA `pp-bigquery-02.partitioning_demo_<your_name>` CASCADE;
