**INTERSECT**
- The **INTERSECT** operator returns the **common rows** between the result sets of **two or more SELECT** statements.

**Topics Covered**
- Ex 01: Basic INTERSECT usage
- Ex 02: Common Customers in Two Tables
- Ex 03: Different column orders, still works if data types and count match
- Ex 04: Error due to different column counts
- Ex 05: Case-sensitive match
- Ex 06: INTERSECT with Multiple Columns

✅ Basic Rules:

**Same Number of Columns:**
- **Both SELECT queries** must return the **same number of columns**.

**Same Data Types (or compatible types):**
- The corresponding columns must have **compatible data types**.

**Column Order Matters:**
- The comparison is based on **column position, not name**.

**Duplicates Removed:**
- INTERSECT returns only **distinct rows** that appear in **both result sets**.
- It **removes duplicates** by default. Duplicates are removed, the result is always **distinct rows**.

**ORDER BY** must come **after** the INTERSECT block.

**Syntax**

     SELECT column1, column2, ...
     FROM table1
     [WHERE condition]
     INTERSECT
     SELECT column1, column2, ...
     FROM table2;
     [WHERE condition]

**Ex 01: Basic INTERSECT usage**
- Common **Country & City** in Two Tables

In [0]:
%sql
DROP TABLE IF EXISTS tbl_intersect_Customers;

CREATE TABLE tbl_intersect_Customers(
    ID INT,
    Name VARCHAR(20),
    Country VARCHAR(20),
    City VARCHAR(20)
);

INSERT INTO tbl_intersect_Customers VALUES (1, 'Aakash', 'INDIA', 'Mumbai');
INSERT INTO tbl_intersect_Customers VALUES (2, 'George', 'USA', 'New York');
INSERT INTO tbl_intersect_Customers VALUES (3, 'David', 'INDIA', 'Bangalore');
INSERT INTO tbl_intersect_Customers VALUES (4, 'Leo', 'SPAIN', 'Madrid');
INSERT INTO tbl_intersect_Customers VALUES (5, 'Rahul', 'INDIA', 'Delhi');
INSERT INTO tbl_intersect_Customers VALUES (6, 'Brian', 'USA', 'Chicago');
INSERT INTO tbl_intersect_Customers VALUES (7, 'Justin', 'SPAIN', 'Barcelona');

SELECT * FROM tbl_intersect_Customers;

ID,Name,Country,City
7,Justin,SPAIN,Barcelona
3,David,INDIA,Bangalore
1,Aakash,INDIA,Mumbai
2,George,USA,New York
5,Rahul,INDIA,Delhi
6,Brian,USA,Chicago
4,Leo,SPAIN,Madrid


In [0]:
%sql
DROP TABLE IF EXISTS tbl_intersect_Branches;

CREATE TABLE tbl_intersect_Branches(
    Branch_Code INT,
    Country VARCHAR(20),
    City VARCHAR(20)
);

INSERT INTO tbl_intersect_Branches VALUES (101, 'INDIA', 'Mumbai');
INSERT INTO tbl_intersect_Branches VALUES (201, 'INDIA', 'Bangalore');
INSERT INTO tbl_intersect_Branches VALUES (301, 'USA', 'Chicago');
INSERT INTO tbl_intersect_Branches VALUES (401, 'USA', 'New York');
INSERT INTO tbl_intersect_Branches VALUES (501, 'SPAIN', 'Madrid');

SELECT * FROM tbl_intersect_Branches;

Branch_Code,Country,City
201,INDIA,Bangalore
101,INDIA,Mumbai
401,USA,New York
501,SPAIN,Madrid
301,USA,Chicago


In [0]:
%sql
SELECT Country, City FROM tbl_intersect_Customers
INTERSECT
SELECT Country, City FROM tbl_intersect_Branches
ORDER BY City;

Country,City
INDIA,Bangalore
USA,Chicago
SPAIN,Madrid
INDIA,Mumbai
USA,New York


**Ex 02: Common Customers in Two Tables**
- **Result:** Employees who are part of **both departments**.

In [0]:
%sql
-- Step 1: Create department_A table
CREATE TABLE tbl_intersect_department_A (
    employee_id INT,
    employee_name VARCHAR(50)
);

-- Step 2: Insert data into department_A
INSERT INTO tbl_intersect_department_A (employee_id, employee_name)
VALUES
(1, 'Alekya'),
(2, 'Baskar'),
(3, 'Chandra'),
(4, 'Darshan');

SELECT * FROM tbl_intersect_department_A;

employee_id,employee_name
1,Alekya
2,Baskar
3,Chandra
4,Darshan


In [0]:
%sql

-- Step 1: Create department_B table
CREATE TABLE tbl_intersect_department_B (
    employee_id INT,
    employee_name VARCHAR(50)
);

-- Step 2: Insert data into department_B
INSERT INTO tbl_intersect_department_B (employee_id, employee_name)
VALUES
(3, 'Chandra'),
(4, 'Darshan'),
(5, 'Sampath'),
(6, 'Nirosha');

SELECT * FROM tbl_intersect_department_B;

employee_id,employee_name
3,Chandra
4,Darshan
5,Sampath
6,Nirosha


In [0]:
%sql
-- Query to find common employees in both departments
SELECT employee_name FROM tbl_intersect_department_A
INTERSECT
SELECT employee_name FROM tbl_intersect_department_B;

employee_name
Chandra
Darshan


**Ex 03: Different column orders, still works if data types and count match**

In [0]:
%sql
CREATE TABLE tblStudents_2023 (
    student_id INT,
    student_name VARCHAR(50)
);

CREATE TABLE tblStudents_2024 (
    student_id INT,
    student_name VARCHAR(50)
);

INSERT INTO tblStudents_2023 VALUES (1, 'John'), (2, 'Asha'), (3, 'David'), (4, 'Priya');
INSERT INTO tblStudents_2024 VALUES (2, 'Asha'), (3, 'David'), (5, 'Kiran');

SELECT * FROM tblStudents_2023;
SELECT * FROM tblStudents_2024;

student_id,student_name
2,Asha
3,David
5,Kiran


In [0]:
%sql
SELECT student_id, student_name FROM tblStudents_2023
INTERSECT
SELECT student_id, student_name FROM tblStudents_2024;

student_id,student_name
2,Asha
3,David


In [0]:
%sql
-- Works because same data types and order is consistent even though column names are swapped.
SELECT student_name, student_id FROM tblStudents_2023
INTERSECT
SELECT student_name, student_id FROM tblStudents_2024;

student_name,student_id
Asha,2
David,3


**Ex 04: Error due to different column counts**

In [0]:
%sql
-- Error: INTERSECT queries must have the same number of columns.
SELECT student_id FROM tblStudents_2023
INTERSECT
SELECT student_id, student_name FROM tblStudents_2024;

org.apache.spark.sql.catalyst.ExtendedAnalysisException: [NUM_COLUMNS_MISMATCH] INTERSECT can only be performed on inputs with the same number of columns, but the first input has 1 columns and the second input has 2 columns. SQLSTATE: 42826; line 1 pos 0;
'Intersect false
:- Project [student_id#19589]
:  +- SubqueryAlias spark_catalog.default.tblStudents_2023
:     +- Relation spark_catalog.default.tblstudents_2023[student_id#19589,student_name#19590] parquet
+- Project [student_id#19591, student_name#19592]
   +- SubqueryAlias spark_catalog.default.tblStudents_2024
      +- Relation spark_catalog.default.tblstudents_2024[student_id#19591,student_name#19592] parquet

	at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:55)
	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$33(CheckAnalysis.scala:643)
	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$33$adapted(CheckAnalysis.scala:634)


**Ex 05: Case-sensitive match**

In [0]:
%sql
-- Add this row to 2024
INSERT INTO tbl_intersect_Customers VALUES (8, 'Suresh', 'India', 'Mumbai');

SELECT * FROM tbl_intersect_Customers;

ID,Name,Country,City
7,Justin,SPAIN,Barcelona
3,David,INDIA,Bangalore
1,Aakash,INDIA,Mumbai
2,George,USA,New York
8,Suresh,India,Mumbai
5,Rahul,INDIA,Delhi
6,Brian,USA,Chicago
4,Leo,SPAIN,Madrid


In [0]:
%sql
SELECT Country FROM tbl_intersect_Customers
INTERSECT
SELECT Country FROM tbl_intersect_Branches

Country
SPAIN
INDIA
USA


**Ex 06: INTERSECT with Multiple Columns**
- **Result:** Employees present in both years.

In [0]:
%sql
-- Step 1: Create employees_2024 table
DROP TABLE IF EXISTS tbl_intersect_employees_2024;

CREATE TABLE tbl_intersect_employees_2024 (
    employee_id INT,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    department VARCHAR(50)
);

-- Step 3: Insert data into employees_2024
INSERT INTO tbl_intersect_employees_2024 (employee_id, first_name, last_name, department)
VALUES
(1, 'Pandit', 'Smiti', 'HR'),
(2, 'Naresh', 'Kumar', 'Finance'),
(3, 'Mohan', 'Rao', 'IT'),
(4, 'Andy', 'Smith', 'Marketing');

SELECT * FROM tbl_intersect_employees_2024;

employee_id,first_name,last_name,department
1,Pandit,Smiti,HR
2,Naresh,Kumar,Finance
3,Mohan,Rao,IT
4,Andy,Smith,Marketing


In [0]:
%sql
-- Step 2: Create employees_2025 table
DROP TABLE IF EXISTS tbl_intersect_employees_2025;

CREATE TABLE tbl_intersect_employees_2025 (
    employee_id INT,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    department VARCHAR(50)
);

-- Step 4: Insert data into employees_2025
INSERT INTO tbl_intersect_employees_2025 (employee_id, first_name, last_name, department)
VALUES
(5, 'Mohan', 'Rao', 'IT'),
(6, 'Andy', 'Smith', 'Sales'),
(7, 'Elango', 'Thiru', 'Finance'),
(8, 'Faisal', 'Ali', 'IT');

SELECT * FROM tbl_intersect_employees_2025;

employee_id,first_name,last_name,department
5,Mohan,Rao,IT
6,Andy,Smith,Sales
7,Elango,Thiru,Finance
8,Faisal,Ali,IT


In [0]:
%sql
-- INTERSECT Query to find common employees based on name
SELECT first_name, last_name FROM tbl_intersect_employees_2024
INTERSECT
SELECT first_name, last_name FROM tbl_intersect_employees_2025;

first_name,last_name
Mohan,Rao
Andy,Smith


**Summary**

| Rule                     | Enforced? |
| ------------------------ | --------- |
| Same # of columns        | ✅ Yes     |
| Same data types          | ✅ Yes     |
| Duplicate rows removed   | ✅ Yes     |
| Case sensitive           | ✅ Yes     |
| ORDER BY only at the end | ✅ Yes     |