# Databricks SQL Concepts - Multiple Choice Questions (MCQs)
## Topics: Querying Files, Writing to Tables, Advanced Transformations, Higher-Order Functions & UDFs

---

### **Question 1**
What function replaces the deprecated `input_file_name()` to get file path information when querying files?

**A)** `metadata.filename`  
**B)** `metadata.filepath`  
**C)** `file_path()`  
**D)** `input_path()`

**Correct Answer: B**  
*Explanation: The metadata.filepath attribute provides the full path to the input file, replacing the deprecated input_file_name() function.*

---

### **Question 2**
Which command is used to refresh metadata cache for external tables when new files are added?

**A)** `CACHE TABLE table_name`  
**B)** `REFRESH TABLE table_name`  
**C)** `RELOAD TABLE table_name`  
**D)** `UPDATE TABLE table_name`

**Correct Answer: B**  
*Explanation: REFRESH TABLE invalidates and reloads the metadata cache, making Databricks re-check underlying data files and schema.*

---

### **Question 3**
What is the correct syntax to query CSV files directly using the `read_files` function?

**A)** `SELECT * FROM read_files("path", "csv", header=true)`  
**B)** `SELECT * FROM read_files("path", format="csv", header="true")`  
**C)** `SELECT * FROM read_files("path", format => "csv", header => "true")`  
**D)** `SELECT * FROM read_files("path", format => "csv", header => true)`

**Correct Answer: C**  
*Explanation: The read_files function uses the => operator for parameter assignment in SQL.*

---

### **Question 4**
Which statement about `CREATE OR REPLACE TABLE` is true?

**A)** It fails if the table doesn't exist  
**B)** It creates a new table version and allows parallel reads during execution  
**C)** It requires the same schema as the original table  
**D)** It cannot be used with Delta tables

**Correct Answer: B**  
*Explanation: CREATE OR REPLACE TABLE creates a new version, allows concurrent reads, and preserves old data if the operation fails.*

---

### **Question 5**
What happens when you use `INSERT OVERWRITE` with a schema mismatch?

**A)** It automatically adjusts the schema  
**B)** It ignores extra columns  
**C)** It fails with a schema mismatch error  
**D)** It creates a new table

**Correct Answer: C**  
*Explanation: INSERT OVERWRITE requires schema compatibility and fails if there's a mismatch, unlike CREATE OR REPLACE TABLE.*

---

### **Question 6**
In a MERGE operation, what does the `WHEN NOT MATCHED` clause do?

**A)** Updates existing records  
**B)** Deletes unmatched records  
**C)** Inserts new records that don't exist in the target table  
**D)** Skips unmatched records

**Correct Answer: C**  
*Explanation: WHEN NOT MATCHED handles records in the source that don't exist in the target table by inserting them.*

---

### **Question 7**
What does the `from_json()` function require to parse JSON strings?

**A)** Only the JSON column name  
**B)** A schema definition or schema inference  
**C)** A temporary view  
**D)** The JSON file path

**Correct Answer: B**  
*Explanation: from_json() requires a schema to properly parse JSON strings into structured data.*

---

### **Question 8**
What is the purpose of the `explode()` function?

**A)** To combine multiple arrays into one  
**B)** To convert a single row with an array into multiple rows  
**C)** To remove duplicates from arrays  
**D)** To sort array elements

**Correct Answer: B**  
*Explanation: explode() transforms each element in an array into a separate row.*

---

### **Question 9**
What does `collect_set()` do differently from `collect_list()`?

**A)** It collects arrays instead of individual elements  
**B)** It maintains order while collect_list doesn't  
**C)** It returns unique values only  
**D)** It works only with numeric data

**Correct Answer: C**  
*Explanation: collect_set() collects unique elements while collect_list() can include duplicates.*

---

### **Question 10**
In the context of `collect_set(books.book_id)` followed by `flatten()` and `array_distinct()`, what is the purpose of the `flatten()` function?

**A)** To remove duplicates  
**B)** To merge nested arrays into a single-level array  
**C)** To sort the array elements  
**D)** To convert strings to lowercase

**Correct Answer: B**  
*Explanation: flatten() merges nested arrays (array of arrays) into a single flat array.*

---

### **Question 11**
What does the PIVOT operation do?

**A)** Converts columns to rows  
**B)** Converts rows to columns  
**C)** Sorts data by multiple columns  
**D)** Filters data based on conditions

**Correct Answer: B**  
*Explanation: PIVOT transforms row-based data into column-based format, creating columns from unique values in a specified column.*

---

### **Question 12**
In a PIVOT operation, what does the aggregation function (like `sum()`) do?

**A)** It counts the number of rows  
**B)** It computes values for each pivot column  
**C)** It sorts the pivot columns  
**D)** It filters the pivot data

**Correct Answer: B**  
*Explanation: The aggregation function computes the values that will populate each pivot column.*

---

### **Question 13**
What is a STRUCT in Databricks SQL?

**A)** A table creation statement  
**B)** A complex nested data type that groups multiple fields  
**C)** A type of JOIN operation  
**D)** A function for string manipulation

**Correct Answer: B**  
*Explanation: STRUCT is like a row within a row, grouping multiple fields together into a single column.*

---

### **Question 14**
Which higher-order function would you use to keep only array elements that satisfy a condition?

**A)** `transform()`  
**B)** `filter()`  
**C)** `reduce()`  
**D)** `exists()`

**Correct Answer: B**  
*Explanation: filter() keeps elements that satisfy a specified condition.*

---

### **Question 15**
What does the `transform()` higher-order function do?

**A)** Filters array elements  
**B)** Applies a function to each element in an array  
**C)** Checks if any element matches a condition  
**D)** Aggregates array elements

**Correct Answer: B**  
*Explanation: transform() applies a lambda function to each element in an array, returning a new array.*

---

### **Question 16**
In the expression `FILTER(books, i -> i.quantity >= 2)`, what does `i` represent?

**A)** The index position  
**B)** The entire books array  
**C)** Each individual element in the books array  
**D)** A counter variable

**Correct Answer: C**  
*Explanation: In lambda expressions for higher-order functions, 'i' represents each individual element being processed.*

---

### **Question 17**
What is the main purpose of User-Defined Functions (UDFs) in Databricks?

**A)** To replace all built-in functions  
**B)** To extend functionality when built-in functions are insufficient  
**C)** To improve query performance  
**D)** To create temporary views

**Correct Answer: B**  
*Explanation: UDFs extend Databricks' built-in functionality for custom logic that cannot be achieved with native functions.*

---

### **Question 18**
Which statement about SQL UDFs is correct?

**A)** They can only return string values  
**B)** They are always session-scoped  
**C)** They can be defined using CREATE FUNCTION syntax  
**D)** They require Python programming knowledge

**Correct Answer: C**  
*Explanation: SQL UDFs are defined using CREATE FUNCTION syntax and are written in pure SQL.*

---

### **Question 19**
What does the `rescued_data` column contain when using `read_files()`?

**A)** File metadata information  
**B)** Successfully parsed data  
**C)** Values that don't match the inferred schema  
**D)** Duplicate records

**Correct Answer: C**  
*Explanation: rescued_data stores values that don't match the expected schema as JSON strings.*

---

### **Question 20**
In a MERGE operation, if you get an error about `rescued_data` during INSERT, what is the best solution?

**A)** Drop the rescued_data column  
**B)** Use INSERT * clause  
**C)** Explicitly specify column names in the INSERT clause  
**D)** Change the source data format

**Correct Answer: C**  
*Explanation: Explicitly specifying column names in the INSERT clause avoids mapping errors with internal columns like rescued_data.*

---

## **Answer Key Summary**
1. B | 2. B | 3. C | 4. B | 5. C
6. C | 7. B | 8. B | 9. C | 10. B
11. B | 12. B | 13. B | 14. B | 15. B
16. C | 17. B | 18. C | 19. C | 20. C

---

## **Scoring Guide**
- **18-20 correct**: Excellent understanding
- **15-17 correct**: Good grasp of concepts
- **12-14 correct**: Satisfactory knowledge
- **Below 12**: Review recommended