# CHATGPT : 

Here are **30 SQL questions** that are commonly asked for **Data Scientist** and **Machine Learning Engineer** roles, especially for someone with **5 years of experience**. These questions cover a mix of **basic**, **intermediate**, and **advanced** topics.

---

### 🔥 **Top 30 SQL Interview Questions and Answers:**

---

#### **1. What is the difference between INNER JOIN and LEFT JOIN?**
- **INNER JOIN:** Returns records that have matching values in both tables.
- **LEFT JOIN:** Returns all records from the left table and matched records from the right table. Unmatched right-side records return **NULL**.

---

#### **2. How do you find the **nth** highest salary from an Employee table?**
```sql
SELECT DISTINCT salary 
FROM Employee 
ORDER BY salary DESC 
LIMIT 1 OFFSET n-1;
```
For the **3rd highest** salary, use `OFFSET 2`.

---

#### **3. What is a Self Join, and when would you use it?**
- A **Self Join** is a table joined with itself.
- Useful for comparing rows within the same table.
```sql
SELECT e1.name AS Employee, e2.name AS Manager 
FROM Employee e1 
JOIN Employee e2 ON e1.manager_id = e2.id;
```

---

#### **4. Explain the difference between WHERE and HAVING.**
- **WHERE:** Filters **rows** before aggregation.
- **HAVING:** Filters **groups** after aggregation.
```sql
SELECT department, COUNT(*) 
FROM Employee 
WHERE salary > 50000 
GROUP BY department 
HAVING COUNT(*) > 5;
```

---

#### **5. What is the purpose of the RANK(), DENSE_RANK(), and ROW_NUMBER() functions?**
- **RANK():** Assigns rank with gaps for ties.
- **DENSE_RANK():** Assigns consecutive ranks for ties.
- **ROW_NUMBER():** Assigns unique row numbers.

---

#### **6. How do you handle NULL values in SQL?**
- Use **IS NULL** and **IS NOT NULL** for conditions.
- Use **COALESCE()** to replace NULL with a default value.
```sql
SELECT COALESCE(salary, 0) AS salary FROM Employee;
```

---

#### **7. How do you find duplicate records in a table?**
```sql
SELECT name, COUNT(*) 
FROM Employee 
GROUP BY name 
HAVING COUNT(*) > 1;
```

---

#### **8. Difference between UNION and UNION ALL.**
- **UNION:** Removes duplicates.
- **UNION ALL:** Keeps all records, including duplicates.

---

#### **9. Explain CTE (Common Table Expression) and its use cases.**
- Temporary result set within a query.
- Useful for recursive queries and improving code readability.
```sql
WITH SalesCTE AS (
    SELECT employee_id, SUM(amount) AS total_sales 
    FROM Sales 
    GROUP BY employee_id
)
SELECT * FROM SalesCTE WHERE total_sales > 10000;
```

---

#### **10. How to write a correlated subquery?**
- A subquery that depends on the outer query.
```sql
SELECT name, salary 
FROM Employee e 
WHERE salary > (SELECT AVG(salary) FROM Employee WHERE department = e.department);
```

---

#### **11. How do you calculate the running total in SQL?**
```sql
SELECT salary, 
       SUM(salary) OVER (ORDER BY salary) AS running_total 
FROM Employee;
```

---

#### **12. What is the difference between DELETE and TRUNCATE?**
- **DELETE:** Removes rows with conditions, logs individual deletions, can rollback.
- **TRUNCATE:** Removes all rows, minimal logging, faster, no WHERE clause.

---

#### **13. How can you find the second lowest salary from a table?**
```sql
SELECT MIN(salary) 
FROM Employee 
WHERE salary > (SELECT MIN(salary) FROM Employee);
```

---

#### **14. Explain Indexes and their types.**
- Indexes speed up search queries.
- Types: **Clustered**, **Non-Clustered**, **Unique**, **Full-Text**.

---

#### **15. What is a Composite Key?**
- A key made of two or more columns to ensure uniqueness.

---

#### **16. What is the difference between CROSS JOIN and FULL OUTER JOIN?**
- **CROSS JOIN:** Cartesian product, no conditions.
- **FULL OUTER JOIN:** Returns matched + unmatched rows from both tables.

---

#### **17. Explain the concept of window functions in SQL.**
- Perform calculations across a set of table rows related to the current row.
```sql
SELECT name, 
       salary, 
       RANK() OVER (ORDER BY salary DESC) AS salary_rank 
FROM Employee;
```

---

#### **18. What is the purpose of the CASE statement?**
- Conditional logic similar to IF-ELSE.
```sql
SELECT name,
       CASE 
           WHEN salary > 50000 THEN 'High'
           ELSE 'Low'
       END AS salary_status 
FROM Employee;
```

---

#### **19. How to find employees with no orders using a LEFT JOIN?**
```sql
SELECT e.name 
FROM Employee e 
LEFT JOIN Orders o ON e.id = o.sales_id 
WHERE o.order_id IS NULL;
```

---

#### **20. What is the use of EXISTS clause?**
- Checks if a subquery returns any rows.
```sql
SELECT name 
FROM Employee e 
WHERE EXISTS (SELECT 1 FROM Orders o WHERE o.sales_id = e.id);
```

---

#### **21. Difference between EXCEPT and NOT IN?**
- **EXCEPT:** Returns non-matching rows between two queries.
- **NOT IN:** Checks if a value does not exist in a subquery.

---

#### **22. How can you find consecutive records in SQL (like attendance)?**
```sql
SELECT id, 
       LEAD(status) OVER (ORDER BY date) AS next_status 
FROM Attendance;
```

---

#### **23. How do you retrieve the current date and time in SQL?**
```sql
SELECT CURRENT_TIMESTAMP;
```

---

#### **24. How can you create a pivot table in SQL?**
```sql
SELECT 
    department,
    SUM(CASE WHEN gender = 'M' THEN 1 ELSE 0 END) AS male_count,
    SUM(CASE WHEN gender = 'F' THEN 1 ELSE 0 END) AS female_count
FROM Employee
GROUP BY department;
```

---

#### **25. What are stored procedures and their advantages?**
- Precompiled SQL code for reusability, performance, and security.

---

#### **26. Explain normalization and its types.**
- Organizing tables to reduce redundancy.
- Types: 1NF, 2NF, 3NF, BCNF.

---

#### **27. What is a materialized view?**
- A view that stores the result set physically, improving query speed.

---

#### **28. How do you update multiple columns in SQL?**
```sql
UPDATE Employee 
SET salary = salary * 1.1, 
    commission = commission * 1.2 
WHERE department = 'Sales';
```

---

#### **29. How can you delete duplicate records in SQL?**
```sql
DELETE FROM Employee 
WHERE id NOT IN (
    SELECT MIN(id) 
    FROM Employee 
    GROUP BY name, salary
);
```

---

#### **30. How do you optimize slow SQL queries?**
- Use **Indexes**, **Avoid Correlated Subqueries**, **Avoid SELECT ***, **Use EXISTS instead of IN**, and **Analyze Execution Plans**.




#### **31. What is the difference between ROLLUP and CUBE in SQL?**
- **ROLLUP:** Aggregates data with a hierarchical approach, adding subtotals.
- **CUBE:** Aggregates data across all combinations of dimensions.

```sql
SELECT department, role, SUM(salary) 
FROM Employee 
GROUP BY ROLLUP (department, role);
```

---

#### **32. What are Common Table Expressions (CTEs) and Recursive CTEs?**
- CTEs: Temporary result sets for readability.
- Recursive CTEs: Useful for hierarchical data like organizational structures.

```sql
WITH RecursiveCTE AS (
    SELECT id, name, manager_id 
    FROM Employee 
    WHERE manager_id IS NULL
    UNION ALL
    SELECT e.id, e.name, e.manager_id 
    FROM Employee e 
    JOIN RecursiveCTE r ON e.manager_id = r.id
)
SELECT * FROM RecursiveCTE;
```

---

#### **33. What is a clustered vs. non-clustered index?**
- **Clustered Index:** Physically sorts data in the table. Only one per table.
- **Non-clustered Index:** Logical order, pointing to the actual data. Multiple per table.

---

#### **34. How do you use the MERGE statement?**
- Used for **UPSERT** operations (Insert, Update, Delete).
```sql
MERGE Target AS t
USING Source AS s
ON t.id = s.id
WHEN MATCHED THEN UPDATE SET t.name = s.name
WHEN NOT MATCHED THEN INSERT (id, name) VALUES (s.id, s.name);
```

---

#### **35. What is the purpose of PARTITION BY in window functions?**
- Divides data into partitions to apply window functions.

```sql
SELECT name, 
       salary, 
       RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rank 
FROM Employee;
```

---

#### **36. How to handle transactions in SQL?**
- Use **BEGIN TRANSACTION**, **COMMIT**, and **ROLLBACK** for atomicity.

```sql
BEGIN TRANSACTION;
UPDATE Employee SET salary = salary * 1.1 WHERE department = 'IT';
ROLLBACK;
```

---

#### **37. What is the purpose of SET vs. SELECT for variable assignment?**
- **SET:** Assigns a single variable, preferred for precision.
- **SELECT:** Assigns multiple variables, faster for batch processing.

---

#### **38. How to find the top 3 highest-selling products per category?**
```sql
SELECT product_id, category,
       DENSE_RANK() OVER (PARTITION BY category ORDER BY sales DESC) AS rank
FROM Sales
WHERE rank <= 3;
```

---

#### **39. How do you delete duplicate records but keep one copy?**
```sql
DELETE FROM Employee
WHERE id NOT IN (
    SELECT MIN(id)
    FROM Employee
    GROUP BY name, department
);
```

---

#### **40. How to handle hierarchical data in SQL?**
- Use **Recursive CTEs** for hierarchical structures.
- **Adjacency List** for parent-child relationships.

---

#### **41. What is a surrogate key vs. a natural key?**
- **Surrogate Key:** Artificial, usually an auto-incrementing number.
- **Natural Key:** Real-world attribute (like SSN).

---

#### **42. How do you find all employees who joined in the last 6 months?**
```sql
SELECT * FROM Employee 
WHERE hire_date >= DATEADD(MONTH, -6, GETDATE());
```

---

#### **43. Explain the concept of data integrity in SQL.**
- Ensures accuracy and consistency of data via:
  - **Entity Integrity:** Unique primary keys.
  - **Referential Integrity:** Foreign keys.
  - **Domain Integrity:** Valid data types.

---

#### **44. What is the use of ROLLUP in SQL?**
- Provides hierarchical aggregations with subtotals.
```sql
SELECT department, role, SUM(salary) 
FROM Employee 
GROUP BY ROLLUP(department, role);
```

---

#### **45. How do you use a CROSS APPLY in SQL Server?**
- Similar to INNER JOIN but works with **table-valued functions**.

---

#### **46. How to handle SQL injection vulnerabilities?**
- Use parameterized queries or stored procedures.
```sql
SELECT * FROM Employee 
WHERE name = @name;
```

---

#### **47. What are materialized views?**
- Physical views that store query results. Improve read performance.

---

#### **48. How do you calculate the median in SQL?**
```sql
SELECT AVG(val) AS median_val
FROM (
    SELECT val, 
           ROW_NUMBER() OVER (ORDER BY val) AS rn,
           COUNT(*) OVER () AS total_count
    FROM Scores
) sub
WHERE rn IN ((total_count + 1) / 2, (total_count + 2) / 2);
```

---

#### **49. What is the purpose of the FOR XML PATH in SQL Server?**
- Converts query results to XML format.

---

#### **50. What are the advantages of using temporary tables vs. table variables?**
- **Temporary Tables:** Indexed, useful for large datasets.
- **Table Variables:** In-memory, scope limited.

---

#### **51. Explain the difference between DATEPART() and DATENAME().**
- **DATEPART():** Returns integer values.
- **DATENAME():** Returns string values.

---

#### **52. How do you find the most frequently ordered product?**
```sql
SELECT product_id, COUNT(*) AS order_count 
FROM Orders 
GROUP BY product_id 
ORDER BY order_count DESC 
LIMIT 1;
```

---

#### **53. What is the use of ROWVERSION in SQL Server?**
- Row versioning for concurrency control.

---

#### **54. How can you implement pagination in SQL?**
```sql
SELECT * 
FROM Employee 
ORDER BY id 
OFFSET 10 ROWS 
FETCH NEXT 10 ROWS ONLY;
```

---

#### **55. What is the difference between NOLOCK and READ UNCOMMITTED?**
- Both allow reading uncommitted data, potentially causing dirty reads.

---

#### **56. How do you handle NULLs in aggregate functions?**
- Use `ISNULL()` or `COALESCE()` to replace NULLs.

---

#### **57. How can you optimize SQL queries?**
- Proper indexing, avoiding subqueries, using EXISTS instead of IN, minimizing joins.

---

#### **58. Explain the purpose of ANALYTIC functions in Oracle SQL.**
- Advanced calculations over a result set (like window functions).

---

#### **59. What is SQL Server Profiler used for?**
- Monitoring and analyzing SQL Server activities.

---

#### **60. How do you find the last login time of each user?**
```sql
SELECT user_id, MAX(login_time) AS last_login 
FROM UserLogins 
GROUP BY user_id;
```



### 🔥 **Top 30 Advanced SQL Questions (Hard Level):**

---

#### **1. Write a query to find the second highest salary without using LIMIT or TOP.**
```sql
SELECT MAX(salary) 
FROM Employee 
WHERE salary < (SELECT MAX(salary) FROM Employee);
```

---

#### **2. Explain how you can find the Nth highest salary.**
```sql
SELECT salary 
FROM (
    SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS rank 
    FROM Employee
) AS temp 
WHERE rank = N;
```

---

#### **3. How do you detect and delete duplicate records while retaining one copy?**
```sql
WITH CTE AS (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY name ORDER BY id) AS rn
    FROM Employee
)
DELETE FROM CTE WHERE rn > 1;
```

---

#### **4. How to pivot data in SQL without using the PIVOT function?**
```sql
SELECT 
    department, 
    SUM(CASE WHEN gender = 'M' THEN 1 ELSE 0 END) AS Male,
    SUM(CASE WHEN gender = 'F' THEN 1 ELSE 0 END) AS Female
FROM Employee
GROUP BY department;
```

---

#### **5. What are Window Functions? Write a query using ROW_NUMBER, RANK, DENSE_RANK.**
```sql
SELECT name, salary,
       ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num,
       RANK() OVER (ORDER BY salary DESC) AS rank,
       DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank
FROM Employee;
```

---

#### **6. Write a query to find consecutive records with the same value.**
```sql
SELECT emp_id, salary, 
       LAG(salary) OVER (ORDER BY emp_id) AS prev_salary
FROM Employee
WHERE salary = LAG(salary) OVER (ORDER BY emp_id);
```

---

#### **7. How do you find gaps in a sequence of numbers in a table?**
```sql
SELECT t1.id + 1 AS start, t2.id - 1 AS end
FROM Employee t1
LEFT JOIN Employee t2 ON t1.id + 1 = t2.id
WHERE t2.id IS NULL;
```

---

#### **8. Explain the difference between CROSS APPLY and OUTER APPLY.**
- **CROSS APPLY:** Acts like an INNER JOIN.
- **OUTER APPLY:** Acts like a LEFT JOIN, returning all records from the left table.

---

#### **9. Write a query to find employees with higher than average salary in their department.**
```sql
SELECT name, department, salary 
FROM Employee e
WHERE salary > (
    SELECT AVG(salary) 
    FROM Employee 
    WHERE department = e.department
);
```

---

#### **10. How do you handle hierarchical data using recursive CTEs?**
```sql
WITH CTE AS (
    SELECT id, name, manager_id 
    FROM Employee 
    WHERE manager_id IS NULL
    UNION ALL
    SELECT e.id, e.name, e.manager_id 
    FROM Employee e 
    JOIN CTE c ON e.manager_id = c.id
)
SELECT * FROM CTE;
```

---

#### **11. Write a query to find employees who did not make any sales.**
```sql
SELECT e.name 
FROM Employee e
LEFT JOIN Sales s ON e.id = s.sales_id
WHERE s.sales_id IS NULL;
```

---

#### **12. Explain and write a query using FULL OUTER JOIN.**
```sql
SELECT * 
FROM TableA 
FULL OUTER JOIN TableB 
ON TableA.id = TableB.id;
```
- Retrieves all records from both tables, with unmatched records shown as NULL.

---

#### **13. Write a query to find the median salary.**
```sql
SELECT AVG(salary) AS median_salary
FROM (
    SELECT salary,
           ROW_NUMBER() OVER (ORDER BY salary) AS rn,
           COUNT(*) OVER () AS total_count
    FROM Employee
) AS temp
WHERE rn IN ((total_count + 1) / 2, (total_count + 2) / 2);
```

---

#### **14. What is the difference between EXISTS and IN? Provide an example.**
- **EXISTS:** Checks for the existence of rows.
- **IN:** Checks for matching values. 

---

#### **15. Explain and use the CASE statement in a query.**
```sql
SELECT name, 
       CASE 
           WHEN salary >= 100000 THEN 'High'
           WHEN salary >= 50000 THEN 'Medium'
           ELSE 'Low'
       END AS salary_grade
FROM Employee;
```

---

#### **16. How to find the percentage of each department’s total salary?**
```sql
SELECT department, 
       salary, 
       salary * 100.0 / SUM(salary) OVER (PARTITION BY department) AS percent_salary
FROM Employee;
```

---

#### **17. Write a query to find duplicate rows in a table.**
```sql
SELECT name, COUNT(*) AS count 
FROM Employee 
GROUP BY name 
HAVING COUNT(*) > 1;
```

---

#### **18. How to optimize a query with nested subqueries?**
- Use **JOINs** and **CTEs** for better readability and performance.

---

#### **19. Write a query using the NTILE function.**
```sql
SELECT name, salary, 
       NTILE(4) OVER (ORDER BY salary) AS quartile
FROM Employee;
```

---

#### **20. Write a query to get the cumulative sum of salaries.**
```sql
SELECT name, salary,
       SUM(salary) OVER (ORDER BY salary ROWS UNBOUNDED PRECEDING) AS cumulative_salary
FROM Employee;
```

---

#### **21. How do you find the longest consecutive sequence of values?**
```sql
WITH CTE AS (
    SELECT value, value - ROW_NUMBER() OVER (ORDER BY value) AS grp
    FROM Sequence
)
SELECT value, COUNT(*) AS length
FROM CTE
GROUP BY grp
ORDER BY length DESC;
```

---

#### **22. What is a surrogate key, and why is it used?**
- A system-generated unique identifier, used when there is no natural primary key.

---

#### **23. How do you handle dynamic SQL?**
- Use the `EXEC()` function or `sp_executesql`.

---

#### **24. How do you track changes in table data (CDC)?**
- Using **Change Data Capture (CDC)** or **Triggers**.

---

#### **25. What is a hash join, and when is it used?**
- Used for large datasets with no proper indexing.

---

#### **26. What is a bitmap index, and why use it?**
- Efficient for columns with low cardinality.

---

#### **27. What is data skew, and how can you handle it?**
- Imbalanced data distribution affecting performance. Use partitioning or indexing.

---

#### **28. Write a query to find gaps in dates.**
```sql
SELECT hire_date, 
       LEAD(hire_date) OVER (ORDER BY hire_date) AS next_date 
FROM Employee 
WHERE DATEDIFF(day, hire_date, next_date) > 1;
```

---

#### **29. What is query plan caching?**
- Reusing execution plans for faster processing.

---

#### **30. Explain correlated subquery vs. regular subquery with examples.**

---

If you need detailed explanations for any of these, feel free to ask! 🚀

# GEMINI


**1. Question:** Explain the difference between `WHERE` and `HAVING` clauses in SQL. When would you use each?

**Answer:**
   - `WHERE` filters rows *before* aggregation, operating on individual rows.
   - `HAVING` filters rows *after* aggregation, operating on grouped rows.
   - Use `WHERE` for row-level conditions and `HAVING` for conditions based on aggregated values (e.g., `SUM`, `AVG`, `COUNT`).

**2. Question:** Describe the purpose of window functions in SQL and provide an example using `RANK()` or `DENSE_RANK()`.

**Answer:**
   - Window functions perform calculations across a set of rows related to the current row without collapsing them into a single output row.
   - Example (using `DENSE_RANK()`):
     ```sql
     SELECT product_id, sales, DENSE_RANK() OVER (ORDER BY sales DESC) AS sales_rank
     FROM sales_table;
     ```
     This ranks products by sales, handling ties without gaps.

**3. Question:** How do you optimize a slow-running SQL query? List at least three techniques.

**Answer:**
   - **Indexing:** Ensure appropriate indexes are created on frequently queried columns.
   - **Query Rewriting:** Avoid `SELECT *`, use specific column names, and minimize subqueries.
   - **Avoid Correlated Subqueries:** Replace them with `JOIN` operations or window functions.
   - **Partitioning:** Partition large tables to improve query performance on subsets of data.
   - **Execution plan analysis:** Use `EXPLAIN` to understand how the query is being executed and identify bottlenecks.

**4. Question:** Explain the concept of a "common table expression" (CTE) and provide a use case.

**Answer:**
   - A CTE is a temporary named result set that you can reference within a `SELECT`, `INSERT`, `UPDATE`, or `DELETE` statement.
   - Use case: Breaking down complex queries into smaller, more readable parts or recursive queries.
     ```sql
     WITH MonthlySales AS (
         SELECT DATE_TRUNC('month', order_date) AS month, SUM(sales) AS total_sales
         FROM orders
         GROUP BY month
     )
     SELECT month, total_sales
     FROM MonthlySales
     WHERE total_sales > 10000;
     ```

**5. Question:** What is the difference between `UNION` and `UNION ALL`?

**Answer:**
   - `UNION` removes duplicate rows from the combined result set.
   - `UNION ALL` includes all rows, including duplicates.
   - `UNION ALL` is generally faster because it doesn't perform duplicate elimination.

**6. Question:** How would you handle missing values (NULLs) in a SQL query?

**Answer:**
   - Use `COALESCE()` or `IFNULL()` to replace NULLs with default values.
   - Use `CASE` statements to perform conditional logic based on NULL values.
   - Filter out NULLs using `WHERE column IS NOT NULL`.

**7. Question:** Explain the difference between `INNER JOIN`, `LEFT JOIN`, `RIGHT JOIN`, and `FULL OUTER JOIN`.

**Answer:**
   - `INNER JOIN`: Returns rows where there's a match in both tables.
   - `LEFT JOIN`: Returns all rows from the left table and matched rows from the right table (NULLs for unmatched rows).
   - `RIGHT JOIN`: Returns all rows from the right table and matched rows from the left table (NULLs for unmatched rows).
   - `FULL OUTER JOIN`: Returns all rows when there's a match in either table.

**8. Question:** Describe how to implement a recursive query in SQL.

**Answer:**
   - Use a CTE with a `UNION ALL` clause, where the CTE references itself.
   - This is useful for hierarchical data like organizational structures or bill-of-materials.
     ```sql
     WITH RECURSIVE EmployeeHierarchy AS (
         SELECT employee_id, manager_id, employee_name, 0 AS level
         FROM employees
         WHERE manager_id IS NULL
         UNION ALL
         SELECT e.employee_id, e.manager_id, e.employee_name, eh.level + 1
         FROM employees e
         JOIN EmployeeHierarchy eh ON e.manager_id = eh.employee_id
     )
     SELECT employee_name, level
     FROM EmployeeHierarchy;
     ```

**9. Question:** How do you implement a pivot table in SQL?

**Answer:**
   - Use `CASE` statements and `GROUP BY` to transform rows into columns.
   - Many database systems also have dedicated `PIVOT` operators.

**10. Question:** Explain the concept of transactions and ACID properties.

**Answer:**
   - A transaction is a sequence of SQL operations treated as a single unit of work.
   - ACID properties:
     - Atomicity: All or nothing.
     - Consistency: Maintains database integrity.
     - Isolation: Concurrent transactions don't interfere.
     - Durability: Changes are permanent.

**11. Question:** How do you create and manage database indexes?

**Answer:**
   - `CREATE INDEX index_name ON table_name (column_name);`
   - Use `DROP INDEX` to remove indexes.
   - Monitor index usage and performance to optimize.

**12. Question:** Describe the use of `EXPLAIN` or `EXPLAIN ANALYZE` in SQL.

**Answer:**
   - `EXPLAIN` shows the query execution plan, revealing how the database will execute the query.
   - `EXPLAIN ANALYZE` executes the query and provides actual execution statistics, useful for performance tuning.

**13. Question:** How do you handle large datasets in SQL for analytical purposes?

**Answer:**
   - Use partitioning, indexing, and materialized views.
   - Leverage distributed database systems (e.g., Spark SQL, BigQuery).
   - Use sampling techniques for exploratory analysis.

**14. Question:** Explain the difference between `VARCHAR` and `TEXT` data types.

**Answer:**
   - `VARCHAR` has a maximum length limit, while `TEXT` doesn't (or has a very large limit).
   - `TEXT` is typically used for storing large strings.

**15. Question:** How do you enforce data integrity in SQL?

**Answer:**
   - Use constraints: `PRIMARY KEY`, `FOREIGN KEY`, `UNIQUE`, `CHECK`, `NOT NULL`.
   - Use triggers to enforce business rules.

**16. Question:** How would you retrieve the top N rows from each group?

**Answer:**
   - Use window functions like `RANK()` or `ROW_NUMBER()` with `PARTITION BY`.

**17. Question:** How do you find duplicate rows in a table?

**Answer:**
   - Use `GROUP BY` and `HAVING COUNT(*) > 1`.
   - Use window functions such as `ROW_NUMBER()` with a partition on all columns to identify duplicates.

**18. Question:** How do you perform date and time calculations in SQL?

**Answer:**
   - Use date/time functions like `DATE_ADD`, `DATE_SUB`, `EXTRACT`, `DATE_TRUNC`.

**19. Question:** Explain the concept of materialized views.

**Answer:**
   - Materialized views are precomputed result sets stored on disk, improving query performance.
   - They need to be refreshed to reflect changes in the underlying tables.

**20. Question:** How do you handle string manipulation in SQL?

**Answer:**
   - Use functions like `SUBSTRING`, `CONCAT`, `TRIM`, `UPPER`, `LOWER`, `REPLACE`.
   - Use regular expressions for complex pattern matching.

**21. Question:** How do you implement a rolling average in SQL?

**Answer:**
   - Use window functions with `AVG()` and a `ROWS BETWEEN` clause.

**22. Question:** How do you create a stored procedure or user-defined function?

**Answer:**
   - Use `CREATE PROCEDURE` or `CREATE FUNCTION` statements.

**23. Question:** How would you handle time series data in SQL?

**Answer:**
   - Use date/time data types and functions.
   - Use window functions for rolling calculations.
   - Consider time series databases for large-scale time series data.

**24. Question:** What are the advantages of using stored procedures?

**Answer:**
   - Improved performance, reduced network traffic, enhanced security, code reusability.

**25. Question:** How do you handle JSON data in SQL?

**Answer:**
   - Use JSON functions like `JSON_EXTRACT`, `JSON_ARRAY`, `JSON_OBJECT`.

**26. Question:** How would you implement a funnel analysis in SQL?

**Answer:**
   - Use `CASE` statements and `GROUP BY` to calculate conversion rates at each stage.
   - Use window functions to track progression through stages.

**27. Question:** How do you optimize queries with large `IN` clauses?

**Answer:**
   - Use

Alright, let's dive into 30 more advanced SQL questions designed to challenge a seasoned data scientist/machine learning engineer:

**1. Question:** Describe the concept of a "correlated subquery" and explain how it differs from a regular subquery. Provide a scenario where a correlated subquery is necessary.

**Answer:**
   - A correlated subquery references a column from the outer query. It executes once for each row processed by the outer query.
   - Regular subqueries execute once and produce a result set used by the outer query.
   - Scenario: Finding employees whose salary is higher than the average salary of their department.

**2. Question:** Explain the concept of "query hints" and when you might use them. What are the potential drawbacks?

**Answer:**
   - Query hints are directives to the database optimizer, influencing execution plans (e.g., index usage, join order).
   - Use when the optimizer's choice is suboptimal.
   - Drawbacks: Can make queries less portable, dependent on specific database versions, and potentially hinder future optimizations.

**3. Question:** How would you design a SQL query to detect anomalies in time series data, such as sudden spikes or dips?

**Answer:**
   - Use window functions (e.g., `AVG()`, `STDDEV()`) with a sliding window to calculate rolling statistics.
   - Identify data points outside a certain number of standard deviations from the mean as anomalies.

**4. Question:** Explain the concept of "database partitioning" and its benefits for large datasets. Describe different partitioning strategies.

**Answer:**
   - Partitioning divides a large table into smaller, more manageable pieces.
   - Benefits: Improved query performance, easier maintenance, reduced backup/restore times.
   - Strategies: Range partitioning, list partitioning, hash partitioning, composite partitioning.

**5. Question:** How do you implement a "fuzzy string matching" algorithm in SQL to find similar strings, even with typos or variations?

**Answer:**
   - Use functions like `SOUNDEX()`, `DIFFERENCE()`, or `LEVENSHTEIN()` (if available).
   - Implement custom logic using `LIKE` with wildcards and regular expressions.

**6. Question:** How would you design a SQL query to calculate a cohort analysis, tracking user behavior over time based on their signup date?

**Answer:**
   - Extract the signup month/year.
   - Group users by signup cohort and calculate metrics (e.g., retention rate) over subsequent periods.

**7. Question:** Explain the concept of "database normalization" and its different normal forms (1NF, 2NF, 3NF, BCNF). What are the trade-offs?

**Answer:**
   - Normalization reduces data redundancy and improves data integrity.
   - Normal forms: 1NF (atomic values), 2NF (no partial dependencies), 3NF (no transitive dependencies), BCNF (no non-trivial dependencies).
   - Trade-offs: Increased complexity, potential performance overhead due to joins.

**8. Question:** How would you design a SQL query to perform a "sessionization" analysis, grouping user actions into sessions based on inactivity gaps?

**Answer:**
   - Use window functions (e.g., `LAG()`) to track timestamps and identify inactivity gaps.
   - Assign session IDs based on these gaps.

**9. Question:** Explain the concept of "database sharding" and when you might use it. How does it differ from partitioning?

**Answer:**
   - Sharding distributes data across multiple database servers.
   - Used for horizontal scalability and handling very large datasets.
   - Partitioning divides a single table, while sharding divides the entire database.

**10. Question:** How do you implement a "graph traversal" algorithm in SQL to find relationships between entities in a graph-like data structure?

**Answer:**
   - Use recursive CTEs to traverse edges and find paths.
   - Consider specialized graph databases for complex graph operations.

**11. Question:** Describe the concept of "data warehousing" and its key components (ETL, OLAP, star schema, snowflake schema).

**Answer:**
   - Data warehousing consolidates data from multiple sources for analytical purposes.
   - ETL (Extract, Transform, Load) processes data.
   - OLAP (Online Analytical Processing) performs multidimensional analysis.
   - Star/snowflake schemas organize data into fact and dimension tables.

**12. Question:** How would you design a SQL query to perform a "market basket analysis" to find frequently co-occurring items in transaction data?

**Answer:**
   - Use self-joins and `GROUP BY` to count item combinations.
   - Calculate association metrics (e.g., support, confidence, lift).

**13. Question:** Explain the concept of "database replication" and its different types (synchronous, asynchronous). What are the use cases?

**Answer:**
   - Replication copies data from one database to another.
   - Synchronous replication ensures data consistency but can impact performance.
   - Asynchronous replication is faster but may have data lag.
   - Use cases: High availability, disaster recovery, read scaling.

**14. Question:** How do you implement a "geospatial query" in SQL to find points within a certain distance of a location or perform spatial joins?

**Answer:**
   - Use geospatial data types and functions (e.g., `ST_Distance()`, `ST_Contains()`).
   - Leverage spatial indexes for performance.

**15. Question:** Explain the concept of "database triggers" and their use cases. What are the potential drawbacks?

**Answer:**
   - Triggers are stored procedures that automatically execute in response to database events (e.g., `INSERT`, `UPDATE`, `DELETE`).
   - Use cases: Enforcing business rules, auditing, data validation.
   - Drawbacks: Can make database behavior complex and difficult to debug.

**16. Question:** How would you design a SQL query to perform a "survival analysis" to estimate the time until an event occurs (e.g., churn, failure)?

**Answer:**
   - Use Kaplan-Meier estimators or Cox proportional hazards models.
   - Calculate survival probabilities and hazard ratios.

**17. Question:** Describe the concept of "data lineage" and how you would track it in a data warehouse environment.

**Answer:**
   - Data lineage tracks the origin and transformation of data.
   - Use metadata repositories, logging, and data governance tools.

**18. Question:** How do you handle "slowly changing dimensions" (SCDs) in a data warehouse? Describe different SCD types.

**Answer:**
   - SCDs handle dimension table changes over time.
   - Types: SCD1 (overwrite), SCD2 (add new row), SCD3 (add new column), SCD4 and SCD6 (hybrid approaches).

**19. Question:** How would you design a SQL query to perform a "A/B testing" analysis, comparing the performance of different variations?

**Answer:**
   - Randomly assign users to groups.
   - Calculate metrics (e.g., conversion rate, revenue) for each group.
   - Perform statistical tests (e.g., t-test, chi-squared test) to determine significance.

**20. Question:** Explain the concept of "database indexing strategies" like bitmap indexes, hash indexes, and clustered vs. non-clustered indexes.

**Answer:**
    - Bitmap indexes: efficient for low cardinality columns.
    - Hash indexes: fast for equality lookups.
    - Clustered: orders data physically, non-clustered: stores pointers.

**21. Question:** How would you implement a "recommendation system" in SQL, based on user-item interactions or collaborative filtering?

**Answer:**
   - Calculate similarity scores between users or items.
   - Use matrix factorization or collaborative filtering algorithms.
   - Rank items based on predicted ratings or preferences.

**22. Question:** Describe the concept of "data masking" and its importance in data security and privacy.

**Answer:**
    - Data masking conceals sensitive data by replacing it with realistic but fake data.
    - Used to protect sensitive information during development or analysis.

**23. Question:** How do you efficiently handle and process large volumes of streaming data in SQL?

**Answer:**
    - Use stream processing frameworks with SQL interfaces (e.g., Apache Flink SQL, Kafka Streams with KSQL).
    - Use windowing functions to aggregate data over time.

**24. Question:** Explain the concept of "database deadlock" and how to prevent or resolve it.

**Answer:**
    - Deadlock occurs when two or more transactions are waiting for each other to release locks.
    - Prevent by ordering locks, reducing transaction duration, or setting lock timeouts.

**25. Question:** How would you design a SQL query to analyze customer churn patterns using survival analysis and predictive modeling?

**Answer:**
    - Calculate churn rates using survival analysis.
    - Build predictive models (e.g., logistic regression, decision trees) to identify churn risk factors.

**26. Question:** What are the advantages and disadvantages of using NoSQL databases compared to relational databases for data science applications?

**Answer:**
    - NoSQL: flexible schemas, scalability, suitable for unstructured data.
    - Relational: strong consistency, complex queries, ACID properties.
    - Choose based on data characteristics and application requirements.



# Perplexity

Here are 30 SQL interview questions and answers tailored for a data scientist or machine learning engineer with 5 years of experience. These questions cover foundational concepts, advanced techniques, and problem-solving skills.

---

### **Foundational SQL Questions**
1. **What is SQL, and why is it important for data science?**  
   SQL is a language used to manage and query relational databases, essential for extracting, cleaning, and analyzing structured data.

2. **Explain the difference between `INNER JOIN`, `LEFT JOIN`, `RIGHT JOIN`, and `FULL OUTER JOIN`.**  
   - `INNER JOIN`: Returns rows with matching values in both tables.  
   - `LEFT JOIN`: Returns all rows from the left table and matching rows from the right table; unmatched rows in the right table are null.  
   - `RIGHT JOIN`: Opposite of `LEFT JOIN`.  
   - `FULL OUTER JOIN`: Returns all rows from both tables, with nulls for unmatched rows.

3. **What are primary keys and foreign keys?**  
   - Primary Key: A unique identifier for each row in a table.  
   - Foreign Key: A field in one table that references the primary key in another table.

4. **What is normalization? Why is it important?**  
   Normalization organizes data to reduce redundancy and improve integrity by dividing tables into smaller ones and defining relationships.

5. **What are indexes in SQL? What happens if a table has no index?**  
   Indexes speed up query performance by allowing faster retrieval of data. Without an index, queries may require full table scans, which are slower.

---

### **Intermediate SQL Questions**
6. **How do you find duplicate records in a table?**  
   ```sql
   SELECT column_name, COUNT(*)
   FROM table_name
   GROUP BY column_name
   HAVING COUNT(*) > 1;
   ```

7. **Explain the difference between `RANK()` and `DENSE_RANK()`.**  
   - `RANK()`: Skips ranks when there are ties.  
   - `DENSE_RANK()`: Does not skip ranks for ties.

8. **What are Common Table Expressions (CTEs)? Provide an example.**  
   CTEs simplify complex queries by creating temporary result sets:  
   ```sql
   WITH temp AS (
       SELECT id, sales FROM orders WHERE sales > 1000
   )
   SELECT * FROM temp;
   ```

9. **How do you fetch alternate records from a table?**  
   Use modulo operation:  
   ```sql
   SELECT * FROM table_name WHERE MOD(id, 2) = 0;
   ```

10. **What is the difference between views and tables in SQL?**  
    - Views: Virtual tables created using queries; they do not store data physically.  
    - Tables: Physical storage of data.

---

### **Advanced SQL Questions**
11. **Explain window functions like `ROW_NUMBER()`, `LEAD()`, and `LAG()`.**  
    - `ROW_NUMBER()`: Assigns unique sequential numbers to rows within partitions.  
    - `LEAD()`: Accesses subsequent row values within partitions.  
    - `LAG()`: Accesses previous row values within partitions.

12. **How do you optimize an SQL query for performance?**  
    - Use indexes appropriately.
    - Avoid SELECT *. 
    - Use joins instead of subqueries where possible.
    - Limit the number of returned rows with filters.

13. **Write a query to find the third-highest salary from an employee table.**  
    ```sql
    SELECT DISTINCT salary 
    FROM employees 
    ORDER BY salary DESC 
    LIMIT 1 OFFSET 2;
    ```

14. **What are set operators in SQL (`UNION`, `INTERSECT`, `EXCEPT`)?**  
    - `UNION`: Combines results from multiple queries (removes duplicates).  
    - `INTERSECT`: Returns common rows between queries.  
    - `EXCEPT`: Returns rows present in the first query but not in the second.

15. **Explain conditional logic using `CASE WHEN`. Provide an example.**  
    ```sql
    SELECT id,
           CASE 
               WHEN sales > 1000 THEN 'High'
               ELSE 'Low'
           END AS sales_category
    FROM orders;
    ```

---

### **Scenario-Based Questions**
16. **How would you handle missing or null values in a dataset using SQL?**  
    Use functions like `COALESCE()` or conditional logic (`CASE WHEN`).

17. **Write a query to calculate the percentage contribution of each product's sales to total sales.**  
    ```sql
    SELECT product_id, 
           SUM(sales) / (SELECT SUM(sales) FROM products) * 100 AS percentage_contribution
    FROM products
    GROUP BY product_id;
    ```

18. **How do you find customers who bought all products?**  
    Use relational division techniques with subqueries or joins.

19. **Explain how you would implement pagination in SQL queries for large datasets.**  
    Use LIMIT and OFFSET:  
    ```sql
    SELECT * FROM orders LIMIT 10 OFFSET 20;
    ```

20. **Write a query to concatenate first name and last name columns into a full name column.**  
    ```sql
    SELECT CONCAT(first_name, ' ', last_name) AS full_name FROM employees;
    ```

---

### **Problem-Solving Questions**
21. **How do you find the top N records based on a column value?**  
    ```sql
    SELECT * FROM employees ORDER BY salary DESC LIMIT N;
    ```

22. **Write a query to count distinct values in a column:**  
    ```sql
    SELECT COUNT(DISTINCT column_name) FROM table_name;
    ```

23. **How do you delete duplicate records while keeping one instance?**  
    ```sql
    DELETE FROM table_name
    WHERE id NOT IN (
        SELECT MIN(id)
        FROM table_name
        GROUP BY column_to_check_duplicates);
    ```

24. **Explain how subqueries work with examples:** Subqueries are nested queries used inside another query:   
```sql 
SELECT name 
FROM employees WHERE department_id IN (SELECT id FROM departments WHERE name = 'HR');
```




### **Advanced SQL Concepts**
1. **Explain the concept of normalization and its types (1NF, 2NF, 3NF, BCNF).**  
   Normalization minimizes redundancy and organizes data efficiently[1][4].

2. **What is the difference between `DDL` and `DML` commands? Provide examples.**  
   DDL defines schema structures (e.g., `CREATE`, `ALTER`), while DML manipulates data (e.g., `INSERT`, `UPDATE`)[1].

3. **What is the difference between `UNION` and `UNION ALL`?**  
   `UNION` removes duplicates; `UNION ALL` includes duplicates[1][6].

4. **Explain window functions like `ROW_NUMBER()`, `RANK()`, and `DENSE_RANK()`.**  
   - `ROW_NUMBER()`: Sequential numbering within partitions.  
   - `RANK()`: Skips ranks for ties.  
   - `DENSE_RANK()`: Does not skip ranks for ties[3][6].

5. **How do you optimize SQL queries for performance?**  
   Use indexes, avoid SELECT *, limit returned rows, and optimize joins[2][6].

---

### **Complex Queries**
6. **Write a query to find duplicate records in a table.**  
   ```sql
   SELECT column_name, COUNT(*)
   FROM table_name
   GROUP BY column_name
   HAVING COUNT(*) > 1;
   ```

7. **How do you find the nth highest salary from an employee table?**  
   ```sql
   SELECT DISTINCT salary 
   FROM employees 
   ORDER BY salary DESC 
   LIMIT 1 OFFSET n-1;
   ```

8. **Explain how to calculate running totals using SQL window functions.**  
   ```sql
   SELECT id, sales, SUM(sales) OVER (ORDER BY id) AS running_total
   FROM orders;
   ```

9. **Write a query to fetch alternate rows from a table.**  
   ```sql
   SELECT * FROM table_name WHERE MOD(id, 2) = 0;
   ```

10. **How do you transpose rows into columns dynamically in SQL?**  
    Use pivoting techniques or dynamic SQL[1][6].

---

### **Scenario-Based Questions**
11. **Find customers who bought all products in a dataset.**  
    Use relational division or subqueries[3].

12. **Write a query to calculate the percentage contribution of each product's sales to total sales.**  
    ```sql
    SELECT product_id, 
           SUM(sales) / (SELECT SUM(sales) FROM products) * 100 AS percentage_contribution
    FROM products
    GROUP BY product_id;
    ```

13. **How do you implement pagination in SQL queries?**  
    ```sql
    SELECT * FROM orders LIMIT page_size OFFSET page_number * page_size;
    ```

14. **Explain how to handle missing values in SQL datasets.**  
    Use functions like `COALESCE()` or conditional logic (`CASE WHEN`)[2][4].

15. **Write a query to concatenate first name and last name into a full name column.**  
    ```sql
    SELECT CONCAT(first_name, ' ', last_name) AS full_name FROM employees;
    ```

---

### **Data Manipulation and Analysis**
16. **What are stored procedures, and why are they useful?**  
    Stored procedures encapsulate reusable logic for complex operations[3][7].

17. **Explain the difference between views and tables in SQL.**  
    Views are virtual tables created using queries; tables store data physically[3][7].

18. **How do you delete duplicate records while retaining one instance?**  
    ```sql
    DELETE FROM table_name WHERE id NOT IN (
        SELECT MIN(id)
        FROM table_name
        GROUP BY column_to_check_duplicates);
    ```

19. **Write a query to find customers who made repeated payments at the same merchant within 10 minutes of each other.**  
    ```sql
    SELECT COUNT(*)
    FROM payments p1
    JOIN payments p2 ON p1.merchant_id = p2.merchant_id 
                     AND p1.card_number = p2.card_number 
                     AND p1.amount = p2.amount 
                     AND ABS(TIMESTAMPDIFF(MINUTE, p1.timestamp, p2.timestamp))  (SELECT AVG(salary) FROM employees WHERE dept_id = e.dept_id);
    ```

22. **How do you group data by month from a date column?**  
    ```sql
    SELECT DATE_FORMAT(date_column, '%Y-%m') AS month, COUNT(*) 
    FROM orders 
    GROUP BY month;
    ```

23. **Explain how subqueries work with examples:** Subqueries are nested queries used inside another query:   
```sql 
SELECT name 
FROM employees WHERE department_id IN (SELECT id FROM departments WHERE name = 'HR');
```

24 . How would you rank products based on sales quantity


---
Answer from Perplexity: https://www.perplexity.ai/search/8875e1a0-e1c6-469b-8d9d-1cde07de3927?utm_source=copy_output