# Problem Statement
Write a solution to select the product id, year, quantity, and price for the first year of every product sold.

Return the resulting table in any order.

The result format is in the following example.

**Example 1:**

**Input:** 
Sales table:
| sale_id | product_id | year | quantity | price |
| ------- | ---------- | ---- | -------- | ----- |
| 1       | 100        | 2008 | 10       | 5000  |
| 2       | 100        | 2009 | 12       | 5000  |
| 7       | 200        | 2011 | 15       | 9000  |

Product table:
| product_id | product_name |
| ---------- | ------------ |
| 100        | Nokia        |
| 200        | Apple        |
| 300        | Samsung      |

**Output:** 
| product_id | first_year | quantity | price |
| ---------- | ---------- | -------- | ----- |
| 100        | 2008       | 10       | 5000  |
| 200        | 2011       | 15       | 9000  |

# Intuition
We need to pinpoint the first sale for each product in the Sales table. This involves selecting the earliest sale record for each product. There are several approaches to solving this problem, each with its own merits and challenges.

Initial Approach:

My first attempt at this problem used DENSE_RANK() within a Common Table Expression (CTE). While this solution was functional and passed the Leetcode submission, it turned out to be the least efficient method. This experience highlights common pitfalls in SQL query optimization, particularly when dealing with window functions like DENSE_RANK().

# Lessons Learned:

This problem serves as an excellent case study for understanding different SQL optimization strategies. Here's what I learned:

Efficiency over Complexity: Sometimes, simpler SQL constructs can outperform more complex ones like window functions, especially in terms of execution speed and resource usage.


# Approach

## Solution 1: Using `DENSE_RANK()`

```sql
WITH first_sales AS (
    SELECT 
        product_id,
        year,
        quantity,
        price,
        DENSE_RANK() OVER (PARTITION BY product_id ORDER BY year ASC) AS rn
    FROM 
        Sales
)
SELECT 
    product_id,
    year AS first_year,
    quantity,
    price
FROM 
    first_sales
WHERE 
    rn = 1
ORDER BY 
    product_id;


```
#### Pros: 
Handles multiple sales in the same year correctly by assigning the same rank.
#### Cons: 
Requires support for window functions (DENSE_RANK()).
Might be slower for very large datasets due to the computational cost of window functions.
When to Use: When dealing with databases that support DENSE_RANK() and there's a potential for multiple sales in the first year for a product.

# Solution 2: Using a Subquery with JOIN
```sql
WITH first_year_sales AS (
    SELECT product_id, MIN(year) AS first_year
    FROM Sales 
    GROUP BY product_id
)
SELECT s.product_id, fys.first_year, s.quantity, s.price
FROM Sales s
JOIN first_year_sales fys ON s.product_id = fys.product_id AND s.year = fys.first_year
```

**Pros:**
- Efficient use of `MIN` aggregation and `JOIN` for quick results.
- Works well with indexed columns, enhancing performance.

**Cons:**
- If multiple sales exist in the first year, it might arbitrarily choose one.

**When to Use:**
- Preferred for most SQL databases where you want to leverage indexing for performance, especially when you don't need to handle multiple first-year sales distinctively.

---

# Solution 3: Using IN with Subquery
```sql
SELECT product_id, year AS first_year, quantity, price
FROM Sales
WHERE (product_id, year) IN 
(SELECT product_id, MIN(year) FROM Sales GROUP BY product_id);
```

**Pros:**
- Simple and straightforward for small to medium datasets.

**Cons:**
- `IN` might not utilize indexes as efficiently, potentially leading to performance issues with very large datasets.

**When to Use:**
- When simplicity is valued over performance or when dealing with smaller datasets where the performance difference is negligible.

---

# Complexity Analysis

**Time Complexity:**
- **Solution 2 (JOIN)** is generally the fastest due to efficient use of `MIN` and `JOIN`.
- **Solution 3 (IN)** could be slower on large datasets due to how `IN` is processed.
- **Solution 1 (DENSE_RANK())** might be the slowest due to window function overhead.

**Space Complexity:**
- All solutions involve some temporary data storage, but **Solution 2** and **Solution 3** are more space-efficient as they don't add extra columns like `rn`.

---

# Conclusion

**Most Efficient in Time and Space:**
- The **JOIN solution (Solution 2)** is the most efficient for most practical scenarios, if indexing for speed while keeping space usage manageable. However, the actual performance can vary based on dataset size, distribution, and database system optimizations.

**Choice:**
- Use **Solution 2** for general purposes where performance is a concern. This is the winner !
- Use **Solution 1** if you're working with a system that supports `DENSE_RANK()` and need to handle multiple first-year sales accurately.
- Use **Solution 3** for simpler queries or when working with smaller datasets where performance differences are less critical.
