## Higher-Order Functions

Higher-order functions in Databricks are a powerful feature for manipulating complex data types like arrays and maps. These functions allow you to apply custom logic to each element of a collection directly within your SQL queries, making it easier to process nested or structured data efficiently.

In this example, we will focus on two commonly used higher-order functions:

- **`FILTER`**: Selects array elements that satisfy a given condition.
- **`TRANSFORM`**: Transforms each element in an array by applying an expression.

>Higher-order functions are especially useful when dealing with semi-structured data (e.g., JSON) or columns that contain arrays of values. Instead of exploding arrays into separate rows, you can manipulate the data in place while maintaining the original structure.

### Filter Syntax
`FILTER(array_expression, element -> boolean_expression)`

**Parameters:**

- `array_expression`: the array you want to process.
- `element`: a variable representing each array element.
- `boolean_expression`: an expression returning `TRUE` to keep the element or `FALSE` to discard it.

**Example:**

FILTER(array(10, 20, 30), x -> x > 15)

-- Returns: [20, 30]

### Create Temporary View with sample data

In [0]:
CREATE OR REPLACE TEMP VIEW employees AS
SELECT * FROM VALUES
  (101, '{"first_name":"Alice","last_name":"Brown","gender":"Female","skills":["Python","SQL","Databricks"],"address":{"city":"Toronto","country":"Canada"}}'),
  (102, '{"first_name":"Bob","last_name":"Smith","gender":"Male","skills":["Scala","Spark"],"address":{"city":"Vancouver","country":"Canada"}}'),
  (103, '{"first_name":"Carol","last_name":"Johnson","gender":"Female","skills":["Java","SQL"],"address":{"city":"Calgary","country":"Canada"}}')
AS t(employee_id, profile);

In [0]:
SELECT * FROM employees;

### Create Parsed View

In [0]:
CREATE OR REPLACE TEMP VIEW parsed_employees AS
SELECT
  employee_id,
  from_json(
    profile,
    schema_of_json('{
      "first_name":"Example",
      "last_name":"Example",
      "gender":"Example",
      "skills":["Skill1","Skill2"],
      "address":{"city":"City","country":"Country"}
    }')
  ) AS profile_struct
FROM employees;

In [0]:
SELECT * FROM parsed_employees;

### Convert all skill names to uppercase using TRANSFORM.

### TRANSFORM Syntax
Applies an expression to each element in the array and returns a new array with transformed elements.

`TRANSFORM(array_expression, element -> expression)`

**Parameters:**

- `array_expression`: the array you want to process.
- `element`: a variable representing each array element.
- `expression`: how you want to transform each element.

**Example:**

TRANSFORM(array(1, 2, 3), x -> x * 10)

-- Returns: [10, 20, 30]

In [0]:
SELECT
  employee_id,
  profile_struct.first_name AS first_name,
  profile_struct.last_name AS last_name,
  TRANSFORM(profile_struct.skills, skill -> upper(skill)) AS skills_uppercase,
  profile_struct.address.city AS city
FROM parsed_employees;

### Keep only skills containing "SQL":

In [0]:
SELECT
  employee_id,
  FILTER(profile_struct.skills, s -> instr(s, 'SQL') > 0) AS sql_skills
FROM parsed_employees;

### Concatenation

In [0]:
SELECT
  employee_id,
  array_join(profile_struct.skills, ',') AS skills_csv
FROM parsed_employees;

### Filter skills containing "SQL", then make them uppercase:

In [0]:
SELECT
  employee_id,
  TRANSFORM(
    FILTER(profile_struct.skills, s -> instr(s, 'SQL') > 0),
    s -> upper(s)
  ) AS sql_skills_upper
FROM parsed_employees;

### Another Example of Filter

In [0]:
CREATE OR REPLACE TEMP VIEW product_table AS
SELECT *
FROM VALUES
  (
    1001,
    array(
      named_struct('offer_code', 'OFF10', 'discount_percent', 50),
      named_struct('offer_code', 'OFF20', 'discount_percent', 70),
      named_struct('offer_code', 'OFF30', 'discount_percent', 30)
    )
  ),
  (
    1002,
    array(
      named_struct('offer_code', 'OFF40', 'discount_percent', 80),
      named_struct('offer_code', 'OFF50', 'discount_percent', 60)
    )
  ),
  (
    1003,
    array(
      named_struct('offer_code', 'OFF60', 'discount_percent', 20)
    )
  )
AS t(product_id, offers);


### Get all offers with at least 60% discount:

In [0]:
SELECT
  product_id,
  offers,
  FILTER(
    offers,
    o -> o.discount_percent >= 60
  ) AS high_discount_offers
FROM product_table
ORDER BY product_id;


If you want just the offer codes, you can use `TRANSFORM` over the filtered array:

In [0]:
SELECT
  product_id,
  TRANSFORM(
    FILTER(offers, o -> o.discount_percent >= 60),
    o -> o.offer_code
  ) AS high_discount_offer_codes
FROM product_table
ORDER BY product_id;