# Create `openalex.institutions.raw_affiliation_strings_institutions_mv`

Creates a materialized view that transforms the affiliation strings lookup table by:
1. Applying institution_ids_override when available (old-style curations)
2. Applying new verb-based curations (add/remove) from `ras_curations`
3. Filtering out invalid institution_ids: `array(-1)` and arrays containing null values
4. Cleaning up empty country arrays

**Priority order**: (base + curated_adds) - curated_removes, where base = override > model

This materialized view is used by `work_author_affiliations_mv` to join affiliations with institution IDs.

In [None]:
%sql
CREATE OR REPLACE MATERIALIZED VIEW openalex.institutions.raw_affiliation_strings_institutions_mv
CLUSTER BY (raw_affiliation_string)
AS
SELECT 
  asl.raw_affiliation_string,
  -- Three-layer priority: (base + curated_adds) - curated_removes
  -- base = override > model
  ARRAY_EXCEPT(
    ARRAY_UNION(
      FILTER(
        CASE
          WHEN asl.institution_ids_override != array() THEN asl.institution_ids_override
          WHEN SIZE(asl.institution_ids) > 0 AND asl.institution_ids[0] IS NULL THEN array()
          ELSE COALESCE(asl.institution_ids, array())
        END,
        x -> x IS NOT NULL AND x != -1
      ),
      COALESCE(rac.curated_add_ids, array())
    ),
    COALESCE(rac.curated_remove_ids, array())
  ) AS institution_ids,
  CASE 
    WHEN COALESCE(asl.countries, array()) = array('') THEN array()
    ELSE COALESCE(asl.countries, array())
  END AS countries,
  asl.source,
  asl.institution_ids AS model_institution_ids,
  asl.institution_ids_override,
  rac.curated_add_ids,
  rac.curated_remove_ids,
  asl.created_datetime,
  asl.updated_datetime
FROM openalex.institutions.affiliation_strings_lookup asl
LEFT JOIN openalex.institutions.ras_curations rac
  ON asl.raw_affiliation_string = rac.raw_affiliation_string