# Refresh RAS Works Counts

Rebuilds the `affiliation_strings_lookup_with_counts` table with fresh works counts
from `work_authorships` and institution IDs from the MV (which includes curations).

This keeps the affiliations dashboard in sync with actual works data.

**Runs after**: Guardrails (needs finalized works data)
**Feeds**: `sync_affiliation_strings_to_elastic_v2` (ES sync for dashboard)

## Step 1: Rebuild works counts per RAS

In [None]:
-- Rebuild works counts by exploding authorships from work_authorships.
-- Uses work_authorships instead of OpenAlex_works for a much faster scan (narrow table).
-- This replaces the entire table with fresh counts.
CREATE OR REPLACE TABLE openalex.institutions.affiliation_string_works_counts AS
SELECT 
    raw_aff_string,
    COUNT(DISTINCT w.work_id) as works_count
FROM openalex.works.work_authorships w
LATERAL VIEW EXPLODE(authorships) AS authorship
LATERAL VIEW EXPLODE(authorship.raw_affiliation_strings) AS raw_aff_string
GROUP BY raw_aff_string

In [ ]:
%sql
-- Quick sanity check
SELECT
  COUNT(*) AS total_unique_ras,
  SUM(works_count) AS total_works_count,
  MIN(works_count) AS min_works,
  MAX(works_count) AS max_works
FROM openalex.institutions.affiliation_string_works_counts

## Step 2: Rebuild lookup with counts

Joins the MV (which has curations applied via 3-layer priority) with fresh counts.
Only keeps RAS that appear in at least one work.

In [ ]:
%sql
CREATE OR REPLACE TABLE openalex.institutions.affiliation_strings_lookup_with_counts AS
SELECT 
    mv.raw_affiliation_string,
    mv.institution_ids AS institution_ids_final,
    mv.model_institution_ids AS institution_ids_from_model,
    mv.institution_ids_override,
    mv.countries,
    mv.source,
    mv.created_datetime,
    mv.updated_datetime,
    c.works_count
FROM openalex.institutions.raw_affiliation_strings_institutions_mv mv
INNER JOIN openalex.institutions.affiliation_string_works_counts c
    ON mv.raw_affiliation_string = c.raw_aff_string

In [ ]:
%sql
-- Verify rebuild
SELECT
  COUNT(*) AS total_rows,
  COUNT(CASE WHEN SIZE(institution_ids_final) > 0 THEN 1 END) AS rows_with_institutions,
  ROUND(COUNT(CASE WHEN SIZE(institution_ids_final) > 0 THEN 1 END) * 100.0 / COUNT(*), 1) AS pct_with_institutions
FROM openalex.institutions.affiliation_strings_lookup_with_counts