# Bluesky Ranker — Example Notebook

This notebook demonstrates the typical workflow:
- Fetch recent public posts into SQLite (upsert-by-URI)
- Load posts from SQLite into a Polars DataFrame
- Rank posts using the TopicRanker (TF–IDF/Count/SBERT)
- Inspect the top clusters and sample posts
- (Optional) Generate a per-handle cluster report to Markdown


> Note: This notebook expects a SQLite DB with posts.
Create one via the sample (no network): `python -m blueskyranker.sample_db --db newsflows_sample.db`
or fetch live data via the fetcher CLI.


In [1]:
# Imports and setup
import polars as pl
from blueskyranker.fetcher import Fetcher, ensure_db, load_posts_df
from blueskyranker.ranker import TopicRanker

## 1) Fetch recent posts into SQLite

- Adjust `--max-age-days` to control the time window.
- Upsert ensures engagement metrics refresh over time.
- You can also call the fetcher via CLI if you prefer.


In [3]:
fetcher = Fetcher()
result = fetcher.fetch(max_age_days=3, 
                       extract_articles=True, 
                       handles=['news-flows-nl.bsky.social'])  
print(result)

# 5 min for 438 posts

Posts fetched (all handles): 0post [00:00, ?post/s]
Posts fetched (all handles): 100post [01:19,  1.23post/s]
Posts fetched (all handles): 200post [02:48,  1.15post/s]
[Aget_author_feed failed for news-flows-nl.bsky.social (cursor=2025-11-09T19:59:53Z) attempt 1/3: 
get_author_feed failed for news-flows-nl.bsky.social (cursor=2025-11-09T19:59:53Z) attempt 2/3: 
get_author_feed failed for news-flows-nl.bsky.social (cursor=2025-11-09T19:59:53Z) attempt 3/3: 
Aborting fetch for news-flows-nl.bsky.social after 3 failed attempts

                                                         
Handles: 100%|██████████| 1/1 [03:07<00:00, 187.94s/handle]
Posts fetched (all handles): 200post [03:07,  1.06post/s]

✅ DONE news-flows-nl.bsky.social: upserted 200 posts into SQLite

FINAL REPORT

Handle: news-flows-nl.bsky.social
  Pages fetched         : 2
  Posts fetched         : 200
    - originals         : 200
    - replies           : 0
    - reposts           : 0
  Engagement (sums)
    - likes             : 0
    - reposts           : 0
    - replies           : 1
    - quotes            : 2
  Engagement (averages per post)
    - likes             : 0.00
    - reposts           : 0.00
    - replies           : 0.01
    - quotes            : 0.01
  Time range            : 2025-11-09T19:59:53+00:00  →  2025-11-10T10:33:00+00:00
  Time taken            : 187.93s
  Effective rate        : 1.06 posts/sec
  WARN embed anomalies  :
    - empty news_title  : 0
    - empty news_descr. : 2
    - empty news_uri    : 0

------------------------------------------------------------------------
All handles combined
------------------------------------------------------------------------
  Total pages     




## 2) Load posts from SQLite

- Choose a handle you want to rank.
- You can limit rows or change ordering as needed.


In [2]:
import os
os.getcwd()

# go in blueskyranker directory
os.chdir('blueskyranker')

In [3]:
conn = ensure_db('newsflows.db')
handle = 'news-flows-nl.bsky.social'  # pick one of your handles
data = load_posts_df(conn, handle = handle, order_by='createdAt', descending=False)
data.head()

uri,cid,author_handle,author_did,indexedAt,createdAt,text,reply_root_uri,reply_parent_uri,is_repost,like_count,repost_count,reply_count,quote_count,news_title,news_description,news_uri,news_content,createdAt_ns
str,str,str,str,str,str,str,null,null,i64,i64,i64,i64,i64,str,str,str,str,i64
"""at://did:plc:toz4no26o2x4vsbum…","""bafyreihtsw5y6eq4vpy524gi2gwcc…","""news-flows-nl.bsky.social""","""did:plc:toz4no26o2x4vsbum7cp4b…","""2025-11-09T20:31:07.460Z""","""2025-11-09T19:59:53.000000Z""","""Vliegverkeer luchthaven Luik w…",,,0,0,0,0,0,"""Vliegverkeer luchthaven Luik w…","""Vliegverkeer luchthaven Luik d…","""https://www.metronieuws.nl/in-…","""Het vliegverkeer bij de luchth…",1762718393000000000
"""at://did:plc:toz4no26o2x4vsbum…","""bafyreid23kqcxoeu256q34dulh3hd…","""news-flows-nl.bsky.social""","""did:plc:toz4no26o2x4vsbum7cp4b…","""2025-11-09T20:22:20.960Z""","""2025-11-09T20:00:39.000000Z""","""Wat moet die paashaas hier in …",,,0,0,0,0,0,"""Wat moet die paashaas hier in …","""Renske Kruitbosch schrijft twe…","""https://www.ad.nl/binnenland/w…","""Eén infuus in Milaan redde het…",1762718439000000000
"""at://did:plc:toz4no26o2x4vsbum…","""bafyreicjjljc7yvg2svkk7d2k3g43…","""news-flows-nl.bsky.social""","""did:plc:toz4no26o2x4vsbum7cp4b…","""2025-11-09T20:16:58.159Z""","""2025-11-09T20:02:09.000000Z""","""Til maakt indruk bij PSV met '…",,,0,0,0,0,0,"""Til maakt indruk bij PSV met '…","""Guus Til kwam begin oktober pe…","""https://www.nu.nl/voetbal/6375…","""Guus Til kwam begin oktober pe…",1762718529000000000
"""at://did:plc:toz4no26o2x4vsbum…","""bafyreihoheekt6tvdgbsrurr6xtwd…","""news-flows-nl.bsky.social""","""did:plc:toz4no26o2x4vsbum7cp4b…","""2025-11-09T20:17:03.257Z""","""2025-11-09T20:02:33.000000Z""","""Slot hekelt afgekeurde goal Va…",,,0,0,0,0,0,"""Slot hekelt afgekeurde goal Va…","""Arne Slot baalt van de afgekeu…","""https://www.nu.nl/voetbal/6375…","""9 nov 2025 om 21:02Update: 2 u…",1762718553000000000
"""at://did:plc:toz4no26o2x4vsbum…","""bafyreigsq7j275udlaqvwav5i4cvv…","""news-flows-nl.bsky.social""","""did:plc:toz4no26o2x4vsbum7cp4b…","""2025-11-09T20:47:07.558Z""","""2025-11-09T20:05:17.000000Z""","""Ondanks knap staaltje schadebe…",,,0,0,0,0,0,"""Ondanks knap staaltje schadebe…","""Formule 1: In São Paulo haalde…","""https://www.nrc.nl/nieuws/2025…","""„We hebben het in elk geval ge…",1762718717000000000


In [5]:
import pandas as pd
df = pd.DataFrame(data, columns=data.columns)
print(df.shape)
df.head()

(200, 19)


Unnamed: 0,uri,cid,author_handle,author_did,indexedAt,createdAt,text,reply_root_uri,reply_parent_uri,is_repost,like_count,repost_count,reply_count,quote_count,news_title,news_description,news_uri,news_content,createdAt_ns
0,at://did:plc:toz4no26o2x4vsbum7cp4bxp/app.bsky...,bafyreihtsw5y6eq4vpy524gi2gwccwkkwqslgm64dg6lx...,news-flows-nl.bsky.social,did:plc:toz4no26o2x4vsbum7cp4bxp,2025-11-09T20:31:07.460Z,2025-11-09T19:59:53.000000Z,Vliegverkeer luchthaven Luik weer stilgelegd o...,,,0,0,0,0,0,Vliegverkeer luchthaven Luik weer stilgelegd o...,Vliegverkeer luchthaven Luik door drones opnie...,https://www.metronieuws.nl/in-het-nieuws/buite...,Het vliegverkeer bij de luchthaven van Luik is...,1762718393000000000
1,at://did:plc:toz4no26o2x4vsbum7cp4bxp/app.bsky...,bafyreid23kqcxoeu256q34dulh3hdpnv2uio5zneei7p4...,news-flows-nl.bsky.social,did:plc:toz4no26o2x4vsbum7cp4bxp,2025-11-09T20:22:20.960Z,2025-11-09T20:00:39.000000Z,Wat moet die paashaas hier in de lente nog?\n\...,,,0,0,0,0,0,Wat moet die paashaas hier in de lente nog?,Renske Kruitbosch schrijft twee keer per week ...,https://www.ad.nl/binnenland/wat-moet-die-paas...,Eén infuus in Milaan redde het leven van Mees ...,1762718439000000000
2,at://did:plc:toz4no26o2x4vsbum7cp4bxp/app.bsky...,bafyreicjjljc7yvg2svkk7d2k3g435dkpnhczdmq67dtl...,news-flows-nl.bsky.social,did:plc:toz4no26o2x4vsbum7cp4bxp,2025-11-09T20:16:58.159Z,2025-11-09T20:02:09.000000Z,Til maakt indruk bij PSV met 'bijzondere goal'...,,,0,0,0,0,0,Til maakt indruk bij PSV met 'bijzondere goal'...,Guus Til kwam begin oktober per toeval in de s...,https://www.nu.nl/voetbal/6375399/til-maakt-in...,Guus Til kwam begin oktober per toeval in de s...,1762718529000000000
3,at://did:plc:toz4no26o2x4vsbum7cp4bxp/app.bsky...,bafyreihoheekt6tvdgbsrurr6xtwdvjwpy45zxhezvdwm...,news-flows-nl.bsky.social,did:plc:toz4no26o2x4vsbum7cp4bxp,2025-11-09T20:17:03.257Z,2025-11-09T20:02:33.000000Z,Slot hekelt afgekeurde goal Van Dijk tegen Cit...,,,0,0,0,0,0,Slot hekelt afgekeurde goal Van Dijk tegen Cit...,Arne Slot baalt van de afgekeurde goal van Vir...,https://www.nu.nl/voetbal/6375398/slot-hekelt-...,9 nov 2025 om 21:02Update: 2 uur geleden\n\nAr...,1762718553000000000
4,at://did:plc:toz4no26o2x4vsbum7cp4bxp/app.bsky...,bafyreigsq7j275udlaqvwav5i4cvvum4cl7szyx4xd77a...,news-flows-nl.bsky.social,did:plc:toz4no26o2x4vsbum7cp4bxp,2025-11-09T20:47:07.558Z,2025-11-09T20:05:17.000000Z,Ondanks knap staaltje schadebeperking lijkt ti...,,,0,0,0,0,0,Ondanks knap staaltje schadebeperking lijkt ti...,Formule 1: In São Paulo haalde Max Verstappen ...,https://www.nrc.nl/nieuws/2025/11/09/ondanks-k...,„We hebben het in elk geval geprobeerd.” Aan M...,1762718717000000000


In [8]:
from blueskyranker.actor_annotator import ActorAnnotator
annotator = ActorAnnotator(model_name="gpt-oss:20b", seed=0)
print("Actor annotator initialized successfully!")

print("Testing with 20 articles first...")
test_df = df.sample(n=20, random_state=42)

test_annotated = annotator.process_dataframe(
    df=test_df, 
    text_column='news_content', 
    title_column='news_title',
    id_column='uri',
    timeout_seconds=60
)

Actor annotator initialized successfully!
Testing with 20 articles first...
Checking availability of model: gpt-oss:20b
Model gpt-oss:20b already available
Processing 20 articles...


Extracting actors:   5%|▌         | 1/20 [02:00<38:02, 120.11s/it]

Error during extraction: timed out


Extracting actors:  10%|█         | 2/20 [02:52<24:07, 80.39s/it] 

Empty actors array found


Extracting actors:  15%|█▌        | 3/20 [04:52<27:55, 98.54s/it]

Error during extraction: timed out


Extracting actors:  45%|████▌     | 9/20 [07:36<05:16, 28.73s/it]

Empty actors array found


Extracting actors:  50%|█████     | 10/20 [07:48<03:54, 23.50s/it]

Empty actors array found


Extracting actors:  55%|█████▌    | 11/20 [08:08<03:22, 22.48s/it]

Empty actors array found


Extracting actors:  60%|██████    | 12/20 [08:25<02:46, 20.81s/it]

Empty actors array found


Extracting actors:  70%|███████   | 14/20 [10:35<03:50, 38.46s/it]

Empty actors array found


Extracting actors: 100%|██████████| 20/20 [17:25<00:00, 52.25s/it]

Error during extraction: timed out





In [9]:
test_annotated.head()

Unnamed: 0,uri,full_text,news_actors,raw_response
95,at://did:plc:toz4no26o2x4vsbum7cp4bxp/app.bsky...,Norris zet in Brazilië grote stap naar titel: ...,,
15,at://did:plc:toz4no26o2x4vsbum7cp4bxp/app.bsky...,Afvaller was klaar met gekonkel op eiland van ...,"```json\n{\n ""actors"": []\n}\n```",
30,at://did:plc:toz4no26o2x4vsbum7cp4bxp/app.bsky...,Het aanvalsduo van PSV ‘Til-Saibari’ domineert...,,
158,at://did:plc:toz4no26o2x4vsbum7cp4bxp/app.bsky...,Humberto Tan over RTL Tonight: ‘Tot nu toe gaa...,"```json\n{\n ""actors"": [\n {\n ""actor...","{""actors"": [{""actor_name"": ""Humberto Tan"", ""ac..."
128,at://did:plc:toz4no26o2x4vsbum7cp4bxp/app.bsky...,Lewis Hamilton ziet keerzijde van zijn Ferrari...,"```json\n{\n ""actors"": [\n {\n ""actor...","{""actors"": [{""actor_name"": ""Lewis Hamilton"", ""..."


In [10]:
test_annotated.raw_response.value_counts(dropna=False)

raw_response
None                                                                                                                                                                                                                                                                                                                                                                                                                        6
                                                                                                                                                                                                                                                                                                                                                                                                                            3
{"actors": [{"actor_name": "Humberto Tan", "actor_function": "b", "actor_pp": ""}, {"actor_name": "Carlo van Lienden", "actor_function": "b", "actor_pp": ""}, 

In [16]:
# Expand test results to actor-level
from blueskyranker.enrich_actors import ActorEnricher

enricher = ActorEnricher(actor_df=test_annotated, id_column='uri')
test_actors_df = enricher.expand_actors_to_rows()
test_actors_df.head()

2025-11-11 10:32:19 INFO: Checking for updates to resources.json in case models have been updated.  Note: this behavior can be turned off with download_method=None or download_method=DownloadMethod.REUSE_RESOURCES
Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/main/resources_1.11.0.json: 435kB [00:00, 34.8MB/s]                    
2025-11-11 10:32:20 INFO: Downloaded file to /Users/elifkilik/stanza_resources/resources.json
2025-11-11 10:32:20 INFO: Loading these models for language: en (English):
| Processor | Package                   |
-----------------------------------------
| tokenize  | combined                  |
| mwt       | combined                  |
| ner       | ontonotes-ww-multi_charlm |

2025-11-11 10:32:20 INFO: Using device: cpu
2025-11-11 10:32:20 INFO: Loading: tokenize
2025-11-11 10:32:20 INFO: Loading: mwt
2025-11-11 10:32:20 INFO: Loading: ner
2025-11-11 10:32:22 INFO: Done loading processors!


TypeError: ActorEnricher.expand_actors_to_rows() missing 2 required positional arguments: 'actor_df' and 'id_column'

In [10]:
test_actors_df.actor_function.value_counts()

actor_function
b    5
a    4
     4
Name: count, dtype: int64

In [13]:
import stanza
from blueskyranker.actor_annotator import extract_core_name
nlp = stanza.Pipeline(lang='nl', processors='tokenize,ner')
test_actors_df['core_actor_name'] = test_actors_df['actor_name'].apply(extract_core_name, nlp=nlp)

2025-10-20 13:04:43 INFO: Checking for updates to resources.json in case models have been updated.  Note: this behavior can be turned off with download_method=None or download_method=DownloadMethod.REUSE_RESOURCES
Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/main/resources_1.11.0.json: 436kB [00:00, 36.2MB/s]                    
2025-10-20 13:04:44 INFO: Downloaded file to /Users/elifkilik/stanza_resources/resources.json
2025-10-20 13:04:44 INFO: Loading these models for language: nl (Dutch):
| Processor | Package |
-----------------------
| tokenize  | alpino  |
| mwt       | alpino  |
| ner       | conll02 |

2025-10-20 13:04:44 INFO: Using device: cpu
2025-10-20 13:04:44 INFO: Loading: tokenize
2025-10-20 13:04:44 INFO: Loading: mwt
2025-10-20 13:04:44 INFO: Loading: ner
2025-10-20 13:04:45 INFO: Done loading processors!


In [14]:
# print a selection of actor names, functions and pp
for i in test_actors_df[test_actors_df['actor_function'] == 'a'].index:
    print("Actor Names:", test_actors_df.at[i, 'actor_name'])
    print("Actor NER:", test_actors_df.at[i, 'core_actor_name'])
    print("\n" + "="*80 + "\n")

Actor Names: kolonel Mickaël Randrianirina
Actor NER: Mickaël Randrianirina


Actor Names: Hooggerechtshof in Madagaskar
Actor NER: None


Actor Names: Gouverneur Oleh Syniehubov van de regio
Actor NER: Oleh Syniehubov


Actor Names: de aanklager
Actor NER: None




In [None]:
# Requires: pip install SPARQLWrapper requests pandas
from SPARQLWrapper import SPARQLWrapper, JSON
import requests
import pandas as pd

WDQS = "https://query.wikidata.org/sparql"
HEADERS = {"User-Agent": "PartyLookup/0.1 (your-email@example.com)"}

def query_sparql(sparql):
    sparqlw = SPARQLWrapper(WDQS, agent=HEADERS["User-Agent"])
    sparqlw.setQuery(sparql)
    sparqlw.setReturnFormat(JSON)
    return sparqlw.query().convert()

def search_wikidata(name, language="en"):
    params = {
        "action": "wbsearchentities",
        "search": name,
        "language": language,
        "format": "json",
        "limit": 1
    }
    resp = requests.get("https://www.wikidata.org/w/api.php", params=params, headers=HEADERS)
    resp.raise_for_status()
    hits = resp.json().get("search", [])
    return hits[0]["id"] if hits else None

def get_latest_party_name(name, language="en"):
    qid = search_wikidata(name, language=language)
    if not qid:
        return None
    
    sparql = f"""
    SELECT ?partyLabel ?start ?end WHERE {{
      VALUES ?person {{ wd:{qid} }}
      ?person p:P102 ?stmt .
      ?stmt ps:P102 ?party .
      OPTIONAL {{ ?stmt pq:P580 ?start. }}
      OPTIONAL {{ ?stmt pq:P582 ?end. }}
      SERVICE wikibase:label {{ bd:serviceParam wikibase:language "en". }}
    }}
    """
    results = query_sparql(sparql)
    df = pd.DataFrame([{
        "party": r["partyLabel"]["value"],
        "start": r.get("start", {}).get("value"),
        "end": r.get("end", {}).get("value"),
    } for r in results["results"]["bindings"]])
    
    if df.empty:
        return None
    
    # order by start descending, if not null, else end descending
    df['start'] = pd.to_datetime(df['start'], errors='coerce')
    df['end'] = pd.to_datetime(df['end'], errors='coerce')
    
    df = df.sort_values(by=['start', 'end'], ascending=[False, False]).reset_index(drop=True)
    
    return df['party'][0]


In [None]:
test_actors_df['party'] = test_actors_df['core_actor_name'].apply(lambda x: get_latest_party_name(x) if pd.notna(x) else None)

In [None]:
# print a selection of actor names, functions and pp
for i in test_actors_df[test_actors_df['actor_function'] == 'a'].index:
    print("Actor Names:", test_actors_df.at[i, 'core_actor_name'])
    print("Actor party:", test_actors_df.at[i, 'party'])
    print("\n" + "="*80 + "\n")

In [None]:
actor_df.actor_function.value_counts(dropna=False)

In [None]:
# get party for core actor names if actor_function is a 
def lookup_party(row):
    if pd.isna(row['core_actor_name']) or row['core_actor_name'] is None:
        return None
    if pd.isna(row['actor_function']) or row['actor_function'] is None:
        return get_latest_party_name(row['core_actor_name'])
    func = row['actor_function'].lower()
    if row['actor_function'].lower() == 'a':
        return get_latest_party_name(row['core_actor_name'])
    return None

actor_df['actor_wikiparty'] = actor_df.apply(lookup_party, axis=1)

In [None]:
actor_df.head()

In [None]:
actor_df.actor_wikiparty.value_counts(dropna=False)

In [None]:
actor_df[actor_df['actor_function'] == 'a'].actor_name.value_counts(dropna=False)

In [None]:
df['domain'] = df[16].map(lambda x: " ".join(x.replace("www.","").split('.')[:1]))
df['isempty'] = df[15].isnull()
pd.crosstab(df['domain'], df['isempty']).sort_values(by=True, ascending=False).head(20)

In [None]:
# read example.csv
example_df = pd.read_csv("blueskyranker/example_news.csv")
print(example_df.shape)
example_df.head()

In [None]:
# print news_uri, news_description and text
example_df[['news_uri', 'news_description', 'text']].values[1]

## 3) Rank posts by topic

- Methods: `networkclustering-tfidf`, `networkclustering-count`, `networkclustering-sbert` (slower, higher semantic quality).
- `similarity_threshold`: raise for fewer/tighter clusters.
- `vectorizer_stopwords`: 'english' | list of words | None.


In [None]:
ranker = TopicRanker(
    returnformat='dataframe',
    method='networkclustering-sbert',  # try 'networkclustering-tfidf' for semantics
    descending=True,
    similarity_threshold=0.2,
    vectorizer_stopwords='english',
    # Optional windows (days):
    cluster_window_days=7,
    engagement_window_days=3,
    push_window_days=1,
)
ranking = ranker.rank(data)
ranking.head()


## 4) Inspect top clusters and posts

- We show the 3 most engaged clusters.
- For each, we list the 5 most recent posts with key fields.


In [None]:
clusters = (
    ranking.group_by('cluster')
    .agg([
        pl.col('cluster_size').first().alias('size'),
        pl.col('cluster_engagement_count').first().alias('engagement')
    ])
    .sort('engagement', descending=True)
    .head(3)
)
for row in clusters.iter_rows(named=True):
    cid = row['cluster']
    size = int(row['size']) if row['size'] is not None else 0
    eng = int(row['engagement']) if row['engagement'] is not None else 0
    print(f"\n=== Cluster {cid} | size={size} | engagement={eng}")
    subset = (
        ranking.filter(pl.col('cluster') == cid)
        .sort('createdAt', descending=True)
        .head(5)
    )
    for rec in subset.select(['uri','text','news_title','news_description','news_uri']).iter_rows(named=True):
        print(f"- uri: {rec['uri']}")
        print(f"  text: {rec.get('text')}")
        print(f"  news_title: {rec.get('news_title')}")
        print(f"  news_description: {rec.get('news_description')}")
        print(f"  news_uri: {rec.get('news_uri')}")


## 5) (Optional) Generate a cluster report

- This writes `cluster_report.md` with top clusters per handle.
- You can adjust method, threshold, and stopwords.


## 6) (Optional) End-to-end: fetch → rank → push (per handle)

- Runs the whole flow and logs a short cluster summary to `push.log`.


In [None]:
from blueskyranker.pipeline import run_fetch_rank_push
run_fetch_rank_push(
    handles=[handle],
    method='networkclustering-sbert', similarity_threshold=0.5,
    cluster_window_days=7, engagement_window_days=1, push_window_days=1,
    include_pins=False, test=True, log_path='push.log'
)


In [None]:
from blueskyranker.cluster_report import generate_cluster_report
generate_cluster_report(db_path='newsflows.db', output_path='cluster_report.md',
                        method='networkclustering-sbert', sample_max=300,
                        similarity_threshold=0.2, vectorizer_stopwords='english')
print('Wrote cluster_report.md')


### Pipeline updates (priority and demotion)

- Priority assignment now starts at 1000 for the first item and decreases by 1 (1000, 999, 998, …). The minimum is clamped at 1. Items explicitly demoted are sent with priority 0.
- Demotion: by default, all posts from the last 48 hours that are not in the current prioritisation are sent with priority 0. Configure via `--demote-window-hours`.
- Export filenames use a human‑readable UTC timestamp: `push_{handle}_{YYYY-MM-DDTHH-mm-ssZ}.json`.
- Server responses: short responses print to stdout; long responses are saved to `push_exports/prioritize_response_{handle}_{YYYY-MM-DDTHH-mm-ssZ}.{json|txt}`.

Example CLI:

```
python -m blueskyranker.pipeline \
+  --handles news-flows-nl.bsky.social news-flows-fr.bsky.social \
+  --method networkclustering-tfidf \
+  --similarity-threshold 0.2 \
+  --cluster-window-days 7 \
+  --engagement-window-days 1 \
+  --push-window-days 2 \
+  --demote-last \
+  --demote-window-hours 48 \
+  --log-path push.log \
+  --no-test
```

Programmatic call:

```python
from blueskyranker.pipeline import run_fetch_rank_push
run_fetch_rank_push(
    handles=['news-flows-nl.bsky.social'],
    method='networkclustering-tfidf', similarity_threshold=0.2,
    cluster_window_days=7, engagement_window_days=1, push_window_days=2,
    demote_last=True, demote_window_hours=48,
    include_pins=False, test=True, log_path='push.log')
```


### Pipeline updates (priority and demotion)

- Priority assignment now starts at 1000 for the first item and decreases by 1 (1000, 999, 998, …). The minimum is clamped at 1. Items explicitly demoted are sent with priority 0.
- Demotion: by default, all posts from the last 48 hours that are not in the current prioritisation are sent with priority 0. Configure via `--demote-window-hours`.
- Export filenames use a human‑readable UTC timestamp: `push_{handle}_{YYYY-MM-DDTHH-mm-ssZ}.json`.
- Server responses: short responses print to stdout; long responses are saved to `push_exports/prioritize_response_{handle}_{YYYY-MM-DDTHH-mm-ssZ}.{json|txt}`.

Example CLI:

```
python -m blueskyranker.pipeline \
+  --handles news-flows-nl.bsky.social news-flows-fr.bsky.social \
+  --method networkclustering-tfidf \
+  --similarity-threshold 0.2 \
+  --cluster-window-days 7 \
+  --engagement-window-days 1 \
+  --push-window-days 2 \
+  --demote-last \
+  --demote-window-hours 48 \
+  --log-path push.log \
+  --no-test
```

Programmatic call:

```python
from blueskyranker.pipeline import run_fetch_rank_push
run_fetch_rank_push(
    handles=['news-flows-nl.bsky.social'],
    method='networkclustering-tfidf', similarity_threshold=0.2,
    cluster_window_days=7, engagement_window_days=1, push_window_days=2,
    demote_last=True, demote_window_hours=48,
    include_pins=False, test=True, log_path='push.log')
```


### Ordering logic (time windows)

- Clustering window: clusters are built from posts in this window (e.g., 7 days).
- Engagement window: cluster engagement is computed here to derive `cluster_engagement_rank` (1 = most engaged).
- Push window: only posts in this window are eligible for the final feed.

Order of posts:

1) Filter to the push window.

2) Order clusters by engagement rank (most engaged first).

3) Within each cluster, sort by recency (newest first).

4) Interleave round‑robin across clusters in rank order (1, 2, 3, … then repeat).

Result: the first post is the most‑recent item from the most‑engaged cluster that has posts in the push window.
