# Bluesky Ranker — Example Notebook

This notebook demonstrates the typical workflow:
- Fetch recent public posts into SQLite (upsert-by-URI)
- Load posts from SQLite into a Polars DataFrame
- Rank posts using the TopicRanker (TF–IDF/Count/SBERT)
- Inspect the top clusters and sample posts
- (Optional) Generate a per-handle cluster report to Markdown


> Note: This notebook expects a SQLite DB with posts.
Create one via the sample (no network): `python -m blueskyranker.sample_db --db newsflows_sample.db`
or fetch live data via the fetcher CLI.


In [1]:
# Imports and setup
import polars as pl
from blueskyranker.fetcher import Fetcher, ensure_db, load_posts_df
from blueskyranker.ranker import TopicRanker

## 1) Fetch recent posts into SQLite

- Adjust `--max-age-days` to control the time window.
- Upsert ensures engagement metrics refresh over time.
- You can also call the fetcher via CLI if you prefer.


In [None]:
fetcher = Fetcher()
result = fetcher.fetch(max_age_days=1, 
                       extract_articles=True, 
                       extract_actors=True, 
                       handles=['news-flows-nl.bsky.social'],
                       ollama_model='phi3:3.8b')  
print(result)

#511 posts 3hours 41 mins

Posts fetched (all handles): 0post [00:00, ?post/s]
Posts fetched (all handles): 100post [49:47,  8.42s/post]
[A
Posts fetched (all handles): 200post [1:33:02, 30.04s/post] 
[A
Posts fetched (all handles): 300post [1:56:41,  5.94s/post]
[A
Posts fetched (all handles): 400post [2:52:36, 15.59s/post] 
[A
Posts fetched (all handles): 500post [3:38:48,  4.69s/post] 
[A
Posts fetched (all handles): 511post [3:40:47,  8.47s/post]
[A
[A
[A
[A
                                                           
Handles: 100%|██████████| 1/1 [3:40:48<00:00, 13248.50s/handle]
Posts fetched (all handles): 511post [3:40:48, 25.93s/post]

✅ DONE news-flows-nl.bsky.social: upserted 511 posts into SQLite

FINAL REPORT

Handle: news-flows-nl.bsky.social
  Pages fetched         : 7
  Posts fetched         : 511
    - originals         : 511
    - replies           : 0
    - reposts           : 0
  Engagement (sums)
    - likes             : 4
    - reposts           : 2
    - replies           : 0
    - quotes            : 1
  Engagement (averages per post)
    - likes             : 0.01
    - reposts           : 0.00
    - replies           : 0.00
    - quotes            : 0.00
  Time range            : 2025-10-01T11:00:00+00:00  →  2025-10-02T10:40:26+00:00
  Time taken            : 13248.36s
  Effective rate        : 0.04 posts/sec
  WARN embed anomalies  :
    - empty news_title  : 118
    - empty news_descr. : 121
    - empty news_uri    : 0

------------------------------------------------------------------------
All handles combined
------------------------------------------------------------------------
  Total page




## 2) Load posts from SQLite

- Choose a handle you want to rank.
- You can limit rows or change ordering as needed.


In [2]:
conn = ensure_db('newsflows.db')
handle = 'news-flows-nl.bsky.social'  # pick one of your handles
data = load_posts_df(conn, handle = handle, order_by='createdAt', descending=False)
data.head()

uri,cid,author_handle,author_did,indexedAt,createdAt,text,reply_root_uri,reply_parent_uri,is_repost,like_count,repost_count,reply_count,quote_count,news_title,news_description,news_uri,news_content,news_actors,createdAt_ns
str,str,str,str,str,str,str,null,null,i64,i64,i64,i64,i64,str,str,str,str,str,i64
"""at://did:plc:toz4no26o2x4vsbum…","""bafyreiatiwxyirtg4tge5dzohaosb…","""news-flows-nl.bsky.social""","""did:plc:toz4no26o2x4vsbum7cp4b…","""2025-10-02T06:11:38.750Z""","""2025-10-01T11:00:00.000000Z""","""DYSTINCT geeft eenmalig klassi…",,,0,0,0,0,0,,,"""https://www.rtl.nl/boulevard/e…","""Blijf op de hoogte van de nieu…","""I's/hamburgersville', and the …",1759316400000000000
"""at://did:plc:toz4no26o2x4vsbum…","""bafyreiadbfyrftvq4rfxeteudu2hs…","""news-flows-nl.bsky.social""","""did:plc:toz4no26o2x4vsbum7cp4b…","""2025-10-01T11:47:27.649Z""","""2025-10-01T11:00:48.000000Z""","""Carine ontdekte middeleeuws re…",,,0,0,0,0,0,"""Carine ontdekte middeleeuws re…","""Aandacht voor een stralende hu…","""https://www.ad.nl/binnenland/c…","""Aandacht voor een stralende hu…","""The - [Question:""",1759316448000000000
"""at://did:plc:toz4no26o2x4vsbum…","""bafyreie5mp2h6c32bupndy22fsztt…","""news-flows-nl.bsky.social""","""did:plc:toz4no26o2x4vsbum7cp4b…","""2025-10-02T06:09:23.449Z""","""2025-10-01T11:03:00.000000Z""","""Opgepast: deze autostoeltjes z…",,,0,0,0,0,0,"""Opgepast: deze autostoeltjes z…","""De zitjes met kinderpoppen wer…","""https://www.metronieuws.nl/in-…","""Zowel de Consumentenbond als d…","""In the textbook> I's_Because…",1759316580000000000
"""at://did:plc:toz4no26o2x4vsbum…","""bafyreiet6vcasntqy5wjgdrtyqbco…","""news-flows-nl.bsky.social""","""did:plc:toz4no26o2x4vsbum7cp4b…","""2025-10-01T12:33:07.254Z""","""2025-10-01T11:10:00.000000Z""","""Joshua Brenet op weg naar Scho…",,,0,0,0,0,0,"""TransferTalk | Joshua Brenet o…","""De transfermarkt is weliswaar …","""https://www.ad.nl/voetbal/tran…","""Het contract van Bernardo Silv…","""You are you's_B-10: # Instruc…",1759317000000000000
"""at://did:plc:toz4no26o2x4vsbum…","""bafyreicnmcz75ep6halkjy6k7mjpt…","""news-flows-nl.bsky.social""","""did:plc:toz4no26o2x4vsbum7cp4b…","""2025-10-02T06:11:48.053Z""","""2025-10-01T11:13:49.000000Z""","""Drie tentoonstellingen tonen m…",,,0,0,0,0,0,,,"""https://www.nrc.nl/nieuws/2025…","""Het sciencefictionachtige oeuv…","""I've been told that the follow…",1759317229000000000


In [1]:
sample_content = """
Bus firm 'refused to transport asylum seekers from Peckham hotel to the Bibby Stockholm barge amid huge protests because they feared they would get negative publicity'
Protesters blocked the route from a Best Western Hotel in Peckham
A bus firm which refused to transport asylum seekers from a hotel to the Bibby Stockholm barge thought they would get 'negative publicity', a court heard.
Protesters blocked the route from a Best Western Hotel in Peckham, South East London to the infamous accommodation barge for migrants at 8.30am on May 2.
Stratford magistrates' court heard yesterday that the bus company contracted by the Border Force declared it wouldn't move them because they were worried about optics and it wasn't 'enforced'.
The vehicle was surrounded by protesters and one of the tyres was deflated. Police attended the scene but around 60 activists later began to surround three police carriers as they detained people.
The court was told that the group was 'three layers deep and fortified with push bikes and hire bikes' in the road.
Jony Cink, 23, and Indea Barbe-Wilson, 31, went on trial charged with obstruction of the highway.
Barbe-Wilson is alleged to have sat in front of a police vehicle because she 'wanted to stop them from experiencing any more trauma', it was heard.
During her evidence, Barbe-Wilson said she 'wanted to go and stop them from being moved to Bibby Stockholm' and arrived about five minutes before she was arrested.
The defendant said she believed the coach was still in place and did not know it had already been moved and there had been arrests.
She said she sat, joined other protesters and started joining chants of 'this is what community looks like'.
Barbe-Wilson, who is a student, said from what she had read in the news about whistleblowers who had previously worked on Bibby Stockholm, 'it is not a good place'.
The barge, moored off the coast of Portland in Dorset, is the only accommodation barge for migrants commissioned so far by ministers and has faced a series of setbacks since its arrival.
The discovery of dangerous bacteria led to its evacuation last summer just days after the arrival of the first asylum seekers, and it remained vacant for two months.
The Bibby Stockholm: A People's Inquiry spoke to a number of former residents for first-hand accounts of what life was like on the vessel.
The report, produced by Care4Calais, Stand Up To Racism and the Portland Global Friendship Group, called for the immediate closure of the barge and no renewal of its contract, as well as investment in asylum claim decision-makers.
Barbe-Wilson told the court: 'My understanding is that they wouldn't be treated well, they would experience further trauma, they're people who had been through enough already and had come here for safety.'
'I wanted to stop them from experiencing any more trauma.'
Barbe-Wilson said: 'Honestly, if I had known that the coach had not been there and if I had known there had already been arrests, I wouldn't have sat down, I wouldn't have the guts or thought there was any point in it, it wasn't an anti-police action.'
When police first arrived at the scene, a number of people were surrounding the coach which was in a bus lane, but traffic was able to move freely down the road initially, Superintendent Matt Cox said.
He said very early on he was informed by the company contracted by Border Force that they would no longer be moving the asylum seekers 'as they did not want any negative publicity or whatever it was' and it 'was not an enforced move'.
Police became aware that one of the coach's tyres had been deflated and said a decision was made for officers to try and get in between the protesters and the coach.
The coach was later moved and about 60 protesters began to surround three police carriers which suspects were being placed inside, prosecutor Timothy Fulford told the court.
About four hours after the protest started when arrests were being made, 'there was a lot of pushing and shoving going on with police officers trying to remove protesters, but also protesters trying to stop police officers undertaking their lawful duty', Mr Cox said.
The whole road became blocked 'very quickly' and Mr Cox tried to communicate with the protesters while a drummer followed him 'making an awful lot of noise', the court heard.
Mr Cox said: 'It was probably one of the most chaotic scenes and one of the hardest ones for police officers to deal with in a long time.'
The welfare of the suspects on the police carriers was a concern for him as they did not know who may or may not have needs to care for, he added.
He estimated the protesters were surrounding the police carriers for about two hours until 3.25pm.
Chief Inspector Vicky Causbrook described 'several hundred' people watching on at about 3pm who 'weren't able to catch buses or go about their daily lives'.
And a school wanted to let its students leave at the end of the day, but 'was afraid to because of the noise and the protest and the risk to the students', she added.
Her body-worn camera footage was played to the court which showed a police cordon near the scene as drums and chanting could be heard, and people booing at one stage.
The moment Cink was arrested was played to the court on PC Ian Rawsthorne's body-worn camera footage.
He was sat linking arms with others when the officer pulled him up, then for part of the journey to the police carrier Cink was carried horizontally.
PC Rawsthorne said that during the defendant's arrest 'there was not aggressive resistance but he wasn't assisting either'.
Footage of Barbe-Wilson's arrest was also played, which showed the defendant sat with her arms linked with others and being told to release her arms before she was handcuffed and brought to the police carrier.
The trial continues.
"""

In [2]:
from blueskyranker.fetcher import extract_actors_from_content

news_actors = extract_actors_from_content(sample_content)

In [3]:
print("Extracted actors:", news_actors)

Extracted actors: {"actors": [{"actor_name": "his wife", "actor_function": "d", "actor_pp": ""}, {"actor_name": "UNHCR", "actor_function": "a", "actor_pp": ""}, {"actor_name": "Luca Bonanno", "actor_function": "b", "actor_pp": ""}, {"actor_name": "Fahad Ansari", "actor_function": "b", "actor_pp": ""}, {"actor_name": "spokesperson for the Italian embassy in London", "actor_function": "a", "actor_pp": ""}, {"actor_name": "British embassy officials", "actor_function": "a", "actor_pp": ""}]}


In [None]:
import spacy
# from fuzzywuzzy import process

# Load SpaCy English model (use transformer if available for better accuracy)
!python -m spacy download en_core_web_sm
nlp = spacy.load("en_core_web_sm")  
import json
import re

# Parse string into a Python dict
actors_json = json.loads(news_actors)

def clean_actor_name(name):
    # Remove text in parentheses
    return re.sub(r"\(.*?\)", "", name).strip()

    
def extract_core_name(full_name):
    """
    Use NER to decide if this is a PERSON or ORG/GPE.
    - For PERSON: return the detected person name
    - For ORG/GPE: return cleaned organization name
    - Otherwise: return None (generic actor)
    """
    clean_name = clean_actor_name(full_name)
    doc = nlp(clean_name)
    
    # Collect labels
    for ent in doc.ents:
        if ent.label_ == "PERSON":
            return ent.text  # just the person name
        elif ent.label_ in ["ORG", "GPE"]:
            return clean_name  # keep full org name
    
    # fallback: if no entity found, return None (probably generic like "government spokesperson")
    return None

# Process actors
for actor in actors_json["actors"]:
    core_name = extract_core_name(actor["actor_name"])
    print(f"Original: {actor['actor_name']} -> Core Name: {core_name}")

Collecting en-core-web-sm==3.8.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl (12.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m75.3 MB/s[0m  [33m0:00:00[0m
[?25hInstalling collected packages: en-core-web-sm
Successfully installed en-core-web-sm-3.8.0
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')
Original: Kent County Council (KCC) -> Core Name: Kent County Council
Original: High Court -> Core Name: None
Original: Home Office -> Core Name: Home Office
Original: Sue Chandler -> Core Name: Sue Chandler
Original: government spokesperson -> Core Name: None


In [6]:
print(data.shape)

(421, 19)


In [9]:
import pandas as pd
pd.DataFrame(data['news_uri']).map(lambda x: " ".join(x.replace("www.","").split('.')[:1])).value_counts()

0                   
https://ad              134
https://rtl             126
https://nu              107
https://nos              43
https://nrc              40
https://volkskrant       33
https://metronieuws      14
https://geenstijl        11
https://mediacourant      3
Name: count, dtype: int64

In [44]:
import pandas as pd
df = pd.DataFrame(data)
df['domain'] = df[16].map(lambda x: " ".join(x.replace("www.","").split('.')[:1]))
df['isempty'] = df[15].isnull()
pd.crosstab(df['domain'], df['isempty']).sort_values(by=True, ascending=False).head(20)

isempty,False,True
domain,Unnamed: 1_level_1,Unnamed: 2_level_1
https://rtl,93,33
https://ad,111,23
https://nu,85,22
https://volkskrant,19,14
https://nos,31,12
https://nrc,31,9
https://geenstijl,4,7
https://metronieuws,13,1
https://mediacourant,3,0


In [45]:
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,12,13,14,15,16,17,18,19,domain,isempty
0,at://did:plc:toz4no26o2x4vsbum7cp4bxp/app.bsky...,bafyreiatiwxyirtg4tge5dzohaosbvtt4irwr3ost3mmo...,news-flows-nl.bsky.social,did:plc:toz4no26o2x4vsbum7cp4bxp,2025-10-02T06:11:38.750Z,2025-10-01T11:00:00.000000Z,DYSTINCT geeft eenmalig klassiek optreden in T...,,,0,...,0,0,,,https://www.rtl.nl/boulevard/entertainment/art...,Blijf op de hoogte van de nieuwste RTL program...,"I's/hamburgersville', and the user:\rnationwid...",1759316400000000000,https://rtl,True
1,at://did:plc:toz4no26o2x4vsbum7cp4bxp/app.bsky...,bafyreiadbfyrftvq4rfxeteudu2hsqqxdggnlessmgb54...,news-flows-nl.bsky.social,did:plc:toz4no26o2x4vsbum7cp4bxp,2025-10-01T11:47:27.649Z,2025-10-01T11:00:48.000000Z,Carine ontdekte middeleeuws recept om mooi gol...,,,0,...,0,0,Carine ontdekte middeleeuws recept om mooi gol...,"Aandacht voor een stralende huid, glanzend haa...",https://www.ad.nl/binnenland/carine-ontdekte-m...,"Aandacht voor een stralende huid, glanzend haa...",The \n- [Question:,1759316448000000000,https://ad,False
2,at://did:plc:toz4no26o2x4vsbum7cp4bxp/app.bsky...,bafyreie5mp2h6c32bupndy22fsztthvv2d34vybwzgebb...,news-flows-nl.bsky.social,did:plc:toz4no26o2x4vsbum7cp4bxp,2025-10-02T06:09:23.449Z,2025-10-01T11:03:00.000000Z,Opgepast: deze autostoeltjes zijn levensgevaar...,,,0,...,0,0,Opgepast: deze autostoeltjes zijn levensgevaar...,De zitjes met kinderpoppen werden bij de botsp...,https://www.metronieuws.nl/in-het-nieuws/binne...,Zowel de Consumentenbond als de ANWB waarschuw...,In the textbook> \n\nI's_Because I/libido \n\n...,1759316580000000000,https://metronieuws,False
3,at://did:plc:toz4no26o2x4vsbum7cp4bxp/app.bsky...,bafyreiet6vcasntqy5wjgdrtyqbcot5gwrdwqw5hzracm...,news-flows-nl.bsky.social,did:plc:toz4no26o2x4vsbum7cp4bxp,2025-10-01T12:33:07.254Z,2025-10-01T11:10:00.000000Z,"Joshua Brenet op weg naar Schotland, Bernardo ...",,,0,...,0,0,TransferTalk | Joshua Brenet op weg naar Schot...,"De transfermarkt is weliswaar gesloten, maar e...",https://www.ad.nl/voetbal/transfertalk-joshua-...,Het contract van Bernardo Silva bij Manchester...,You are you's_B-10:\r\n# Instruction=2 \n \n|,1759317000000000000,https://ad,False
4,at://did:plc:toz4no26o2x4vsbum7cp4bxp/app.bsky...,bafyreicnmcz75ep6halkjy6k7mjptbg762j36mgyhxgcu...,news-flows-nl.bsky.social,did:plc:toz4no26o2x4vsbum7cp4bxp,2025-10-02T06:11:48.053Z,2025-10-01T11:13:49.000000Z,Drie tentoonstellingen tonen mode van een eeuw...,,,0,...,0,0,,,https://www.nrc.nl/nieuws/2025/10/01/drie-tent...,Het sciencefictionachtige oeuvre van Iris van ...,"I've been told that the following sentence: ""T...",1759317229000000000,https://nrc,True


In [49]:
df[18].sample(20).values

array(["The \nYou's \n\nI \nIn the YouTeaching a) toothen, and I am notebook \n\n\r\nAnswer:** \nToasty_user=instruction: The documentary-Based onion of course of allergia.')</p[...]\nThe \n1.",
       "Input:\n\n-Given the following document, I've been given a list of strings that are not in order to provide you with an extensive and detailed explanation for each step by which_user=\n\n### Instruction>\nWrite a comprehensive review article on how many times per day. The",
       'I.\nThe documentary_user: I apologize you areta \nQuestion: Create a) the user:\r\n\n### QUESTION: The FBI and your task: A \n\nAs anatomy, ascorrection',
       'I\'m sorry, I apologize, but the document has been a newcomers to this conversation between two orang-prize_text= "The Great Gulfstreams and their workout of an essay on your own life insights into how much more detailed information about mealw \n\n### Ph.D.AI, I\'m sorry, it seems like a good day to be able to provide you with the most recent advan

In [43]:
df.head()

NameError: name 'df' is not defined

In [12]:
# read example.csv
example_df = pd.read_csv("blueskyranker/example_news.csv")
print(example_df.shape)
example_df.head()

(10000, 11)


Unnamed: 0,uri,cid,indexed_at,text,news_title,news_description,news_uri,reply_count,repost_count,like_count,quote_count
0,at://did:plc:toz4no26o2x4vsbum7cp4bxp/app.bsky...,bafyreieqmukhyxpxajkfavvqbrholxef7is62mfklqhof...,2025-08-06T09:18:10Z,Besloten afscheidsdienst voor Hulk Hogan in Fl...,Besloten afscheidsdienst voor Hulk Hogan in Fl...,Familie en vrienden hebben afscheid genomen va...,https://www.rtl.nl/boulevard/artikel/5521988/b...,0,0,0,0
1,at://did:plc:toz4no26o2x4vsbum7cp4bxp/app.bsky...,bafyreifh6zdh6vvra73cjspovrml33mlulk2ezfa2sh33...,2025-08-06T09:02:26Z,Slot kan twee aanvallers kwijtraken bij Liverp...,TransferTalk | Slot kan twee aanvallers kwijtr...,Met het nieuwe voetbalseizoen in aantocht draa...,https://www.ad.nl/transfernieuws/transfertalk-...,0,0,0,0
2,at://did:plc:toz4no26o2x4vsbum7cp4bxp/app.bsky...,bafyreifknnduruehkzbt4hfiugdlyp3r3cccxrl6dtdh2...,2025-08-06T09:02:24Z,Nederlands meisje (3) verdronken bij Spaanse v...,Nederlands meisje (3) verdronken bij Spaanse v...,Een 3-jarig Nederlands meisje is maandag verdr...,https://www.ad.nl/buitenland/nederlands-meisje...,0,0,0,0
3,at://did:plc:toz4no26o2x4vsbum7cp4bxp/app.bsky...,bafyreifrqfiaxii6lha5xaqvruj7nerovcq5g5cjd7l2z...,2025-08-06T09:02:22Z,Ook de verkoop van tweedehands Tesla’s dreigt ...,Ook de verkoop van tweedehands Tesla’s dreigt ...,Nadat de verkoop van nieuwe Tesla’s de afgelop...,https://www.ad.nl/auto/ook-de-verkoop-van-twee...,0,0,0,0
4,at://did:plc:toz4no26o2x4vsbum7cp4bxp/app.bsky...,bafyreibxhxtx54t6kbtqwmobrpn3x3aa3hfdadhhiu2jq...,2025-08-06T09:18:07Z,"Kandidatenlijst SGP blijft in beton gegoten, m...","Kandidatenlijst SGP blijft in beton gegoten, m...",De kandidatenlijst van de SGP voor de Tweede K...,https://www.volkskrant.nl/politiek/kandidatenl...,0,0,0,0


In [14]:
# print news_uri, news_description and text
example_df[['news_uri', 'news_description', 'text']].values[1]

array(['https://www.ad.nl/transfernieuws/transfertalk-slot-kan-twee-aanvallers-kwijtraken-bij-liverpool-manunited-bereikt-akkoord-met-spits-van-85-miljoen~a2335771/',
       'Met het nieuwe voetbalseizoen in aantocht draait de transfermolen op volle toeren. Welke spelers vinden voor september onderdak bij een nieuwe club, en wat zijn de laatste geruchten? Hieronder volg je...',
       'Slot kan twee aanvallers kwijtraken bij Liverpool, ManUnited bereikt akkoord met spits van 85 miljoen\n\nMet het nieuwe voetbalseizoen in aantocht draait de transfermolen op volle toeren. Welke spelers vinden voor september onderdak bij een nieuwe club, en wat zijn de laatste geruchten? Hieronder...'],
      dtype=object)

## 3) Rank posts by topic

- Methods: `networkclustering-tfidf`, `networkclustering-count`, `networkclustering-sbert` (slower, higher semantic quality).
- `similarity_threshold`: raise for fewer/tighter clusters.
- `vectorizer_stopwords`: 'english' | list of words | None.


In [13]:
ranker = TopicRanker(
    returnformat='dataframe',
    method='networkclustering-sbert',  # try 'networkclustering-tfidf' for semantics
    descending=True,
    similarity_threshold=0.2,
    vectorizer_stopwords='english',
    # Optional windows (days):
    cluster_window_days=7,
    engagement_window_days=3,
    push_window_days=1,
)
ranking = ranker.rank(data)
ranking.head()


TypeError: from_epoch() got an unexpected keyword argument 'unit'

## 4) Inspect top clusters and posts

- We show the 3 most engaged clusters.
- For each, we list the 5 most recent posts with key fields.


In [None]:
clusters = (
    ranking.group_by('cluster')
    .agg([
        pl.col('cluster_size').first().alias('size'),
        pl.col('cluster_engagement_count').first().alias('engagement')
    ])
    .sort('engagement', descending=True)
    .head(3)
)
for row in clusters.iter_rows(named=True):
    cid = row['cluster']
    size = int(row['size']) if row['size'] is not None else 0
    eng = int(row['engagement']) if row['engagement'] is not None else 0
    print(f"\n=== Cluster {cid} | size={size} | engagement={eng}")
    subset = (
        ranking.filter(pl.col('cluster') == cid)
        .sort('createdAt', descending=True)
        .head(5)
    )
    for rec in subset.select(['uri','text','news_title','news_description','news_uri']).iter_rows(named=True):
        print(f"- uri: {rec['uri']}")
        print(f"  text: {rec.get('text')}")
        print(f"  news_title: {rec.get('news_title')}")
        print(f"  news_description: {rec.get('news_description')}")
        print(f"  news_uri: {rec.get('news_uri')}")


## 5) (Optional) Generate a cluster report

- This writes `cluster_report.md` with top clusters per handle.
- You can adjust method, threshold, and stopwords.


## 6) (Optional) End-to-end: fetch → rank → push (per handle)

- Runs the whole flow and logs a short cluster summary to `push.log`.


In [None]:
from blueskyranker.pipeline import run_fetch_rank_push
run_fetch_rank_push(
    handles=[handle],
    method='networkclustering-sbert', similarity_threshold=0.5,
    cluster_window_days=7, engagement_window_days=1, push_window_days=1,
    include_pins=False, test=True, log_path='push.log'
)


In [None]:
from blueskyranker.cluster_report import generate_cluster_report
generate_cluster_report(db_path='newsflows.db', output_path='cluster_report.md',
                        method='networkclustering-sbert', sample_max=300,
                        similarity_threshold=0.2, vectorizer_stopwords='english')
print('Wrote cluster_report.md')


### Pipeline updates (priority and demotion)

- Priority assignment now starts at 1000 for the first item and decreases by 1 (1000, 999, 998, …). The minimum is clamped at 1. Items explicitly demoted are sent with priority 0.
- Demotion: by default, all posts from the last 48 hours that are not in the current prioritisation are sent with priority 0. Configure via `--demote-window-hours`.
- Export filenames use a human‑readable UTC timestamp: `push_{handle}_{YYYY-MM-DDTHH-mm-ssZ}.json`.
- Server responses: short responses print to stdout; long responses are saved to `push_exports/prioritize_response_{handle}_{YYYY-MM-DDTHH-mm-ssZ}.{json|txt}`.

Example CLI:

```
python -m blueskyranker.pipeline \
+  --handles news-flows-nl.bsky.social news-flows-fr.bsky.social \
+  --method networkclustering-tfidf \
+  --similarity-threshold 0.2 \
+  --cluster-window-days 7 \
+  --engagement-window-days 1 \
+  --push-window-days 2 \
+  --demote-last \
+  --demote-window-hours 48 \
+  --log-path push.log \
+  --no-test
```

Programmatic call:

```python
from blueskyranker.pipeline import run_fetch_rank_push
run_fetch_rank_push(
    handles=['news-flows-nl.bsky.social'],
    method='networkclustering-tfidf', similarity_threshold=0.2,
    cluster_window_days=7, engagement_window_days=1, push_window_days=2,
    demote_last=True, demote_window_hours=48,
    include_pins=False, test=True, log_path='push.log')
```


### Pipeline updates (priority and demotion)

- Priority assignment now starts at 1000 for the first item and decreases by 1 (1000, 999, 998, …). The minimum is clamped at 1. Items explicitly demoted are sent with priority 0.
- Demotion: by default, all posts from the last 48 hours that are not in the current prioritisation are sent with priority 0. Configure via `--demote-window-hours`.
- Export filenames use a human‑readable UTC timestamp: `push_{handle}_{YYYY-MM-DDTHH-mm-ssZ}.json`.
- Server responses: short responses print to stdout; long responses are saved to `push_exports/prioritize_response_{handle}_{YYYY-MM-DDTHH-mm-ssZ}.{json|txt}`.

Example CLI:

```
python -m blueskyranker.pipeline \
+  --handles news-flows-nl.bsky.social news-flows-fr.bsky.social \
+  --method networkclustering-tfidf \
+  --similarity-threshold 0.2 \
+  --cluster-window-days 7 \
+  --engagement-window-days 1 \
+  --push-window-days 2 \
+  --demote-last \
+  --demote-window-hours 48 \
+  --log-path push.log \
+  --no-test
```

Programmatic call:

```python
from blueskyranker.pipeline import run_fetch_rank_push
run_fetch_rank_push(
    handles=['news-flows-nl.bsky.social'],
    method='networkclustering-tfidf', similarity_threshold=0.2,
    cluster_window_days=7, engagement_window_days=1, push_window_days=2,
    demote_last=True, demote_window_hours=48,
    include_pins=False, test=True, log_path='push.log')
```


### Ordering logic (time windows)

- Clustering window: clusters are built from posts in this window (e.g., 7 days).
- Engagement window: cluster engagement is computed here to derive `cluster_engagement_rank` (1 = most engaged).
- Push window: only posts in this window are eligible for the final feed.

Order of posts:

1) Filter to the push window.

2) Order clusters by engagement rank (most engaged first).

3) Within each cluster, sort by recency (newest first).

4) Interleave round‑robin across clusters in rank order (1, 2, 3, … then repeat).

Result: the first post is the most‑recent item from the most‑engaged cluster that has posts in the push window.
