Estimate elapsed narrative ("fictive") time in passages of English-language fiction using large language models. This library factors Ted Underwood's public GPT-4 experiment into reusable, testable components (preprocessing, prompting, LLM gateway, parsing, evaluation) plus a convenient CLI.
- Chunking & normalization of raw long fiction text (
Preprocessor) - Prompt construction with (optionally shuffled) four Underwood exemplars (
UnderwoodPrompter) or a minimal prompt (Prompter) - Pluggable LLM gateway abstraction (
LLMGateway) with OpenAI implementation + retry logic (OpenAIClient) and an offline mock (DummyGateway) - Robust parsing that extracts summary, reasoning, unit conversion, final minutes, and confidence (
ResponseParser) - High-level synchronous & async estimation (
NarrativeTimeEstimator) - Structured result objects (
TimeEstimate,TimeEstimateBatch) with DataFrame export - Correlation utilities for human vs model comparison (
CorrelationEvaluator) - CLI command
fic-timefor quick experimentation; JSONL output option for pipelines
Minimal (without OpenAI client β only dummy gateway):
pip install fic-timeWith OpenAI support:
pip install "fic-time[openai]"Optional scientific stack (already included in base dependencies: pandas, scipy). Ensure Python β₯ 3.9.
Set your key (required only when using OpenAIClient):
export OPENAI_API_KEY="sk-..."| Component | Purpose | Typical Use |
|---|---|---|
Preprocessor |
Normalize & split long text into size-limited passages | Custom max chars / external sentence splitter integration |
Prompter / UnderwoodPrompter |
Build chat message lists | Few-shot replication / reproducibility (shuffle control) |
LLMGateway |
Abstract async chat() interface | Implement new providers (Anthropic, Azure, etc.) |
OpenAIClient |
OpenAI chat with retry (tenacity) | Production usage |
DummyGateway |
Offline deterministic response | Tests / examples without API calls |
ResponseParser |
Parse raw model output to minutes | Custom formatting experiments |
NarrativeTimeEstimator |
High-level batch estimation (sync & async) | Main entry point |
TimeEstimateBatch |
Container + DataFrame helper | Downstream analysis |
CorrelationEvaluator |
Pearson r (raw/log) | Validation against human annotations |
All symbols are re-exported from fic_time.__init__.
from fic_time import NarrativeTimeEstimator, UnderwoodPrompter, DummyGateway
text = """It was raining again the next morning, a fine curtain of mist..."""
# Offline deterministic demo (no API key needed)
estimator = NarrativeTimeEstimator(gateway=DummyGateway(), prompter=UnderwoodPrompter(shuffle=False))
batch = estimator.estimate(text)
for item in batch.items:
print(item.index, item.total_minutes, item.confidence)from fic_time import NarrativeTimeEstimator, UnderwoodPrompter, OpenAIClient
estimator = NarrativeTimeEstimator(
gateway=OpenAIClient(), # requires OPENAI_API_KEY
prompter=UnderwoodPrompter(shuffle=True), # shuffle exemplars (default)
max_chars=1200,
)
batch = estimator.estimate(open("novel_segment.txt", "r", encoding="utf-8").read())
print(batch.to_dataframe().head())Basic estimation:
fic-time estimate passage.txt --provider openai --model gpt-4o-miniOffline (dummy):
fic-time estimate passage.txt --provider dummyDeterministic (disable exemplar shuffle):
fic-time estimate passage.txt --provider openai --no-shuffleJSON Lines output (for piping / later aggregation):
fic-time estimate passage.txt --provider openai --jsonl > out.jsonlKey options:
--model: OpenAI model name (default: gpt-4o-mini)--max-chars: Passage split threshold--no-shuffle: Keep exemplar order fixed (reproducibility)
Exit code 0 on success; non-estimate subcommand prints help (reserved for future extensions).
For large corpora you may prefer streaming results without holding all passages:
import asyncio
from fic_time import NarrativeTimeEstimator, OpenAIClient, UnderwoodPrompter
async def run(text: str):
est = NarrativeTimeEstimator(gateway=OpenAIClient(), prompter=UnderwoodPrompter())
async for item in est.estimate_async_iter(text):
print(item.index, item.total_minutes)
asyncio.run(run(open("long.txt", "r", encoding="utf-8").read()))UnderwoodPrompter(shuffle=True)(default): builds system + 4 few-shot exemplars + target passage (order randomized each call)UnderwoodPrompter(shuffle=False): stable order β reproducible experimentsPrompter(): minimal single-pass prompt (no exemplars) β faster, potentially weaker performance
Switching is as simple as providing a different prompter to the estimator.
If you want to inspect / debug model formatting:
from fic_time import ResponseParser
raw = "1: A quick summary 2: Reasoning... 3: 1.5 hours => 90 minutes 4: 90 minutes 5: High confidence"
parsed = ResponseParser().parse(raw)
print(parsed.minutes, parsed.confidence)The parser tolerates ranges (60-90 minutes), decimal hours (1.5 hours), and long units (2 weeks).
batch = estimator.estimate(text)
minutes_list = batch.to_minutes_series() # [float | None, ...]
df = batch.to_dataframe() # requires pandas
df.to_csv("estimates.csv", index=False)Each TimeEstimate contains:
indexβ passage orderraw_textβ the passage stringmodel_judgmentβ full multi-step LLM answertotal_minutesβ parsed numeric value (may be None if parsing failed)confidenceβ free-text (e.g., High / Moderate / Low)
Assume a TSV file: segment_id<TAB>minutes matching passage ordering.
import csv
from fic_time import CorrelationEvaluator
human = []
with open("human.tsv", "r", encoding="utf-8") as f:
for row in csv.reader(f, delimiter='\t'):
human.append(float(row[1]))
model_minutes = [m for m in minutes_list if m is not None]
r_log, p_log = CorrelationEvaluator().pearson_log(human, model_minutes)
r_raw, p_raw = CorrelationEvaluator().pearson_raw(human, model_minutes)
print("log r=", r_log, "p=", p_log)The log transform uses log(x + 0.1) mirroring the original notebook to avoid log(0).
Implement another provider by subclassing LLMGateway:
from fic_time import LLMGateway
from typing import Sequence, Dict
class MyGateway(LLMGateway):
async def chat(self, messages: Sequence[Dict[str, str]], model: str) -> str:
# Call your provider here (pseudo-code)
resp = await my_client.generate_chat(messages=messages, model=model)
return resp.content
# Usage
from fic_time import NarrativeTimeEstimator, Prompter
estimator = NarrativeTimeEstimator(gateway=MyGateway(), prompter=Prompter())Contract: chat(messages, model) -> str returns a single textual reply.
- Pin package & OpenAI model versions in
requirements.txt - Use
UnderwoodPrompter(shuffle=False)or set--no-shufflein the CLI - (Optional) Seed Python's random:
import random; random.seed(42)before constructing the prompter - Keep raw model outputs (
--jsonl) if auditing later
- OpenAI calls retry up to 3 times with exponential backoff (
tenacity) - If parsing fails to find a numeric expression,
total_minutesbecomesNone; handle downstream accordingly - For batch operations, one malformed passage does not abort the rest
from fic_time import NarrativeTimeEstimator, OpenAIClient, UnderwoodPrompter
texts = [open(p).read() for p in corpus_paths] # Or pre-split yourself
est = NarrativeTimeEstimator(gateway=OpenAIClient(), prompter=UnderwoodPrompter())
all_minutes = []
for t in texts:
for item in est.estimate(t).items:
all_minutes.append(item.total_minutes)Consider adding rate limiting / sleep if provider quotas apply.
from fic_time import NarrativeTimeEstimator, DummyGateway, Prompter
def test_basic():
est = NarrativeTimeEstimator(gateway=DummyGateway(), prompter=Prompter())
batch = est.estimate("Short passage.")
assert len(batch.items) == 1Why minutes as the canonical unit? Enables direct aggregation and comparison; parser converts larger units.
Does the parser trust model arithmetic? It re-extracts the first numeric + unit pattern and performs its own conversion.
Why not tokenize instead of max char split? Simplicity & model-agnostic; you can pre-chunk with your own logic before passing text.
Can I keep the five-step format but change wording? Provide a custom PromptTemplate and your own Prompter subclass.
MIT License. See LICENSE.
- 0.1.0 β Initial packaged release (structure extracted from experiment notebooks)
Issues / PRs welcome: add new gateways, better parsing heuristics, additional evaluation metrics.
If this toolkit supports published research, cite Ted Underwood's original experiment/blog ( https://github.com/tedunderwood/fictional-time-with-GPT4/tree/main / https://tedunderwood.com/2023/03/19/using-gpt-4-to-measure-the-passage-of-time-in-fiction/ ).
Happy analyzing!