Srcfull

srcfull is a package-first toolkit for extracting and upgrading web image URLs.

It is designed as a standalone library and CLI for image extraction and source resolution. The focus is:

extract image candidates from HTML
filter obvious junk like logos and icons
resolve CDN/transformed URLs back to larger originals
probe likely source variants when no curated pattern exists
optionally plug in HTML fetchers like ScrapingBee and fallback image providers like Firecrawl

It handles the page-shape problems that usually make this kind of package annoying in practice:

relative image paths resolved against the page URL
lazy-loaded image attributes like data-src, data-srcset, and data-original
img srcset, picture source, inline background images, and social/meta image tags
private-host blocking for both page scraping and image validation
HEAD fallback to ranged GET for hosts that refuse metadata requests
persistent file-backed cache/pattern stores for repeat runs

Install

pnpm add @howells/srcfull

Library Usage

import { scrapePage, resolveImageUrl } from "@howells/srcfull";

const resolved = await resolveImageUrl(
  "https://cdn.example.com/image.jpg?w=400&q=80"
);

const page = await scrapePage("https://example.com/product-page");

scrapePage() normalizes relative candidates against the page URL before validation and resolution, so typical product/article HTML works without extra preprocessing.

If you need rendered HTML instead of plain fetch, inject a custom fetcher:

import { scrapePage } from "@howells/srcfull";
import { createScrapingBeeHtmlFetcher } from "@howells/srcfull/providers/scrapingbee";

const fetchHtml = createScrapingBeeHtmlFetcher({
  apiKey: process.env.SCRAPINGBEE_API_KEY!,
});

const result = await scrapePage("https://example.com", { fetchHtml });

If you want the built-in fetcher with different timeout or header behavior:

import { createDefaultHtmlFetcher, scrapePage } from "@howells/srcfull";

const fetchHtml = createDefaultHtmlFetcher({
  timeoutMs: 15_000,
  headers: {
    "Accept-Language": "en-GB,en;q=0.9",
  },
});

const result = await scrapePage("https://example.com", { fetchHtml });

For image-only fallback:

import { createFirecrawlImageFallback } from "@howells/srcfull/providers/firecrawl";

If you want candidate extraction without the rest of the pipeline:

import { extractImageCandidatesFromHtml } from "@howells/srcfull";

const candidates = extractImageCandidatesFromHtml(
  html,
  "https://example.com/product-page"
);

For repeat jobs, persist cache and learned patterns on disk:

import {
  createFileCache,
  createFilePatternStore,
  resolveImageUrl,
} from "@howells/srcfull";

const cache = createFileCache({ filePath: ".srcfull/cache.json" });
const patternStore = createFilePatternStore({
  filePath: ".srcfull/patterns.json",
});

const result = await resolveImageUrl("https://cdn.example.com/photo.jpg?w=400", {
  cache,
  patternStore,
});

CLI

srcfull resolve 'https://cdn.example.com/photo.jpg?w=300'
srcfull scrape 'https://example.com/listing' --max-images=12
srcfull scrape 'https://example.com/listing' --max-images=12 --min-size=300 --resolve-concurrency=8
srcfull --version

The JSON response from scrape includes stats.returned as well as found, resolved, failed, and durationMs.

Demo Page

There is a self-contained demo page generated at demo/index.html.

pnpm demo:build
pnpm demo:serve

The page is generated from real calls to the package, so the HTML samples, extracted candidates, resolved URLs, and persisted cache/pattern snapshots are actual outputs rather than hand-written mockups.

Development

pnpm test
pnpm test:live-patterns
pnpm typecheck
pnpm build

pnpm test:live-patterns revalidates the researched real-world CDN fixtures in test/fixtures/curated-patterns.json against the network.

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
.husky		.husky
data		data
scripts		scripts
src		src
test		test
.biomeignore		.biomeignore
.gitignore		.gitignore
.npmrc		.npmrc
README.md		README.md
biome.json		biome.json
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Srcfull

Install

Library Usage

CLI

Demo Page

Development

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Srcfull

Install

Library Usage

CLI

Demo Page

Development

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages