# SEO visibility and content gap

Given an AI visibility gap in a certain topic:

- checks if this is due to missing content, or content not being indexed/positioned well by Google
    - extracts (from datocat) or derives (with LLM) N google keywords relacionados con el cluster de prompts
    - compares visibility in Google with competitors
    - TODO: checks for additional (non-ranked) content using _branded_ keywords (e.g. ayudas coches eléctricos Peugeot)
- checks own/competitor's web content that _is_ positioned
    - deeply analyses competitor web content:
        - presence of structured data, listicles, FAQs etc.
        - entities mentioned
        - internal and external linking
    - TODO: compares with own brand content
    - TODO: recommends content to create/modify

ChatGPT context: https://chatgpt.com/share/6925a9d4-f81c-8005-8362-9ca5e8d57c4d

In [1]:
import { load } from "@std/dotenv";

import * as pl from "npm:nodejs-polars";
import * as Plot from "npm:@observablehq/plot";
import { Defuddle } from "npm:defuddle/node";
import TurndownService from "npm:turndown";
import { document } from "jsr:@manzt/jupyter-helper";

import { z } from '@zod/zod';

import { askOpenAISafe } from "shared/openai.ts";

import * as gpt from "../../src/apis/brightdata.ts?v=1";
import { type ModelResult } from '../../src/schemas/models.schema.ts';

import * as utils from "../../src/utils.ts?v=1";
import * as brands from "../../src/brands.ts?v=1";
import * as scrape from "../../src/apis/hasdata/scrape.ts?v=1";
import * as serp from "../../src/apis/hasdata/serp.ts?v=1";
import * as gap from "../../src/analysis/gap.ts?v=1";

void await load({
  envPath: "../../.env",
  export: true,
});

const CACHE = "./cache.json"

In [2]:
const { md, html, display } = Deno.jupyter;

In [3]:
// await utils.clearCache(CACHE);

In [4]:
let CONFIG = {
    brandDomain: "freepik.com",
    sector: "ai photo enhancement tools",
    country: "us",
    language: "en",
}

# Define topic queries

Define or generate (with LLM) a number of search queries for the topic (or prompt cluster) we're investigating.

In [5]:
let promptsDf = pl
    .readCSV("./prompts_ai_photo_enhancement_brands_json.csv")
    .drop("ROWID");

await display(promptsDf.slice(0, 5).drop("all_entities_json"));

let kwdPrompts = promptsDf.toRecords();
let kwds = kwdPrompts.map(k => k.keyword) as Array<string>;
let prompts = kwdPrompts.map(k => k.prompt) as Array<string>;

keyword,prompt,avgMonthlySearches,searchVolume,relevanceScore,intent,isBranded,purchaseProbability,models_with_brands,all_ranked_brands_json
photo enhancer,Can you help me enhance a photo?,1000000,"[823000, 1000000, 1000000, 1000000, 1000000, 1000000, 1000000, 1220000, 1220000, 1220000, 1220000, 1220000]",0.85,informational,non-branded,40,2,"[""Canva""]"
ai photo enhancer,Can you recommend an AI tool to enhance photos?,246000,"[246000, 246000, 246000, 246000, 301000, 246000, 246000, 246000, 301000, 246000, 301000, 246000]",0.85,informational,non-branded,65,2,"[""Canva""]"
image enhancer ai,Can you recommend an AI tool for enhancing images?,60500,"[49500, 60500, 60500, 60500, 74000, 74000, 60500, 60500, 74000, 74000, 74000, 74000]",0.9,informational,non-branded,40,2,"[""Canva""]"
ai picture enhancer,Can you help me enhance a picture using AI?,14800,"[14800, 14800, 18100, 14800, 18100, 14800, 14800, 14800, 14800, 14800, 14800, 14800]",0.85,informational,non-branded,40,2,"[""Adobe Firefly"",""Canva""]"
enhance photo ai,How can I enhance a photo using AI?,9900,"[8100, 8100, 9900, 9900, 9900, 9900, 9900, 9900, 9900, 9900, 9900, 9900]",0.85,informational,non-branded,40,2,"[""Runway"",""Canva""]"


In [6]:
promptsDf.shape

{ height: [33m31[39m, width: [33m11[39m }

In [7]:
for (let p of prompts) {
    console.log(p);
}

Can you help me enhance a photo?
Can you recommend an AI tool to enhance photos?
Can you recommend an AI tool for enhancing images?
Can you help me enhance a picture using AI?
How can I enhance a photo using AI?
Can you recommend an AI tool for enhancing quality?
Can you help me enhance a photo using AI?
Can you help me enhance the quality of an AI-generated image?
Can you help me sharpen an image using AI?
Can you help me enhance a photo using AI?
What is the best AI photo enhancer?
Can you recommend a good AI tool to enhance videos?
Can you help me with AI tools for product photography?
Can you tell me about Adobe's image enhancer?
Can you help me enhance photos using AI?
What is the best AI image enhancer?
Can you help me enhance the resolution of a photo using AI?
Can you help me sharpen a photo using AI?
What are the best image enhancement software options?
How can I use AI to enhance a photo?
Can you recommend some AI image enhancer software?
What is the best AI tool for enhancin

# Brand and competitors

We will compare the "organic" visibility GAP across competitors.

In [8]:
let regenBrands = false;

let knownBrands = ["radiant photo", "fotor", "picsart", "luminar neo", "adobe", "remini", "let's enhance", "topaz photo ai", "pixlr", "ON1"];
let briefing = `Include at least these brands: ${JSON.stringify(knownBrands)}.`;

let brs = await utils.fromCache(CACHE, 'brands') as brands.FlaggedBrand[] | null;
if (!brs || regenBrands) {
    console.log('Generating brand data...');
    const brand = await brands.generateBrandInfo({
        brandDomain: CONFIG.brandDomain,
        language: CONFIG.language,
        sector: CONFIG.sector,
        market: CONFIG.country,
    });
    const competitors = await brands.generateCompetitorsInfo({
        brandDomain: CONFIG.brandDomain,
        language: CONFIG.language,
        sector: CONFIG.sector,
        market: CONFIG.country,
        briefing: briefing
    });
    brs = brands.concatBrands([brand], competitors);
    await utils.toCache(CACHE, brs, 'overwrite', 'brands');
} else {
    console.log('Loaded brand data from cache');
}

Loaded brand data from cache


In [9]:
brs!.map(b => [b.shortName, b.domain]);

[
  [ [32m"Freepik"[39m, [32m"freepik.com"[39m ],
  [ [32m"PicsArt"[39m, [32m"picsart.com"[39m ],
  [ [32m"Fotor"[39m, [32m"fotor.com"[39m ],
  [ [32m"Adobe"[39m, [32m"adobe.com"[39m ],
  [ [32m"Remini"[39m, [32m"remini.app"[39m ],
  [ [32m"Let’s Enhance"[39m, [32m"letsenhance.io"[39m ],
  [ [32m"Topaz Photo AI"[39m, [32m"topazlabs.com"[39m ],
  [ [32m"Pixlr"[39m, [32m"pixlr.com"[39m ],
  [ [32m"Radiant Photo"[39m, [32m"radiantimaginglabs.com"[39m ],
  [ [32m"ON1 Photo RAW"[39m, [32m"on1.com"[39m ],
  [ [32m"DxO PhotoLab"[39m, [32m"dxo.com"[39m ]
]

# SERPs

Given the search queries, get the SERPs and optionally expand with the PeopleAlsoAsk and RelatedQuestion components of the results.

In [10]:
let regenSerps = false;

let serps = await utils.fromCache(CACHE, 'serps') as Array<serp.SerpResponse> | null;
if (!serps || regenSerps) {
    console.log('Fetching Serps...');
    serps = await serp.fetchSerpBatch(kwds!, {
        country: CONFIG.country,
        location: CONFIG.country,
        language: CONFIG.language
    });
    await utils.toCache(CACHE, serps, 'overwrite', 'serps');
}
else {
    console.log('Loaded Serps from cache');
}

Loaded Serps from cache


In [11]:
import * as gap from "../../src/analysis/gap.ts?v=4401";

let orgResults = gap.extractOrganicResults(serps!);
let aioResults = gap.extractAIOResults(serps!);
console.log(`Extracted ${orgResults.length} organic results and ${aioResults.length} AIO results.`);

Extracted 31 organic results and 31 AIO results.


In [12]:
// TODO: Expand SERPs with relatedQuestions and PAA

# Scrape ChatGPT

TO DO: Repeat N times for better confidence

In [13]:
let regenGPT = false;

let gptResults = await utils.fromCache(CACHE, 'gptResults') as Array<ModelResult> | null;
if (!gptResults || regenGPT) {
    console.log('Fetching ChatGPT responses...');
    gptResults = await gpt.scrapeGPTBatch({
        prompts: prompts,
        countryISOCode: CONFIG.country.toUpperCase(),
        useSearch: true,
    });
    await utils.toCache(CACHE, gptResults, 'overwrite', 'gptResults');
} else {
    console.log('Loaded ChatGPT responses from cache');
}

Loaded ChatGPT responses from cache


# Engine Visibility

Analyze visibility of brand and competitors in SERP organic results, AI Overview and ChatGPT

In [14]:
import * as gap from "../../src/analysis/gap.ts?v=4326";

let annOrgResults = gap.annotateBrandVisibility(orgResults, brs);
let annAioResults = gap.annotateBrandVisibility(aioResults, brs);
let annGptResults = gap.annotateBrandVisibility(gptResults, brs);

let orgVis = gap.aggregateBrandVisibility(annOrgResults, brs);
let aioVis = gap.aggregateBrandVisibility(annAioResults, brs);
let gptVis = gap.aggregateBrandVisibility(annGptResults, brs);

In [15]:
// NUumber of SERPs with AI overview?
annAioResults.filter(p => p.answer != '').length

[33m13[39m

In [16]:
function prefixColumnNames(df: pl.DataFrame, prefix: string): pl.DataFrame {
    const inCols = df.columns.filter(col => col != 'name');
    const map = Object.fromEntries(inCols.map(name => [name, prefix + name.charAt(0).toUpperCase() + name.slice(1)]));
    return df.rename(map);
}

In [17]:
let orgVisDf = pl.DataFrame(orgVis).select(['name', 'citations', 'uniqueCitations']);
let aioVisDf = pl.DataFrame(aioVis);
let gptVisDf = pl.DataFrame(gptVis);

orgVisDf = prefixColumnNames(orgVisDf, 'org');
aioVisDf = prefixColumnNames(aioVisDf, 'aio');
gptVisDf = prefixColumnNames(gptVisDf, 'gpt');

let allVisDf = orgVisDf
    .join(aioVisDf, { on: 'name', how: 'inner' })
    .join(gptVisDf, { on: 'name', how: 'inner' });

let allVis = allVisDf.toRecords();
await display(allVisDf.sort('orgCitations', true));

name,orgCitations,orgUniqueCitations,aioAnswer,aioCitations,aioUniqueCitations,aioReferences,aioUniqueReferences,gptAnswer,gptCitations,gptUniqueCitations,gptReferences,gptUniqueReferences
Adobe,24,7,11,12,5,12,5,10,10,10,20,19
PicsArt,20,2,4,6,2,6,2,1,0,0,0,0
Topaz Photo AI,9,4,8,1,1,1,1,8,2,2,5,4
Fotor,1,1,10,4,2,4,2,8,4,3,6,5
Let’s Enhance,1,1,0,0,0,0,0,6,1,1,3,2
Radiant Photo,0,0,0,0,0,0,0,12,0,0,0,0
Remini,0,0,9,0,0,0,0,5,0,0,0,0
ON1 Photo RAW,0,0,1,1,1,1,1,1,0,0,0,0
Freepik,0,0,0,0,0,0,0,0,0,0,0,0
Pixlr,0,0,2,1,1,1,1,0,0,0,0,0


Note that there are 31 keywords/prompts in total, and 13 keywords with an AI Overview!!

In [18]:
let numericCols = ["orgCitations", "aioCitations", "aioAnswer", "gptCitations", "gptAnswer"];
await display(allVisDf.select(['name'].concat(numericCols)));

name,orgCitations,aioCitations,aioAnswer,gptCitations,gptAnswer
Radiant Photo,0,0,0,0,12
Adobe,24,12,11,10,10
Fotor,1,4,10,4,8
Topaz Photo AI,9,1,8,2,8
Let’s Enhance,1,0,0,1,6
Remini,0,0,9,0,5
PicsArt,20,6,4,0,1
ON1 Photo RAW,0,1,1,0,1
Freepik,0,0,0,0,0
Pixlr,0,1,2,0,0


In [19]:
// Calculate pairwise Spearman correlation matrix for all numeric columns (lower triangle, no diagonal)
let corrMatrix: Array<{ col1: string; col2: string; correlation: number }> = [];
for (let i = 0; i < numericCols.length; i++) {
    for (let j = 0; j < numericCols.length; j++) {
        let corr = allVisDf
            .select(pl.spearmanRankCorr(numericCols[i], numericCols[j])) // Internally handles ranking
            .row(0)[0] as number;
        if (true) {//(!Number.isNaN(corr)) {
            corrMatrix.push({
                col1: numericCols[j],
                col2: numericCols[i],
                correlation: corr
            });
        }
    }
}

let corrDf = pl.readRecords(corrMatrix).pivot({ on: "col2", index: "col1", values: "correlation" });
// await display(corrDf);

// Visualize as heatmap (lower triangle, no diagonal, skip NaN)
Plot.plot({
    document,
    marks: [
        Plot.cell(corrMatrix, {
            x: "col1",
            y: "col2",
            fill: "correlation",
            tip: true,
        }),
        Plot.text(corrMatrix, {
            x: "col1",
            y: "col2",
            text: d => d.correlation ? d.correlation.toFixed(2) : "",
            fill: d => Math.abs(d.correlation) > 0.5 ? "white" : "black",
            fontSize: 10,
        })
    ],
    color: {
        scheme: "RdBu",
        domain: [-1, 1],
    },
    x: { tickRotate: -45, label: null, domain: numericCols },
    y: { label: null, domain: numericCols },
    title: "Spearman Rank Correlation Matrix",
    style: { backgroundColor: "white" },
    marginBottom: 80,
    marginLeft: 100,
});

In [20]:
function scatter(data: any[], xvar: string, yvar: string) {
    return Plot.plot({
        document,
        marks: [
            Plot.dot(data, {
                x: xvar,
                y: yvar,
                tip: true,
                fill: "currentColor",
            }),
            Plot.text(data, {
                x: xvar,
                y: yvar,
                text: "name",
                dy: -10,
                fontSize: 10,
            })
        ],
        x: { type: "band" },
        y: { grid: true },
        title: `Content Gap Analysis: ${yvar} vs ${xvar}`,
        style: { backgroundColor: "white" },
    });
}

display(await scatter(allVis, "orgCitations", "aioAnswer"));
display(await scatter(allVis, "orgCitations", "gptAnswer"));

Promise { [90mundefined[39m }

# Off-page visibility

In [21]:
import * as gap from "../../src/analysis/gap.ts?v=613";

let opUrls = gap.extractCitedUrls([orgResults, aioResults, gptResults], true, brs, ['org', 'aio', 'gpt']);

In [22]:
console.log(Object.keys(opUrls).length);

511


In [23]:
let opUrlRecords = Object.entries(opUrls).map(([k, v]) => ({
    url: k,
    total: v.total,
    ...v.engines
}));

let opUrlDf = pl.readRecords(opUrlRecords).sort('total', true)
opUrlDf

url,total,org,aio,gpt
https://www.canva.com/features/image-enhancer/,23,20,3,0
https://www.photogrid.app/en/image-enhancer/,18,17,1,0
https://www.youtube.com/watch,18,4,14,0
https://airbrush.com/image-enhancer,17,14,3,0
https://imgupscaler.ai/,16,15,1,0
https://en.wikipedia.org/wiki/luminar_neo,16,0,0,16
https://medium.com/freelancers-hub/i-tried-7-ai-image-enhancers-heres-my-review-about-what-s-the-best-d4dbba08b6dd,15,12,1,2
https://techcommunity.microsoft.com/discussions/windows10space/what-is-the-best-ai-photo-enhancer-for-improving-image-quality/4466428,14,1,5,8
https://www.cutout.pro/photo-enhancer-sharpener-upscaler,13,13,0,0
https://www.upscale.media/,13,10,2,1


## Off-page domains

How often URls from each domain were cited and the unique URLs

In [24]:
import * as gap from "../../src/analysis/gap.ts?v=605";

let opDomains = gap.aggregateCitedDomains(opUrls);

In [25]:
let opDomainsArray = Object.entries(opDomains)
    .map(([domain, stats]) => ({
        domain: domain,
        total: stats.total,
        ...stats.engines,
        unique: stats.urls.size,
        urls: Array.from(stats['urls']).sort()

    }));

let opDomainsDf = pl.readRecords(opDomainsArray);
await display(opDomainsDf.sort('total', true).head(20).drop('urls'));
await display(opDomains['wikipedia.org']);

domain,total,org,aio,gpt,unique
wikipedia.org,37,0,0,37,6
perfectcorp.com,34,7,9,18,13
medium.com,30,12,2,16,10
upscale.media,29,11,2,16,14
canva.com,28,24,4,0,5
photogrid.app,19,18,1,0,2
reddit.com,19,17,1,1,10
airbrush.com,18,14,3,1,2
imgupscaler.ai,18,17,1,0,2
youtube.com,18,4,14,0,1


{
  total: [33m37[39m,
  engines: { org: [33m0[39m, aio: [33m0[39m, gpt: [33m37[39m },
  urls: Set(6) {
    [32m"https://en.wikipedia.org/wiki/luminar_neo"[39m,
    [32m"https://en.wikipedia.org/wiki/radiant_photo"[39m,
    [32m"https://en.wikipedia.org/wiki/snapseed"[39m,
    [32m"https://en.wikipedia.org/wiki/adobe_firefly"[39m,
    [32m"https://en.wikipedia.org/wiki/gimp"[39m,
    [32m"https://en.wikipedia.org/wiki/retouch4me"[39m
  }
}

In [26]:
opDomainsDf.select('unique').sum();

unique
511


## Scrape off-page content

In [27]:
import * as scrape from "../../src/apis/hasdata/scrape.ts?v=8";

let regenSourceContent = false;

type SourceContentType = Record<string, scrape.ScrapeResponse>;

let sourceContent = await utils.fromCache(CACHE, 'sourceContent') as SourceContentType | null;
if (!sourceContent || regenSourceContent) {
    console.log(`Scraping source URL content...`);
    let urls = Object.keys(opUrls);
    let content = await scrape.runBatchScrape(
        urls,
        {
            formats: ['text', 'markdown', 'html'],
            jsRendering: true,
        }
    );
    sourceContent = Object.fromEntries(
        content.map((resp) => [resp.url, resp])
    );

    await utils.toCache(CACHE, sourceContent, 'overwrite', 'sourceContent');
} else {
    console.log('Loaded source content from cache');
}

Object.entries(sourceContent).length

Loaded source content from cache


[33m517[39m

## Visibility in off-page content by engine


In [28]:
import * as gap from "../../src/analysis/gap.ts?v=1002";

let annSourceContent = gap.annotateBrandVisibilityInScrapedPages(sourceContent, brs);

In [29]:
let opResults = { aio: {}, gpt: {} };

for (let engine of ['aio', 'gpt']) {
    let engineAnnResponse = gap.filterByEngine(annSourceContent, opUrls, engine);
    let engineVis = gap.aggregateBrandVisibility(Object.values(engineAnnResponse), brs);
    let engineVisDf = pl.DataFrame(engineVis).select(['name', 'answer', 'citations']);
    engineVisDf = prefixColumnNames(engineVisDf, engine + 'Op');
    opResults[engine]['annResponses'] = engineAnnResponse;
    opResults[engine]['vis'] = engineVis;
    opResults[engine]['visDf'] = engineVisDf;
}

let opVisDf = opResults.aio.visDf.join(opResults.gpt.visDf, { on: 'name', how: 'inner' });
void 0;

In [30]:
Object.entries(opResults['gpt'].annResponses)
    .filter(([url, result]) => result.visibilities["Adobe"].inContent == true)
    .map(([url, result]) => url)
    .sort();

[
  [32m"https://ai-productreviews.com/topazlabs-review/"[39m,
  [32m"https://aitubo.ai/blog/post/best-ai-photo-enhancers/"[39m,
  [32m"https://akvis.com/press-kit/akvis-enhancer-1.1-en.html"[39m,
  [32m"https://akvis.com/press-kit/akvis-enhancer-11.0-en.php"[39m,
  [32m"https://beebom.com/best-ai-photo-enhancers/"[39m,
  [32m"https://blog.bestai.com/best-ai-photo-editing-tools-for-content-creators-in-june-2025/"[39m,
  [32m"https://blog.erazor.app/blog/top-ai-photo-editing-tools-2025"[39m,
  [32m"https://blog.pikes.ai/top-10-ai-tools-for-product-photography-in-2025/"[39m,
  [32m"https://blog.prodia.com/post/10-free-ai-image-upscalers-for-quick-enhancements"[39m,
  [32m"https://blog.prosper7.com/best-ai-photo-editing-software-in-2025/"[39m,
  [32m"https://borisfx.com/blog/12-best-ai-plugins-for-photoshop-in-2025/"[39m,
  [32m"https://bostoninstituteofanalytics.org/blog/top-10-ai-image-enhancer-tools-for-2024-revealed/"[39m,
  [32m"https://creati.ai/ai-tools/ai-

In [31]:
let holVisDf = allVisDf.join(opVisDf, { on: 'name', how: 'inner' });

let showCols = [
    "orgCitations",
    "aioCitations",
    "aioAnswer",
    "aioOpAnswer",
    "gptCitations",
    "gptAnswer",
    "gptOpAnswer",
]

await display(holVisDf.sort("aioAnswer", true).select(["name", ...showCols]));

name,orgCitations,aioCitations,aioAnswer,aioOpAnswer,gptCitations,gptAnswer,gptOpAnswer
Adobe,24,12,11,36,10,10,166
Fotor,1,4,10,26,4,8,91
Remini,0,0,9,15,0,5,74
Topaz Photo AI,9,1,8,12,2,8,37
PicsArt,20,6,4,8,0,1,32
Pixlr,0,1,2,11,0,0,39
ON1 Photo RAW,0,1,1,3,0,1,16
Let’s Enhance,1,0,0,2,1,6,59
DxO PhotoLab,0,0,0,6,0,0,13
Freepik,0,0,0,0,0,0,12


In [32]:
// Calculate pairwise Spearman correlation matrix for all numeric columns (lower triangle, no diagonal)
numericCols = showCols;

let corrMatrix: Array<{ col1: string; col2: string; correlation: number }> = [];
for (let i = 0; i < numericCols.length; i++) {
    for (let j = 0; j < numericCols.length; j++) {
        let corr = holVisDf
            .select(pl.spearmanRankCorr(numericCols[i], numericCols[j])) // Internally handles ranking
            .row(0)[0] as number;
        if (true) {//(!Number.isNaN(corr)) {
            corrMatrix.push({
                col1: numericCols[j],
                col2: numericCols[i],
                correlation: corr
            });
        }
    }
}

let corrDf = pl.readRecords(corrMatrix).pivot({ on: "col2", index: "col1", values: "correlation" });
// await display(corrDf);

// Visualize as heatmap (lower triangle, no diagonal, skip NaN)
Plot.plot({
    document,
    marks: [
        Plot.cell(corrMatrix, {
            x: "col1",
            y: "col2",
            fill: "correlation",
            tip: true,
        }),
        Plot.text(corrMatrix, {
            x: "col1",
            y: "col2",
            text: d => d.correlation ? d.correlation.toFixed(2) : "",
            fill: d => Math.abs(d.correlation) > 0.5 ? "white" : "black",
            fontSize: 10,
        })
    ],
    color: {
        scheme: "RdBu",
        domain: [-1, 1],
    },
    x: { tickRotate: -45, label: null, domain: numericCols },
    y: { label: null, domain: numericCols },
    title: "Spearman Rank Correlation Matrix",
    style: { backgroundColor: "white" },
    marginBottom: 130,
    marginLeft: 180,
    width: 800,
    height: 600
});

# Branded visibility

Do I rank at least, or can we find any relevant content when we include our brand in the topic queries?

In [33]:
// TODO

# Brand content

In [34]:
await display(holVisDf.sort("aioAnswer", true));

name,orgCitations,orgUniqueCitations,aioAnswer,aioCitations,aioUniqueCitations,aioReferences,aioUniqueReferences,gptAnswer,gptCitations,gptUniqueCitations,gptReferences,gptUniqueReferences,aioOpAnswer,aioOpCitations,gptOpAnswer,gptOpCitations
Adobe,24,7,11,12,5,12,5,10,10,10,20,19,36,22,166,85
Fotor,1,1,10,4,2,4,2,8,4,3,6,5,26,15,91,27
Remini,0,0,9,0,0,0,0,5,0,0,0,0,15,0,74,0
Topaz Photo AI,9,4,8,1,1,1,1,8,2,2,5,4,12,7,37,69
PicsArt,20,2,4,6,2,6,2,1,0,0,0,0,8,2,32,12
Pixlr,0,0,2,1,1,1,1,0,0,0,0,0,11,3,39,7
ON1 Photo RAW,0,0,1,1,1,1,1,1,0,0,0,0,3,1,16,12
Let’s Enhance,1,1,0,0,0,0,0,6,1,1,3,2,2,2,59,39
DxO PhotoLab,0,0,0,0,0,0,0,0,0,0,0,0,6,1,13,15
Freepik,0,0,0,0,0,0,0,0,0,0,0,0,0,0,12,7


In [35]:
// Brand URL's visible in organic results only!
import * as gap from "../../src/analysis/gap.ts?v=1007";

let brandUrls = {};

for (let brand of brs!) {
    let brandName = brand.shortName;
    let urls = new Set<string>();
    for (let result of annOrgResults) {
        let citations = result.visibilities[brandName]?.citations || [];
        let references = result.visibilities[brandName]?.references || [];
        for (let url of citations.concat(references)) {
            urls.add(url);
        }
    }
    brandUrls[brandName] = Array.from(urls).sort();
}

brandUrls

{
  Freepik: [],
  PicsArt: [
    [32m"https://picsart.com/ai-image-enhancer/"[39m,
    [32m"https://picsart.com/ai-image-enhancer/sharpen-image/"[39m
  ],
  Fotor: [ [32m"https://www.fotor.com/video-enhancer/"[39m ],
  Adobe: [
    [32m"https://exchange.adobe.com/apps/cc/57ce87f0/remini-ai-photo-enhancer"[39m,
    [32m"https://helpx.adobe.com/photoshop/desktop/repair-retouch/clean-restore-images/enhance-image-quality-with-generative-upscale.html"[39m,
    [32m"https://www.adobe.com/creativecloud/photography/discover/image-sharpener.html"[39m,
    [32m"https://www.adobe.com/express/feature/image/enhance"[39m,
    [32m"https://www.adobe.com/products/firefly.html"[39m,
    [32m"https://www.adobe.com/products/photoshop-lightroom/super-resolution.html"[39m,
    [32m"https://www.adobe.com/products/photoshop/image-upscaler.html"[39m
  ],
  Remini: [],
  [32m"Let’s Enhance"[39m: [ [32m"https://letsenhance.io/"[39m ],
  [32m"Topaz Photo AI"[39m: [
    [32m"https://ww

In [36]:
let regenBrandContent = false;

type ContentType = Record<string, scrape.ScrapeResponse>;

let content = await utils.fromCache(CACHE, 'brandContent') as ContentType | null;
if (!content || regenBrandContent) {
    console.log(`Scraping brands' URL content...`);
    let flatUrls = Object.values(brandUrls).flat();
    let responses = await scrape.runBatchScrape(
        flatUrls,
        {
            formats: ['text', 'markdown', 'html'],
            jsRendering: true,
        }
    );
    content = Object.fromEntries(responses.map((resp) => [resp.url, resp]));
    await utils.toCache(CACHE, content, 'overwrite', 'brandContent');
} else {
    console.log(`Loaded brands' content from cache`);
}

Loaded brands' content from cache


# Content analysis

In [37]:
import * as parse from "../../src/analysis/parseHtml.ts?v=1002";
import * as analyse from "../../src/analysis/analyse.ts?v=11";

In [88]:
let inclBrands = ["PicsArt", "Fotor", "Let’s Enhance", "Topaz Photo AI"];
let brandAnalyses = analyse.analyzeBrandContent(brandUrls, content!, inclBrands);
let someAnalysis = brandAnalyses["Topaz Photo AI"][0]
console.log(someAnalysis.url);
someAnalysis["stats"]

https://www.topazlabs.com/


{
  numSchemas: [33m1[39m,
  schemaStats: {
    Article: [33mfalse[39m,
    Author: [33mfalse[39m,
    BlogPosting: [33mfalse[39m,
    BreadcrumbList: [33mfalse[39m,
    Event: [33mfalse[39m,
    FAQPage: [33mfalse[39m,
    HowTo: [33mfalse[39m,
    JobPosting: [33mfalse[39m,
    LocalBusiness: [33mfalse[39m,
    Organization: [33mtrue[39m,
    OrganizationFields: {
      name: [33mtrue[39m,
      url: [33mtrue[39m,
      logo: [33mtrue[39m,
      sameAs: [33mfalse[39m,
      brand: [33mfalse[39m,
      contactPoint: [33mfalse[39m,
      address: [33mtrue[39m
    },
    Person: [33mfalse[39m,
    Product: [33mfalse[39m,
    Recipe: [33mfalse[39m,
    Review: [33mfalse[39m,
    Service: [33mfalse[39m,
    SoftwareApplication: [33mfalse[39m,
    VideoObject: [33mfalse[39m,
    WebSite: [33mfalse[39m
  },
  headingStats: {
    totalHeadings: [33m85[39m,
    oneH1: [33mfalse[39m,
    maxDepth: [33m4[39m,
    avgSubheadings: [33m5.5

In [90]:
// Write all brandAnalyses to JSON files:
// One folder per brand, one file per analysis
import { join } from "@std/path";

for (let [brand, analyses] of Object.entries(brandAnalyses)) {
    let brandDir = join("./analyzedContent", brand.replace(/\s+/g, '_').toLowerCase());
    await Deno.mkdir(brandDir, { recursive: true });
    for (let analysis of analyses) {
        let fileName = analysis.url.replace(/[^a-z0-9]/gi, '_').toLowerCase().slice(0, 50) + ".json";
        let filePath = join(brandDir, fileName);
        await Deno.writeTextFile(filePath, JSON.stringify(analysis, null, 2));
    }
}

In [91]:
let aggStats = {};
for (const brand of inclBrands) {
    aggStats[brand] = analyse.aggregateBrandStats(brandAnalyses, [brand]);
}

{
  count: [33m4[39m,
  avgSchemas: [33m1.8[39m,
  schemaStats: { Organization: [33m4[39m },
  headingStats: {
    avgHeadings: [33m88.3[39m,
    pctOneH1: [33m0[39m,
    avgMaxDepth: [33m4[39m,
    avgSubheadings: [33m8.4[39m,
    avgSkippedLevels: [33m11.8[39m,
    avgEmptyHeadings: [33m0[39m,
    avgDuplicateHeadings: [33m20[39m,
    avgHeadingCounts: { h1: [33m6.8[39m, h2: [33m61.5[39m, h3: [33m11[39m, h4: [33m9[39m, h5: [33m0[39m, h6: [33m0[39m }
  },
  avgParagraphs: [33m46.8[39m,
  avgParagraphLength: [33m87.7[39m,
  avgLists: [33m0.8[39m,
  avgListLength: [33mNaN[39m,
  avgTables: [33m0[39m,
  avgLinks: [33m37.5[39m,
  avgInternalLinks: [33m31[39m,
  avgExternalLinks: [33m6.5[39m,
  avgQuestions: [33m7.5[39m,
  avgForms: [33m0.8[39m,
  avgWords: [33m1764.8[39m,
  avgChars: [33m14723.5[39m
}

In [92]:
// Save aggStats to JSON:
// One file for each brand in ./analyzedStats
for (let [brand, stats] of Object.entries(aggStats)) {
    let brandDir = join("./analyzedStats");
    await Deno.mkdir(brandDir, { recursive: true });
    let fileName = brand.replace(/\s+/g, '_').toLowerCase() + "_stats.json";
    let filePath = join(brandDir, fileName);
    await Deno.writeTextFile(filePath, JSON.stringify(stats, null, 2));
}

## Scaped page to LLM input

Also check: https://huggingface.co/jinaai/ReaderLM-v2

In [87]:
import { load } from "@std/dotenv";

import * as scrape from "../../src/apis/hasdata/scrape.ts?v=2";
import * as parse from "../../src/analysis/parseHtml.ts?v=2";
import * as analyse from "../../src/analysis/analyse.ts?v=3";
import * as clf from '../../src/analysis/classifyPage.ts?v=12';

void await load({
  envPath: "../../.env",
  export: true,
});

In [43]:
let someUrl = "https://www.toyota.es/world-of-toyota/articles-news-events/como-desgravarse-compra-coche-electrico"
let someResp = await scrape.scrapeWeb(someUrl, { formats: ['text', 'markdown', 'html'], jsRendering: true })

In [44]:
let someAnalysis = analyse.analyzeBrandContent(
    { "Peugeot": [someUrl] },
    { [someUrl]: someResp },
)

In [47]:
let someMd = await responseToMd(someResp, false)
console.log(`Original HTML length: ${someResp.html?.length}`);
console.log(`Original MD length: ${someResp.markdown?.length}`);
console.log(`Cleaned MD length: ${someMd.length}`);

Original HTML length: 604570
Original MD length: 38741
Cleaned MD length: 11103


In [75]:
let someContext = await clf.responseToContext(someResp, someAnalysis["Peugeot"][0], clf.CLASSIFICATION_PRESET);
console.log(someContext.length);

18021


In [78]:
let aiSummary = await clf.classifyPageFreestyle(someContext);
aiSummary

{
  type: [32m"object"[39m,
  topic: [32m"Deducción fiscal por compra de coche eléctrico en España"[39m,
  subtopics: [
    [32m"Tipos de coches eléctricos deducibles"[39m,
    [32m"Requisitos para deducción en IRPF"[39m,
    [32m"Porcentaje deducible en el IRPF"[39m,
    [32m"Deducción de puntos de carga"[39m,
    [32m"Dónde declarar la deducción"[39m,
    [32m"Modelos Toyota eléctricos que aplican"[39m,
    [32m"Compatibilidad con el Plan MOVES III"[39m,
    [32m"Ventajas fiscales y ayudas estatales"[39m,
    [32m"Comparativa de vehículos eléctricos vs convencionales"[39m
  ],
  summary: [32m"Este artículo de Toyota España es una completa guía informativa sobre cómo desgravar fiscalmente la compra de un coche eléctrico en España. Explica qué tipos de vehículos eléctricos –incluyendo BEV, PHEV y FCEV– pueden beneficiarse de la deducción en el IRPF, y detalla los requisitos necesarios (como fechas de compra, porcentaje del pago anual, uso y precio máximo del vehí

In [79]:
let aiClf = await clf.classifyPage(someContext, 'gpt-5.1', { reasoning: { effort: 'low' } });
aiClf

{
  topic: [32m"Content"[39m,
  subtopic: [32m"Guide"[39m,
  summary: [32m"Informational guide from Toyota España explaining how to deduct the purchase of an electric car in Spain on the IRPF, including eligible vehicle types, tax requirements, deduction percentages, treatment of charging points, compatibility with Plan MOVES III, and examples of Toyota electric models that qualify."[39m,
  reasoning: [32m"The page is structured as an in-depth informational resource with headings addressing specific questions about tax deductions for electric vehicles. It belongs to Toyota’s content/blog area and uses Article schema, but its primary purpose is to guide users through requirements and options rather than to purely promote a specific product or offer step-by-step technical instructions. This aligns best with a ‘Content > Guide’ classification rather than ‘Instructional > How-to’ or a commercial product/solution page."[39m
}

## Content categories

- Article, Blog Post, Listicle, Comparison (table), Calculator, Product page, Hub, How-to, News

In [None]:
let ContentTypeSchema = z.object({
    type: z.enum([
        "Article",
        "Blog Post",
        "Listicle",
        "Comparison",
        "Calculator",
        "Product Page",
        "Hub",
        "How-to",
        "News"
    ]).describe("Category best describing the web page. Select the most specific if applicable (e.g. Listicle, Product Page), otherwise more general (e.g. Article)."),
    reason: z.string().describe("Brief explanation (single phrase) of why this content type was assigned."),
}).describe("Categorization of the web page.");

let ContentElementSchema = z.object({
    elementType: z.enum([
        "List",
        "Listicle",
        "Comparison",
        "Calculator"
    ]).describe("Type of content element found on the page. Although a page may not be categorized specifically as a Listicle or Comparison, it may still contain such elements."),
    contextHeading: z.string().describe("The heading or section title under which this content element is found."),
}).describe("Specific content elements identified within the web page that contribute to its overall categorization.");

let StructuredContentSchema = z.object({
    types: z.array(ContentTypeSchema).describe("One or more content types. A web page can be both a general type (e.g. Article) and a more specific type (e.g. Listicle). If multiple types are assigned, they should be listed from most specific to most general. Do NOT include types that are not applicable."),
    elements: z.array(ContentElementSchema).describe("List of specific content elements identified on the page that support its categorization."),
}).describe("Structured representation of the web page's content type and its constituent elements based on HTML analysis.");

In [None]:
// ------ HTML based ------
let catPromptHTML = `
Analyze the following web page HTML content and check whether it belongs to one or more of the following categories:
Article, Blog Post, Listicle, Comparison, Calculator, Product page, Hub, How-to, News. Also check whether any of the following
content elements are present: List, Listicle, Comparison, Calculator. In the output provide the list of applicable categories
(with reason) and the list of detected content blocks (with heading and direct url if available).
To classify the overall page category (e.g. Article, Blog Post, Listicle etc.) focus on the content of the headings (h1, h2).
For Listicles and Comparisons, focus on the presence and content of lists or tables.
For product pages, focus on the presence of product information in the structured data, e.g. lists of products or offers,
or headings that suggest product listings. To detect a hub, focus on the quantity of internal links to related articles or sections.

## Base URI
${currUrl}

## HTML Content
{content}
`
    .replace("{content}", $.html())
    .trim();

let result = await askOpenAISafe(
    catPromptHTML,
    'gpt-5.1',
    StructuredContentSchema,
    { reasoning: { effort: 'low' } }
);

if (result.parsed) {
    console.log(result.parsed);
}


In [None]:
// ------ Structure based ------
let catPromptStruct = `
Analyze the following structured web page content and check whether it belongs to one or more of the following categories:
Article, Blog Post, Listicle, Comparison, Calculator, Product page, Hub, How-to, News. Also check whether any of the following
content elements are present: List, Listicle, Comparison, Calculator. In the output provide the list of applicable categories
(with reason) and the list of detected content blocks (with heading and direct url if available).
To classify the overall page category (e.g. Article, Blog Post, Listicle etc.) focus on the schemas and the content of the headings (h1, h2).
For Listicles and Comparisons, focus on the presence and content of lists or tables.
For product pages, focus on the presence of product information in the structured data, e.g. lists of products or offers,
or headings that suggest product listings.
To detect a hub, focus on the quantity of internal links to related articles or sections.

## Base URI
${currUrl}

## HTML Content
{content}
`
    .replace("{content}", JSON.stringify(structuredContent, null, 2))
    .trim();

let result = await askOpenAISafe(
    catPromptStruct,
    'gpt-5.1',
    StructuredContentSchema,
    { reasoning: { effort: 'low' } }
);

if (result.parsed) {
    console.log(result.parsed);
}


## Content element categories

In [None]:
import * as parse from "../../src/analysis/parseHtml.ts?v=207";

let classifiedLists = await parse.classifyElements(lists, 'list');
let classifiedTables = await parse.classifyElements(tables, 'table');
let classifiedForms = await parse.classifyElements(forms, 'form');

console.log(`Classified ${classifiedLists.length} lists, ${classifiedTables.length} tables, ${classifiedForms.length} forms`);

In [None]:
classifiedLists

## Entities
Which entities are mentioned in well-ranking pages?

In [97]:
import * as entities from "../../src/entities.ts?v=6";

let instructions = `
Extract any relevant entities or keywords from the text related to AI photo enhancement products.
These will be used to brief content creation, so focus on terms that would help in writing informative articles
similar to the input text but for a different brand.
`.trim();

let bodyEnts = await entities.extractAnyEntities(text, instructions, 'gpt-5.1', { reasoning: { effort: 'none' } });

In [101]:
pl.readRecords(bodyEnts).sort('type')

name,type
video footage,asset_type
family video,asset_type
old and blurry video,asset_type
childhood memory,asset_type
wedding footage,asset_type
high-quality video,attribute
4k,attribute
60 fps,attribute
240p,attribute
360p,attribute


# Own content

In [None]:
...

# Create Content

Create content automatically, or generate a brief for what content should be created

## Auto-generate FAQ