Skip to content

Improve referrer matching and add domains#45803

Merged
johnjeremiah merged 2 commits into
mainfrom
jj-referral-attribution-fix-issue
May 22, 2026
Merged

Improve referrer matching and add domains#45803
johnjeremiah merged 2 commits into
mainfrom
jj-referral-attribution-fix-issue

Conversation

@johnjeremiah
Copy link
Copy Markdown
Contributor

@johnjeremiah johnjeremiah commented May 19, 2026

A small bug in our attribution logic. R

Replace exact equality checks with startsWith() (via Array.some).

The classification uses Array.includes(), which requires an exact string match.
The lists contain bare root domains: for example

'https://www.linkedin.com/'
'https://www.reddit.com/'
'https://x.com/'

The stored referrer is always the full URL of the specific page the visitor came from, including path.

So - if an Organic Social URL was https://www.linkedin.com/feed/
It does not equal https://www.linkedin.com/
and
The match fails, and the visit falls through to Web Referral. (this would be incorrect)

Also - Expand search engine list (generalized Google domain, added Startpage, Brave, Kagi, Yandex entries) and significantly extend social referrer list (LinkedIn variants, lnkd.in, Hacker News, X/Twitter variants, Facebook mobile/l.facebook, Instagram, Threads, Bsky, Mastodon instances, YouTube, Reddit variants, Pinterest, Quora, etc.).

These changes make Organic search/social detection more accurate across country TLDs, subdomains and shortened links.

Summary by CodeRabbit

  • Bug Fixes
    • Improved Salesforce traffic attribution by expanding organic referrer allowlists and enhancing matching to better detect organic search and social sources.
    • Normalized referrer values when absent or malformed to prevent misclassification, yielding more accurate traffic categorization and attribution reporting.

Review Change Stack

Broaden referrer matching for organic attribution: replace exact equality checks with startsWith() (via Array.some). Expand search engine list (generalized Google domain, added Startpage, Brave, Kagi, Yandex entries) and significantly extend social referrer list (LinkedIn variants, lnkd.in, Hacker News, X/Twitter variants, Facebook mobile/l.facebook, Instagram, Threads, Bsky, Mastodon instances, YouTube, Reddit variants, Pinterest, Quora, etc.). These changes make Organic search/social detection more accurate across country TLDs, subdomains and shortened links.
@fleet-release fleet-release requested a review from eashaw May 19, 2026 14:47
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 19, 2026

Walkthrough

The PR updates Salesforce contact attribution classification logic. It expands the allowlists for organic search and organic social referrer domains and changes the matching strategy from includes() equality checks to startsWith() prefix matching. This affects how the code assigns source channel details and campaign values for contacts classified as "Organic search (OS)" or "Organic social (SOC)" versus the default "Web referral (WR)" fallback.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description clearly explains the bug, solution, and domain additions, but does not follow the repository's required checklist template with sections for changes files, testing, database migrations, etc. Add the standard checklist sections from the template (changes files, testing, database migrations, etc.) with appropriate checkboxes marked or deleted as applicable to this PR.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Improve referrer matching and add domains' directly and accurately summarizes the main changes: fixing referrer matching logic and expanding domain allowlists.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch jj-referral-attribution-fix-issue

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@website/api/helpers/salesforce/update-or-create-contact-and-account.js`:
- Around line 284-285: The current referrer check uses the unsafe prefix string
'https://www.google.' (and similar prefix matches) which can be spoofed by hosts
like 'www.google.evil.com'; modify the referrer validation in
update-or-create-contact-and-account.js to parse the referrer URL (new
URL(referrer).hostname) and validate the hostname rather than doing a
substring/prefix match on the full URL—either compare the hostname against an
explicit whitelist of allowed hostnames (e.g. canonical google hostnames) or use
a regex that enforces a single-label root plus a valid TLD such as matching
/^www\.google\.[a-z]{2,}$/i; replace the 'https://www.google.' prefix entry and
any similar prefix-based checks (e.g. the list containing 'https://www.google.'
and 'https://www.bing.com/') with this hostname-based validation.
- Around line 323-327: Guard the referrer value before calling .startsWith():
ensure marketingAttributionCookie.referrer is a string (coerce with String(...)
or check typeof === 'string') and handle null/undefined/other types before
running the domain tests; update the logic around
REFERRER_DOMAINS_FOR_ORGANIC_SEARCH and REFERRER_DOMAINS_FOR_ORGANIC_SOCIAL
checks to use the validated/coerced referrer (or skip matching and set a
default) so attributionDetails.sourceChannelDetails and
attributionDetails.campaign only get set when a safe string match succeeds.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 83e99ec7-580d-46fa-ad2e-22f6a4d09655

📥 Commits

Reviewing files that changed from the base of the PR and between d7fa35e and 4cc44b0.

📒 Files selected for processing (1)
  • website/api/helpers/salesforce/update-or-create-contact-and-account.js

Comment thread website/api/helpers/salesforce/update-or-create-contact-and-account.js Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
website/api/helpers/salesforce/update-or-create-contact-and-account.js (1)

327-327: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use hostname validation for Google referrer matching (prefix is spoofable).

Line 327 still allows false positives because https://www.google. can match attacker-controlled hosts (for example, https://www.google.evil.com/...). Parse hostname and validate it explicitly before classifying as Organic search.

Suggested minimal fix
-          let REFERRER_DOMAINS_FOR_ORGANIC_SEARCH = [
-            'https://www.google.',      // covers all ~190 country variants (google.com, google.co.uk, google.de, etc.)
+          let REFERRER_DOMAINS_FOR_ORGANIC_SEARCH = [
             'https://www.bing.com/',
             'https://search.yahoo.com/',
             'https://duckduckgo.com/',
@@
             'https://yandex.ru/',
           ];
@@
-          if(REFERRER_DOMAINS_FOR_ORGANIC_SEARCH.some((domain) => referrer.startsWith(domain))) {
+          let referrerHost = '';
+          try { referrerHost = (new URL(referrer)).hostname.toLowerCase(); } catch (e) {}
+          let isGoogleReferrer = /^www\.google\.[a-z]{2,}(?:\.[a-z]{2})?$/i.test(referrerHost);
+
+          if(isGoogleReferrer || REFERRER_DOMAINS_FOR_ORGANIC_SEARCH.some((domain) => referrer.startsWith(domain))) {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@website/api/helpers/salesforce/update-or-create-contact-and-account.js` at
line 327, The referrer check using startsWith is spoofable; change it to parse
the referrer with new URL(referrer) (wrap in try/catch to handle invalid URLs)
and compare url.hostname against REFERRER_DOMAINS_FOR_ORGANIC_SEARCH by
verifying hostname === domain or hostname.endsWith('.' + domain) rather than
using startsWith; update the conditional that uses
REFERRER_DOMAINS_FOR_ORGANIC_SEARCH and referrer to perform this hostname-based
validation so only true Google (and other listed) hostnames match.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@website/api/helpers/salesforce/update-or-create-contact-and-account.js`:
- Line 327: The referrer check using startsWith is spoofable; change it to parse
the referrer with new URL(referrer) (wrap in try/catch to handle invalid URLs)
and compare url.hostname against REFERRER_DOMAINS_FOR_ORGANIC_SEARCH by
verifying hostname === domain or hostname.endsWith('.' + domain) rather than
using startsWith; update the conditional that uses
REFERRER_DOMAINS_FOR_ORGANIC_SEARCH and referrer to perform this hostname-based
validation so only true Google (and other listed) hostnames match.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d3e5eef2-636b-4222-a20f-2ae45eaece59

📥 Commits

Reviewing files that changed from the base of the PR and between 4cc44b0 and a78d5d7.

📒 Files selected for processing (1)
  • website/api/helpers/salesforce/update-or-create-contact-and-account.js

@johnjeremiah
Copy link
Copy Markdown
Contributor Author

@eashaw bumping this back up for your review - thanks in advance

@johnjeremiah johnjeremiah self-assigned this May 21, 2026
@johnjeremiah johnjeremiah merged commit 66f5b46 into main May 22, 2026
12 checks passed
@johnjeremiah johnjeremiah deleted the jj-referral-attribution-fix-issue branch May 22, 2026 13:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants