Improve referrer matching and add domains#45803
Conversation
Broaden referrer matching for organic attribution: replace exact equality checks with startsWith() (via Array.some). Expand search engine list (generalized Google domain, added Startpage, Brave, Kagi, Yandex entries) and significantly extend social referrer list (LinkedIn variants, lnkd.in, Hacker News, X/Twitter variants, Facebook mobile/l.facebook, Instagram, Threads, Bsky, Mastodon instances, YouTube, Reddit variants, Pinterest, Quora, etc.). These changes make Organic search/social detection more accurate across country TLDs, subdomains and shortened links.
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
WalkthroughThe PR updates Salesforce contact attribution classification logic. It expands the allowlists for organic search and organic social referrer domains and changes the matching strategy from 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@website/api/helpers/salesforce/update-or-create-contact-and-account.js`:
- Around line 284-285: The current referrer check uses the unsafe prefix string
'https://www.google.' (and similar prefix matches) which can be spoofed by hosts
like 'www.google.evil.com'; modify the referrer validation in
update-or-create-contact-and-account.js to parse the referrer URL (new
URL(referrer).hostname) and validate the hostname rather than doing a
substring/prefix match on the full URL—either compare the hostname against an
explicit whitelist of allowed hostnames (e.g. canonical google hostnames) or use
a regex that enforces a single-label root plus a valid TLD such as matching
/^www\.google\.[a-z]{2,}$/i; replace the 'https://www.google.' prefix entry and
any similar prefix-based checks (e.g. the list containing 'https://www.google.'
and 'https://www.bing.com/') with this hostname-based validation.
- Around line 323-327: Guard the referrer value before calling .startsWith():
ensure marketingAttributionCookie.referrer is a string (coerce with String(...)
or check typeof === 'string') and handle null/undefined/other types before
running the domain tests; update the logic around
REFERRER_DOMAINS_FOR_ORGANIC_SEARCH and REFERRER_DOMAINS_FOR_ORGANIC_SOCIAL
checks to use the validated/coerced referrer (or skip matching and set a
default) so attributionDetails.sourceChannelDetails and
attributionDetails.campaign only get set when a safe string match succeeds.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 83e99ec7-580d-46fa-ad2e-22f6a4d09655
📒 Files selected for processing (1)
website/api/helpers/salesforce/update-or-create-contact-and-account.js
There was a problem hiding this comment.
♻️ Duplicate comments (1)
website/api/helpers/salesforce/update-or-create-contact-and-account.js (1)
327-327:⚠️ Potential issue | 🟠 Major | ⚡ Quick winUse hostname validation for Google referrer matching (prefix is spoofable).
Line 327 still allows false positives because
https://www.google.can match attacker-controlled hosts (for example,https://www.google.evil.com/...). Parse hostname and validate it explicitly before classifying as Organic search.Suggested minimal fix
- let REFERRER_DOMAINS_FOR_ORGANIC_SEARCH = [ - 'https://www.google.', // covers all ~190 country variants (google.com, google.co.uk, google.de, etc.) + let REFERRER_DOMAINS_FOR_ORGANIC_SEARCH = [ 'https://www.bing.com/', 'https://search.yahoo.com/', 'https://duckduckgo.com/', @@ 'https://yandex.ru/', ]; @@ - if(REFERRER_DOMAINS_FOR_ORGANIC_SEARCH.some((domain) => referrer.startsWith(domain))) { + let referrerHost = ''; + try { referrerHost = (new URL(referrer)).hostname.toLowerCase(); } catch (e) {} + let isGoogleReferrer = /^www\.google\.[a-z]{2,}(?:\.[a-z]{2})?$/i.test(referrerHost); + + if(isGoogleReferrer || REFERRER_DOMAINS_FOR_ORGANIC_SEARCH.some((domain) => referrer.startsWith(domain))) {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@website/api/helpers/salesforce/update-or-create-contact-and-account.js` at line 327, The referrer check using startsWith is spoofable; change it to parse the referrer with new URL(referrer) (wrap in try/catch to handle invalid URLs) and compare url.hostname against REFERRER_DOMAINS_FOR_ORGANIC_SEARCH by verifying hostname === domain or hostname.endsWith('.' + domain) rather than using startsWith; update the conditional that uses REFERRER_DOMAINS_FOR_ORGANIC_SEARCH and referrer to perform this hostname-based validation so only true Google (and other listed) hostnames match.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@website/api/helpers/salesforce/update-or-create-contact-and-account.js`:
- Line 327: The referrer check using startsWith is spoofable; change it to parse
the referrer with new URL(referrer) (wrap in try/catch to handle invalid URLs)
and compare url.hostname against REFERRER_DOMAINS_FOR_ORGANIC_SEARCH by
verifying hostname === domain or hostname.endsWith('.' + domain) rather than
using startsWith; update the conditional that uses
REFERRER_DOMAINS_FOR_ORGANIC_SEARCH and referrer to perform this hostname-based
validation so only true Google (and other listed) hostnames match.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d3e5eef2-636b-4222-a20f-2ae45eaece59
📒 Files selected for processing (1)
website/api/helpers/salesforce/update-or-create-contact-and-account.js
|
@eashaw bumping this back up for your review - thanks in advance |
A small bug in our attribution logic. R
Replace exact equality checks with startsWith() (via Array.some).
The classification uses Array.includes(), which requires an exact string match.
The lists contain bare root domains: for example
The stored referrer is always the full URL of the specific page the visitor came from, including path.
So - if an Organic Social URL was https://www.linkedin.com/feed/
It does not equal https://www.linkedin.com/
and
The match fails, and the visit falls through to Web Referral. (this would be incorrect)
Also - Expand search engine list (generalized Google domain, added Startpage, Brave, Kagi, Yandex entries) and significantly extend social referrer list (LinkedIn variants, lnkd.in, Hacker News, X/Twitter variants, Facebook mobile/l.facebook, Instagram, Threads, Bsky, Mastodon instances, YouTube, Reddit variants, Pinterest, Quora, etc.).
These changes make Organic search/social detection more accurate across country TLDs, subdomains and shortened links.
Summary by CodeRabbit