Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client side keyword detection #48

Merged
merged 3 commits into from
Apr 15, 2021
Merged

Conversation

davidfischer
Copy link
Contributor

  • Detects certain keywords on the client side and uses the 10 most common as input to the ad decision.
  • Uses detected keywords in addition to any manually specified ones
  • Tries to find the main content and not use navigation in the keyword detection. Looks first for <main> then role="main" then just <body>
  • Keyword detection is pretty fast and take no more than single digit milliseconds on a reasonable computer

- Detects certain keywords on the client side
  and uses them as input to the ad decision.
  Uses the top 10 most common keywords.
- Uses detected keywords in addition to any manually
  specified ones
- Tries to find the main content and not use navigation
  in the keyword detection.
- Keyword detection is pretty fast and take no
  more than single digit milliseconds
This is required for IE11 to work correctly.
This adds ~7kb to the minified bundle and ~1kb after gzip.
Copy link
Member

@ericholscher ericholscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks nice and clean for an initial test.

if (a[1] < b[1]) return 1;
return 0;
}
).slice(0, MAX_KEYWORDS).map((x) => x[0]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so functional it makes my brain hurt. The final map is basically just pulling the value from a 'keyword': count dict item?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. keywordHist is a dict with keys as the keyword and the value is the count of its occurrences. Object.entries is basically the same as dict.items in Python. It returns an array of length 2 arrays (key, value). We then sort that array such that the most common items are first. Then we take the first MAX_KEYWORDS items from the array so as to only get the top X keywords. Lastly, we drop the number of occurrences and just return the keywords.

// Additional categorization can be done on the server side for pages
// that request ads commonly but this quick and easy categorization
// works decently well most of the time.
const KEYWORDS = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use a mapping -- I assume the key is the page copy and the value is the server-side keyword?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct.

@@ -58,8 +182,11 @@ export class Placement {

this.response = null;

this.keywords = keywords || "";
this.campaign_types = campaign_types || "paid|community|house";
this.keywords = (keywords || []).concat(this.detectKeywords());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be useful on the server-side to have the user-defined & automatic keywords separate in the request. Maybe we don't need it for a v1 given that it requires server changes (eg. automatic_keywords alongside keywords), but I do think we might want to treat them differently in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, I could see that. It's possible also that this client side stuff is only temporary too. We'll have to see.

@@ -92,7 +219,7 @@ export class Placement {
}

let classes = (element.className || "").split(" ");
if (classes.includes("loaded")) {
if (classes.indexOf("loaded") >= 0) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note that includes requires a polyfill in IE11. That's why I removed it.

@davidfischer davidfischer merged commit 3c962ad into master Apr 15, 2021
@stsewd stsewd deleted the davidfischer/keyword-extraction branch April 15, 2021 22:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants