Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic Creative Use Case #36

Closed
appascoe opened this issue Jun 24, 2020 · 17 comments
Closed

Dynamic Creative Use Case #36

appascoe opened this issue Jun 24, 2020 · 17 comments

Comments

@appascoe
Copy link
Collaborator

Use Case

A common advertising use case for online advertisement is dynamic creative. While there
are many recommendation algorithms that would function well in the TURTLEDOVE framework,
one of the most basic applications is somewhat problematic: recommending a set of products
the user has already viewed, an "identity" recommender.

Advertisers with a large number of products could face issues where certain products are
viewed so infrequently that just using an interest group would not be sufficiently
differentially private. Smaller advertisers may be completely locked out of this
functionality altogether. As such, a solution would be useful to a broad base of
clients.

Proposed Solution

Perhaps I am missing something, but I propose that in addition to the interest request,
advertisers have an opportunity to write web bundles into the browser as the user is
on the advertiser's site, when a pixel is fired. I see that:

  • Advertisers would be able to add very granular data into the web bundle that would
    enable individual products to be recommended, regardless of how many views they receive.
  • There appears to be no additional concerns for privacy, as since these data would be
    written while on the advertiser's site, first-party cookie tracking would be completely
    available anyway.
  • This capability has benefits beyond dynamic creative, as it could be applied to very
    granular interest groups in general to select much more targeted ads upfront. Indeed,
    the functionality could be fully duplicative of the interest group request in general,
    providing a bidding function and the interest group response package to later be
    combined with a contextual package in the browser when on a publisher site.

While perhaps this is an additional opportunity to write bundles beyond the interest
group request, I could see this as a replacement, or rather time-shift, of the interest
group request as a whole. For any advertisement, it provides an ability to use more
fine-grained machine learning models without revealing to advertisers any more
information than they already have.

If we wanted to avoid some inefficiency with advertisers sending web bundles back on
every pixel fired, we could provide some guarantee in the browser that if a set of
interest groups for an advertiser does not have an associated web bundle, the browser
would then at a random later time make an interest group request, complete with
differential privacy, as a last opportunity to provide a web bundle after the user has
left the advertiser's site. This gives advertisers a chance to hedge their bets without
inundating the client with loads of data, with some confidence that they won't
completely miss an opportunity for delivery.

One concern would be that an ad in the web bundle would have such a specific ID that
it would deanonymize the user, but the Aggregate Reporting API should handle this by
not reporting delivery data until it's met some differential privacy bar. Advertisers
would be incentivized to not provide IDs that are too granular so they could receive
reporting back in a timely manner.

Any additional thoughts, questions, and discussion are most welcome.

@michaelkleber
Copy link
Collaborator

Hi Andrew,

I think what you're suggesting is the same as, or very similar to, the proposal in #31 from @jonasz. Please take a look there and see if I'm missing something?

@appascoe
Copy link
Collaborator Author

Hey, Michael,

I think that the very specific use case of an identity recommender would be covered by #31 , but the proposal there is not as powerful as the proposal here. This is starting to diverge a bit from the use case of an identity recommender, but I think the following example will explain why this proposal is more powerful but preserves as much privacy.

The current proposal in TURTLEDOVE allows adding a browser to a set of interest groups based on browsing behavior during the pixel firing. But advertisers know that during the subsequent interest group request, the browser will make some differential privacy decisions on which interest groups to include in this request. To hedge their bets, advertisers will be incentivized to add browsers to a hierarchy of interest groups. Suppose that a user looks at a specific pair of sneakers. The advertiser may want to write a set of interest groups such as:

{ "shoes_id_12345", "womens_sneakers", "sneakers",  "womens_shoes", "shoes", "womens_apparel", "apparel" }

Perhaps this seems like overkill, but given that the advertiser doesn't know what will be sufficiently private for the browser, the advertiser is going to hope that it can get as specific as possible. Later, during the interest group request, the browser sends:

{ "womens_shoes", "shoes", "womens_apparel", "apparel" }

The interest groups not included did not have audiences of sufficient size to be included.

With the TURTLEDOVE spec, this would be the point at which any type of machine learning models are applied and the web bundle is chosen to send back to the browser. Issue #31 allows for personalization of the ad with some additional personalizedData bundle that is sent to the browser at pixel time, but a critical piece of advertiser performance is the choice of bid based on granular data and what is actually included in the ad. In this case, the machine learning is still only receiving a smaller set of interest groups to inform what an appropriate bid would be. For example, in this case, it doesn't know that the user was looking at sneakers, as opposed to boots, dress shoes, etc.

The proposal here suggests that, at pixel time, the advertiser writes into the browser:

  • The full set of interest groups
  • The web bundle of the ad to be rendered
  • The bidding javascript and any metadata including the output of machine learning models in the backend, evaluated over the full set of interest groups

As the browser is on the advertiser's site and has a first-party identifier to understand the user's history with the site anyway, no further information is leaked to the advertiser than they already know. However, there's a benefit gained by employing more sophisticated and granular machine learning models that will improve the efficiency of pricing, better maintaining advertiser performance when third-party cookies go away. Issue #31 doesn't cover this.

The downside, as mentioned, would be that it may be prohibitive, from a bandwidth or computation perspective, to write a web bundle or evaluate a machine learning pricing model on every pixel fire. To prevent this, we can make writing the web bundle and the bidding javascript/model evaluation optional. If, after some length of time, the browser detects that a set of interest groups have been written, but there is no associated web bundle or bidding data associated with that set, the browser performs an interest group request as currently specified in TURTLEDOVE, where the submitted interest groups have differential privacy guarantees. This ensures that the advertiser, perhaps having hedged its bets by not providing a web bundle upfront, still has an opportunity to serve a display ad, even if it's not evaluated on the full set of interest groups in the browser.

@michaelkleber
Copy link
Collaborator

Got it, thanks for pointing out those differences. I do think that the two proposals might end up more similar than you expect, since @jonasz included an open-ended personalizationData object... which could include an encoded product image, or even a whole web bundle as a value!

Handing the ad bundle itself to the browser at the same time that it joins the interest group makes a lot of sense. It won't help with a use case like "That lamp you looked at is now 25% off!", or with being able to start a new ad campaign later which targets people you saw earlier. But it does allow for more specificity, as you say.

It seems to me that there are two privacy questions to think about:

  1. The privacy UX question discussed in that other issue: it can be used to show someone an ad that says "Hi 👀 Michael Kleber 👀 we're watching you!" This could make people unhappy, even though it doesn't actually allow any server to track them.

  2. Link decoration at ad click time. If a site knows something about me, then when I click an outbound link from that site, we need to worry about information carried along with the navigation event. With TURTLEDOVE, the ad doesn't know exactly who I am, just the interest group I'm in, and that helps mitigate the risk at ad click time. In this variant, the ad knows more about me, so there's more to leak.
    Perhaps this could be mitigated if we added the restriction that the ad click's destination must be the same site that stashed the ad in your browser in the first place.

@appascoe
Copy link
Collaborator Author

So I didn't realize personalizedData would be able to pack so much stuff inside it. If we also allow personalizedData to include any ML predictions (and perhaps a bidding.js file), then I suppose it would cover the specificity of this proposal and also allow for starting of new campaigns or representing sale opportunities as well. I'd be open to that.

As for the privacy questions:

  1. Yeah, I agree that this proposal and Browser-side personalization (eliminating the privacy-personalization tradeoff) #31 both allow for this to happen. Pretty much like everything else with all privacy-sandbox proposals, there's an inherent tradeoff here. I don't know how frequently display ads, even at present, employ such deanonymizing content. My guess is that they probably wouldn't even perform well. On the other side, while I don't have any numbers to share at present, I do have an intuition around granularity in predictions propping up CPMs, and thus, publisher revenue. Not saying it's a given, but this may be a tradeoff worth making, especially if the mechanism is fundamentally privacy-preserving under the hood.
  1. Hm, I think I'm seeing two things here:

i) On an advertiser site, there's nothing stopping outbound links from dynamically including deanonymizing content which can then be passed on to a third-party site upon click. This doesn't have anything to do with the ad itself, as far as I can tell, but is a privacy-violating vector. It's unclear to me how to prevent this, and TURTLEDOVE is silent on the matter. For clarity, an egregious example, though requiring savvy infra, could be:

Advertiser_A site has a link: https://advertiser_b/page?advertiser_a_fpc=12345

Advertiser_B serves the request, and drops advertiser_b_fpc=67890, but reads advertiser_a_fpc=12345. This allows them to build a cookie matching table ({"67890": {"advertiser_a": 12345} }) which can be leveraged or sold that ties the browsing behavior on both sites together.

Advertiser_B then makes a server-to-server call back to Advertiser_A which allows them to cookie match in the other direction ({"12345": {"advertiser_b": 67890} }).

There's nothing special about advertisers doing this. Publishers could get in on the game as well. A sufficiently large coop network could communicate a lot through this mechanism, I suspect, since a great deal of browsing happens through hyperlink clicking. The reason that I don't know how to prevent this is because URL parameters are basically critical for the web.

ii) The ad click URL itself. Just so I understand, the concern isn't that the advertiser's site, through the pixel, could deanonymize that a click happened for a specific user, right? I mean, if they're landing on the advertiser's page, the advertiser would have access to the fpc, and could do a lookup to get the browsing history on the site anyway. The real concern is that the click URL would be prepended with some third-party deanonymizing tracking service, and then redirect to the advertiser's page?

Sure, applying a restriction on the click URL's destination would cut this off at the pass. Seems like a reasonable solution to me.

@michaelkleber
Copy link
Collaborator

As you say, targeting data that can vary by individuals instead of groups is inherently a trade-off. Any data that could back up your intuition on the extra value from different minimum group sizes — say, 1 vs. 10 vs 100 vs 1000 — would help contribute to the discussions I'm sure we'll have on this question.

Regarding link decoration, you're quite right information carried in the URL is a problem that browsers need to think about in general. But there might be some hope of understanding that a click from site A to site B carries some info between those two sites, and letting the browser or the person clicking make informed choices as a result. In a single-person-ad scenario, a click from site A to site B could carry information instead from site C that placed the ad. Perhaps this is no worse than any other click, we just need to think it through.

@jonasz
Copy link
Contributor

jonasz commented Jun 26, 2020

Hi Andrew, Michael,

It seems we are trying to simultaneously satisfy:
a) Preventing PII being displayed within the ads
b) Better item recommendations that are more useful for the user

If we were to enforce minimum audience thresholds on components of the ads (that is: items to be recommended) - not on ads themselves (which in reality are collections of items), I think we can keep a) and greatly boost b).

Technically, here's one basic proposal: let's say we introduce item_interest_groups the user can be added to, and which correspond to specific items within the advertiser's catalog. Now:

  • fetch_item_web_bundle is (conceptually) called per each item_interest_group and fetches single item's web assets. (Note that this also improves the networking performance - we only fetch each item at most once. Currently, with interest_groups overlapping in terms of items, we would regularly fetch some items multiple times.)
  • interest_group is no longer used to fetch the web_bundle, but is used to fetch bidding_logic and item_selection_logic from the DSP

Upon winning a bid (which could work as in original Turtledove), the browser would supervise the assembly of an ad from web_bundles corresponding to user's item_interest_groups. We'd have to agree on how the ad is assembled, and what flexibility the browser supports.

// Note that the use case supported by current TD design would be a special case of the logic above.

With this, the "last seen recommender" would be much more feasible for small advertisers. While this doesn't solve the problem of "recommending a single rare item", I think that would still be a great step towards dynamic creatives, and would definitely be very valuable in ecommerce retargeting. Maybe that could be the tradeoff we are looking for?

@michaelkleber
Copy link
Collaborator

Very interesting idea! As you say, there would be a bunch of design work in figuring out how the ad could be sensibly assembled inside the browser after its components pass the not-too-personal threshold. But I like it — this seems like a feasible way to meet the privacy and business goals.

@appascoe
Copy link
Collaborator Author

@michaelkleber Regarding your question about the size of interest groups, we're working on that internally. However, I do think that's a bit tangential to the proposal here. The tradeoff I'm talking about is the granularity of features in ML models (and subsequent performance improvements) vs. the user experience. I can definitively say that 1) identity recommenders tend to perform well, and 2) the reason that all of these adtech companies invest in granular ML is for meaningful perf gains; it's easier and cheaper to work with less features, so we wouldn't bother if the results weren't significant.

One number I can share, is that we were performing an internal migration, and as part of an A/B test, we dropped all of our user features, employing only contextual features. We didn't run this experiment for long because we saw a performance penalty of ~30% (metrics such as cost-per-action and cost-per-click). This seems in the ballpark of Google's and Facebook's own research of publisher revenues dropping by 50% when removing personalization.

@jonasz Maybe I'm missing something again, but wouldn't sending out a request for every item be deanonymizing for infrequently viewed items? I think that this proposal, or an extension of #31 to include bidding logic in personalizedData is more privacy-preserving, though, admittedly, opens the door for the "creepy ad factor." It's still unclear to me how much of a user experience problem this actually is even at present.

As for the browser cobbling together an ad on the fly, I wouldn't expect the browser to provide that functionality. The way I read the TURTLEDOVE docs would be that a web bundle could contain 1) a dynamic ad that itself knows how to employ some logic, 2) the set of product images (or other creative content) to load. When the ad gets rendered on the page, it would trigger code to pull the necessary components from the web bundle, without any additional network calls, to display the ad as intended. At least, in principle, this is how our dynamic ads work today; we refer to the code as an "armature." I was expecting we'd be able to continue using that same mechanism.

@jonasz
Copy link
Contributor

jonasz commented Jun 29, 2020

@michaelkleber

Very interesting idea! As you say, there would be a bunch of design work in figuring out how the ad could be sensibly assembled inside the browser after its components pass the not-too-personal threshold. But I like it — this seems like a feasible way to meet the privacy and business goals.

Great to hear that! Please let me know what'd be the best way to push this idea further. I'm happy to prepare a more detailed proposal in the form of a pull request. Would that be helpful?

@appascoe

wouldn't sending out a request for every item be deanonymizing for infrequently viewed items?

You can think of it as "item-level Turtledove", if you like. Requests for infrequently viewed items would be analogous to requests for small interest groups.

As for the browser cobbling together an ad on the fly, I wouldn't expect the browser to provide that functionality.

Pushing ad creation browser-side is limiting in some ways - that's the price we would pay to attain better recommendation quality. Some design work is required, but it would be great to retain as much flexibility as possible for different styles of creatives.

I fully agree with your sentiment that higher recommendation quality brings real value, both to the users and to the RTB actors. At RTB House we've been looking for ways to improve recommendation quality in Turtledove while eliminating the PII-in-ads concerns, "item level Turtledove" is our best idea so far. If you see room for improvement, it would be great to hear your feedback. Of course, we are also still more than happy to explore other ideas as well.

@appascoe
Copy link
Collaborator Author

I see no reason to push ad creation to the browser. The ad in the web bundle is more than capable of constructing the ad itself without doing any network calls. This will ensure that not every dynamic creative looks the same across different DSPs.

I'm still in favor of either 1) having the ability to send web bundles at pixel fire time, or 2) providing a web bundle and bidding logic in the personalizedData object a la #31 . Infrequently-viewed items being subject to interest group differential privacy is the problem this GitHub issue is trying to solve, and "item-level TURTLEDOVE" does not solve it.

@jonasz
Copy link
Contributor

jonasz commented Jun 29, 2020

I see no reason to push ad creation to the browser

I'd like to stress that there are strong reasons for enabling item level thresholds, and pushing ad creation browser side seems like a feasible way to get there. It just seems we were thinking of different use cases - sorry for mixing up the discussion on the two in one thread. I guess it's best if we move the item-level idea somewhere else, and pursue each issue separately. Thanks for clarifying!

@appascoe
Copy link
Collaborator Author

Item-level thresholds and ad creation on the browser side are orthogonal. You can still have item-level thresholds, but then allow the ad in the web bundle to perform the assembly at render-time.

@michaelkleber
Copy link
Collaborator

I think there may be some terminology differences that are getting in the way of this discussion. I think all three of us like the ideas of (1) some kind of item-level thresholds, and (2) ad bundles that assemble themselves out of those items at render time.

I can imagine a few different ways to implement that, including a hybrid between ideas already discussed above: a personalizedData object which lists the items, including some text and/or images for each one, and then a web bundle fetched using the normal TURTLEDOVE mechanism for the interest group which gets to insert that text/image into the creative at render time.

@appascoe
Copy link
Collaborator Author

appascoe commented Jul 8, 2020

I would like to throw some more color on this issue. Through some analysis internally, we've determined that our bids drop by ~50% after 10 minutes of adding a user to a segment (interest group). In other words, given the current ecosystem, it is extremely valuable to advertiser to users very soon after they've been on the site. This is likely due to a multitude of reasons, but one which relates to this issue, is being able to recommend products based on recent browsing history. The TURTLEDOVE spec casually mentions something like a 4-6 hour delay for the interest group request, which would be severely detrimental to publisher revenues.

Given this data point, I would now characterize myself as:

  1. Against the personalizedData proposal if it would adhere to delays of 4-6 hours before having an opportunity to even generate an ad.
  2. For eliminating the interest group request entirely, and instead writing interest groups and web bundles into the browser at pixel time.
  3. Against item-level thresholds since they provide no additional privacy under point 2.

@michaelkleber
Copy link
Collaborator

Very interesting, @appascoe, thanks for that perspective.

The benefit of the interest group request is, of course, that you might want to show someone a different ad the following week.

It seems like we could accommodate both flows by allowing you to hand the browser a personalized web bundle immediately (at the same time that you add someone to an interest group), without any worry about group size, and then the periodic update only takes place if/when the group is large enough.

@appascoe
Copy link
Collaborator Author

appascoe commented Jul 9, 2020

That sounds reasonable to me. I misspoke when I recently wrote "eliminating the interest group request entirely;" both the original post and the followup (#36 (comment)) also advocate for some type of interest group request for bandwidth/computation efficiency reasons, but being able to modify the ad later (such as for sale prices) is another reason to maintain this additional request.

@JensenPaul
Copy link
Collaborator

Closing this issue as it represents past design discussion that predates more recent proposals. I believe some of this feedback was incorporated into the Protected Audience (formerly known as FLEDGE) proposal. If you feel further discussion is needed, please feel free to reopen this issue or file a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants