Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes DISCO-2470: Implement the foundation of a Firefox Suggest component #5723

Merged
merged 33 commits into from Jul 21, 2023

Conversation

linabutler
Copy link
Contributor

@linabutler linabutler commented Jul 12, 2023

This commit adds a new Suggest component to power the Firefox Suggest feature. It downloads a dataset of suggestions from Remote Settings, stores them in an SQLite database with a simple schema, and makes them available to queries.

We'll iterate on this over time: more comprehensive tests, additions to the interface, and schema changes will definitely be coming soon, and you can take a peek at some of that upcoming work in this Jira epic! This initial PR is meant to provide a foundation for that work, and not so much a production-quality component that we'll ship right away.

It can be reviewed commit-by-commit—that might be helpful for folks newer to Rust and Application Services, as an example of how to get a new component off the ground—or as a mega-diff. For the newer folks, some of the patterns in this new component show up in our other components, too; if you're curious, I definitely recommend checking out the .udl files for the others!

/cc @mozilla/disco-services

Pull Request checklist

  • Breaking changes: This PR follows our breaking change policy
    • This PR follows the breaking change policy:
      • This PR has no breaking API changes, or
      • There are corresponding PRs for our consumer applications that resolve the breaking changes and have been approved
  • Quality: This PR builds and tests run cleanly
    • Note:
      • For changes that need extra cross-platform testing, consider adding [ci full] to the PR title.
      • If this pull request includes a breaking change, consider cutting a new release after merging.
  • Tests: This PR includes thorough tests or an explanation of why it does not
  • Changelog: This PR includes a changelog entry in CHANGELOG.md or an explanation of why it does not need one
    • Any breaking changes to Swift or Kotlin binding APIs are noted explicitly
  • Dependencies: This PR follows our dependency management guidelines
    • Any new dependencies are accompanied by a summary of the due dilligence applied in selecting them.

Branch builds: add [firefox-android: branch-name] to the PR title.

@linabutler linabutler requested a review from tiftran July 12, 2023 18:54
@linabutler linabutler force-pushed the suggest-component branch 2 times, most recently from 660cfe3 to ad34efd Compare July 14, 2023 16:19
@linabutler linabutler force-pushed the suggest-component branch 2 times, most recently from b64c945 to bef40d8 Compare July 14, 2023 22:47
@linabutler linabutler marked this pull request as ready for review July 14, 2023 23:08
@linabutler linabutler changed the title Implement the foundation of a Firefox Suggest component Fixes DISCO-2470: Implement the foundation of a Firefox Suggest component Jul 14, 2023

This component currently supports the basic Suggest experience only. The basic experience shows suggestions for sponsored and web content from a canned dataset. The component downloads the dataset from [Remote Settings](https://remote-settings.readthedocs.io/en/latest/), stores the suggestions in a local database, and makes them available to the Firefox address bar. Because matching is done locally, Mozilla never sees the user's query.

The opt-in "Improved Firefox Suggest Experience", which sends user queries to a [Mozilla-owned proxy server](https://mozilla-services.github.io/merino/intro.html) for server-side matching, is not currently supported.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗒️ I'm hedging a little bit with the "not currently" here 😅 Eventually, I'd love for us to add a Merino client to this component, and have it return suggestions in the same format. We could use that to power AccuWeather suggestions, and bring the whole opt-in Merino experience to mobile. For now, though, I wanted to make it clear that we only support the "offline" experience.

I also used "basic Suggest experience" and "improved Suggest experience" here, instead of "offline" and "online" or "Suggest" and "Suggest Plus", because we haven't used that terminology outside of our team yet, and "improved" is how the SUMO FAQ page describes it.

}

#[test]
fn ingest() -> anyhow::Result<()> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗒️ This is an end-to-end test that exercises the happy path. We'll definitely want to add more coverage, but I'm curious about folks' opinions on the right balance of tests to add in this PR, vs. a follow-up?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A followup is fine IMO, especially if we assume you'd like others on the team to help contribute to them. Your team can then make their own decision about when the coverage is "good enough"

components/suggest/src/suggest.udl Outdated Show resolved Hide resolved
components/suggest/src/store.rs Outdated Show resolved Hide resolved
/// the application. A mobile device on a metered connection might want to
/// request a small subset of the Suggest data and fetch the rest later,
/// while a desktop on a fast link might request the entire dataset on first
/// launch.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 Some other ideas for using constraints:

  • Imagine a future where we have "offline" location-based suggestions. We could have a scheme that chunks these suggestion by region in Remote Settings, and constrain the ingestion to just the chunk based on the user's current (coarse) location—so that we'll only download suggestions near where the user is, instead of downloading the entire dataset and using up space to store suggestions for places they might never visit.
  • If the user just installed Firefox, we might want to ingest an initial set of "higher-value" suggestions that we can show right away, and ingest the rest later. This would also require changing our Remote Settings schema, as I don't think records currently know what suggestions are in their attachments—but it's a thought for making Suggest more useful to first-time users.
  • If the user is on cellular, we could similarly limit the amount of work we do. Android's WorkManager lets you schedule a task based on the network connection type, but we might still want to fetch a smaller set of suggestions even on cellular—just not a full fetch.

pub struct IngestLimits {
/// The maximum number of records to request from Remote Settings.
/// Each record has about 200 suggestions.
pub records: Option<u64>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 I added this constraint mostly for smoke testing against a real Remote Settings, so that I could kick the tires on the schema without ingesting the entire 7 MB dataset every time...

But, I think "approximate number of suggestions" might be a more useful constraint than number of records, if it's valuable at all. "Number of records" (and chunks) is an implementation detail; if the consumer requests about 500 suggestions, we can do the calculation in the component that it's 3 records, without leaking that detail to consumers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to hiding the details of what a record is - I like the idea of constraints in that it helps for Mobile vs Desktop network connection, but I wonder how the consumer will make the decision of the constraint.. is the plan to run some smoke tests with real devices on throttled networks to see how long the network takes? Or do we plan to have dynamic configuration of limits based on a user's network 👀

Regardless I feel like this is a followup but I'm interested in the methodology to pick a limit as I'm sure we can learn from that in other places (Nimbus, other consumers of Remote Settings, Sync?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or do we plan to have dynamic configuration of limits based on a user's network 👀

Yep, I was thinking something closer to this. I'd love to explore generalizing it to our other components, too, but definitely in the far future!

My half-baked idea is, a consumer could use platform APIs to query whether the device is on a metered connection (and/or has Data Saver or Low Data Mode on), or to check its battery level, and then pick a set of 👋🏼 arbitrary but sensible 👋🏼 limits based on that.

Taking that even further, we could make the limits adaptive: see how long it takes to download some data, and figure out what to prioritize and how much to download based on that.

I'm not sure it's super helpful for where we are right now, since Suggest is a ~7 MB, infrequently changing dataset (1-2x/month; when it does change, it's only a few chunks, not the entire dataset) that's chunked arbitrarily now.

But I could see us wanting to do this if we ever want to support novel suggestion types, like location-based suggestions, in the future—where mirroring the entire dataset might not work well.

How does that line up with what you're thinking? 😊

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't realize I didn't get back to this 😅

My half-baked idea is, a consumer could use platform APIs to query whether the device is on a metered connection (and/or has Data Saver or Low Data Mode on), or to check its battery level, and then pick a set of 👋🏼 arbitrary but sensible 👋🏼 limits based on that.

I like it! And seems like it would serve our use case here of making sure the user gets a good experience even if they are on a low data network connection - once we ship and see how well our configuration story is in practice I'd love for us to visit if there's value in the other components, but like you said, that's in the far future!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My half-baked idea is, a consumer could use platform APIs to query whether the device is on a metered connection (and/or has Data Saver or Low Data Mode on), or to check its battery level, and then pick a set of 👋🏼 arbitrary but sensible 👋🏼 limits based on that.

I guess one vague concern about this is that "arbitrary but sensible" is really a contradiction. Picking something sensible probably means the app needs to know more about the implementation than really makes sense. I'm wondering if it's better for the app to communicate what it does know for sure (ie, it's in a kind of data-saver mode) which the component uses to choose appropriate (but not arbitrary ;) limits.

components/suggest/src/store.rs Show resolved Hide resolved
Copy link
Member

@mhammond mhammond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fantastic! While some further tests would be nice, I see no reason they can't be done in a followup - indeed, there's literally nothing here which I think must be fixed before landing even if some tweaks are desired before releasing - so I think you should land this whenever you like :)

components/suggest/src/db.rs Outdated Show resolved Hide resolved
components/suggest/src/db.rs Outdated Show resolved Hide resolved
components/suggest/src/store.rs Outdated Show resolved Hide resolved
}

#[test]
fn ingest() -> anyhow::Result<()> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A followup is fine IMO, especially if we assume you'd like others on the team to help contribute to them. Your team can then make their own decision about when the coverage is "good enough"

components/suggest/src/suggest.udl Outdated Show resolved Hide resolved
components/suggest/src/db.rs Outdated Show resolved Hide resolved
components/suggest/src/db.rs Outdated Show resolved Hide resolved
components/suggest/src/store.rs Outdated Show resolved Hide resolved
components/suggest/src/db.rs Outdated Show resolved Hide resolved
components/suggest/src/db.rs Outdated Show resolved Hide resolved
components/suggest/src/db.rs Outdated Show resolved Hide resolved
Copy link
Member

@tarikeshaq tarikeshaq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @linabutler 🚀 🚀 This looks great!!

Had mostly questions as I was going through - didn't dig too much into the business logic 😛 This looks great to me and I'm super excited for another cross-platform component 🤩

components/suggest/src/lib.rs Outdated Show resolved Hide resolved
components/suggest/src/store.rs Show resolved Hide resolved
components/suggest/src/suggest.udl Outdated Show resolved Hide resolved
pub struct IngestLimits {
/// The maximum number of records to request from Remote Settings.
/// Each record has about 200 suggestions.
pub records: Option<u64>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to hiding the details of what a record is - I like the idea of constraints in that it helps for Mobile vs Desktop network connection, but I wonder how the consumer will make the decision of the constraint.. is the plan to run some smoke tests with real devices on throttled networks to see how long the network takes? Or do we plan to have dynamic configuration of limits based on a user's network 👀

Regardless I feel like this is a followup but I'm interested in the methodology to pick a limit as I'm sure we can learn from that in other places (Nimbus, other consumers of Remote Settings, Sync?)

components/suggest/src/store.rs Show resolved Hide resolved
components/suggest/src/store.rs Show resolved Hide resolved
components/suggest/src/store.rs Show resolved Hide resolved
components/suggest/src/keyword.rs Show resolved Hide resolved
components/suggest/src/keyword.rs Show resolved Hide resolved
components/suggest/src/store.rs Outdated Show resolved Hide resolved
components/suggest/src/schema.rs Show resolved Hide resolved
components/suggest/src/db.rs Show resolved Hide resolved
components/suggest/src/db.rs Outdated Show resolved Hide resolved
@codecov-commenter
Copy link

codecov-commenter commented Jul 18, 2023

Codecov Report

Patch and project coverage have no change.

Comparison is base (fad43b7) 89.86% compared to head (c43a481) 89.86%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #5723   +/-   ##
=======================================
  Coverage   89.86%   89.86%           
=======================================
  Files           1        1           
  Lines         148      148           
=======================================
  Hits          133      133           
  Misses         15       15           

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Member

@ncloudioj ncloudioj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Lina! This is cool stuff. 🚢

This commit adds a new component to fetch search suggestions from
Remote Settings and store them in an SQLite database.
Before, creating a provider would block the calling thread until the
database was set up. Now, migrations will run the first time the
database is accessed, which we already expect to happen on a background
thread.
@linabutler linabutler force-pushed the suggest-component branch 2 times, most recently from 29795bd to 43245de Compare July 20, 2023 23:06
This commit adds an empty UniFFI interface for the new Suggest
component.
In other uses, "provider" refers to the source of a suggestion, like
adM, Wikipedia, AMO, etc. Since `SuggestionProvider` aggregates
suggestions from all these sources, calling it a "provider" could be
confusing, especially if we support filtering suggestions from specific
sources in the future.
linabutler and others added 22 commits July 20, 2023 16:33
The Gecko JS UniFFI bindings don't support synchronous function calls
yet (https://bugzilla.mozilla.org/show_bug.cgi?id=1842433), so the
builder API will be a little awkward to use on Desktop.

This commit replaces the builder with a dictionary, with default
fields to make it more ergonomic.
This commit adds documentation and tests for `full_keyword`, and
reimplements it using the Rust iterator methods.
This commit exposes `SuggestDb.clear()` as `SuggestStore.clear()`, and
ensures that it removes all icons in addition to suggestions and
metadata.
This commit changes the visibility of crate-internal types and methods
from `pub` to `pub(crate)`, and clarifies the difference between the two
error types:

* `SuggestApiError` is the public error type, for all errors that should
  be handled by the application.
* `Error` is the internal error type, used in the implementation of the
  component, and is not part of the public interface.
This commit makes it possible to combine multiple database accesses
into a single transaction. Previously, each `SuggestDb` method would
lock the database connection, and optionally run its own transaction
before releasing the lock. This meant that a single logical operation,
like ingesting all suggestions from an attachment, would involve
multiple transactions.

This commit moves the database accesses into a new `SuggestDao` type,
which takes a reference to an existing locked connection. The new
`SuggestDb::{read, write}` methods handle locking and transaction
management, and take a closure that receives a `SuggestDao` for reading
and writing.

Co-authored-by: Tif Tran <ttran@mozilla.com>
`SuggestDb` already protects the connection with a Rust mutex, so
SQLite's per-connection mutex is unnecessary.
This commit also generalizes `IngestLimits.records` to
`SuggestIngestionConstraints.max_suggestions`, since the partitioning
scheme and number of suggestions per Remote Settings attachment is an
implementation detail.
"Fetching" is a bit of an overloaded term that can refer to querying
ingested suggestions, or downloading records from Remote Settings. To
reduce confusion, let's replace those uses with more precise
terminology:

* "Downloading" for when we request suggestions from Remote Settings.
* "Ingesting" for storing the downloaded suggestions in the database.
* "Querying" for when the application requests ingested suggestions.
This matches how we pass dictionaries across the FFI boundary in our
other components.
A bit of future-proofing, in case we support multiple Remote Settings
collections in the future.
This commit renames:

* `RemoteSuggestion` to `DownloadedSuggestion` and
  `SuggestDataAttachmentContents` to `DownloadedSuggestDataAttachment`,
  for consistency with the "downloading" terminology.
* `RemoteRecordId` to `SuggestRecordId`, for consistency with the other
  Remote Settings types.
@linabutler linabutler added this pull request to the merge queue Jul 20, 2023
Merged via the queue into main with commit 1519c69 Jul 21, 2023
17 checks passed
@linabutler linabutler deleted the suggest-component branch July 21, 2023 00:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants