Fixes DISCO-2470: Implement the foundation of a Firefox Suggest component #5723

linabutler · 2023-07-12T18:54:48Z

This commit adds a new Suggest component to power the Firefox Suggest feature. It downloads a dataset of suggestions from Remote Settings, stores them in an SQLite database with a simple schema, and makes them available to queries.

We'll iterate on this over time: more comprehensive tests, additions to the interface, and schema changes will definitely be coming soon, and you can take a peek at some of that upcoming work in this Jira epic! This initial PR is meant to provide a foundation for that work, and not so much a production-quality component that we'll ship right away.

It can be reviewed commit-by-commit—that might be helpful for folks newer to Rust and Application Services, as an example of how to get a new component off the ground—or as a mega-diff. For the newer folks, some of the patterns in this new component show up in our other components, too; if you're curious, I definitely recommend checking out the .udl files for the others!

/cc @mozilla/disco-services

Pull Request checklist

Breaking changes: This PR follows our breaking change policy
- This PR follows the breaking change policy:
  - This PR has no breaking API changes, or
  - There are corresponding PRs for our consumer applications that resolve the breaking changes and have been approved
Quality: This PR builds and tests run cleanly
- Note:
  - For changes that need extra cross-platform testing, consider adding [ci full] to the PR title.
  - If this pull request includes a breaking change, consider cutting a new release after merging.
Tests: This PR includes thorough tests or an explanation of why it does not
Changelog: This PR includes a changelog entry in CHANGELOG.md or an explanation of why it does not need one
- Any breaking changes to Swift or Kotlin binding APIs are noted explicitly
Dependencies: This PR follows our dependency management guidelines
- Any new dependencies are accompanied by a summary of the due dilligence applied in selecting them.

Branch builds: add [firefox-android: branch-name] to the PR title.

linabutler · 2023-07-14T23:39:37Z

components/suggest/README.md

+
+This component currently supports the basic Suggest experience only. The basic experience shows suggestions for sponsored and web content from a canned dataset. The component downloads the dataset from [Remote Settings](https://remote-settings.readthedocs.io/en/latest/), stores the suggestions in a local database, and makes them available to the Firefox address bar. Because matching is done locally, Mozilla never sees the user's query.
+
+The opt-in "Improved Firefox Suggest Experience", which sends user queries to a [Mozilla-owned proxy server](https://mozilla-services.github.io/merino/intro.html) for server-side matching, is not currently supported.


🗒️ I'm hedging a little bit with the "not currently" here 😅 Eventually, I'd love for us to add a Merino client to this component, and have it return suggestions in the same format. We could use that to power AccuWeather suggestions, and bring the whole opt-in Merino experience to mobile. For now, though, I wanted to make it clear that we only support the "offline" experience.

I also used "basic Suggest experience" and "improved Suggest experience" here, instead of "offline" and "online" or "Suggest" and "Suggest Plus", because we haven't used that terminology outside of our team yet, and "improved" is how the SUMO FAQ page describes it.

linabutler · 2023-07-14T23:43:08Z

components/suggest/src/store.rs

+    }
+
+    #[test]
+    fn ingest() -> anyhow::Result<()> {


🗒️ This is an end-to-end test that exercises the happy path. We'll definitely want to add more coverage, but I'm curious about folks' opinions on the right balance of tests to add in this PR, vs. a follow-up?

A followup is fine IMO, especially if we assume you'd like others on the team to help contribute to them. Your team can then make their own decision about when the coverage is "good enough"

components/suggest/src/suggest.udl

components/suggest/src/store.rs

linabutler · 2023-07-15T00:05:03Z

components/suggest/src/store.rs

+///    the application. A mobile device on a metered connection might want to
+///    request a small subset of the Suggest data and fetch the rest later,
+///    while a desktop on a fast link might request the entire dataset on first
+///    launch.


💭 Some other ideas for using constraints:

Imagine a future where we have "offline" location-based suggestions. We could have a scheme that chunks these suggestion by region in Remote Settings, and constrain the ingestion to just the chunk based on the user's current (coarse) location—so that we'll only download suggestions near where the user is, instead of downloading the entire dataset and using up space to store suggestions for places they might never visit.

If the user just installed Firefox, we might want to ingest an initial set of "higher-value" suggestions that we can show right away, and ingest the rest later. This would also require changing our Remote Settings schema, as I don't think records currently know what suggestions are in their attachments—but it's a thought for making Suggest more useful to first-time users.

If the user is on cellular, we could similarly limit the amount of work we do. Android's WorkManager lets you schedule a task based on the network connection type, but we might still want to fetch a smaller set of suggestions even on cellular—just not a full fetch.

linabutler · 2023-07-15T00:08:57Z

components/suggest/src/store.rs

+pub struct IngestLimits {
+    /// The maximum number of records to request from Remote Settings.
+    /// Each record has about 200 suggestions.
+    pub records: Option<u64>,


💭 I added this constraint mostly for smoke testing against a real Remote Settings, so that I could kick the tires on the schema without ingesting the entire 7 MB dataset every time...

But, I think "approximate number of suggestions" might be a more useful constraint than number of records, if it's valuable at all. "Number of records" (and chunks) is an implementation detail; if the consumer requests about 500 suggestions, we can do the calculation in the component that it's 3 records, without leaking that detail to consumers.

+1 to hiding the details of what a record is - I like the idea of constraints in that it helps for Mobile vs Desktop network connection, but I wonder how the consumer will make the decision of the constraint.. is the plan to run some smoke tests with real devices on throttled networks to see how long the network takes? Or do we plan to have dynamic configuration of limits based on a user's network 👀

Regardless I feel like this is a followup but I'm interested in the methodology to pick a limit as I'm sure we can learn from that in other places (Nimbus, other consumers of Remote Settings, Sync?)

Or do we plan to have dynamic configuration of limits based on a user's network 👀

Yep, I was thinking something closer to this. I'd love to explore generalizing it to our other components, too, but definitely in the far future!

My half-baked idea is, a consumer could use platform APIs to query whether the device is on a metered connection (and/or has Data Saver or Low Data Mode on), or to check its battery level, and then pick a set of 👋🏼 arbitrary but sensible 👋🏼 limits based on that.

Taking that even further, we could make the limits adaptive: see how long it takes to download some data, and figure out what to prioritize and how much to download based on that.

I'm not sure it's super helpful for where we are right now, since Suggest is a ~7 MB, infrequently changing dataset (1-2x/month; when it does change, it's only a few chunks, not the entire dataset) that's chunked arbitrarily now.

But I could see us wanting to do this if we ever want to support novel suggestion types, like location-based suggestions, in the future—where mirroring the entire dataset might not work well.

How does that line up with what you're thinking? 😊

Didn't realize I didn't get back to this 😅

My half-baked idea is, a consumer could use platform APIs to query whether the device is on a metered connection (and/or has Data Saver or Low Data Mode on), or to check its battery level, and then pick a set of 👋🏼 arbitrary but sensible 👋🏼 limits based on that.

I like it! And seems like it would serve our use case here of making sure the user gets a good experience even if they are on a low data network connection - once we ship and see how well our configuration story is in practice I'd love for us to visit if there's value in the other components, but like you said, that's in the far future!

My half-baked idea is, a consumer could use platform APIs to query whether the device is on a metered connection (and/or has Data Saver or Low Data Mode on), or to check its battery level, and then pick a set of 👋🏼 arbitrary but sensible 👋🏼 limits based on that.

I guess one vague concern about this is that "arbitrary but sensible" is really a contradiction. Picking something sensible probably means the app needs to know more about the implementation than really makes sense. I'm wondering if it's better for the app to communicate what it does know for sure (ie, it's in a kind of data-saver mode) which the component uses to choose appropriate (but not arbitrary ;) limits.

components/suggest/src/store.rs

mhammond

This is fantastic! While some further tests would be nice, I see no reason they can't be done in a followup - indeed, there's literally nothing here which I think must be fixed before landing even if some tweaks are desired before releasing - so I think you should land this whenever you like :)

components/suggest/src/db.rs

components/suggest/src/store.rs

mhammond · 2023-07-15T15:32:20Z

components/suggest/src/store.rs

+    }
+
+    #[test]
+    fn ingest() -> anyhow::Result<()> {


A followup is fine IMO, especially if we assume you'd like others on the team to help contribute to them. Your team can then make their own decision about when the coverage is "good enough"

components/suggest/src/suggest.udl

components/suggest/src/db.rs

components/suggest/src/store.rs

components/suggest/src/db.rs

tarikeshaq

Thanks @linabutler 🚀 🚀 This looks great!!

Had mostly questions as I was going through - didn't dig too much into the business logic 😛 This looks great to me and I'm super excited for another cross-platform component 🤩

components/suggest/src/lib.rs

components/suggest/src/store.rs

components/suggest/src/suggest.udl

tarikeshaq · 2023-07-17T15:01:18Z

components/suggest/src/store.rs

+pub struct IngestLimits {
+    /// The maximum number of records to request from Remote Settings.
+    /// Each record has about 200 suggestions.
+    pub records: Option<u64>,


+1 to hiding the details of what a record is - I like the idea of constraints in that it helps for Mobile vs Desktop network connection, but I wonder how the consumer will make the decision of the constraint.. is the plan to run some smoke tests with real devices on throttled networks to see how long the network takes? Or do we plan to have dynamic configuration of limits based on a user's network 👀

Regardless I feel like this is a followup but I'm interested in the methodology to pick a limit as I'm sure we can learn from that in other places (Nimbus, other consumers of Remote Settings, Sync?)

components/suggest/src/store.rs

components/suggest/src/keyword.rs

components/suggest/src/store.rs

components/suggest/src/schema.rs

components/suggest/src/db.rs

codecov-commenter · 2023-07-18T22:47:52Z

Codecov Report

Patch and project coverage have no change.

Comparison is base (fad43b7) 89.86% compared to head (c43a481) 89.86%.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #5723   +/-   ##
=======================================
  Coverage   89.86%   89.86%           
=======================================
  Files           1        1           
  Lines         148      148           
=======================================
  Hits          133      133           
  Misses         15       15

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

ncloudioj

Thanks, Lina! This is cool stuff. 🚢

This commit adds a new component to fetch search suggestions from Remote Settings and store them in an SQLite database.

Before, creating a provider would block the calling thread until the database was set up. Now, migrations will run the first time the database is accessed, which we already expect to happen on a background thread.

This commit adds an empty UniFFI interface for the new Suggest component.

In other uses, "provider" refers to the source of a suggestion, like adM, Wikipedia, AMO, etc. Since `SuggestionProvider` aggregates suggestions from all these sources, calling it a "provider" could be confusing, especially if we support filtering suggestions from specific sources in the future.

The Gecko JS UniFFI bindings don't support synchronous function calls yet (https://bugzilla.mozilla.org/show_bug.cgi?id=1842433), so the builder API will be a little awkward to use on Desktop. This commit replaces the builder with a dictionary, with default fields to make it more ergonomic.

This commit adds documentation and tests for `full_keyword`, and reimplements it using the Rust iterator methods.

This commit exposes `SuggestDb.clear()` as `SuggestStore.clear()`, and ensures that it removes all icons in addition to suggestions and metadata.

This commit changes the visibility of crate-internal types and methods from `pub` to `pub(crate)`, and clarifies the difference between the two error types: * `SuggestApiError` is the public error type, for all errors that should be handled by the application. * `Error` is the internal error type, used in the implementation of the component, and is not part of the public interface.

This commit makes it possible to combine multiple database accesses into a single transaction. Previously, each `SuggestDb` method would lock the database connection, and optionally run its own transaction before releasing the lock. This meant that a single logical operation, like ingesting all suggestions from an attachment, would involve multiple transactions. This commit moves the database accesses into a new `SuggestDao` type, which takes a reference to an existing locked connection. The new `SuggestDb::{read, write}` methods handle locking and transaction management, and take a closure that receives a `SuggestDao` for reading and writing. Co-authored-by: Tif Tran <ttran@mozilla.com>

`SuggestDb` already protects the connection with a Rust mutex, so SQLite's per-connection mutex is unnecessary.

This commit also generalizes `IngestLimits.records` to `SuggestIngestionConstraints.max_suggestions`, since the partitioning scheme and number of suggestions per Remote Settings attachment is an implementation detail.

"Fetching" is a bit of an overloaded term that can refer to querying ingested suggestions, or downloading records from Remote Settings. To reduce confusion, let's replace those uses with more precise terminology: * "Downloading" for when we request suggestions from Remote Settings. * "Ingesting" for storing the downloaded suggestions in the database. * "Querying" for when the application requests ingested suggestions.

This matches how we pass dictionaries across the FFI boundary in our other components.

A bit of future-proofing, in case we support multiple Remote Settings collections in the future.

This commit renames: * `RemoteSuggestion` to `DownloadedSuggestion` and `SuggestDataAttachmentContents` to `DownloadedSuggestDataAttachment`, for consistency with the "downloading" terminology. * `RemoteRecordId` to `SuggestRecordId`, for consistency with the other Remote Settings types.

linabutler requested a review from tiftran July 12, 2023 18:54

linabutler force-pushed the suggest-component branch 2 times, most recently from 660cfe3 to ad34efd Compare July 14, 2023 16:19

linabutler requested a review from ncloudioj July 14, 2023 18:22

linabutler force-pushed the suggest-component branch 2 times, most recently from b64c945 to bef40d8 Compare July 14, 2023 22:47

linabutler requested review from mhammond and tarikeshaq July 14, 2023 23:03

linabutler marked this pull request as ready for review July 14, 2023 23:08

linabutler changed the title ~~Implement the foundation of a Firefox Suggest component~~ Fixes DISCO-2470: Implement the foundation of a Firefox Suggest component Jul 14, 2023

linabutler commented Jul 15, 2023

View reviewed changes

mhammond approved these changes Jul 15, 2023

View reviewed changes

linabutler commented Jul 16, 2023

View reviewed changes

tarikeshaq reviewed Jul 17, 2023

View reviewed changes

ncloudioj reviewed Jul 18, 2023

View reviewed changes

components/suggest/src/schema.rs Show resolved Hide resolved

components/suggest/src/db.rs Show resolved Hide resolved

components/suggest/src/db.rs Outdated Show resolved Hide resolved

ncloudioj approved these changes Jul 20, 2023

View reviewed changes

linabutler added 2 commits July 20, 2023 13:34

Initial scaffolding for a Suggest component.

19a06ea

This commit adds a new component to fetch search suggestions from Remote Settings and store them in an SQLite database.

suggest: Open database connections lazily.

643cae3

Before, creating a provider would block the calling thread until the database was set up. Now, migrations will run the first time the database is accessed, which we already expect to happen on a background thread.

linabutler force-pushed the suggest-component branch 2 times, most recently from 29795bd to 43245de Compare July 20, 2023 23:06

linabutler added 9 commits July 20, 2023 16:33

suggest: Build config changes for Kotlin UniFFI bindings.

a564ef3

This commit adds an empty UniFFI interface for the new Suggest component.

suggest: Add UniFFI bindings for SuggestionProvider.

5b65098

suggest: Use the same full keyword heuristics as Desktop.

e9dfbe4

suggest: Store and return icons.

a57579a

suggest: Derive built-in traits for Suggestion.

591b789

suggest: Fix Clippy lints.

32fe892

suggest: Flag sponsored suggestions.

7faef03

suggest: Rename SuggestError to SuggestApiError.

5506349

linabutler and others added 22 commits July 20, 2023 16:33

suggest: Experiment with a SuggestionQuery builder API.

5badadc

suggest: Make block_id an integer.

11352c1

suggest: Add ingestion test with mock Remote Settings data.

d816056

suggest: Refactor full_keyword.

d964697

This commit adds documentation and tests for `full_keyword`, and reimplements it using the Rust iterator methods.

suggest: Support clearing the suggestions store.

43248d6

This commit exposes `SuggestDb.clear()` as `SuggestStore.clear()`, and ensures that it removes all icons in addition to suggestions and metadata.

suggest: Add README and docs for public interfaces.

cdc3971

suggest: Add MPL license headers.

c4a4a96

suggest: Switch to Parking Lot.

54d8a08

suggest: Use inline table syntax for dependencies in Cargo.toml.

b372e9f

suggest: Drop serde_derive.

959ac2c

suggest: Open SQLite connections with SQLITE_OPEN_NO_MUTEX.

548e941

`SuggestDb` already protects the connection with a Rust mutex, so SQLite's per-connection mutex is unnecessary.

suggest: Rename IngestLimits to SuggestIngestionConstraints.

46bdeba

This commit also generalizes `IngestLimits.records` to `SuggestIngestionConstraints.max_suggestions`, since the partitioning scheme and number of suggestions per Remote Settings attachment is an implementation detail.

suggest: Don't pass SuggestionQuery [ByRef].

1e7bd53

This matches how we pass dictionaries across the FFI boundary in our other components.

suggest: Include the collection name in the "last ingest" metadata key.

a427eb2

A bit of future-proofing, in case we support multiple Remote Settings collections in the future.

suggest: Improve rustdoc comments.

3c52626

suggest: Make all SuggestionQuery fields required.

44e61d3

suggest: Add CHANGELOG entry.

c43a481

linabutler force-pushed the suggest-component branch from 43245de to c43a481 Compare July 20, 2023 23:34

linabutler added this pull request to the merge queue Jul 20, 2023

Merged via the queue into main with commit 1519c69 Jul 21, 2023
17 checks passed

linabutler deleted the suggest-component branch July 21, 2023 00:14

linabutler mentioned this pull request Jul 21, 2023

Create a Rust component for a Merino client (#5345) #5346

Closed

4 tasks

0c0w3 mentioned this pull request Mar 1, 2024

Bug 1882954 - Expose icon mimetype in suggestions #6151

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes DISCO-2470: Implement the foundation of a Firefox Suggest component #5723

Fixes DISCO-2470: Implement the foundation of a Firefox Suggest component #5723

linabutler commented Jul 12, 2023 •

edited

linabutler Jul 14, 2023 •

edited

linabutler Jul 14, 2023

mhammond Jul 15, 2023

linabutler Jul 15, 2023

linabutler Jul 15, 2023

tarikeshaq Jul 17, 2023

linabutler Jul 17, 2023

tarikeshaq Jul 20, 2023

mhammond Jul 21, 2023

mhammond left a comment

mhammond Jul 15, 2023

tarikeshaq left a comment

tarikeshaq Jul 17, 2023

codecov-commenter commented Jul 18, 2023 •

edited

ncloudioj left a comment


		This component currently supports the basic Suggest experience only. The basic experience shows suggestions for sponsored and web content from a canned dataset. The component downloads the dataset from [Remote Settings](https://remote-settings.readthedocs.io/en/latest/), stores the suggestions in a local database, and makes them available to the Firefox address bar. Because matching is done locally, Mozilla never sees the user's query.

		The opt-in "Improved Firefox Suggest Experience", which sends user queries to a [Mozilla-owned proxy server](https://mozilla-services.github.io/merino/intro.html) for server-side matching, is not currently supported.

Fixes DISCO-2470: Implement the foundation of a Firefox Suggest component #5723

Fixes DISCO-2470: Implement the foundation of a Firefox Suggest component #5723

Conversation

linabutler commented Jul 12, 2023 • edited

Pull Request checklist

linabutler Jul 14, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mhammond left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tarikeshaq left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-commenter commented Jul 18, 2023 • edited

Codecov Report

ncloudioj left a comment

Choose a reason for hiding this comment

linabutler commented Jul 12, 2023 •

edited

linabutler Jul 14, 2023 •

edited

codecov-commenter commented Jul 18, 2023 •

edited