Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1298918 - Show Highlights in Activity Stream Panel #2084

Merged
merged 14 commits into from Sep 23, 2016

Conversation

sleroux
Copy link

@sleroux sleroux commented Sep 7, 2016

First version of implementing a similiar algorithm for determining 'highlights' from a combination of a user's history and bookmarks. The query mostly implements the pseudo query Sebastian wrote here [1] with the except of a host blacklist. Unfortunately we don't store when a bookmark was created on iOS and is not something that comes down from sync bookmarks [2]. The best we can do is use the last_modified time of the bookmark. Probably not the worst thing as in most cases would reflect creation time otherwise it would indicate some source of activity on the bookmark which is significant in itself. One question I have with this approach is that the query assumes local_modified and server_modified come from the same clock which isn't true. Are these comparable in any way?

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1293710#c5
[2] https://bugzilla.mozilla.org/show_bug.cgi?id=676563

@rnewman
Copy link
Contributor

rnewman commented Sep 8, 2016

One question I have with this approach is that the query assumes local_modified and server_modified come from the same clock which isn't true. Are these comparable in any way?

They're not comparable, but:

import XCGLogger
import Deferred

/// Small enum to help strength parameter requirements for potential bookmark highlights query below.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strengthen

@rnewman
Copy link
Contributor

rnewman commented Sep 8, 2016

First-pass feedback done, @sleroux. Lemme know when you want me to look again!

@sleroux
Copy link
Author

sleroux commented Sep 13, 2016

@rnewman Thanks for the suggestion to the view on local bookmarks. With the changes to the query and the adjusted view, this looks a lot simpler and seems to be a much faster query. I'll follow up with some measureBlock tests on the query. When you get a chance, could you give this another pass?

@sleroux sleroux force-pushed the sleroux/Bug1298918-ActivityStreamHighlights branch from 9eb26cf to ce12b46 Compare September 13, 2016 14:47
@sleroux
Copy link
Author

sleroux commented Sep 13, 2016

Updated query for easier reading: https://gist.github.com/sleroux/07f060b594903eb23bdfc3df642493a9

Check whether you need this wrapper select.

I guess SQLite doesn't like handling LIMIT on queries that are about to be UNION'ed so the suggestion was to wrap those queries in a SELECT * query to work around that.

@farhanpatel
Copy link
Contributor

Are you planning to add support for Hiding items from Highlights? Similar to how TopSites work.

@sleroux
Copy link
Author

sleroux commented Sep 16, 2016

Are you planning to add support for Hiding items from Highlights? Similar to how TopSites work.

Good question. I'm not actually sure. Also makes me think - does highlights also include what we want to show for Top Sites or are we keeping that algorithm the same?

cc @bryanbell

@bkmunar
Copy link
Contributor

bkmunar commented Sep 16, 2016

@sleroux We do want to support hiding items for highlights per Bryan Bell's request to add a "dismiss" option for the context menu. Also, idk to me I feel like Top Sites algo is the same b/c my desktop top sites seem to be my phone top sites

Copy link
Contributor

@rnewman rnewman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That UNION needs a look, @sleroux. Otherwise good.

@@ -100,7 +100,7 @@ private let log = Logger.syncLogger
* We rely on SQLiteHistory having initialized the favicon table first.
*/
public class BrowserTable: Table {
static let DefaultVersion = 16 // Bug 1185038.
static let DefaultVersion = 17 // Bug 1185038.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change the bug number.

extension SQLiteHistory: HistoryRecommendations {
public func getHighlights() -> Deferred<Maybe<Cursor<Site>>> {
let limit = 20
let microsecondsPerMinute: Timestamp = 60_000_000 // 1000 * 1000 * 60
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UInt64

let thirtyMinutesAgo = now - 30 * microsecondsPerMinute
let threeDaysAgo = now - (60 * microsecondsPerMinute) * 24 * 3
let bookmarkLimit = 1
let historyLimit = limit - bookmarkLimit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move all the limit stuff together.

" FROM (",
" SELECT siteID AS s, max(date) AS visitDate",
" FROM \(TableVisits)",
" WHERE date < \(thirtyMinutesAgo)",
Copy link
Contributor

@rnewman rnewman Sep 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thirtyMinutesAgo is the only dynamic part of this query. If you extract that as a value in args, the compiler should be able to avoid dynamically concatenating this string. (Same below for threeDaysAgo.)

In the future, if we need to squeeze more perf out of the system, that will also allow us to pre-parse statements rather than parsing SQL each time we fetch.

" WHERE visitCount <= 3 AND title NOT NULL AND title != ''",
" LIMIT \(historyLimit)",
")"
].joinWithSeparator(" ")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implied: do this with string concatenation instead of dynamically using joinWithSeparator. I think the Swift compiler is much better at type analysis now, so this shouldn't blow up compile time, but probably not smart enough to recognize joinWithSeparator on constants.

" FROM (",
" SELECT bmkUri",
" FROM \(ViewBookmarksLocalOnMirror)",
" WHERE \(ViewBookmarksLocalOnMirror).server_modified > \(threeDaysAgo) OR \(ViewBookmarksLocalOnMirror).local_modified > \(threeDaysAgo)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bear in mind that you'll need to enable bidirectional syncing in order to test this feature.

"SELECT \(highlightProjection)",
"FROM (",
bookmarkHighlights,
"UNION",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This UNION might not do what you expect.

Your bookmark highlights always return NULL AS visitDate. Your non-recent history query always returns a max visit date. visitDate is in highlightProjection, so this UNION is always equivalent to UNION ALL, and you're relying on GROUP BY url below to randomly select one of the two rows in the event that you have a bookmark that's been visited more than three times more than 30 minutes ago.

The observed result is probably that sometimes you'll have a date in a row, and sometimes you won't.

@sleroux sleroux force-pushed the sleroux/Bug1298918-ActivityStreamHighlights branch from ce12b46 to 3b10982 Compare September 19, 2016 18:59
@sleroux
Copy link
Author

sleroux commented Sep 19, 2016

@rnewman I've rebased from master to support Xcode 8/Swift 2.3 and address the UNION issue. I think by using max(visitDate) in with the GROUP BY statement we'll always select the one with the date if there is a bookmark as well. I've also added a isBookmarked flag to more explicitly indicate that the visit is indeed a bookmark instead of relying on a NULL visitDate for that info.

I haven't tested this yet with remote bookmarks since I'm seeing an issue on iOS 10 with sync but I'll look into that this afternoon.

@sleroux sleroux force-pushed the sleroux/Bug1298918-ActivityStreamHighlights branch from 3b10982 to 61b5fea Compare September 20, 2016 15:56
@rnewman
Copy link
Contributor

rnewman commented Sep 20, 2016

Figured I'd flesh out my review comment for posterity.

In order to use UNION (or even DISTINCT!) to get unique results, you need to either narrow the set of columns so that UNION removes the duplicates (which means discarding all of the stuff that makes bookmarks and history differ!), use an enclosing query to transform the result set, or…

Conceptually you're trying to do a FULL OUTER JOIN: that is, give me things from this set, and things from this set, and if they overlap on a given criterion, merge them together.

SQLite doesn't support FOJ, and emulating it isn't cheap, but you could do it.

The approach you've taken here is an enclosing query, using GROUP BY to collapse rows together when they refer to the same URL. That needs a teensy bit more work, because the behavior when you return a non-aggregate value from a grouped result set is undefined. Comment inline…

")"

let highlightsQuery =
"SELECT historyID, url, title, guid, visitCount, max(visitDate) AS visitDate, isBookmarked, iconID, iconURL, iconType, iconDate, iconWidth " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these values come from individual rows, and differ on the row. url is fixed by virtue of being the grouping characteristic, but visitCount might come from the bookmark row, or isBookmarked might come from the history row!

Here you see why this is really a FULL OUTER JOIN: we want to merge the two together.

Now, we know there's at most one history row, and at most one bookmark URL, for each URL. If a bookmark row is returned, it'll already be joined against history, so the title etc. will match. The remaining column is isBookmarked, so we can try to phrase this projection in terms of aggregates:

SELECT historyID, url, title, guid, visitCount, max(visitDate) AS visitDate, max(isBookmarked) AS isBookmarked, iconID, iconURL, iconType, iconDate, iconWidth

However, we're half-way to just doing a FOJ — you're selecting bookmarks and LEFT JOINing them against history. Another option here, then, is for you to alter the history side of the UNION to be history items that match but are not bookmarked. You can then safely avoid grouping by anything, and just use a UNION ALL.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option here, then, is for you to alter the history side of the UNION to be history items that match but are not bookmarked. You can then safely avoid grouping by anything, and just use a UNION ALL.

That's smart. I feel like I can avoid a lot of hassle doing that instead since I know that the bookmark/recent history queries will be mutually exclusive.

I think I can modify the query to include a bookmark check in the project like so (excludes the date checks):

    SELECT historyID, url, title, latest, visitCount FROM ( 
        SELECT history.id as historyID, url, title, latest, 
            (SELECT COUNT(1) FROM visits WHERE s = visits.siteID) AS visitCount,
            (SELECT COUNT(1) FROM view_bookmarksLocal_on_mirror WHERE view_bookmarksLocal_on_mirror.bmkUri == url) AS bookmarked
        FROM (
            SELECT siteID AS s, max(date) AS latest
            FROM visits
            GROUP BY siteID
        )
        LEFT JOIN history ON history.id = s
        WHERE visitCount <= 3 AND title NOT NULL AND title != '' AND bookmarked == 0
        LIMIT 19
    )

Copy link
Contributor

@rnewman rnewman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, but remember you can now use UNION ALL for a little speed boost.

You might also consider restoring isBookmarked, with hardcoded 0/1 values, so that highlightsQuery includes that for each row — I imagine the UI wants to throw a little star in there, or know whether to offer to bookmark the highlight or not…

@sleroux sleroux force-pushed the sleroux/Bug1298918-ActivityStreamHighlights branch from 4d30e71 to 213efaf Compare September 23, 2016 17:20
@sleroux sleroux force-pushed the sleroux/Bug1298918-ActivityStreamHighlights branch from e07ce72 to ac7715b Compare September 23, 2016 18:19
@sleroux sleroux merged commit 1207536 into master Sep 23, 2016
@sleroux sleroux deleted the sleroux/Bug1298918-ActivityStreamHighlights branch September 23, 2016 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants