Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UI needs help getting stats for pages #795

Closed
travjenkins opened this issue Nov 14, 2022 · 7 comments
Closed

UI needs help getting stats for pages #795

travjenkins opened this issue Nov 14, 2022 · 7 comments
Labels
data-plane enhance New feature or enhancement with UX impact

Comments

@travjenkins
Copy link
Member

TL;DR
UI needs to query an array of catalog_names and narrow down the tss so we receive a single item back per catalog_name.

Background
The initial plan for fetching stats with the UI was to have each row query to get its own stats via the catalog_name and sort by the ts and limit to a single response. However, this would mean the UI is making a lot of calls for stats.

Instead of this - the UI is wanting to query once to fetch all the stats needed for the catalogs in a single view. This means we are using the in operator. When the UI uses this approach we run into issues with limiting this to only fetch the tss we want. For instance, we cannot query "All catalogs latest daily stat".

Queries
To give an example of what kind of queries we want to run. We would like to lookup up via a single catalog_name or via an array of catalog_names and fetch one of these time frames:
Last 6 Months
Last Month
This Month
Last Week
This Week
Yesterday
Today

Links
I initially thought that I would want to query with group/distinct to get what we need... but it looks like Postgrest does not support this:
PostgREST/postgrest#915

@travjenkins travjenkins added enhance New feature or enhancement with UX impact data-plane labels Nov 14, 2022
@travjenkins
Copy link
Member Author

I am thinking this would get solved with a view or something... but this is not my specialty. If there is a way to get this with postgrest that would be great. Let me know.

@travjenkins travjenkins changed the title UI needs help getting stats for tables UI needs help getting stats for pages Nov 14, 2022
@jgraettinger
Copy link
Member

How about using a ts.eq, ts.lte, ts.gte etc predicates? Timestamps are always truncated to the beginning of the time grain in question, making them very predictable.

Examples:

  • For grain 'hour' it's 2022-11-14T18:00:00Z.
  • For grain 'day' it's 2022-11-14T00:00:00Z.
  • For grain 'month' it's 2022-11-01T00:00:00Z.

Then, fetching the last 7 days would mean taking the current timestamp and truncating it to a 'day' grain, and then subtracting 7 days to identify the beginning of the range.

@williamhbaker
Copy link
Member

How about using a ts.eq, ts.lte, ts.gte etc predicates? Timestamps are always truncated to the beginning of the time grain in question, making them very predictable.

Probably stating the obvious, but there is an ambiguity in timestamps between 'day' and 'month' grains, where the first day of the month is the same timestamp as the month grain itself. But there is also a specific column calling out the grain to delineate these cases. The grain timestamps are constructed here by the typescript derivation:

const grainsFromTS = (ts: Date): TimeGrain[] => {
ts.setUTCMilliseconds(0);
ts.setUTCSeconds(0);
ts.setUTCMinutes(0);
const hourlyTS = ts.toISOString();
ts.setUTCHours(0);
const dailyTS = ts.toISOString();
ts.setUTCDate(1);
const monthlyTS = ts.toISOString();
return [
{
grain: 'hourly' as Document['grain'],
ts: hourlyTS,
},
{
grain: 'daily' as Document['grain'],
ts: dailyTS,
},
{
grain: 'monthly' as Document['grain'],
ts: monthlyTS,
},
];
};

So perhaps a chained a set of filters with one being for the list of catalog names, another for the computed timestamp predicate, and an eq for the specific grain in question would work.

@travjenkins
Copy link
Member Author

travjenkins commented Dec 15, 2022

UI PR that will add stats: estuary/ui#422

Below are example queries that the UI will be using for stats. One thing that I didn't think about until now is that the app uses Date Fns and it extends the build in Date object in the browser. This means all the queries would be based on the user's timezone. I think this is okay/right... but might be something we want to change. Because I do not think we have discussed too much how we want to handle time zones.

I also am not sure what we feel is best in terms of using a gt/lt together to query for a range of times or just use an eq and match the beginning of the time grain. I know we use the beginning of a grain today but wasn't sure if we think that could change in the cuture.

Examples:

Testing at December 14, 2022 10:16pm

today
ts: gte.2022-12-13T23:59:59-05:00
grain: eq.hourly

yesterday
ts: gte.2022-12-13T00:00:00-05:00
ts: lte.2022-12-13T23:59:59-05:00
grain: eq.daily

this week
ts: gte.2022-12-11T00:00:00-05:00
ts: lte.2022-12-17T23:59:59-05:00
grain: eq.daily

last week
ts: gte.2022-12-04T00:00:00-05:00
ts: lte.2022-12-10T23:59:59-05:00
grain: eq.daily

this month
ts: gte.2022-12-01T00:00:00-05:00
ts: lte.2022-12-31T23:59:59-05:00
grain: eq.monthly

last month
ts: gte.2022-11-01T00:00:00-04:00
ts: lte.2022-11-30T23:59:59-05:00
grain: eq.monthly

@travjenkins
Copy link
Member Author

After a talk with Will I am switching this query to use equals where possible and follow the pattern where the stats are stored at the start of a grain.

Examples:
Testing at December 20, 2022 11:55am

today
ts: eq.2022-12-20 00:00:00+00
grain: eq.daily

ts: eq.2022-12-19 00:00:00+00
grain: eq.daily

this week
ts: gte.2022-12-18 00:00:00+00
ts: lte.2022-12-24 23:00:00+00
grain: eq.daily

last week
ts: gte.2022-12-11 00:00:00+00
ts: lte.2022-12-17 23:00:00+00
grain: eq.daily

this month
ts: eq.2022-12-01 00:00:00+00
grain: eq.monthly

last month
ts: eq.2022-11-01 00:00:00+00
grain: eq.monthly

@travjenkins
Copy link
Member Author

I think we're good with the UI using ts to fetch what we need and gonna close this.

@jgraettinger
Copy link
Member

jgraettinger commented Dec 20, 2022

I'm not sure where the timezone discussion netted out, but one note about local vs UTC:
If we don't use UTC, then summing up a 24 hour period in the user's timezone can give a different answer than the daily grain, which may be confusing.

Edit to add: this probably is irrelevant for current-day data, as "past 24 hours" and "current UTC day" are different things, but could be confusing when looking at historical data?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data-plane enhance New feature or enhancement with UX impact
Projects
None yet
Development

No branches or pull requests

3 participants