Skip to content

Conversation

@Swatinem
Copy link
Contributor

@Swatinem Swatinem commented Dec 17, 2020

This is a new API for querying sessions and has a query interface similar to discover, and returns timeseries data.

Example URL: http://localhost:8000/api/0/organizations/sentry/sessions/?project=1&statsPeriod=1d&interval=6h&field=sum(session)&field=count_unique(user)&groupBy=release&groupBy=session.status

TODO:

  • handle errors (unknown fields, etc) correctly
  • group by project, restrict filtered projects to ones we have permission to access
  • default to all org projects if no project is specifically selected
  • get a data generator or something going… (maybe someone has one already)
  • implement filtering
  • more "virtual" fields, such as "release.major", etc.

This is a new API for querying sessions and has a query
interface similar to discover, and returns
timeseries data.
# print(result_totals, result_timeseries)

with sentry_sdk.start_span(op="sessions.endpoint", description="massage_sessions_result"):
result = massage_sessions_result(query, result_totals, result_timeseries)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use a 'serializer' to convert the internal formats to the ones the client is expecting.

@Swatinem
Copy link
Contributor Author

Swatinem commented Jan 12, 2021

Some notes to clarify some strange things we saw while testing:

Why do I get 3 intervals when using statsPeriod=2d&interval=1d:

Lets visualize this using epic box drawing:

          ┌─ 2021-01-10T11:05:01 ─ 2021-01-12T11:05:01 ─┐
├─────────────────────┼─────────────────────┼─────────────────────┤
└─2021-01-10T00:00:00─└─2021-01-11T00:00:00─└─2021-01-12T00:00:00─

The statsPeriod is not rounded, but exact, so it spans 3 rounded intervals. This is also reflected in snuba:

greaterOrEquals(started, toDateTime('2021-01-10T11:05:01', 'Universal')) AND less(started, toDateTime('2021-01-12T11:05:01', 'Universal'))

If you know how it works, it is kind of expected. However, in this example, you only get half of the data for the first period, which is weird from a user perspective. Better would be to round the statsPeriod as well to the interval.

Why are multi-day intervals so weird?

So When using statsPeriod=4w&interval=2w, we actually get:

["2020-12-10T00:00:00Z","2020-12-24T00:00:00Z","2021-01-07T00:00:00Z"]

… Which is not really what we would expect, and today (2021-01-12), we are halfway through the last interval, why?

Well, both sentry (the code that I copy/pasted from somewhere else) and snuba do very basic arithmetic rounding/truncating based on timestamps, for example:

(toDateTime(multiply(intDiv(toUInt32(started), 1209600), 1209600), 'Universal') AS _snuba_bucketed_started)

And it just so happens that this is the arithmetic rounding you get for unix timestamps. so they will be widely inconsistent across years. At least sentry and snuba agree on the rounding rules, so the intervals are matching ;-)

Conclusion:

  • Don’t use interval > 1d, just don’t.
  • I use an internal util function to do the statsPeriod parsing. Maybe it makes sense to introduce a new parameter to round to a certain interval. Though I should be very careful doing that, since I have no idea what other implications this has throughout sentry. I hope @markstory can tell me more?
  • @matejminar For comparing current period with last period, you can use statsPeriod=2w&interval=1d for the current period, and statsPeriodStart=4w&statsPeriodEnd=2w&interval=1d for the previous period.

@markstory
Copy link
Member

Though I should be very careful doing that, since I have no idea what other implications this has throughout sentry. I hope @markstory can tell me more?

An additional parameter should be relatively safe. Would you need to change the rounding of the interval, or would adjusting the start date to be aligned with a bucket boundary work? For example a start of 2021-01-12 10:03:00 with a 1d bucket size could be aligned to 2021-01-12 00:00:00 (as the day is not done) or 2021-01-13 00:00:00 (to get partial results from the most recent bin. I'm partial towards aligning on the future value.

We have some precedent for moving the start dates, as we quantize start dates to 5 min boundaries in discover to improve cache efficacy.

@Swatinem
Copy link
Contributor Author

I think this should round "upward", so for example now at 19:57 on 2021-01-12, when I request 2 days worth of 1 day intervals, it would round to 2021-01-11T00:00:00 <= X < 2021-01-13T00:00:00 (or <= 2021-01-12T23:59:59), so it has 2 days in it.

I mean either way the rounding problem also becomes a timezone problem, since we round to UTC, so maybe expectations of people in other timezones would be different in that case, hm.

Copy link
Member

@markstory markstory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to have a couple of endpoint tests that check succesful queries, queries with invalid fields, and project permission problems.

Copy link
Member

@markstory markstory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the endpoint tests. You may need to relax your assertions to get the tests to pass.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants