New Sentry Architecture #2

mitsuhiko · 2022-07-21T10:21:33Z

This is a living document with some of my collected thoughts about changes to Sentry's core architecture on a longer time horizon.

Rendered RFC

fpacifici

more comments to come ...

fpacifici · 2022-08-03T00:51:40Z

text/0002-new-architecture.md

+This will cause issues in the testsuite as the sentry test suite currently tries to create events
+regularly and pragmatic solutions for this need to be found.
+
+### Remove Pickle in Rabbit


yes please

Are we also considering serialization formats other than JSON. I think we never tried anything seriously but it may be interesting to explore alternatives to improve parsing/serialization (but really parsing)

fpacifici · 2022-08-03T00:59:05Z

text/0002-new-architecture.md

+
+### Remove Celery and RabbitMQ for Kafka
+
+Our use of RabbitMQ is reaching the limits of what can be done with this system safely.  If we hit


Do you have some references about this ? When you say "hit the disk" are you talking about Rabbit ? Or the tasks themselves ?
Rabbit has some advantages specifically in terms of auto scaling. In Kafka you are bound to the number of partitions of your topic. The traffic is not redistributed across partitions when you add consumers to a consumer group. Only partitions are reshuffled. So in order to autoscale and preserve a balanced load you need a number of partitions that is much larger than the number of consumer, which presents problems in its own as it is very heavy on the broker and causes massive rebalancing.

Today if we ever hit disk on rabbit the thing crawls to a ground. I am unfortunately not able to find the last incident involving this as it predates the current incident process, but it's something we are generally aware of. We could probably operate rabbit so that going to disk is an option, but today that does not appear to be something we can do.

I wish we fully understood what is going on with RabbitMQ in that scenario and did tests that re-validate that understanding.

fpacifici · 2022-08-03T01:00:21Z

text/0002-new-architecture.md

+It should however still be possible to move this to a Kafka model where batches of tasks are redistributed
+to slower and faster topics.  For instance the processing of the task could be started and if it's not
+finishing within a deadline (or the code already knows that this execution can't happen) it would be
+dispatched to increasing slower topics.  With sufficient concurrency on the consumers this is probably


This is how we would do parallelism right in a scenario where we have different classes of traffic that do not have to be processed in strict order
https://github.com/confluentinc/parallel-consumer

fpacifici · 2022-08-03T01:01:38Z

text/0002-new-architecture.md

+the entire pipeline whereas a native event can spend up to 30 minutes or more in the pipeline in very
+extreme cases (completely cold caches against a slow symbol server).
+
+It should however still be possible to move this to a Kafka model where batches of tasks are redistributed


I feel the split between classes of traffic and the need to move away from Rabbit are orthogonal. Or am I missing something?
Multiple queues can be implemented in either system.

Generally speaking I think we just need to make a decision how we feel about rabbit vs more kafka. Everything else there is downstream from it. If we no longer want rabbit we need to build some stuff to replicate the benefits we get from it with Kafka.

Do we need though ?
I feel that Rabbit and Kafka solve two related problems but provide different functionalities. Some problems can be solved by both but not all of them.
Rabbit:

Auto scales well.

Cheaper ?

Kafka:

Can provide in order delivery guarantees (that Rabbit cannot provide)

Has better delivery guarantees (we can easily do both at least once and at most once)

It seems to scale massively but partitioning makes autoscaling less effective.

So if we want to pick one over the other. It has to be Kafka because of the functional requirements around delivery guarantees that Rabbit does not provide. That means we cannot do without Kafka.

I guess the question is whether we have real use cases where we use rabbitMQ and that would be hard to replace with Kafka.

From where I'm standing we have RabbitMQ for two reasons at the moment: one is that it does not have head of line blocking which makes it a pretty trivial choice for parts of the processing library where we are not quite sure how long something takes. The second reasons is that we build a lot of stuff on top of celery, particularly the cron based stuff.

The first reason is in fact a benefit but it can also be done with Kafka. The second reason is just a historic thing I believe. I'm not sure if we have actual cost or scaling benefits with RabbitMQ still but that's something I would like to explore.

Particularly on the RabbitMQ side we currently operate with non durable, federated queues.

One concern I have is that celery is a great abstraction for quick prototyping while Kafka is everything but. Moving every trivially low-scale task that doesn't have consistency requirements to Kafka today is much harder than creating a task that runs on worker-glob.

one thing we could do is to implement a celery broker on top of kafka. this would make it easier to gather operational experience with running certain product features on kafka vs rabbitmq, and could significantly sweeten the deal if we decided we're gonna ditch rabbitmq entirely for whatever reason.

Have we considered a managed cloud service for tasks queues instead of RabbitMQ? GCP Tasks or Pub/Sub? Here's a comparison.

I can confirm as well that scaling of workers on RabbitMQ is simpler than on Kafka due to the 1 consumer / partition limit and scaling of consumer groups causing rebalancing.

If we did want to do everything with Kafka agree with @fpacifici that the parallel consumer should be investigated for non-ordered workloads.

I will say that I have no preference on if we want to use RabbitMQ (or a hosted queue) or Kafka. I can in general live with both but if we stay with RabbitMQ I would like to move us into a world where different codebases can use the queue and not just the monolith. Particularly also other languages.

text/0002-new-architecture.md

markstory · 2022-08-19T15:07:07Z

text/0002-new-architecture.md

+This is generally quite inefficient but we are also running into limitations in the workers.  For instance
+we rely exclusively on memcache for caches which means that we are limited by our cache size limitations.
+Anything above the cache size is not processable which can cause a bad user experience for users depending
+on very large source maps.


Having one way to apply symbols sounds like a good simplification.

markstory · 2022-08-19T15:10:50Z

text/0002-new-architecture.md

+Currently quite a bit of code lives in `getsentry` for reasons of assumed convenience.  However this has created
+the situation that it is quite easy to ship regressions because the changes in `getsentry` were not considered.
+There is likely a whole range of code that does not need to live in `getsentry` and moving it to `sentry` would
+make the testsuite more reliable, faster to run and catch more regressions early.


I could see usage periods and usage collection being something that could move to sentry. We would keep invoicing and subscriptions in getsentry but the usage subsystem could be moved out. This could also help align product features like orgstats and dynamic sampling suggestions with the usage periods that we use for billing in saas more easily.

text/0002-new-architecture.md

markstory · 2022-08-19T15:18:22Z

text/0002-new-architecture.md

+The Sentry monolith currently houses a range of small services that are however clustered together.  For instance the
+processing pipeline really could be separated entirely from the rest of the System.  However the same is true for quite
+a lot of different parts of the product.  For instance certain parts of the user experience are sufficiently idependent
+in the UI code today already that they could turn into a separate service that just supplies UI views together with some
+APIs.  While it's unclear if the existing experiences are worth splitting up into new services, it's probably likely that
+we can at least structure their code so that we can reduce the total amount of pieces of code that need to be checked and
+tested against changes.


Is it worthwhile being explicit that service compartmentalization doesn't imply splitting the monolith into multiple repos and running them as independent processes. Similarly, compartmentalizing the UI doesn't mean 'micro-frontends' that are stitched together.

My understanding is that we want to compartmentalize build and test dependencies so that CI and code review can be done more efficiently is that what you had in mind?

Pretty much yes. I think the discussion about single vs multi repo is largely orthogonal to how things are organized. We have pretty bad tooling for both mono repo and multi repo at the moment and I think in the absence of a clear direction I would personally probably keep the status quo for now.

We can still have a separate conversation later if we want to move things more into one repo or into multiple repos.

I feel we should clarify what we really want to achieve with a Service Infrastructure. I think different people have different ideas on the goal, and depending on the goal the solution is different.
I heard about testability, isolating feature to reduce risk, isolating feature to reduce deployment cost (which would require independent stand alone services).

I would add another one which is architecture sanity (when everything depends on everything there is no architecture).
Do we have other reasons ?

I think that covers it.

(For me one hidden reason about the service discussion is that if we acknowledge that we already operate services, we would do a better job articulating the operational requirements. Today all services we create are one-off creations which makes operating the entirety outside of the primary sentry.io installation challenging. Though that applies less to the monolith than other things we operate but which we might separate from the monolith in the future.)

markstory · 2022-08-19T15:20:31Z

text/0002-new-architecture.md

+The biggest limiting factor today in creating new services is the configuration of these services.  When an engineer adds
+a new service there is typically a followup PR against the internal ops repo to configure these services for the main
+sentry installation.  Then there is some sort of ad-hoc workaround to make the whole thing work in local development, how
+to make it work on a single tenant or self hosted installation.  Each new services undergoes a new phase of rediscovery how
+suboptimal the rules and policies around scaling these services are.


Are we overloading 'services'? Do we need a different term for the processes that a product service needs in order to operate?

Agreed we should probably be more explicit about what those things are.

semantic nit maybe. Should we start talking about domain boundaries and bounded contexts (from domain driven design) instead of services ?
Defining bounded contexts is a logical partition of the data model (which requires having a coherent data model to begin with). That then implies clear dependencies between features/components as your component cannot directly access the part of the data model that belongs to another bounded context.
Having the clear dependencies then allows us to isolate what we call services today whether or not we split the code base or not.

Should we start talking about domain boundaries and bounded contexts (from domain driven design) instead of services ?
Defining bounded contexts is a logical partition of the data model (which requires having a coherent data model to begin with).

Totally agree that the first step to separating 'services' within the monolith or outside of it should start with bounded contexts and what consistency guarantees we need within each context.

For the service separation within the monolith I'm trying to work something sensible out as a near term roadmap.

text/0002-new-architecture.md

Co-authored-by: Mark Story <mark@mark-story.com>

fpacifici · 2022-08-21T21:50:09Z

text/0002-new-architecture.md

+decisions have to be made about the future of our codebase.  As we are talking about creating
+more Sentry deployments in different regions and for smaller clusters, some of these problems
+are becoming more pressing as they create complexity for each added installation.


I think there is an implication to our architecture coming from the idea of multiple smaller clusters that we are not covering here: operation and monitoring.
Essentially the focus moves from the ability to run large scalable clusters to operate efficiently many clusters where each has smaller scalability issues.
There are two ways to address this problem (probably the two have to be addressed together).

Make the architecture simpler with fewer components or fewer types of components to be deployed. This could drive us to consolidate components like multiple snuba consumers into fewer deployment units.

Massively invest in automation so that managing multiple clusters (adding components, tuning components) becomes not harder than managing one. Example: we cannot be scaling up/down individual kafka consumer clusters with 10 regions and we cannot have to add a new Kafka topic manually to each region.

I think the above should be a requirement for the new Sentry architecture.
Maybe this should be a work stream in its own.

fpacifici · 2022-08-21T21:59:03Z

text/0002-new-architecture.md

+the entire pipeline whereas a native event can spend up to 30 minutes or more in the pipeline in very
+extreme cases (completely cold caches against a slow symbol server).
+
+It should however still be possible to move this to a Kafka model where batches of tasks are redistributed


Do we need though ?
I feel that Rabbit and Kafka solve two related problems but provide different functionalities. Some problems can be solved by both but not all of them.
Rabbit:

Auto scales well.

Cheaper ?

Kafka:

Can provide in order delivery guarantees (that Rabbit cannot provide)

Has better delivery guarantees (we can easily do both at least once and at most once)

It seems to scale massively but partitioning makes autoscaling less effective.

So if we want to pick one over the other. It has to be Kafka because of the functional requirements around delivery guarantees that Rabbit does not provide. That means we cannot do without Kafka.

I guess the question is whether we have real use cases where we use rabbitMQ and that would be hard to replace with Kafka.

fpacifici · 2022-08-21T22:19:09Z

text/0002-new-architecture.md

+Within relay some changes are likely to be necessary for continued to support of providing a stable service.  In
+particular as more and more work moves into Relay certain risks that the current architecture is carrying need to


Is it a good idea that more and more work keeps moving into Relay ?
I see a lot of work that cannot be anywhere else like bucketing metrics and process dynamic sampling as we want customer relays to be able to drop transactions.
Though the more logic we put in Relay without a split between infra code and product code the harder it is to keep it stable.
Product logic that maybe should not be there can be:

extract spans details (for the suspect spans feature)

A various amount of replays specific processing logic

A lot of events specific normalization (the more we add envelope types the more product logic will end up in Relay to normalize the data structure)

...

Is there something that prevents us from making Relay simpler and moving all product logic into the specific data category ingestion pipeline to further isolate them?
Also did a bug specific to a data category processing ever cause an incident in relay that impacted others ?

Is it a good idea that more and more work keeps moving into Relay ?

Unclear, but I have been talking with @jan-auer already that if we keep doing we might want to split relay into two pieces two which are connected through a persistent queue. And once we do something like that we can also modularize it. Metrics aggregation I'm conflicted about, but in particular longer term there is a lot of reason to try to attempt symbolication before something hits the innermost parts of the pipeline, particularly now that we have a lot more mobile/native traffic.

Is there something that prevents us from making Relay simpler and moving all product logic into the specific data category ingestion pipeline to further isolate them?

Anything we need to know ahead of dynamic sampling needs to be expanded/processes by relay.

This needs a more concrete plan. We've been talking for a long time about doing symbolication in Relay but so far it's been a pie-in-the-sky idea.

I think the problems this pipeline-reordering brings up are mostly impossible to solve. I have a rough idea on how to do grouping-aware filtering in some cases, but I don't see how we can stackwalk before dynamic sampling without any access to our massive internal system symbols, for example.

fpacifici · 2022-08-21T22:28:24Z

text/0002-new-architecture.md

+layers.  For instance we can achieve much better aggregations by ensuring that data from related metrics keys are
+forwarded to the same processing relays.


I think there are three types of traffic routing problems we are facing today, though they are really instances of the same problem and we could have one system to solve the three.

Traffic affinity. Which is what you mentioned.

Partitioning by organization to run parallel slices of the architecture. Everything based on Kafka does not scale indefinitely horizontally on one topic. We generally first scale by adding partitions, then by adding parallelism on the consumer, but at a point (above 64 partitions) break the traffic into independent topics and partition organizations across topics. As we scale more this last scenario will happen more.

Smart semantic partitioning. Besides Relay metrics aggregations there is a number of Kafka consumers that become more efficient if traffic is sort of semantic partitioned. As an example: if, tentatively, an organization will be only present in a small number of partitions. The gain is still cache locality.

I think the problem is always the same:

map an organization to a logical shard.

have a component (that can even be a service) that dynamic ally or statically route a logical shard to a physical resource (a relay node, a kafka topic or a kafka partition, etc.)

have a feedback loop for dynamic traffic routing that observes the load on the physical resource and inform the component that performs the routing.

I think there is some value in building this as a Relay agnostic system that works based on an abstract concept of partition and physical resource.

I wrote down some of that in the comments here
https://www.notion.so/sentry/Traffic-Steering-f2244ea8c40d405ebca998fe8dc6fd05
This is the scenario in the case 2 above https://www.notion.so/sentry/Traffic-routing-between-shards-d5f15d5e419f4dc5aac67c1c1140fa98

I agree with filippo and I wish we would start building this service sooner rather than later. Our current initiatives don't have a common interface to code against, and since multiple teams are working on sharding/slicing/hybrid-cloud solutions I fear that there will be some avoidable fragmentation in approaches.

fpacifici · 2022-08-21T22:39:21Z

text/0002-new-architecture.md

+The Sentry monolith currently houses a range of small services that are however clustered together.  For instance the
+processing pipeline really could be separated entirely from the rest of the System.  However the same is true for quite
+a lot of different parts of the product.  For instance certain parts of the user experience are sufficiently idependent
+in the UI code today already that they could turn into a separate service that just supplies UI views together with some
+APIs.  While it's unclear if the existing experiences are worth splitting up into new services, it's probably likely that
+we can at least structure their code so that we can reduce the total amount of pieces of code that need to be checked and
+tested against changes.


I feel we should clarify what we really want to achieve with a Service Infrastructure. I think different people have different ideas on the goal, and depending on the goal the solution is different.
I heard about testability, isolating feature to reduce risk, isolating feature to reduce deployment cost (which would require independent stand alone services).

I would add another one which is architecture sanity (when everything depends on everything there is no architecture).
Do we have other reasons ?

fpacifici · 2022-08-21T22:41:59Z

text/0002-new-architecture.md

+The biggest limiting factor today in creating new services is the configuration of these services.  When an engineer adds
+a new service there is typically a followup PR against the internal ops repo to configure these services for the main
+sentry installation.  Then there is some sort of ad-hoc workaround to make the whole thing work in local development, how
+to make it work on a single tenant or self hosted installation.  Each new services undergoes a new phase of rediscovery how
+suboptimal the rules and policies around scaling these services are.


semantic nit maybe. Should we start talking about domain boundaries and bounded contexts (from domain driven design) instead of services ?
Defining bounded contexts is a logical partition of the data model (which requires having a coherent data model to begin with). That then implies clear dependencies between features/components as your component cannot directly access the part of the data model that belongs to another bounded context.
Having the clear dependencies then allows us to isolate what we call services today whether or not we split the code base or not.

fpacifici · 2022-08-21T22:48:49Z

text/0002-new-architecture.md

+more Sentry deployments in different regions and for smaller clusters, some of these problems
+are becoming more pressing as they create complexity for each added installation.
+
+# Expected Workstreams


I think we may have another work stream to deal with.
We have a few independent code bases today (a few independent services) and likely, in the future, we will have more. Though we do not have a Sentry Foundation set of services and libraries that solve common problems for all the code bases services, thus we end up rebuilding things.
There is a number of common components I am thinking about:

rate limiting libraries (a number of rate limiters policies. Today we literally reimplemented that in Snuba and Sentry)

A runtime config flags service+sdk with admin UI usable by everybody. We have 3 systems now, none of which is good.

Shared settings system across services.

Traffic routing sdk.

A streaming platform that allows product teams to run their product logic without caring too much about the deployment of a Kafka consumer. If we move more towards Kafka to replace rabbit we will have a lot more consumers running product logic.

Postgres models (we will certainly not allow different services to share a postgres instance, though it does not mean that different services could access their own postgres instance and take advantage of reusing the sentry abstractions)

....

fpacifici · 2022-08-21T22:55:23Z

text/0002-new-architecture.md

+However it's likely that we will be unable to reduce cardinality in the long run and the data model should ideally
+be able to represent this high cardinality data.


Is supporting high cardinality metrics the first problem to solve ?
The product fundamentally relies on the assumption that real world data is low cardinality (at least for what concerns transaction names). The product concept falls apart if that is not true, i.e. if you cannot group by transaction names effectively. This is because the whole product experience is based on aggregating by transaction names.

So, knowing that real world data is high cardinality, shouldn't we first decide whether we change the product assumption (and then build a high cardinality storage) or whether we keep the product assumption and fix data cardinality instead ?

@fpacifici I think you're mostly thinking about transaction name, but people are also wishing for significantly more tags on performance metrics AND release health that the product would be able to deal with very well... but not the infrastructure.

I think it would help to note down some examples here of concrete product features or queries that we would want to enable with this.

Some examples of high cardinality data outside of URLs:

User / Customer ID

Shopping Cart ID

Request ID

IP address

fpacifici · 2022-08-21T22:58:18Z

text/0002-new-architecture.md

+scale is quite tricky and in some situations we need to needlessly spawn more processes and containers even though they
+could be colocated.
+
+## Data Store


A few other ideas about data store improvements.

The next generation of our Clickhosue fleet. Which should support dynamic schemas, elasticity, separation of storage and compute, self service. I had some discussion with @nikhars about this.

An intermediate high volume storage between postgres (for consistency) and clickhouse (for scale). We should identify which cases fit here and specifically whether we should revisit the main storage for our issue data model.

fpacifici · 2022-08-21T23:25:01Z

text/0002-new-architecture.md

+
+# Expected Workstreams
+
+This captures all expected streams of work that are in the scope of overhauling Sentry's


Another aspect we may want to capture regarding the new architecture: the concept of ownership and self service. We have seen that we cannot live anymore in a world where ops and SnS own everything and that product teams need to take ownership of their system, which can include application code in the monolith but also systems (like kafka consumers or storage instances). This also includes being accountable for the resources they use so that they can take responsibility for decisions between a short term non-scalable solution to deliver faster and a long term scalable solution.

I think there are two fundamental ways to approach the problem that generate different architectures:

physically segregate systems by ownership. Product would have full stack ownership of their resources including k8s deployments, clickhosue clusters, k8s namespace.

a mixed model where the product owns application logic running on a serverless infrastructure managed by infra teams.

For snuba we are going towards the second model. I am outlining it here https://www.notion.so/sentry/Improve-product-development-speed-92c5b9f06a6744288454437f5a8c3db6

Though this problem will apply to Sentry as well.

@fpacifici would be nice to move this into a public space so I can reference it: https://www.notion.so/sentry/Improve-product-development-speed-on-Snuba-92c5b9f06a6744288454437f5a8c3db6

untitaker · 2022-08-30T11:35:03Z

text/0002-new-architecture.md

+
+We still use pickle as serialization format for Celery which means that it's not possible for code
+outside of the Sentry monolith to dispatch tasks.  This is both an issue for the separation of
+`sentry` and `sentry_pipeline` (as import paths are changing) as well as it causes problems for


While I absolutely support moving away from pickle, I would challenge that this is a blocker for renaming sentry to sentry_pipeline. Celery tasks are explicitly named using the instrumented_task decorator, and while the names usually correspond to teh import path, you can move tasks around and rename modules without breaking task producers. we already did that successfully.

The only place where I think actual import paths are serialized for Celery today is when a complex object is pickled as a task argument. There is prior art to restricting what objects may be pickled in load-it-up.py. A custom pickle serializer wired up to Celery that warns if a non-JSON-serializable object is being serialized should be step 0 before moving to a different serialization format. Ideally we'd be doing the same for org/project/global options.

This step 0 will require a lot of coordination and followup tasks for many product areas as I am fairly sure we pickle complex objects in both task arguments and postgres

The issue is not the naming of the task but the arguments. If I today want to dispatch a task containing a model from outside the sentry codebase, I need to create a message that pickles into a django model from within the sentry codebase. The only realistic and portable way to do this is to import the monolith, create the instance and dispatch the task that way.

The non JSON safe detection code BTW is already in Sentry. The issue is that the tasks are still passing these complex objects in practice. That's a thing we need to start addressing before we can even entertain the idea of moving off it.

untitaker · 2022-08-30T13:41:32Z

text/0002-new-architecture.md

+
+### Separation of Traffic Classes
+
+Relay currently forwards all envelopes to the next relay in chain the same.  There are however benefits by being able


open question:

Where should this split occur? I assume the simplest answer is to do the routing in pops, and only separate processing relays, but could it make sense to do it in customer relays already, so that pops are also segmented?

untitaker · 2022-08-30T13:43:57Z

text/0002-new-architecture.md

+
+### Serverless Function Declarations for Consumers
+
+The smallest unit of execution at Sentry is not a docker image but in fact a function.  We currently have no real serverless


related to the rabbitmq vs kafka discussion, but we already have this abstraction in sentry: a celery task

we can consider two almost orthogonal things here:

whether we want to continue using celery as our task abstraction, or build something simpler and more restrictive in-house

what the backing queue should be (rabbitmq, kafka, something else)

untitaker · 2022-08-30T13:45:51Z

text/0002-new-architecture.md

+handling of moving such code into environments where other code wants to access it.  We already require depickling
+in the data pipeline for such cases.
+
+### Phase out Buffers where Possible


honestly i think buffers is conceptually a good idea to keep. it provides batching for a subset of operations in your task/function without forcing you to give up isolation of tasks/functions. I instead wish we made buffers more scalable under the hood, encourage/allow isolation per product domain, and extended the abstraction to work for actually more things than models.

moving to bespoke kafka consumers that do batching is a pretty big investment to make for each product, and I would prefer if we continued to have higher-level building blocks (higher-level than arroyo). To me that would be a rewrite of buffers.

I think buffers is a good concept and it should stay. The way it's built is maybe not ideal (in particular the model and pickle dependency) and particularly given extended outages as a possibility it would be preferrable this to be backed by kafka rather than redis for instance. I don't think everything should build a kafka consumer but a buffer service could be a possibility.

Building on the ideas presented in #2, this RFC aims to provide a more formal description of how services/domains within the monolith could be structured.

mitsuhiko · 2022-09-28T09:48:39Z

I merged this RFC and will maintain it as an aspirational living document. The link to the PR is maintained so that old discussions can be found.

mitsuhiko added 3 commits July 21, 2022 12:20

New Sentry Architecture

48f5604

More text

c278cbb

High cardinality store

dec4541

fpacifici reviewed Aug 3, 2022

View reviewed changes

Fix PR link

1c4edab

markstory reviewed Aug 19, 2022

View reviewed changes

mitsuhiko and others added 4 commits August 21, 2022 16:42

Update text/0002-new-architecture.md

dab8e8a

Co-authored-by: Mark Story <mark@mark-story.com>

Update text/0002-new-architecture.md

e8095e8

Co-authored-by: Mark Story <mark@mark-story.com>

Update text/0002-new-architecture.md

89edb32

Co-authored-by: Mark Story <mark@mark-story.com>

Update text/0002-new-architecture.md

ec861d4

Co-authored-by: Mark Story <mark@mark-story.com>

fpacifici reviewed Aug 21, 2022

View reviewed changes

untitaker reviewed Aug 30, 2022

View reviewed changes

markstory added a commit that referenced this pull request Sep 8, 2022

Define service boundaries within the monolith

d025fb3

Building on the ideas presented in #2, this RFC aims to provide a more formal description of how services/domains within the monolith could be structured.

markstory added a commit that referenced this pull request Sep 8, 2022

Define service boundaries within the monolith

7d72312

Building on the ideas presented in #2, this RFC aims to provide a more formal description of how services/domains within the monolith could be structured.

markstory mentioned this pull request Sep 8, 2022

Define service boundaries within the monolith #7

Open

marandaneto mentioned this pull request Sep 18, 2022

Introduce additional types of exceptions next to mechanism.handled exceptions #10

Open

mitsuhiko merged commit 9581e0e into main Sep 28, 2022

mitsuhiko deleted the feature/arch branch September 28, 2022 09:38


		### Remove Celery and RabbitMQ for Kafka

		Our use of RabbitMQ is reaching the limits of what can be done with this system safely. If we hit

		Within relay some changes are likely to be necessary for continued to support of providing a stable service. In
		particular as more and more work moves into Relay certain risks that the current architecture is carrying need to

		layers. For instance we can achieve much better aggregations by ensuring that data from related metrics keys are
		forwarded to the same processing relays.

		However it's likely that we will be unable to reduce cardinality in the long run and the data model should ideally
		be able to represent this high cardinality data.


		# Expected Workstreams

		This captures all expected streams of work that are in the scope of overhauling Sentry's


		### Separation of Traffic Classes

		Relay currently forwards all envelopes to the next relay in chain the same. There are however benefits by being able


		### Serverless Function Declarations for Consumers

		The smallest unit of execution at Sentry is not a docker image but in fact a function. We currently have no real serverless

New Sentry Architecture #2

New Sentry Architecture #2

Conversation

mitsuhiko commented Jul 21, 2022 • edited

fpacifici left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fpacifici Aug 21, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

untitaker Aug 30, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

untitaker Aug 30, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

untitaker Aug 30, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

untitaker Aug 30, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mitsuhiko commented Sep 28, 2022

mitsuhiko commented Jul 21, 2022 •

edited

fpacifici Aug 21, 2022 •

edited

untitaker Aug 30, 2022 •

edited

untitaker Aug 30, 2022 •

edited

untitaker Aug 30, 2022 •

edited

untitaker Aug 30, 2022 •

edited