Skip to content

Performance inching isn't a strategy, and AI models aren't exceptions

Anh-Tho Chuong edited this page Apr 14, 2024 · 1 revision

When touting their product against competitors, developers often use performance as the core criteria. Assuming you’re a developer, you’ve definitely seen the slogans. “Our framework is 40% more efficient than its predecessor”. “Reach 2x faster speeds than Y library”. Etc. Etc.

It’s a fair thing to care about. Notably, performance woes with an existing solution often prompt a search for an upgrade. If your application can’t handle your users’ needs, then something needs to change. For most companies, however, that search doesn’t result in a solution with peak performance. Instead, organizations often choose something with satisfactory performance that has other markers of success, such as a great community.

In short, we believe that performance is rarely more than a checkbox.

Our personal experience

About a year ago, we were running up against the wall with our database setup. We used Postgres for everything, but the general-purpose database was failing our analytical needs. The analytical queries were hogging up our database’s CPU and memory, clogging the database’s availability for other business-critical processes.

We eventually made a decision. Postgres’s efficiency in answering analytical queries wasn’t cutting it. We needed a more efficient solution—an analytics-friendly OLAP database, to be specific. And we had tons of options. Pinot. Apache Druid. Timescale.

We ended up choosing ClickHouse. ClickHouse was blazing fast and definitely above the threshold needed to rectify our problem. However, while our search was prompted by speed issues, we didn’t choose ClickHouse because it was the fastest. It was technically slower than [QuestDB](https://questdb.io/time-series-benchmark-suite/) for instance. What mattered was that it was fast enough. Other highlights sealed the deal: (i) an amazing community and (ii) a strong managed offering.

Let’s back up and look at this story a bit more holistically.

Performance is a heavy-weighted checkbox

In general, developers tend to take a holistic approach when choosing products. And there are many checkboxes. For some teams, products need to be open source. Other teams need explicit compatibility with their other tools.

Typically, performance is another one of these checkboxes. A product simply needs to be fast enough, compressed enough, and economical enough.

For instance, we had two requirements for our new analytical database: (i) analytical queries couldn’t lock up the server, and (ii) queries had to be answered in a sub-second time. Technically, we could’ve achieved the first tenet by reading from a Postgres replica alone, but never the second. We needed a new product, and ClickHouse solved both. And while <>, <>, and Timescale were technically faster, our quest wasn’t returning the fastest result to the user. Our users waiting a heartbeat for the loaders to disappear was A-OK. So, once we checked the box, we opted for other considerations (which, for us, was community).

There are exceptions

As with any mantra, there are exceptions. In certain cases, performance is the most important consideration, particularly if the sought product is closely tied to the business’s value prop. For instance, Dropbox—a company in the business of storing things—may care about achieving peak compression, given that it’s directly tied to Dropbox’s profit margin. Likewise, a team such as AWS S3 may care about optimal servers to serve its CDN content faster than competitors.

But even here, there are holdouts. PostHog is in the business of storing and displaying website analytical data but still opted for ClickHouse for similar reasons to ours.

In a nutshell, performance is about crossing a threshold, not pushing a boundary. But, if it’s critical to an organization’s core offering, the reverse may be applicable.

Community is an underrated marker

The real reason we chose ClickHouse was its massive community of adopters. These were builders who were dealing with similar analytical problems. While ClickHouse has plenty of competitors, it’s a rather old tool, one of the first OLAP databases in the modern era of big data processing. Its long, battle-tested age has very effectively activated developers.

Community is a big deal. And I’m not referencing a heap of available Stack Overflow posts. It’s more that products with big communities naturally flesh out. Big communities mean more online guides. Big communities mean more battle-tested pairings with other common frameworks. Big communities create a market for commercially-driven managed solutions to launch, scale, and maintain the product (e.g., Altinity Cloud for ClickHouse). And, as icing on the cake, adopting a popular project makes it easy to attract engineering talent.

Today, we have a relatively popular stack at Lago. We would like to think of it as a fancy, bespoke solution, but frankly, Postgres + ClickHouse has grown into quite a common database split. We needed to do very little in the creative wheelhouse when implementing it. Scaling it was easy with Altinity. Connecting it was simplified by prebuilt Kafka connectors. This was all made possible due to ClickHouse’s popularity.

Open source plays a role here

Many developer tools—especially tools shaped around performance—are open source. And, for open-source tools, community is inherently important. A strong community means more third-party plugins built for tools. It means more members auditing code, flagging bugs, and preventing security issues. In short, community activation by open-source projects means better projects in the long run.

This is a slightly self-fulfilling point because open source attracts avid communities because of its free and open nature. There’s no chance that ClickHouse would’ve gotten its modern adoption had it been closed source.

Often, cost precedes performance

Assuming all contenders have satisfactory performance, another common tie-breaker is cost. A product that is twice as fast might cost twice as much. Even if it costs 50% as much, it’s likely not the ideal choice if the slower product’s performance is satisfactory.

For us, cost wasn’t a major consideration given that all databases are priced with scale, and projecting scale was a tricky thing to begin with. Other companies may be satisfied with the Porsche, not the Bugatti.

What does this mean for performance-inching?

The intention of this article isn’t to shame folks working on performance-inching tools (e.g., something with a 10-100% improvement on the previous leader). The development ecosystem moves forward because of small projects that aspire to push the needle.

Instead, the message here is that those projects are not going to succeed because of broad adoption, assuming there’s a widely popular incumbent. This is, more or less, an engineering version of the VC “10x or bust” mentality. Frankly, a 20-30% improvement isn’t going to convince organizations to move over from a significantly more popular tool.

At the same time, that’s not a death sentence for projects with marginal performance improvements. Instead of succeeding with broad adoption, they can charge more and cater to organizations whose bottom-line business is directly attached to underlying performance. For instance, a project that scores a marginally faster AI training score may find serious, lucrative success with organizations that do AI training as a service. It just may not succeed with companies that need it as a subprocess.

Closing thoughts

Performance might not be a common north star, but it is still important. What’s more important, however, is projects building well-rounded qualities—including amazing communities, extensions, pricing plans, and code disclosures. Of course, some companies may see performance as a be-all and end-all factor, but it’s increasingly rare!