In [4]:
!pip -q install redis redis-cli

In [None]:
import redis
r = redis.Redis(host="127.0.0.1", port=6379, db=0)

## Redis HyperLogLog Explained - [Link to Video](https://youtu.be/UAL2dxl1fsE)

### Overview

HyperLogLog (HLL) estimates the **cardinality** (number of distinct elements) of large sets using fixed memory, trading perfect accuracy for **~0.81% standard error** and **≤12 KB** per structure ([Redis HLL docs](https://redis.io/docs/latest/develop/data-types/probabilistic/hyperloglogs/), [PFCOUNT docs](https://redis.io/docs/latest/commands/pfcount/)). Use HLLs when you need approximate distinct counts with minimal memory; use **Sets** when you need exact membership or precise counts (with higher memory) ([Redis HLL docs](https://redis.io/docs/latest/develop/data-types/probabilistic/hyperloglogs/)).

Core commands:
- **`PFADD`**: add elements to an HLL ([PFADD docs](https://redis.io/docs/latest/commands/pfadd/)).
- **`PFCOUNT`**: get the estimated cardinality of one or more HLLs; with multiple keys it estimates the **union** cardinality without storing the union ([PFCOUNT docs](https://redis.io/docs/latest/commands/pfcount/)).

Applications mentioned: counting unique website visitors, ad clickers, and traffic flows—while limiting stored personal data because HLLs **do not store raw members** ([Redis HLL docs](https://redis.io/docs/latest/develop/data-types/probabilistic/hyperloglogs/)).

---

### Data Model: Traffic Heat Map (San Francisco Intersections)

Each major intersection has an HLL key. Example: intersection of **Market St (ID 12)** and **10th St (ID 27)** ⇒ key `count:sf:12:27`. Record each scanned **license plate** as an element; the HLL gives a per-intersection distinct-vehicle estimate.

```python
# add license plates observed at Market (12) & 10th (27)
r.execute_command("PFADD", "count:sf:12:27", "CA-7ABC123")

# add many more observations (unique plate IDs)
for i in range(2, 62):  # +60 more to illustrate scale
    r.execute_command("PFADD", "count:sf:12:27", f"CA-{i:06d}")

# verify: estimated distinct vehicles at this intersection
r.execute_command("PFCOUNT", "count:sf:12:27")
print("61")
````

Notes:

* Duplicate scans (e.g., a U-turn seen twice) don’t inflate counts: HLL estimates **unique** elements ([PFCOUNT docs](https://redis.io/docs/latest/commands/pfcount/)).

---

### Single-Intersection Count (`PFCOUNT`)

Retrieve per-key cardinality to drive the heat map intensity.

```python
# get the count for Market & 10th
r.execute_command("PFCOUNT", "count:sf:12:27")
print("61")
```

The result is an **approximate** integer with \~0.81% standard error ([PFCOUNT docs](https://redis.io/docs/latest/commands/pfcount/), [Redis HLL docs](https://redis.io/docs/latest/develop/data-types/probabilistic/hyperloglogs/)).

---

### Neighborhood / Union Count (`PFCOUNT` with Multiple Keys)

Zooming out requires counting distinct vehicles across multiple intersections. `PFCOUNT` with multiple keys estimates the **union** without persisting it.

```python
# union cardinality across multiple intersections (Market & 10th, 11th, 12th)
r.execute_command(
    "PFCOUNT",
    "count:sf:12:27",
    "count:sf:12:28",
    "count:sf:12:29",
)
print("~approximate union cardinality (integer)")
```

This performs an **on-the-fly merge** to compute the union estimate ([PFCOUNT docs](https://redis.io/docs/latest/commands/pfcount/)).

---

### Sliding Window with Key Expiration (Freshness)

The transcript defers time-windowing to the reader. A typical approach is **rolling HLL keys with TTL** to maintain a sliding window.

```python
# example: daily buckets per intersection with expiry for a 7-day sliding window
key = "count:sf:12:27:2025-09-21"  # ISO date partition
r.execute_command("PFADD", key, "CA-7ABC123")
r.execute_command("EXPIRE", key, 7 * 24 * 3600)  # 7 days

# verify expiry is set (> 0)
r.execute_command("TTL", key)
print("> 0 seconds remaining")
```

For a coarser view, union the last N daily keys with `PFCOUNT key1 key2 ...` ([PFCOUNT docs](https://redis.io/docs/latest/commands/pfcount/), [EXPIRE docs](https://redis.io/docs/latest/commands/expire/)).

---

### Properties and Trade-offs

* **Fixed memory**: up to **12 KB** per HLL, with sparse→dense automatic encoding ([PFCOUNT docs: representation](https://redis.io/docs/latest/commands/pfcount/), [Redis HLL docs](https://redis.io/docs/latest/develop/data-types/probabilistic/hyperloglogs/)).
* **Accuracy**: \~**0.81%** standard error ([Redis HLL docs](https://redis.io/docs/latest/develop/data-types/probabilistic/hyperloglogs/)).
* **No member retrieval**: HLLs maintain registers, **not elements**; you can’t list what was added ([Redis HLL docs](https://redis.io/docs/latest/develop/data-types/probabilistic/hyperloglogs/)).
* **Union options**:

  * **Ad-hoc**: `PFCOUNT k1 k2 ...` computes a temporary union estimate ([PFCOUNT docs](https://redis.io/docs/latest/commands/pfcount/)).
  * **Materialized**: `PFMERGE dest k1 k2 ...` produces a stored union sketch for repeated queries ([PFMERGE docs](https://redis.io/docs/latest/commands/pfmerge/)).

---

### Command Examples (Concise)

```python
# add elements; returns 1 if HLL’s internal registers changed, else 0
r.execute_command("PFADD", "count:sf:12:27", "CA-7ABC123", "CA-7XYZ999")
print("1 or 0")

# single-key cardinality
r.execute_command("PFCOUNT", "count:sf:12:27")
print("approximate integer")

# multi-key union cardinality (temporary)
r.execute_command("PFCOUNT", "count:sf:12:27", "count:sf:12:28")
print("approximate integer")

# materialize a union for repeated queries
r.execute_command("PFMERGE", "count:sf:12:market_corridor", "count:sf:12:27", "count:sf:12:28")
r.execute_command("PFCOUNT", "count:sf:12:market_corridor")
print("approximate integer")
```

---

> **Sidenote**
>
> [Link: PFADD — Redis Docs](https://redis.io/docs/latest/commands/pfadd/)
>
> **Command**: `PFADD` → Add elements to a HyperLogLog.
>
> **Pattern**: `PFADD key element [element ...]`
>
> **Example**: `PFADD count:sf:12:27 CA-7ABC123 CA-7XYZ999`
>
> **Result**:
>
> Integer reply: `1` if at least one internal register changed; `0` otherwise.

> **Sidenote**
>
> [Link: PFCOUNT — Redis Docs](https://redis.io/docs/latest/commands/pfcount/)
>
> **Command**: `PFCOUNT` → Estimate cardinality of one HLL or the union of many.
>
> **Pattern**: `PFCOUNT key [key ...]`
>
> **Example**: `PFCOUNT count:sf:12:27 count:sf:12:28`
>
> **Result**:
>
> Integer reply: approximate distinct count; with multiple keys, an on-the-fly union estimate.

> **Sidenote**
>
> [Link: PFMERGE — Redis Docs](https://redis.io/docs/latest/commands/pfmerge/)
>
> **Command**: `PFMERGE` → Merge HLLs into a destination HLL (approximate union).
>
> **Pattern**: `PFMERGE destkey sourcekey [sourcekey ...]`
>
> **Example**: `PFMERGE count:sf:12:market_corridor count:sf:12:27 count:sf:12:28`
>
> **Result**:
>
> Simple string reply: `OK`; `PFCOUNT destkey` then returns the merged union estimate.

> **Sidenote**
>
> [Link: EXPIRE — Redis Docs](https://redis.io/docs/latest/commands/expire/)
>
> **Command**: `EXPIRE` → Set a TTL on a key to support sliding windows.
>
> **Pattern**: `EXPIRE key seconds [NX|XX|GT|LT]`
>
> **Example**: `EXPIRE count:sf:12:27:2025-09-21 604800`
>
> **Result**:
>
> Integer reply: `1` if the timeout was set; `0` otherwise.

---

> **Sidenote**
>
> [Link: HyperLogLog — Redis Docs](https://redis.io/docs/latest/develop/data-types/probabilistic/hyperloglogs/)
>
> **Concept**: `HyperLogLog` → Probabilistic structure for distinct counting with fixed memory and small error.
>
> **Context**: Use when tracking unique items at scale (e.g., intersections, visitors) where approximate counts are acceptable.
>
> **Example**: Per-intersection keys capturing license plates; heat map color is driven by `PFCOUNT` results.
>
> **Implication**:
>
> Constant memory (≤12 KB) and \~0.81% error enable high-scale, privacy-aware counting without storing actual identifiers.

> **Sidenote**
>
> [Link: PFCOUNT — Representation & Limits](https://redis.io/docs/latest/commands/pfcount/)
>
> **Concept**: `Union Semantics` → Multi-key `PFCOUNT` computes a temporary union; `PFMERGE` materializes a reusable union sketch.
>
> **Context**: Temporary vs. repeated neighborhood/zone queries.
>
> **Example**: `PFCOUNT kA kB kC` for a one-off zoomed view; `PFMERGE kDest kA kB kC` for dashboards.
>
> **Implication**:
>
> Choose ad-hoc or materialized unions to balance latency, persistence, and query frequency.

### Original Transcript

Video title: Redis HyperLogLog Explained
    
Video URL: https://youtu.be/UAL2dxl1fsE
    
Video language: English (United States)
    
--------------------------------

A 45-minute commute? That's way too long. I'd love a way to see where all the traffic is throughout the day. We can use the HyperLogLog to create a solution. HyperLogLogs are all about efficient counting. We can count unique visits to a website, daily clicks from users on a specific advertisement, even track traffic. All while preserving privacy. Let's see how we can use a HyperLogLog to power a traffic heat map for the city of San Francisco. Hop on in, let's go for a ride.  A HyperLogLog is a data structure that answers a simple question. What's the approximate size or cardinality of a set? That's it. HyperLogLogs optimize for space or memory over perfect accuracy. So you use them when you're counting really large sets and you don't need or have the space for perfectly accurate counts. If you need more set-like functionality or perfect count accuracy, then you should use a Redis Set instead. But remember that when counting large sets, a Redis Set will use a lot more memory than a HyperLogLog. With only two key commands, it's easy to master HyperLogLogs. We'll add elements to the HyperLogLog with the PFADD command. We'll fetch the cardinality of one or more HyperLogLogs using the PFCOUNT command. To create our traffic heat map of San Francisco, we'll create a HyperLogLog for each major intersection in the city. To count the vehicles, we'll add the license plate numbers scanned at each intersection to that intersection's HyperLogLog. When we render our traffic map, we'll use the cardinality estimate for each intersection to show how busy it is. The higher the cardinality, the hotter the intersection. Let's imagine we're at the corner of Market and 10th in San Francisco. We'll refer to Market Street by the ID 12 and we'll refer to 10th Street by the ID 27. To record a license plate in the HyperLogLog, we'll use the PFADD command like so. PFADD count:sf:12:27 and our license plate number. I'll add 60 more license plate numbers to our HyperLogLog to further our example. Now, I want to point out something with this traffic counting system. A HyperLogLog maintains a count of unique members. So if a car pulls a U-turn in an intersection and gets scanned twice, the HyperLogLog will count it only once. This means our count might not always perfectly reflect the San Francisco drivers' unique driving habits. Also, a quick word about privacy. Like most people, San Franciscans value their privacy. No one really wants us tracking their movements around the city. What's interesting about the HyperLogLog is that it naturally helps to preserve privacy. Once we put a license plate into a HyperLogLog, we can't get it out. A HyperLogLog stores as little information as possible to estimate its counts. This means that HyperLogLogs are a great way to count unique members without explicitly storing personal data. Now that we are recording vehicle movements around San Francisco, we need a way to get each HyperLogLog's estimate of the traffic at each intersection. We run the PFCOUNT command to get a HyperLogLog's count. To check our Market and 10th Street HyperLogLog, we call PFCOUNT count:sf:12:27. 61 is returned. To keep our heat map up to date, we'll call the PFCOUNT periodically in all of our intersections. What if we need to zoom out on our heat map to see a coarser-grained view at a neighborhood level? To do that, we'll need to consider data stored in several HyperLogLogs. PFCOUNT lets us do that too. When we pass multiple HyperLogLog keys to PFCOUNT, we get a count representing the cardinality of the union of those HyperLogLogs. Our command would be PFCOUNT followed by our intersection keys. This will give us an approximation of the number of distinct vehicles moving around a larger area on our map. Yeesh, don't go to Civic Center during lunch hour. Our example records vehicles at each intersection. But we never specify in what time frame or how long we keep that HyperLogLog. A sliding window method of recording HyperLogLog with key expiry would be one solution. We'll leave that exercise to the viewer. But if you'd like to see a similar approach to sliding window data, check out our two-part explainer videos on Redis Sets. OK, review time. We've learned how to add elements to a HyperLogLog with PFADD. And we've learned how to get the cardinality of one or more HyperLogLogs with PFCOUNT. What will you do with a Redis HyperLogLog? HyperLogLogs are ideal for tracking IP addresses to estimate unique visitors to a website, user check-ins at locations, and monitoring any large set of elements where approximating cardinality is the main focus. To learn more about Redis HyperLogLogs and other data structures, sign up for our free online courses at Redis University, our online learning platform for all things Redis. Thanks for taking a ride with me around Redis HyperLogLogs. Hope to see you again soon, and make sure to look both ways before you PFCOUNT.