In [None]:
import random

In [None]:
def load_edges(path):
    edges = []
    with open(path, "r") as f:
        for line in f:
            edge = line.strip().split()
            if edge:
                edges.append(frozenset(edge))
    return edges

edges = load_edges("Wiki-Vote.txt")
print(f"Loaded {len(edges)} edges.")
edges[:5]

In [None]:
def generateNeighboorhood(edges, node_u, node_v):
    neighbors_u = set()
    neighbors_v = set()
    for edge in edges:
        if node_u in edge:
            neighbors_u.update(edge - {(node_u)})
        if node_v in edge:
            neighbors_v.update(edge - {(node_v)})
    return neighbors_u.intersection(neighbors_v)

In [None]:
def updateCounter(sign, u, v, reservoir, triangleCounter):
    neighborSet = generateNeighboorhood(reservoir, u, v)
    if sign == "minus":
        triangleCounter -= len(neighborSet)
    else:
        triangleCounter += len(neighborSet)
    return triangleCounter

In [None]:
def reservoir_sampling_base(iteration, reservoir_size, reservoir, triangleCounter):
    if iteration < reservoir_size:
        return [True, reservoir, triangleCounter]
    rnd = random.randint(0, iteration)
    if rnd < reservoir_size:
        replaced = reservoir[rnd]
        triangleCounter = updateCounter("minus", list(replaced)[0], list(replaced)[1], reservoir, triangleCounter)
        reservoir.pop(rnd)
        return [True, reservoir, triangleCounter]
    return [False, reservoir, triangleCounter]

In [None]:
def trieste_base(stream, reservoir_size):
    reservoir = []
    triangleCounter = 0
    est = 0
    
    for i, edge in enumerate(stream):
        t = i + 1
        print(i)
        [isAdded, reservoir, triangleCounter] = reservoir_sampling_base(i, reservoir_size, reservoir, triangleCounter)
        if isAdded:
            reservoir.append(edge)
            triangleCounter = updateCounter("plus", list(edge)[0], list(edge)[1], reservoir, triangleCounter)
        if t <= reservoir_size:
            est = 1
        else:
            est = (t * (t - 1) * (t - 2)) / (reservoir_size * (reservoir_size - 1) * (reservoir_size - 2))
    return triangleCounter * est

trieste_base(edges, 10000)


In [None]:
def generateNeighboorhood_improved(edges, node_u, node_v):
    neighbors_u = set()
    neighbors_v = set()
    for edge in edges:
        if node_u in edge:
            neighbors_u.update(edge - {(node_u)})
        if node_v in edge:
            neighbors_v.update(edge - {(node_v)})
    return neighbors_u.intersection(neighbors_v)

def updateCounter_improved(u, v, reservoir, triangleCounter, t, reservoir_size):
    neighborSet = generateNeighboorhood_improved(reservoir, u, v)
    weightedIncrease = 0
    if (t <= reservoir_size):
        weightedIncrease = 1
    else:
        weightedIncrease = max(1, (((t - 1) * (t - 2))/((reservoir_size * (reservoir_size - 1)))))
    triangleCounter += (len(neighborSet) * weightedIncrease)
    return triangleCounter

def reservoir_sampling_base_improved(iteration, reservoir_size, reservoir, triangleCounter):
    if iteration < reservoir_size:
        return [True, reservoir, triangleCounter]
    rnd = random.randint(0, iteration)
    if rnd < reservoir_size:
        reservoir.pop(rnd)
        return [True, reservoir, triangleCounter]
    return [False, reservoir, triangleCounter]

def trieste_improved(stream, reservoir_size):
    reservoir = []
    triangleCounter = 0
    est = 0
    
    for i, edge in enumerate(stream):
        t = i + 1
        print(i)
        triangleCounter = updateCounter_improved(list(edge)[0], list(edge)[1], reservoir, triangleCounter, t, reservoir_size)
        [isAdded, reservoir, triangleCounter] = reservoir_sampling_base_improved(i, reservoir_size, reservoir, triangleCounter)
        if isAdded:
            reservoir.append(edge)
    return triangleCounter 

trieste_improved(edges, 10000)

## Report
### Instructions
The Wiki-Vote.txt file was downloaded from Canvas and placed next to the notebook. We treated the directed edges as undirected. Both TRIÈSTE-Base and TRIÈSTE-Improved were implemented and tested in the notebook.

### Introduction
Triangle counting is expensive because even moderately large graphs can contain millions of triangles, and in a streaming setting we can’t store the full graph. TRIÈSTE addresses this by keeping only a fixed-size sample (a reservoir) of previously seen edges. From that sample, it estimates how many triangles the full graph likely contains. The assignment required implementing both the original “base” method and an improved variation.

### Triangle Estimation With Reservoir Sampling
A triangle {u,v,w} is detected only if all three edges arrive while the reservoir happens to contain the other two. Since the reservoir is small, we rely on probability rather than full information. The estimator corrects for this by scaling the triangle contributions.

### Reservoir Sampling Overview
Reservoir sampling maintains a uniform sample of size k out of t seen edges:  
- First k edges are kept.  
- Each later edge is kept with probability k/t.  
- If kept, it replaces a random edge in the reservoir.  
This guarantees fairness but causes edges (and thus triangle evidence) to appear and disappear over time.

### TRIÈSTE-Base
The base algorithm works as follows:
- When a new edge (u,v) arrives, it checks the reservoir for common neighbors of u and v. Each one means a triangle is “seen” at that moment.  
- If the edge gets inserted, a random old edge is removed. If that removed edge belonged to triangles, the algorithm subtracts those contributions.

While this approach seems innocent, it introduce some serious issues:
- Subtracting triangles during eviction introduces big fluctuations, especially if the removed edge connects to many nodes.  
- The estimator relies on a final global scaling factor, which amplifies noise.  
- Implementation becomes messy due to the constant add/remove bookkeeping.

### TRIÈSTE-Improved
The improved version fixes the instability while keeping the same basic idea. Instead of counting only if the new edge survives sampling, it always checks for triangles first. This removes the need for triangle subtraction alltogether, as well as makes the observed behavior much smoother.

**2. Local weighting**  
Each triangle increment gets its own weight:
    w(t) = max(1, (t-1)(t-2) / (k(k-1)))
This corrects the probability that all three triangle edges would have been in the reservoir at time t.  
- Avoids the large global correction the base version applies at the end.  
- Reduces variance significantly.

**3. Simpler internal state**  
The maintained triangle counter is already the final estimate—no reversal logic or eviction fixes required.

### Empirical Results
Running both methods on Wiki-Vote with reservoir size 10,000 showed:  
- Base: noisy estimates caused by subtraction during edge evictions.  
- Improved: a stable, smooth estimate that grows predictably as more triangles are encountered.  
Runtime stayed similar, but the improved method required fewer special cases.

### Conclusion
TRIÈSTE-Base demonstrates the core idea but suffers from volatility and complicated correction steps.  
The improved version solves these issues through reordered logic and per-edge weighting, giving lower variance and a cleaner implementation.  
For any realistic streaming setting, the improved version is the better and more usable estimator.
