# CPU Cycles and Degree of Data Selectivity

This experiment evaluates the cost of total CPU cycles during transfering different percentages of the whole **dataset (50GB)** from multiple servers to client using a simple data filtering function. The transmission size range from 10% to 90% of the dataset, which is generated/filled with only digits under uniform distribution, to represent different degree of data selectivity. We hold the following assumption in this experiment as follows,

- Both the servers and client have the same CPU model.
- Producting workset of high data selectivity (smaller transmission size) consumes more CPU cycles than workset of low data selectivity (larger transmission size).

## Data Filtering Function

The data filtering function we evaluated in this experiment is a lambda function
```python
lambda v: v not in list(b'0123456789')[:LEN]
```
where `LEN` between 1 and 9 is for controling the degree of data selectivity.

For each byte in the dataset, we use the above function to check if the current byte should be transmited. Specifically, a bigger `LEN` results in high data selectivity while smller `LEN` permits more bytes in the dataset to be transmited meaning low data selectivity in other words. To verify high data selectivity should cost more CPU cycles than low data selectivity with this function, we have this proof of concept to show the differences of time and cycles consumptions of these two cases:


```bash
docker run --rm -ti --cap-add SYS_ADMIN ljishen/perf:4.9-python3 stat \
    python3 -m timeit -s "expr = lambda v: v not in list(b'0123456789')[:9]" "expr(48)"
1000000 loops, best of 3: 0.53 usec per loop

 Performance counter stats for 'python3 -m timeit -s expr = lambda x: x not in list(b'0123456789')[:9] expr(48)':

       2498.401472      task-clock (msec)         #    1.000 CPUs utilized          
                 3      context-switches          #    0.001 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
              3627      page-faults               #    0.001 M/sec                  
        6914129443      cycles                    #    2.767 GHz                    
       16216707149      instructions              #    2.35  insn per cycle         
        3711921245      branches                  # 1485.718 M/sec                  
           5161883      branch-misses             #    0.14% of all branches        

       2.499271785 seconds time elapsed
```
---

```bash       
docker run --rm -ti --cap-add SYS_ADMIN ljishen/perf:4.9-python3 stat \
    python3 -m timeit -s "expr = lambda v: v not in list(b'0123456789')[:1]" "expr(48)"
1000000 loops, best of 3: 0.46 usec per loop

 Performance counter stats for 'python3 -m timeit -s expr = lambda x: x not in list(b'0123456789')[:1] expr(48)':

       2211.099283      task-clock (msec)         #    0.996 CPUs utilized          
                 6      context-switches          #    0.003 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
              3629      page-faults               #    0.002 M/sec                  
        6741664677      cycles                    #    3.049 GHz                    
       15562575721      instructions              #    2.31  insn per cycle         
        3547408130      branches                  # 1604.364 M/sec                  
           8796172      branch-misses             #    0.25% of all branches        

       2.219600636 seconds time elapsed
```

To summarize,

|Selectivity | Function                                    | CPU Cycles (1000000 loop)  | Time (usec per loop)  |
|:-----------|:--------------------------------------------|:---------------------------|:----------------------|
|High        |lambda v: v not in list(b'0123456789')[:9]  | 6914129443                 | 0.53                  |
|Low         |lambda v: v not in list(b'0123456789')[:1]  | 6741664677                 | 0.46                  |

## Experiment Setup

![experiment_setup](https://raw.githubusercontent.com/ljishen/eucycles/master/analysis/images/experiment_setup.png)