# Intro

This notebook analyses the `checkIsType` method calls for the `http-ballerina-tests` test suite. The Types that are passed and the time taken to execute have been logged to CSV and the analysis can be found here. In order to measure the execution time, the `checkIsType` function was modified to run 10,000 times for every call. The hope was that this would reduce the noise associated with measuring small durations. However; we still notices that multiple calls of the same type checks can be differ by an order of maginitude. Thus, execution times of the same order of magnitude will be considered to be comparable.

# Data Processing

In [45]:
import csv
from tabulate import tabulate

In [46]:
def get_key(vals: list)-> str:
    return vals[0]+" / "+vals[1]

# Format and print float values
def print_f(dict, key: str)-> str:
    return f"{dict[key]:,.2f}"

# Format and print integer values
def print_i(dict, key: str)-> str:
    return f"{dict[key]:,d}"

In [47]:
count = 0
dict_count = {}
dict_total_time = {}
dict_time = {}

with open('./test-10000-full.csv', newline='') as f:
    reader = csv.reader(f)
    for row in reader:
        count += 1
        key = get_key(row[:2])
        if key in dict_count:
            dict_count[key] = dict_count[key] + 1
            dict_total_time[key] = dict_total_time[key] + int(row[2])
        else:
            dict_count[key] = 1
            dict_total_time[key] = int(row[2])

for key in dict_count:
    dict_time[key] = int(dict_total_time[key] / dict_count[key])

dict_count_sorted = sorted(dict_count.items(), key=lambda item: item[1],reverse=True)
dict_total_time_sorted = sorted(dict_total_time.items(), key=lambda item: item[1],reverse=True)
dict_time_sorted = sorted(dict_time.items(), key=lambda item: item[1],reverse=True)

In [48]:
print("Numeber of entires: ",count)
print("Numeber of unique entires: ",len(dict_count))

Numeber of entires:  32951
Numeber of unique entires:  119


# Visualization

In [49]:
show_amount = 30

In [50]:
print("Top %s common:"%show_amount)
print(tabulate(dict_count_sorted[:show_amount], headers=['Type', 'Number of occurrences'], tablefmt='orgtbl'))

Top 30 common:
| Type                                                |   Number of occurrences |
|-----------------------------------------------------+-------------------------|
| int / int                                           |                    3582 |
| string / string                                     |                    2809 |
| boolean / boolean                                   |                    2653 |
| map / any                                           |                    1252 |
| float / float                                       |                    1217 |
| string / $anonType$_148|$anonType$_149              |                    1021 |
| http:ClientConfiguration / http:ClientConfiguration |                     985 |
| string / $anonType$_148                             |                     881 |
| http:CacheConfig / http:CacheConfig                 |                     830 |
| () / http:ClientSecureSocket|()                     |                     751 |
|

## observations 
* Primitive type comparisions are very commom
* Null checking of Object types are common
* Object type checking is common

In [51]:
print("Top %s cumulative time:"%show_amount)
print(tabulate(dict_total_time_sorted[:show_amount], headers=['Type', 'Total time (ns)'], tablefmt='orgtbl'))

Top 30 cumulative time:
| Type                                                   |   Total time (ns) |
|--------------------------------------------------------+-------------------|
| http:HttpCachingClient / http:HttpClient               |     3860426963447 |
| http:CircuitBreakerClient / http:HttpClient            |      324655560331 |
| http:RedirectClient / http:HttpClient                  |      167709948957 |
| cache:LruEvictionPolicy / cache:AbstractEvictionPolicy |       12264443190 |
| string / $anonType$_148|$anonType$_149                 |        7465699204 |
| string / $anonType$_65|$anonType$_66|$anonType$_67     |        6300899780 |
| string / $anonType$_46|$anonType$_47|$anonType$_48     |        5904880226 |
| string / $anonType$_43|$anonType$_44|$anonType$_45     |        5134332369 |
| $$returnType$$ / lang.array:$anonType$_5|()            |        3454401319 |
| map / any                                              |        2645499423 |
| string / $anonType$_148   

In [59]:
print_i(dict_count,'http:HttpCachingClient / http:HttpClient')

'162'

## Observations
* Type checking of object types that are similar is taking up (`http:HttpCachingClient / http:HttpClient` `http:CircuitBreakerClient / http:HttpClient` and `http:RedirectClient / http:HttpClient`) most of the processing time
* This makes sense because the `checkIsType` logic has been written to fail fast for dissimilar types. Object types that are similar have to go through the longest execution paths.

In [52]:
print("Top %s time:"%show_amount)
print(tabulate(dict_time_sorted[:show_amount], headers=['Type', 'Single execution time (ns)'], tablefmt='orgtbl'))

Top 30 time:
| Type                                                    |   Single execution time (ns) |
|---------------------------------------------------------+------------------------------|
| http:HttpCachingClient / http:HttpClient                |                  23829796070 |
| http:CircuitBreakerClient / http:HttpClient             |                  23189682880 |
| http:RedirectClient / http:HttpClient                   |                  20963743619 |
| error / error|()                                        |                     73928241 |
| cache:LruEvictionPolicy / cache:AbstractEvictionPolicy  |                     43801582 |
| lang.array:ArrayIterator / lang.array:$anonType$_1 { 	function next() returns (lang.array:$anonType$_0|()) }                                                         |                     36767445 |
| error / error                                           |                     27093571 |
| http:LoadBalancerRoundRobinRule / http:LoadBalancerRule |

In [60]:
print_i(dict_count,'error / error')

'2'

In [57]:
dict_time_sorted[-10:]

[('() / mime:MediaType|()', 1062664),
 ('() / http:ResponseCacheControl|()', 1053566),
 ('string / json', 1043614),
 ('boolean / boolean', 914886),
 ('string / any|error', 892767),
 ('mime:Entity / mime:Entity', 767544),
 ('string / string', 513192),
 ('() / ()', 512570),
 ('float / float', 500185),
 ('int / int', 496830)]

## Observations
* Previous observation with the similar object types can be observed here too
* The comparision of similar primitives execute the fastest
* `error / error` time appears to be an annomaly. It follows the same execution path as `init / int` and should have produced similar execution times.

# Conclusions and recommendations

* It is apparent that the comparsion of similar type Objects contribute the most to the execution time of a "quasi real world" workload such as the aforementioned http-test workload
* Comparision of similar primitive types is not an issue at the moment
* An relatively easy fix for the Object type comparision would be to cache the results of the Object type checks. This has the potential to give a significant performance boost to real world workloads.