# Summary of ALMA antenna logs

October 13, 2018.

The raw dataset was extracted from a subset of ALMA logs, INFO level and above. It covers 14 days of antenna movements including maintenance and science observations as well.

# Preamble

The following code is required to read actual data.

In [1]:
from src import *
from src.models.AlmaClasses import *

# Colors found
palette = PaletteFileDB(
    filename='../data/processed/colors-almaAntenna.pkl', 
    colorFunction=paintedForAlmaAntennas )
colors=palette.getColors()

# Delays found
AntennaObserving = DelaysFileDB( caseName="CaseAntennaObserving", path= '../' + config.FILEPATH_DB+"/delays")  

# Method

## Colorization

Corresponds to cluster events by text similarity and assigning a uniqe number. The similarity is measured by removing numbers but keeping some specific strings related to equipment that must be distinguished uniquely, for example IFPROC_1 and IFPROC_2.  

In [2]:
# Found colors:
len(colors)

1468

In [3]:
# Example of color: all 
colors[638]

'[CONTROL/${ANT}/cppContainer-GL - ] Switched state of component CONTROL/${ANT}/LORR: DESTROYING -> DEFUNCT'

## Delay Extraction

Details of the process is described elsewhere. In brief, the set of all logs is splitted in *cases* representing the same high level task, and then the time differences between all *comparable pairs* (those with same cardinality within a single case) are extracted and stored as a sequence of *delays*, labelled with the pair of colors _(A,B)_.

Three tasks were analyzed, below is shown the most interesting high level task: _Antenna Observing_.

In [4]:
# Unique colors found among all cases
len(AntennaObserving.unique_colors())

125

In [5]:
# Number of (A,B) pairs found among all cases
AntennaObserving.total_pairs()

4025

In [9]:
# Total cases found in all logs
total_cases = AntennaObserving.total_cases()

total_cases

351

In [10]:
instances_per_pair = sorted( AntennaObserving.instances_per_pair() , key=lambda (x,y): y, reverse=True )

# Show a sample of ((A,B), seq_len)
instances_per_pair[0:2000:300]

[((464, 471), 351),
 ((492, 495), 150),
 ((487, 580), 78),
 ((581, 579), 44),
 ((510, 526), 23),
 ((480, 512), 15),
 ((481, 587), 9)]

In [29]:
delays_per_pair = AntennaObserving.delays_per_pair()

In [33]:
# How many pairs (should be the same than total_pairs)
len(delays_per_pair.keys())

4025

In [36]:
# Example amount of delays for a specific pair
delays_per_pair[(402, 400)]

224

## Filter out by percentage of cases

The pairs found for Antenna Observing can be stripped out to those meaningful. The first criteria for this filtering is how often a specific pair ```(A,B)``` appears by case. Note that the total number of delays pertaining to a specific pair is strictly greater than the appearing by case.

In [11]:
MIN_PERCENTAGE_OF_CASES = 0.2

instances_long = [ ((A,B), cases_len) for ((A,B), cases_len) in instances_per_pair if float(cases_len)/total_cases > MIN_PERCENTAGE_OF_CASES ]

In [12]:
# Size of filtered set
len(instances_long)

671

In [13]:
# Percentage of filtered values
"%0.2f %%" % ( 100.0*len(instances_long)/len(instances_per_pair) )

'16.67 %'

Now we have also restricted the colors over which perform the analysis

In [14]:
filtered_colors = set( [ A for ((A,B), seq_len) in instances_long ] + [ B for ((A,B), seq_len) in instances_long ] )

In [15]:
len(filtered_colors)

59

Below is an attempt of a matrix N x N where the cell (A,B) = seq_len if defined in ```instances_long```. Also note that the data below is already available as ```matrix_by_cases.csv```.

In [69]:
col_names = sorted(sorted([ "%s" %i for i in filtered_colors]))
matrix = "AxB," + ",".join(col_names)

for i in range(len(filtered_colors)):
    row = []
    for j in range(len(filtered_colors)):
        val = "   "
        for (A,B), seq_len in instances_long:
            if col_names[i] == str(A) and col_names[j] == str(B):
                val=str(seq_len)
        row.append(val)
    matrix += "\n%s," % col_names[i] + ",".join(row)
    
with open('matrix_by_cases.csv','wb') as file:
        file.write(matrix)
#print matrix

For example, the number of cases where the pair ```(A,B) = (402,400)``` has at least 1 delay sequence is:

In [22]:
[ cases_len for ((A,B), cases_len) in instances_per_pair if (A,B) == (402,400) ] [0]

120

## Filter by amount of delays

The same analysis can be done by total delays for a pair ```(A,B)```. The amount of delays by pair can be ordered to have a glimpse on lenghts distribution 

In [52]:
delays_per_pair_length = sorted(delays_per_pair.values())

In [53]:
min(delays_per_pair_length), max(delays_per_pair_length)

(1, 2021)

Then again we can filter out those pairs where the total number of delays (including all cases) are greater than a threshold

In [54]:
MIN_TOTAL_DELAY_LENGTH = 100

In [62]:
delays_filtered_by_length = [ ( (A,B), delays_per_pair[(A,B)] ) for (A,B) in AntennaObserving.pair_names() if delays_per_pair[(A,B)]  >= MIN_TOTAL_DELAY_LENGTH ]

In [63]:
len(delays_filtered_by_length)

609

In [64]:
filtered_colors_by_length = set( [ A for ((A,B), seq_len) in delays_filtered_by_length ] + [ B for ((A,B), seq_len) in delays_filtered_by_length ] )

In [65]:
len(filtered_colors_by_length)

60

Now let's dump the result in ```matrix_by_delays```

In [68]:
col_names = sorted(sorted([ "%s" %i for i in filtered_colors_by_length]))
matrix = "AxB," + ",".join(col_names)

for i in range(len(filtered_colors_by_length)):
    row = []
    for j in range(len(filtered_colors_by_length)):
        val = "   "
        for (A,B), seq_len in delays_filtered_by_length:
            if col_names[i] == str(A) and col_names[j] == str(B):
                val=str(seq_len)
        row.append(val)
    matrix += "\n%s," % col_names[i] + ",".join(row)
    
with open('matrix_by_delays.csv','wb') as file:
        file.write(matrix)
#print matrix

In [71]:
# How many delays has a specific pair?
[ delays_len for ((A,B), delays_len) in delays_filtered_by_length if (A,B) == (402,400) ] [0]

224