# Summary of ALMA antenna logs

October 13, 2018.

The raw dataset was extracted from a subset of ALMA logs, INFO level and above. It covers 14 days of antenna movements including maintenance and science observations as well.

# Preamble

The following code is required to read actual data.

In [1]:
from src import *
from src.models.AlmaClasses import *

# Colors found
palette = PaletteFileDB(
    filename='../data/processed/colors-almaAntenna.pkl', 
    colorFunction=paintedForAlmaAntennas )
colors=palette.getColors()

# Delays found
AntennaObserving = DelaysFileDB( caseName="CaseAntennaObserving", path= '../' + config.FILEPATH_DB+"/delays")  

# Method

## Colorization

Corresponds to cluster events by text similarity and assigning a uniqe number. The similarity is measured by removing numbers but keeping some specific strings related to equipment that must be distinguished uniquely, for example IFPROC_1 and IFPROC_2.  

In [2]:
# Found colors:
len(colors)

1468

In [3]:
# Example of color: all 
colors[638]

'[CONTROL/${ANT}/cppContainer-GL - ] Switched state of component CONTROL/${ANT}/LORR: DESTROYING -> DEFUNCT'

## Delay Extraction

Details of the process is described elsewhere. In brief, the set of all logs is splitted in *cases* representing the same high level task, and then the time differences between all *comparable pairs* (those with same cardinality within a single case) are extracted and stored as a sequence of *delays*, labelled with the pair of colors _(A,B)_.

Three tasks were analyzed, below is shown the most interesting high level task: _Antenna Observing_.

In [4]:
# Unique colors found among all cases
len(AntennaObserving.unique_colors())

125

In [5]:
# Number of (A,B) pairs found among all cases
AntennaObserving.total_pairs()

4025

In [23]:
# Total cases found in all logs
total_cases=AntennaObserving.total_cases()
total_cases

351

In [6]:
instances = sorted( AntennaObserving.instances_per_pair() , key=lambda (x,y): y, reverse=True )

# Show a sample of ((A,B), seq_len)
instances[0:2000:300]

[((464, 471), 351),
 ((492, 495), 150),
 ((487, 580), 78),
 ((581, 579), 44),
 ((510, 526), 23),
 ((480, 512), 15),
 ((481, 587), 9)]

## Filter out by percentage of cases

The pairs found for Antenna Observing can be stripped out to those meaningful. The first criteria for this filtering is how often a specific pair ```(A,B)``` appears by case. Note that the total number of delays pertaining to a specific pair is strictly greater than the appearing by case.

In [35]:
MIN_PERCENTAGE_OF_CASES = 0.2

instances_long = [ ((A,B), cases_len) for ((A,B), cases_len) in instances if float(cases_len)/total_cases > MIN_PERCENTAGE_OF_CASES ]

In [36]:
# Size of filtered set
len(instances_long)

671

In [37]:
# Percentage of filtered values
"%0.2f %%" % ( 100.0*len(instances_long)/len(instances) )

'16.67 %'

Now we have also restricted the colors over which perform the analysis

In [38]:
filtered_colors = set( [ A for ((A,B), seq_len) in instances_long ] + [ B for ((A,B), seq_len) in instances_long ] )

In [39]:
len(filtered_colors)

59

Below is an attempt of a matrix N x N where the cell (A,B) = seq_len if defined in ```instances_long```. Also note that the data below is already available as ```matrix.csv``` and ```matrix.xls```.

In [40]:
col_names = sorted(sorted([ "%s" %i for i in filtered_colors]))
print "AxB," + ",".join(col_names)

for i in range(len(filtered_colors)):
    row = []
    for j in range(len(filtered_colors)):
        val = "   "
        for (A,B), seq_len in instances_long:
#             print (A,B) , col_names[i]
            if col_names[i] == str(A) and col_names[j] == str(B):
                val=str(seq_len)
        row.append(val)
    print "%s," % col_names[i] + ",".join(row)

AxB,387,395,398,400,402,439,464,465,466,467,468,469,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,486,487,488,489,490,491,492,493,494,495,497,498,500,501,502,503,504,505,506,507,509,510,511,512,514,515,578,579,580,581,584,585
387,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,78,   ,   ,   ,   ,72,72,72,72,72,72,72,72,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   
395,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   
398,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   
400,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   , 

For example, the number of cases where the pair ```(A,B) = (402,400)``` has at least 1 delay sequence is:

In [43]:
[ cases_len for ((A,B), cases_len) in instances if (A,B) == (402,400) ] [0]


120

## Filter by amount of delays


... voy aqui!!!