This document explains the strategy used to clusterize free text logs by looking the constant part. First, you need to import the code available at https://bitbucket.org/jpgil_cl/procdelays.

# How to color
The colors below were generated using `paintedForAlmaAntennas` function, which remove numbers but keeps some specific equipment that must be distinguished uniquely. 

In [1]:
from src.models.AlmaClasses import paintedForAlmaAntennas

In [2]:
paintedForAlmaAntennas("Example")

'Example'

In [3]:
paintedForAlmaAntennas("Example with 1 number")

'Example with ${N} number'

In [4]:
paintedForAlmaAntennas("Specific 2 antennas: DV01 and CM12")

'Specific ${N} antennas: ${ANT} and ${ANT}'

In [5]:
paintedForAlmaAntennas("Specific hardware: IFProc0 and IFProc1. Compare with others like DTX0, DTX1, and so on.")

'Specific hardware: IFProc_A and IFProc_B. Compare with others like DTX${N}, DTX${N}, and so on.'

# Discovered Palette
Some statistic and counting over colors. The palette has a dictionary that is persistent on executions (it mixes all the analysis in all files), but it does not count colors or instances per case. For that, you need to use a special class called CaseStats.

In [6]:
from src import *
from src.models.AlmaClasses import *
palette = PaletteFileDB(
    filename='../data/processed/colors-almaAntenna.pkl', 
    colorFunction=paintedForAlmaAntennas )

colors=palette.getColors()
len(colors)

1468

To see one color you can query by index, or by the color itself

In [7]:
colors[952]

'[CONTROL/${ANT}/cppContainer-GL - void Control::AntennaImpl::resynchroniseLORR()] Antenna ID Error (type=${N}, code=${N}) Detail="The LORR reports an unsynchronised TE signal. Please check that the LORR is in good shape and that the incoming TE signal is alive."'

In [8]:
colors[653]

'[CONTROL/${ANT}/DTXBBpr_AWriterThread - virtual Logging::Logger::~Logger()] LOGGING STATISTICS FOR: Undefined.CONTROL/${ANT}/DTXBBpr_AWriterThread ErrorMessageIncrement="nan" MessageIncrement="nan" LastPeriodNumberOfErrorMessages="${N}" LastPeriodNumberOfMessages="${N}" LastPeriodDuration="${N}" ErrorMessageStatistics="${N}" MessageStatistics="${N}" StatisticsGranularity="${N}" LoggerId="CONTROL/${ANT}/DTXBBpr_AWriterThread" StatisticsIdentification="Undefined"'

In [11]:
palette.index("[CONTROL/${ANT}/FrontEnd/IFSwitch - ] ContainerServices::getComponentNonSticky(CONTROL/${ANT}/IFProc_B)")

511

Below is shown a subset of the colors:

In [47]:
for i in range(621,650):
    print ("Color %d: %s" % (i, colors[i][:150]))

Color 621: [CONTROL/${ANT}/cppContainer-GL - virtual CORBA::Long PowerDistBase::GET_CARTRIDGE_ENABLE(ACS::Time&)] Inactive (type=${N}, code=${N})
Color 622: LoggingProxy: Disconnected from the Centralized Logger, using local logging cache.
Color 623: Failed to create cache logger. Logging cache is lost!
Color 624: [CONTROL/${ANT}/cppContainer-GL - BaseSupplier::publishEvent] Failed to send an event of type 'ACSJMSMessageEntity' to the 'CMW.ALARM_SYSTEM.ALARMS.SO
Color 625: [CONTROL/${ANT}/cppContainer-GL - ] Error getting TMCDB component
Color 626: [CONTROL/${ANT}/cppContainer-GL - virtual std::string XmlTmcdbComponent::getConfigXml(const std::string&)] Tmcdb access is _nil, trying to get data fr
Color 627: [CONTROL/${ANT}/cppContainer-GL - virtual std::string XmlTmcdbComponent::getConfigXsd(const std::string&)] Tmcdb access is _nil, trying to get data fr
Color 628: [CONTROL/${ANT}/cppContainer-GL - virtual uint${N}_t IFProcBase::getTimingErrorFlag(ACS::Time&)] Inactive (type=${N}, cod

Those colors were obtained in near 10 minutes over these 6 files. Note that there are 270 files that can be processed:

In [48]:
!ls ../data/interim/ | tail

dv25-acsStartContainer_cppContainer_2017-07-10_17.03.32.841_STRIPPED
dv25-acsStartContainer_cppContainer_2017-07-10_17.12.59.636_STRIPPED
dv25-acsStartContainer_cppContainer_2017-07-10_19.30.47.674_STRIPPED
dv25-acsStartContainer_cppContainer_2017-07-10_20.43.06.773_STRIPPED
dv25-acsStartContainer_cppContainer_2017-07-10_20.56.06.754_STRIPPED
dv25-acsStartContainer_cppContainer_2017-07-11_19.55.26.410_STRIPPED
dv25-acsStartContainer_cppContainer_2017-07-11_20.41.40.861_STRIPPED
dv25-acsStartContainer_cppContainer_2017-07-11_20.55.42.275_STRIPPED
dv25-acsStartContainer_cppContainer_2017-07-12_00.14.04.823_STRIPPED
dv25-acsStartContainer_cppContainer_2017-07-12_00.40.12.586_STRIPPED


# Statistics on pairs

In [49]:
from src import *
from src.models.AlmaClasses import *

db = DelaysFileDB( caseName="CaseAntennaObserving", path= '../' + config.FILEPATH_DB+"/delays")  

In [50]:
db.caseName

'CaseAntennaObserving'

In [51]:
#db.instances_per_pair()[:5]

In [52]:
len(db.unique_colors())

119

In [53]:
db.total_pairs()

3870

In [54]:
some_pair, value = db.instances_per_pair()[10]
db.delays_per_pair()[some_pair]

311

In [55]:
db.total_cases()

325

## CaseRadioSetup

In [56]:
db = DelaysFileDB( caseName="CaseRadioSetup", path= '../' + config.FILEPATH_DB+"/delays")  

In [57]:
len(db.unique_colors())

9

In [58]:
db.total_pairs()

18

In [59]:
db.total_cases()

531

## CaseAntennaInArray

In [60]:
db = DelaysFileDB( caseName="CaseAntennaInArray", path= '../' + config.FILEPATH_DB+"/delays")  

In [61]:
len(db.unique_colors())

228

In [62]:
db.total_pairs()

12067

In [63]:
db.total_cases()

156