# ALMA Dataset

The dataset from ALMA correspond to some manually selected antenna containers that covers the full lifecycle of the related computer, between reboots. 

In [225]:
!ls ../../data/raw/alma | tail

dv25-acsStartContainer_cppContainer_2017-07-10_17.03.32.841
dv25-acsStartContainer_cppContainer_2017-07-10_17.12.59.636
dv25-acsStartContainer_cppContainer_2017-07-10_19.30.47.674
dv25-acsStartContainer_cppContainer_2017-07-10_20.43.06.773
dv25-acsStartContainer_cppContainer_2017-07-10_20.56.06.754
dv25-acsStartContainer_cppContainer_2017-07-11_19.55.26.410
dv25-acsStartContainer_cppContainer_2017-07-11_20.41.40.861
dv25-acsStartContainer_cppContainer_2017-07-11_20.55.42.275
dv25-acsStartContainer_cppContainer_2017-07-12_00.14.04.823
dv25-acsStartContainer_cppContainer_2017-07-12_00.40.12.586


In [226]:
!tail -n 5 ../../data/raw/alma/dv25-acsStartContainer_cppContainer_2017-07-10_17.03.32.841

2017-07-10T17:09:35.050 [CONTROL/DV25/cppContainer-GL - virtual void AmbDeviceImpl::monitorEnc(ACS::Time*, const AMBSystem::AmbRelativeAddr&, AmbDataLength_t&, AmbDataMem_t*)] CAMB Error (type=10000, code=0) Detail="The monitor request returned an error.  AMB status = 11, channel = 1, node number = 0x13, RCA = 0xb009."
terminate called after throwing an instance of 'St9bad_alloc'
  what():  St9bad_alloc
/alma/ACS-2015.8/ACSSW/bin/acsStartContainer: line 112:  3298 Aborted                 (core dumped) $COMMANDLINE
2017-07-10T17:09:48.100 INFO [acsStartContainer] Container: 'CONTROL/DV25/cppContainer' exited with code: 134.


## Load an antenna file

Given an antenna file:

In [319]:
ant_file="../../data/raw/alma/da41-acsStartContainer_cppContainer_2017-07-05_21.11.36.377"

Let's consider only the individual lines that follows the rule:
```
TIMESTAMP [Source] logtext
```
The file is read in a pandas dataframe

In [322]:
import pandas as pd
RAWLINES=!cat $ant_file | egrep "^2[0-9]..\-..\-..T.*[0-9][0-9][0-9] \["
raw=pd.DataFrame(RAWLINES)

And it is splitted into different columns, removing some useless strings (as CONTROL/ANT)

In [321]:
import re
regex = re.compile(r"CONTROL\/[A-Z][A-Z][0-9][0-9]")

raw["@timestamp"] = raw[0].apply( lambda r: pd.to_datetime( r[:23] ))
raw["source"]  = raw[0].apply( lambda r: regex.sub( "", r[24:].split("]")[0][1:] ) )
raw["logtext"] = raw[0].apply( lambda r: " ".join(r[24:].split("] ")[1:]) )
del raw[0]

KeyError: 0

In [309]:
raw

Unnamed: 0,@timestamp,source,logtext
0,2017-07-01 20:06:52.995,/cppContainer-GL - cdb::DAOImpl,DAO:'MACI/Containers/CONTROL/DA41/cppContainer...
1,2017-07-01 20:06:53.001,/cppContainer - maci::ContainerImpl::getManager,Resolving manager...
2,2017-07-01 20:06:53.003,/cppContainer-GL - maci::MACIHelper::resolveMa...,ManagerReference obtained via command line: 'c...
3,2017-07-01 20:06:53.505,/cppContainer - maci::Container::init,Recovery enabled.
4,2017-07-01 20:06:53.507,/cppContainer - maci::ContainerImpl::init,Container 'CONTROL/DA41/cppContainer' activated.
...,...,...,...
7729,2017-07-01 22:33:52.438,ambServer - loggerThread,Channel 1 has seen 0 total control messages an...
7730,2017-07-01 22:33:52.438,ambServer - loggerThread,Channel 2 has seen 0 total control messages an...
7731,2017-07-01 22:33:52.438,ambServer - loggerThread,Channel 3 has seen 0 total control messages an...
7732,2017-07-01 22:33:52.438,ambServer - loggerThread,Channel 4 has seen 0 total control messages an...


## Searching for Antenna Observing

From a previous work, we choose a high level task called "Antenna Observing" that are characterized by the following start / end events:

```
Request to load 'AntInterferometryController'
...
Switched state of component ... AntInterferometryController: DESTROYING -> DEFUNCT
```
Let's search in 

In [310]:
start = raw[ raw["logtext"].str.contains("Request to load.*AntInterferometryController", regex=True) ]
end = raw[ raw["logtext"].str.contains("AntInterferometryController: DESTROYING -> DEFUNCT", regex=False) ]

In [311]:
#start

In [312]:
#end

The events between those two  markers corresponds to AntennaObserving instances to be analyzed.

In [313]:
ant_obs=pd.DataFrame( { 'start': start["@timestamp"].values, 'end': end["@timestamp"].values })

In [314]:
ant_obs

Unnamed: 0,start,end
0,2017-07-01 21:02:13.979,2017-07-01 21:02:29.648
1,2017-07-01 21:03:21.631,2017-07-01 21:03:29.601
2,2017-07-01 21:20:08.150,2017-07-01 21:21:00.560
3,2017-07-01 21:23:26.907,2017-07-01 21:36:57.971
4,2017-07-01 21:37:25.567,2017-07-01 21:50:32.845
5,2017-07-01 21:55:11.232,2017-07-01 22:08:19.178
6,2017-07-01 22:08:47.676,2017-07-01 22:21:53.586


And now, let's filter the raw logs by the ant_obs dates

In [315]:
for i, r in ant_obs.iterrows():
    print( r["start"], len(raw[ raw["@timestamp"] >= r["start"] ][ raw["@timestamp"] <= r["end"] ]))

2017-07-01 21:02:13.979000 18
2017-07-01 21:03:21.631000 19
2017-07-01 21:20:08.150000 41
2017-07-01 21:23:26.907000 341
2017-07-01 21:37:25.567000 314
2017-07-01 21:55:11.232000 314
2017-07-01 22:08:47.676000 319


  


## All Together

In [323]:
import pandas as pd
import re

def get_alma_logs(ant_file):
    RAWLINES=!cat $ant_file | egrep "^2[0-9]..\-..\-..T.*[0-9][0-9][0-9] \["
    raw=pd.DataFrame(RAWLINES)
    if len(raw) == 0:
        return []

    regex = re.compile(r"CONTROL\/[A-Z][A-Z][0-9][0-9]")

    raw["@timestamp"] = raw[0].apply( lambda r: pd.to_datetime( r[:23] ))
    raw["source"]  = raw[0].apply( lambda r: regex.sub( "", r[24:].split("]")[0][1:] ) )
    raw["logtext"] = raw[0].apply( lambda r: " ".join(r[24:].split("] ")[1:]) )
    del raw[0]
    
    start = raw[ raw["logtext"].str.contains("Request to load.*AntInterferometryController", regex=True) ]
    end = raw[ raw["logtext"].str.contains("AntInterferometryController: DESTROYING -> DEFUNCT", regex=False) ]
    
    minl = min(len(start), len(end))
    ant_obs=pd.DataFrame( { 'start': start[:minl]["@timestamp"].values, 'end': end[:minl]["@timestamp"].values })
    
    obs_logs = []
    for i, r in ant_obs.iterrows():
        obs_logs.append( raw[ raw["@timestamp"] >= r["start"] ][ raw["@timestamp"] <= r["end"] ] )
    
    return obs_logs

In [324]:
for logs in get_alma_logs("../../data/raw/alma/da41-acsStartContainer_cppContainer_2017-07-01_20.06.49.894"):
    print ( "At %s there are %s logs" % (logs[["@timestamp"]].values[0], len(logs)) )

At ['2017-07-01T21:02:13.979000000'] there are 18 logs
At ['2017-07-01T21:03:21.631000000'] there are 19 logs
At ['2017-07-01T21:20:08.150000000'] there are 41 logs
At ['2017-07-01T21:23:26.907000000'] there are 341 logs
At ['2017-07-01T21:37:25.567000000'] there are 314 logs
At ['2017-07-01T21:55:11.232000000'] there are 314 logs
At ['2017-07-01T22:08:47.676000000'] there are 319 logs




## Storing ALMA datasets in INTERIM

In [325]:
FILES=!ls ../../data/raw/alma/

In [333]:
all_stats=[]
for f in FILES:
    alllogs = get_alma_logs("../../data/raw/alma/%s" % f)
    print("Processing %s with #%s logs" % (f, len(alllogs)))
    for logs in alllogs:
        stats={}
        stats["antenna"] = f[:4]
        stats["@timestamp"] = str(logs["@timestamp"][0:1].values[0])[:23]
        stats["# logs"] = len(logs)
        logs.to_csv("../../data/interim/ALMA-%s-%s-AntObs.csv" % (stats["@timestamp"], stats["antenna"]), index=False )
        
        all_stats.append(stats)
pstats=pd.DataFrame(all_stats)
pstats.to_csv("../../data/interim/ALMA-count-AntObs.csv", index=False)
pstats

Processing da41-acsStartContainer_cppContainer_2017-07-01_18.16.48.847 with #0 logs
Processing da41-acsStartContainer_cppContainer_2017-07-01_20.06.49.894 with #7 logs




Processing da41-acsStartContainer_cppContainer_2017-07-01_22.45.35.715 with #0 logs
Processing da41-acsStartContainer_cppContainer_2017-07-03_20.37.03.645 with #0 logs
Processing da41-acsStartContainer_cppContainer_2017-07-03_21.43.00.460 with #2 logs
Processing da41-acsStartContainer_cppContainer_2017-07-05_19.22.44.226 with #0 logs
Processing da41-acsStartContainer_cppContainer_2017-07-05_19.44.33.199 with #0 logs
Processing da41-acsStartContainer_cppContainer_2017-07-05_19.56.28.307 with #0 logs
Processing da41-acsStartContainer_cppContainer_2017-07-05_21.11.36.377 with #0 logs
Processing da41-acsStartContainer_cppContainer_2017-07-05_21.12.49.008 with #0 logs
Processing da41-acsStartContainer_cppContainer_2017-07-05_21.51.45.672 with #4 logs
Processing da41-acsStartContainer_cppContainer_2017-07-07_20.39.24.481 with #0 logs
Processing da41-acsStartContainer_cppContainer_2017-07-07_21.01.00.244 with #3 logs
Processing da41-acsStartContainer_cppContainer_2017-07-10_00.15.06.909 with 

Unnamed: 0,antenna,@timestamp,# logs
0,da41,2017-07-01T21:02:13.979,18
1,da41,2017-07-01T21:03:21.631,19
2,da41,2017-07-01T21:20:08.150,41
3,da41,2017-07-01T21:23:26.907,341
4,da41,2017-07-01T21:37:25.567,314
...,...,...,...
625,dv25,2017-07-12T11:02:24.567,273
626,dv25,2017-07-12T12:24:18.011,206
627,dv25,2017-07-12T12:54:40.157,963
628,dv25,2017-07-12T14:39:02.679,933
