# Zodiac: Source Code and Documentation

###### Date: May 30, 2016

Modern buildings consist of many different types of infrastructure such as lighting, air conditioning, power, water. For operation and maintenance of these systems with minimal manual supervision, networked sensors are deployed across the building so that it can be monitored remotely. <a href="https://en.wikipedia.org/wiki/Building_management_system"> Building Management Systems (BMS)</a> are software systems that collect data from installed sensors, allow remote control of equipment and provide visualizations for the maintenance personnel. 

Modern BMSes have thousands of data points per building, and these data points correspond to installed sensors, actuators of equipment as well as configuration parameters. With modern data processing and control mechanisms it is possible to exploit these data points to create useful and innovative building applications such as personalized control, fault detection, demand response management, model predictive control, power grid stability and many more. However, a major impediment to deployment of these applications is that the data points are organized for building domain experts and not computer algorithms. As a result, the metadata that describes the context of the data points in the BMS has errors, extraneous notes, vendor specific notations and other inconsistencies which make it difficult for a machine to interpret the data. Our project Zodiac exploits machine learning techniques to map the raw building metadata to a consistent format so that applications can be developed on top of a common interface and reused across multiple buildings. 

Our full research paper which describes the building metadata problem and the Zodiac algorithm can be found <a href="http://dl.acm.org/citation.cfm?id=2821674">here</a>. The Zodiac project home page where we share our raw building metadata, manually labelled ground truth data point types and this source code page can be found <a href="http://www.synergylabs.org/bharath/zodiac.html">here</a>.

### Source Code

We are making the source code available using the Jupyter notebook with explanations alongside so it is easy to follow and replicate our results. 

#### Input File Paths
The notebook code assumes that the raw metadata and manually labelled ground truth files are available in "metadata" directory at the same level as the notebook.

#### Python Libraries
We heavily use the <a href="http://scikit-learn.org/stable/">Python Scikit Learn</a> library, which contains implementations of popular machine learning algorithms such as hierarchical clustering and random forest classifier used in our algorithm. We use Python <a href="http://www.numpy.org/">numpy</a> and <a href="http://pandas.pydata.org/">pandas</a> libraries to store and manipulate large records of metadata. These data structures work well with Scikit Learn modules.

We use Python <a href="https://docs.python.org/2/library/re.html">re</a> for regular expressions and <a href="http://matplotlib.org/">matplotlib</a> for plotting. We also use <a href="https://docs.python.org/2/library/shelve.html">shelve</a> and <a href="https://docs.python.org/2/library/pickle.html">pickle</a> for object serialization and non-volatile storage.

In [1]:
#essential libraries
import numpy as np
import pandas as pd
from scipy.cluster.vq import *
import operator
from matplotlib import pyplot as plt   
import pickle as pkl
import shelve
import re
from collections import Counter
from sklearn.feature_extraction.text import CountVectorizer
import scipy
from sklearn.feature_extraction import DictVectorizer

The file "bacnet_devices.shelve" contains the building points metadata for about 55 buildings at <a href="http://ucsd.edu/">University of California, San Diego</a> (UCSD). We obtain this metadata using <a href="http://www.bacnet.org/">BACnet</a>, a standard building automation network communication protocol. 

When loaded, the shelve file becomes a Python dictionary. The format of the metadata is as follows:

The "bacnet_device_id" corresponds to a middlebox that hosts up to 4000 building data points. One building can be assigned several middleboxes based on its requiments and the network design. All of the 55 buildings we obtain our metadata from are managed by the vendor <a href="http://www.johnsoncontrols.com/">Johnson Control Inc.</a>, and some of the metadata organizing may be specific to this vendor.

Each point is identified by its "instance" within the middlebox, and we define a "source_id" as a university wide unique identifier for the point. There are two names associated with the point. The "name" corresponds to BACnet name property, and the name is based on network architecture. The "jci_name" is a proprietary BACnet field used by Johnson Controls, and the name is assigned relative to building location hierarchy. The last part of the name ("ZN T" above) encodes an abbreviation of the data point type. The human readable data point type is given in "description" (e.g. "Zone Temperature"). 

BACnet also encodes the data type and input/output in one field referred to as "type" above. For example, "analog input" means the data type is float and the point is of input type (e.g. sensor).

In [2]:
#Import shelve dictionary containing all bacnet devices
sensors_dict = shelve.open('metadata/bacnet_devices.shelve','r')

In [3]:
#device_list filters the NAE for a particular building. This is currently manual. It can be automated 
#if building names are known.

#bonner hall
device_list = [
                "557",
                "607",
                "608",
                "609",
                "610",
]

In [4]:
#Parse the data in the dictionary as filtered by device_list
#Gives us a sensor_list with sensor information of a building
sensor_list = []
names_list = []
names_listWithDigits = [] 
sensor_type_namez=[]
desc_list = []
unit_list = []
type_str_list = []
type_list = []
jci_names_list = []
source_id_set = set([])
for nae in device_list:
    device = sensors_dict[nae]
    h_dev = device['props']
    for sensor in device['objs']:
        h_obj = sensor['props']
        source_id = str(h_dev['device_id']) + '_' + str(h_obj['type']) + '_' + str(h_obj['instance'])
        
        if h_obj['type'] not in (0,1,2,3,4,5,13,14,19):
            continue
        
        if source_id in source_id_set:
            continue
        else:
            source_id_set.add(source_id)
        
        #create individual lists
        #remove numbers from names because they do not indicate type of sensor
        names_listWithDigits.append(sensor['jci_name']) 
        sensor_type_namez.append(sensor['sensor_type'])
        names_list.append(''.join([c for c in sensor['name'] if not c.isdigit()]))
        desc_list.append(''.join([c for c in sensor['desc'] if not c.isdigit()]))
        jci_names_list.append(''.join([c for c in sensor['jci_name'] if not c.isdigit()]))
        #convert string to dictionary for categorical vectorization
        unit_list.append({str(sensor['unit']):1})
        type_str_list.append({str(h_obj['type_str']):1})
        type_list.append({str(h_obj['type']):1})
        
        #create a flat list of dictionary to avoid using json file
        sensor_list.append({'source_id': source_id, 
                            'name': sensor['name'], 
                            'description': sensor['desc'],
                            'unit': sensor['unit'],
                            'type_string': h_obj['type_str'],
                            'type': h_obj['type'],
                            #'device_id': h_obj['device_id'],
                            'jci_name': sensor['jci_name'],
                            #add data related characteristics here
                        })
sensor_df = pd.DataFrame(sensor_list)
sensor_df = sensor_df.set_index('source_id')
sensor_df = sensor_df.groupby(sensor_df.index).first()
print len(sensor_list)

3213


In [5]:
#Create a bag of words from sensor string metadata. Vectorize so that it can be used in ML algorithms.
namevect = CountVectorizer(token_pattern='(?u)\\b\\w+\\b')
namebow = scipy.sparse.coo_matrix(namevect.fit_transform(names_list))

descvect = CountVectorizer() 
descbow = scipy.sparse.coo_matrix(descvect.fit_transform(desc_list))

unitvect = DictVectorizer() 
unitbow = scipy.sparse.coo_matrix(unitvect.fit_transform(unit_list))

type_str_vect = DictVectorizer() 
type_str_bow = scipy.sparse.coo_matrix(type_str_vect.fit_transform(type_str_list))

typevect = DictVectorizer() 
typebow = scipy.sparse.coo_matrix(typevect.fit_transform(type_list))

jcivect = CountVectorizer() 
jcibow = scipy.sparse.coo_matrix(jcivect.fit_transform(jci_names_list))

feature_set = jcivect.get_feature_names()+ \
              descvect.get_feature_names()+ \
              unitvect.get_feature_names()+ \
              type_str_vect.get_feature_names()+ \
              typevect.get_feature_names()
              

final_bow = scipy.sparse.hstack([
                                 #namebow,
                                 descbow,
                                 unitbow,
                                 type_str_bow,
                                 typebow,
                                 jcibow
                                ]) 
bow_array = final_bow.toarray() # this is the bow for each sensor. 

In [6]:
# Hierarchical agglomerative clustering 
from scipy.cluster.hierarchy import linkage, dendrogram
import scipy.cluster.hierarchy as hier

num_of_sensors = len(bow_array)
a = np.array(bow_array[:num_of_sensors])
z = linkage(a,metric='cityblock',method='complete')

In [7]:
#Apply threshold to hierarchical tree to obtain individual clusters. Results stored in equip_map
dists = list(set(z[:,2]))
thresh = (dists[2] + dists[3]) /2 
print "Threshold: ", thresh
b = hier.fcluster(z,thresh, criterion='distance')
cluster_map = {}
equip_map = {}
for i in range(len(b)):
    cluster_map[names_list[i]] = b[i]
    print i, names_list[i], b[i]
    if b[i] in equip_map:
        equip_map[b[i]]["sensors"].append(sensor_list[i])
        equip_map[b[i]]["sensor_ids"].append(i)
    else:
        equip_map[b[i]] = {"sensors":[sensor_list[i]]}
        equip_map[b[i]]["sensor_ids"] = [i]
    sensor_list[i]['equip_cluster_id'] = b[i]
sorted_map = sorted(cluster_map.items(), key=operator.itemgetter(1))

Threshold:  2.5
0 NAE  N  VAV  RMA T 246
1 NAE  N  VAV  EB T 243
2 NAE  N  VAV  EB T 243
3 NAE  N  VAV  RMA T 248
4 NAE  N  VAV  RA T 244
5 NAE  N  VAV  SA T 245
6 NAE  N  VAV  RA T 244
7 NAE  N  VAV  SA T 245
8 NAE  N  VAV  RA T 244
9 NAE  N  VAV  SA T 245
10 NAE  N  VAV  RA T 244
11 NAE  N  VAV  SA T 245
12 NAE  N  VAV  ZN T 95
13 NAE  N  VAV  WC ADJ 240
14 NAE  N  VAV  ACTCLG SP 241
15 NAE  N  VAV  ACTHTG SP 242
16 NAE  N  VAV  ZN T 95
17 NAE  N  VAV  WC ADJ 240
18 NAE  N  VAV  ACTCLG SP 241
19 NAE  N  VAV  ACTHTG SP 242
20 NAE  N  VAV  ZN T 95
21 NAE  N  VAV  WC ADJ 240
22 NAE  N  VAV  ACTCLG SP 241
23 NAE  N  VAV  ACTHTG SP 242
24 NAE  N  VAV  RA T 244
25 NAE  N  VAV  SA T 245
26 NAE  N  VAV  RA T 244
27 NAE  N  VAV  SA T 245
28 NAE  N  VAV  RA T 244
29 NAE  N  VAV  SA T 245
30 NAE  N  VAV  RA T 244
31 NAE  N  VAV  SA T 245
32 NAE  N  VAV  RA T 244
33 NAE  N  VAV  SA T 245
34 NAE  N  VAV  RA T 244
35 NAE  N  VAV  SA T 245
36 NAE  N  VAV  RA T 244
37 NAE  N  VAV  SA T 245
38 NAE  N

In [8]:
#read ground truth sensor types
import csv
building = 'bonner'
ground_truth_list = []
with open('metadata/'+building+'_sensor_types.csv') as ground_truth_file:
    csv_reader = csv.DictReader(ground_truth_file)
    for row in csv_reader:
        ground_truth_list.append(row)
sensor_type_map = {s['source_id']:s['sensor_type'] for s in ground_truth_list}

In [9]:
# Merges the clusters formed by hierarchical clustering based on "description" tag. 
equip_desc_map = {}
sensor_abbrvs = [s['jci_name'].split('.')[-1].lower() if '.' in s['jci_name'] else s['jci_name'] for s in ground_truth_list]
#sensor_abbrvs = [re.sub('[^a-z ]', '', s) for s in sensor_abbrvs]

for k,v in equip_map.iteritems():
    #print v
    desc_list = [s['description'].lower() for s in v['sensors']]
    desc_list = [re.sub('[^a-z ]', '', d) for d in desc_list]
    desc_list = [sensor_abbrvs[i] if d == '' else d for i,d in enumerate(desc_list)]
    if len(set(desc_list)) == 1:
        if desc_list[0] in equip_desc_map and desc_list[0] != '':
            equip_desc_map[desc_list[0]]['sensors'] += v['sensors']
            equip_desc_map[desc_list[0]]['sensor_ids'] += v['sensor_ids']
        elif desc_list[0] == '':
            equip_desc_map[k] = v
        else:
            equip_desc_map[desc_list[0]] = v
    else:
        equip_desc_map[k] = v
    
#print "merged cluster:", len(equip_desc_map)

merged cluster: 195


In [10]:
#get ground truth set
#equip_map = equip_desc_map #Uncomment for using merged clusters
# Manually label say 10 clusters and hence multiple sensors. 
import random
num_manual_labels = 10
sensor_labels = []
sensor_bow = []
labeled_equip_keys = []
equip_cluster_lens = {k:len(v['sensors']) for k,v in equip_map.iteritems()}
sorted_equip_keys = sorted(equip_cluster_lens.items(), key=operator.itemgetter(1), reverse=True)
for i in range(num_manual_labels):
#for c_id in equip_map.keys()[:num_manual_labels]:
    c_id = random.choice(equip_map.keys())
    #c_id = sorted_equip_keys[i][0]
    labeled_equip_keys.append(c_id)
    for ix,i in enumerate(equip_map[c_id]['sensor_ids']):
            sensor_bow.append(bow_array[i])
            source_id = sensor_list[i]['source_id']
            sensor_labels.append(sensor_type_map[source_id])
sensor_bow = np.array(sensor_bow)
sensor_labels = np.array(sensor_labels)

(52, 402)
(52,)


In [11]:
#learn a model
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import BaggingClassifier
from sklearn.svm import OneClassSVM
from sklearn.mixture import GMM
from sklearn.mixture import DPGMM
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import AdaBoostClassifier
from sklearn.naive_bayes import MultinomialNB
le = LabelEncoder()
le.fit(sensor_labels)
#print list(le.classes_)
train_labels = le.transform(sensor_labels)
model = RandomForestClassifier(n_estimators=400, random_state=0)
model.fit(sensor_bow,sensor_labels)
#model.fit(sensor_bow)

['actual heating setpoint', 'call for fan to run', 'cooling demand', 'exhaust fan start stop', 'exhaust fan status', 'power in kw', 'push to test button', 'supply air temperature reset high setpoint', 'supply air temperature reset low setpoint']


RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=400, n_jobs=1,
            oob_score=False, random_state=0, verbose=0, warm_start=False)

In [12]:
def apply_model_on_all_clusters( ): # model, T_low, T_high, ....
    # This method uses global variables including model, T_low, T_high and several others. 
    # Goal: apply model on all clusters and determine correctness of "confident predictions" 
    # and manually label "very low confidence" ones 

    global sensor_labels
    global sensor_bow 
    global labeled_equip_keys 
    
    # Iteratively apply Random Forest to label new sensors 
    change_thresholds = True 
    n_wrong_confident_sensor_pred = 0
    sensor_bow = list(sensor_bow)
    sensor_labels = list(sensor_labels) 
    n_high_confidence_sensors = 0
    n_manually_labeled_thisepoch = 0 # epoch = whatever happens after (re-) training RF models

    for p in equip_map.keys(): # for each cluster 
    #for p in sorted_equip_keys:
        #p = p[0]

        # Escape if already labeled. 
        if p in labeled_equip_keys:
            continue


        # Get sensors from this cluster. 
        sample_bow = []
        for k in equip_map[p]['sensor_ids']:
            sample_bow.append(bow_array[k])
        sample_bow = np.array(sample_bow)


        # Apply trained model: 
        confidence = model.predict_proba(sample_bow)
        prediction_label = model.predict(sample_bow)
        # Get overall max confidence for any sensor in cluster: 
        max_c = 0
        for c in confidence:
            max_c = max(np.append(c,[max_c]))


        # Compare with Thresholds. 
        flag = 0    
        if max_c < T_low:
            flag = 1
        if max_c > T_high:
            flag = 2        

        if flag==1: 
            n_manually_labeled_thisepoch+=1 

        
        # Handle the cluster beyond threshold: 
        if flag>0: 
            change_thresholds = False 
            labeled_equip_keys.append(p)  

            # For each sensor in this cluster: 
            for k in range(len(equip_map[p]['sensors'])):  
                sourceid = equip_map[p]['sensors'][k]['source_id']
                true_type = sensor_type_map[sourceid] 
                pred_type = prediction_label[k]              

                if flag==2: 
                    n_high_confidence_sensors+=1
                    if pred_type != true_type: 
                        n_wrong_confident_sensor_pred+=1 

                    # append these sensors into labeled ones (with possibly wrong labels): 
                    sensor_bow.append(bow_array[equip_map[p]['sensor_ids'][k]]) 
                    sensor_labels.append(pred_type) 

                if flag==1:                 
                    
                    # append these sensors into labeled ones (with ground truth): 
                    sensor_bow.append(bow_array[equip_map[p]['sensor_ids'][k]]) 
                    sensor_labels.append(true_type) 

            break
        #sensor_bow = np.array(sensor_bow)
        #sensor_labels = np.array(sensor_labels)
        #model.fit(sensor_bow, sensor_labels)
        
    return n_manually_labeled_thisepoch, n_wrong_confident_sensor_pred, n_high_confidence_sensors, len(sensor_labels), 100.0*len(sensor_labels)/len(bow_array),change_thresholds,len(bow_array)-len(sensor_labels)   
    # return __  , __ , __ , num labeled sensors , % labeled sensors, change_thresholds, num_sensors_in_gray 

In [13]:
# Iteratively train RF model and call the method to apply it on all clusters. 
# When method asks us to change thresholds, then we do so. 
# Otherwise we re-train RF and try catch more sensors. 
# We also record the number of correct sensors in each iteration, the manual effort in each iteration etc. 

#print equip_map.keys() 
#print labeled_equip_keys 
from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(n_estimators=400, random_state=0)
model.fit(sensor_bow,sensor_labels) 

num_sensors_in_gray=100

T_low = 0.1
T_high = 0.95 
thresholds = [ (0.1,0.95), (0.1,0.9) , (0.15,0.9), (0.15,0.85), (0.2,0.85), (0.25,0.85), (0.3,0.85), (0.35,0.85), (0.4,0.85), (0.45,0.85), (0.5,0.85), (0.55,0.85), (0.6,0.85), (0.65,0.85), (0.7,0.85), (0.75,0.85), (0.8,0.85), (0.849999999,0.85) ] 
#thresholds = [ (0.1,0.9) , (0.15,0.9), (0.15,0.85), (0.2,0.85), (0.25,0.85), (0.3,0.85), (0.35,0.85), (0.4,0.85), (0.45,0.85), (0.45,0.8),(0.5,0.8), (0.55,0.8), (0.6,0.8), (0.65,0.8), (0.7,0.8), (0.75,0.8), (0.7999999,0.8) ] 
#thresholds = [ (0.1,0.7) , (0.25,0.7), (0.3,0.7), (0.3,0.7), (0.35,0.7), (0.4,0.7), (0.4,0.65),(0.45,0.65), (0.5,0.65), (0.5,0.6), (0.55,0.6), (0.5999999,0.6)] 

thresh_count=0 

# Start iterations: 
n_manual_lab_clusters_iter = [10 ]
n_sensors_covered_iter = [len(sensor_labels) ] 

while num_sensors_in_gray>0: 
    T_low,T_high = thresholds[thresh_count] 
    
    # Re-train model: 
    model.fit(sensor_bow,sensor_labels) 
    
    # Use model to label clusters/sensors: 
    n_manually_labeled_thisepoch, n_wrong_confident_sensor_pred, n_high_confidence_sensors, n_sens_covered, perc_coverage,change_thresholds,num_sensors_in_gray = apply_model_on_all_clusters()        
    print n_manually_labeled_thisepoch, n_wrong_confident_sensor_pred, n_high_confidence_sensors, n_sens_covered, perc_coverage,change_thresholds,num_sensors_in_gray  
    n_manual_lab_clusters_iter.append(n_manual_lab_clusters_iter[-1] + n_manually_labeled_thisepoch) 
    n_sensors_covered_iter.append(n_sens_covered) 
    
    if change_thresholds: 
        thresh_count+=1 
        print T_low, T_high

0 0 64 116 3.6103330221 False 3097
0 0 46 162 5.04201680672 False 3051
0 0 0 162 5.04201680672 True 3051
0.1 0.95
0 0 0 162 5.04201680672 True 3051
0.1 0.9
0 0 0 162 5.04201680672 True 3051
0.15 0.9
0 0 126 288 8.96358543417 False 2925
0 0 33 321 9.99066293184 False 2892
0 0 87 408 12.6984126984 False 2805
0 0 3 411 12.79178338 False 2802
0 0 0 411 12.79178338 True 2802
0.15 0.85
1 0 0 417 12.9785247432 False 2796
1 0 0 420 13.0718954248 False 2793
1 0 0 437 13.6009959539 False 2776
0 0 16 453 14.0989729225 False 2760
0 0 4 457 14.2234671646 False 2756
1 0 0 498 15.4995331466 False 2715
0 0 39 537 16.7133520075 False 2676
0 0 22 559 17.3980703392 False 2654
1 0 0 573 17.8338001867 False 2640
1 0 0 629 19.5767195767 False 2584
0 0 58 687 21.3818860878 False 2526
1 0 0 720 22.4089635854 False 2493
1 0 0 765 23.8095238095 False 2448
0 5 5 770 23.9651416122 False 2443
0 12 12 782 24.3386243386 False 2431
1 0 0 793 24.6809835045 False 2420
1 0 0 794 24.712107065 False 2419
1 0 0 795 24.7432

In [None]:
# This code uses regular expressions to map descriptions (and if needed, jci_name) to ground truth 
# Goal: Get a manual effort (in mapping either of above to ground truth) to coverage 

desc_list=[] 
jc_names_list=[] 
sensor_info = {} 
for s in sensor_list: 
    sid = s['source_id'] 
    sensor_info[sid]={} 
    d = s['description'].lower() 
    d = ''.join([i for i in d if not i.isdigit()]) #remove digits 
    d = re.sub(r"[^\w' ]", "",  d ) # remove special chars 
    d = ' '.join(d.split()) #remove extra spaces 
    sensor_info[sid]['desc'] = d 
    desc_list.append(d)     
    
    j = s['jci_name'].split('.')[-1] 
    sensor_info[sid]['jci'] = j 
    jc_names_list.append(j)  
    sensor_info[sid]['figuredout'] = False 
    
manualeffort=[0] 
coveredsensors=[0] 
desc_map = {}
jci_map = {} 

for s in sensor_list: 
    sid = s['source_id'] 
    if sensor_info[sid]['figuredout']==True: continue # If label known, skip. 
        
    # Info about this sensor: 
    gt = sensor_type_map[ s['source_id'] ]  # ground truth 
    d = s['description'].lower() 
    d = ''.join([i for i in d if not i.isdigit()]) #remove digits 
    d = re.sub(r"[^\w' ]", "",  d ) # remove special chars 
    d = ' '.join(d.split()) #remove extra spaces 
    j = s['jci_name'].split('.')[-1] 

    
    if not d =="": 
        if not d in desc_map: 
            manualeffort.append(manualeffort[-1]+1) 
            desc_map[d] = gt 
            jci_map[j] = gt 
            sensor_info[sid]['figuredout']=True 
            # Check how many it catches: 
            numcatches=0
            for s2 in sensor_list: 
                sid2 = s2['source_id'] 
                if sensor_info[sid2]['figuredout']==False and (sensor_info[sid2]['desc']==d   or sensor_info[sid2]['jci']==j): 
                    sensor_info[sid2]['figuredout']=True
                    numcatches+=1 
            coveredsensors.append(coveredsensors[-1] + numcatches) 
            
    else: 
        if not j in jci_map: 
            manualeffort.append(manualeffort[-1]+1) 
            jci_map[j] = gt 
            sensor_info[sid]['figuredout']=True 
            # Check how many it catches: 
            numcatches=0
            for s2 in sensor_list: 
                sid2 = s2['source_id'] 
                if sensor_info[sid2]['figuredout']==False and sensor_info[sid2]['jci']==j: 
                    sensor_info[sid2]['figuredout']=True
                    numcatches+=1 
            coveredsensors.append(coveredsensors[-1] + numcatches) 
            
            
            
    
# Just checking. 
for s in sensor_list: 
    sid = s['source_id'] 
    if sensor_info[sid]['figuredout']==False: 
        print sid 


        

# Plot the manual effort vs coverage for Regex based approach. 
plt.plot(manualeffort, coveredsensors, 'ro')
plt.xticks(fontsize=20)
plt.yticks(fontsize=15)
plt.ylabel('# points covered', fontsize=20)
plt.xlabel('Manual inputs', fontsize=20)
plt.tight_layout()

plt.xticks(np.arange(0, max(manualeffort)+1, 75.))

plt.savefig("BonnersensorsREGEXManualVsCoverage.pdf",bbox_inches='tight',dpi=150)


len(set(jc_names_list))