Scrabble

Scrabble is a machine learning framework that can normalize unstructured metadata in Building Management Systems (BMS) such as point names into the structured metadata, Brick. It is evaluated in our own paper. This code base is implemented mainly for evaluating the results. If you would like to interactively add examples and get inferred results, please consider using Plaster together.

General Procedure

It is best to run this with Plaster. Here, we only describe the procedure to run Scrabble alone.

Store metadata and ground truth data following a MongoDB schema defined in scrabble/data_model.py. Refer Data Model for detailed description and /scripts/ingest_example_data.py for an example.
Run scrabble with proper options.
Retrieve and process the results from the database. Example code: scripts/parse_scrabble_res.py. (There will be better documents on this.)
Put labels for chosen examples. (This is only supported with Plaster.)

Installation

Dependencies

Python 3
MongoDB
PIP packages: requirements.txt
Tested only on Linux. There can be issues with path names in Windows.

Install

git clone https://github.com/jbkoh/scrabble.git
cd scrabble
``virtualenv -p python3 env
source env/bin/activate
python setup.py install (Note that it assumes to use TensorFlow instead of Theano for the backend of Keras.)

Example commands to run

Complete the installation procedure and activate the virtualenv (source env/bin/activate).
python scripts/ingest_example_data.py
scrabble -task scrabble -bl example -nl 1 -t example -neg true -ub true -iter 5 -inc 1

File Descriptions (./)

scripts/scrabble: Main executable file. Once it's PIPped, you can run it in a command line.
char2ir.py: Learning mapping from characters to Brick Tags or Intermediate Representation (IR).
ir2tagsets.py: IR to TagSets learning and iteration functions.

Data Model

Data models in the form of mongoengine. Refer scrabble/data_model.py. Each entity is associated with srcid. A combination of srcid and building is unique in the database. Please refer TODO for ingesting a CSV file into MongoDB.

RawMetadata

Raw metadata you can retrieve from existing systems like BMSes.

Definition:

{
    "srcid": str # unique ID,
    "building": str # building name,
    "metadata": dict # key: metadata type, value: metadata
}

Example:

{
    "srcid": "1234",
    "building": "ebu3b",
    "metadata": {
        "VendorGivenName": "NAE2.VMA101.ZNT",
        "BACnetDescription": "Zone Temp",
        "BACnetUnit": 64
    }
}

VendorGivenName is commonly referred as point names also.

LabeledMetadata

LabeledMetadata containing different types of labels per entity.

Definition

{
    "srcid": str # unique ID,
    "building": str # building name,
    "fullparsing": dict # dict of list of tuples of character and its label.
    "tagsets": list # list of tagsets found in the metadata.
    "point_tagset": str # tagset for Point.
}

Example

{
    "srcid": "1234",
    "building": "ebu3b",
    "fullparsing": {
        "VendorGivenName": [
            ["N", "B_networkadapter-nae"],
            ["A", "I_networkadapter-nae"],
            ["E", "I_networkadapter-nae"],
            ["2", "B_leftidentifier"],
            [".", "O"],
            ["V", "B_vav"],
            ["M", "I_vav"],
            ["A", "I_vav"],
            [".", "O"],
            ["1", "B_leftidentifier"],
            ["0", "I_leftidentifier"],
            ["1", "I_leftidentifier"],
            [".", "O"],
            ["Z", "B_zone"],
            ["N", "I_zone"],
            ["T", "B_temperature"]
        ],
        "BACnetDescription": [
            ["Z", "B_Zone"],
            ["o", "I_Zone"],
            ["n", "I_Zone"],
            ["e", "I_Zone"],
            [" ", "O"],
            ["T", "B_temperature"],
            ["e", "I_temperature"],
            ["m", "I_temperature"],
            ["p", "I_temperature"]
        ]
    },
    "tagsets": ["networkadapter-nae", "vav", "zone_temperature_sensor"],
    "point_tagset": "zone_temperature_sensor",
}

In fullparsing, you also need to put BIO tags to actual Brick Tags.
- B = Beginning
  - If a character is in the starting position of a word (e.g., Z for ZN), B_ should be attached to actual label (e.g., B_Zone for Z).
- I = Inside
  - Everything else with a Brick Tag should attach I_ (I_Zone for N in ZN)
- O = Outside
  - If a character is not associated with any Brick Tag, label it as O, which means nothing.

How to Use it?

Configuration options

-bl: Building list: list of source building names deliminated by comma (e.g., -bl ebu3b,bml).
-nl: Sample number list: list of sample numbers per building. The order should be same as bl. (e.g., -nl 200,1)
-t: Target building name: Name of the target building. (e.g., -t bml)
-c: Whether to use clustering for random selection or not (e.g., -c true)
-avg: How many times run experiments to get average? (e.g., -avg 5)
-iter: How many times to iterate the process? (e.g., -iter 10)
-d: Debug mode flag (e.g., -d false)
-ub: Whethre to use Brick when learning. (e.g., -ub true)
Note: Please refer scrabble --help to learn all the options.

Best configuration

Active learning of two stages
```
scrabble -task scrabble -bl ebu3b -nl 10 -ut true -neg true -c true -t ebu3b
```
This learns the entire model of Scrabble. It will use only ebu3b to select more examples for the target ebu3b, the same building. It initially select 10 examples randomly to initiate the model.
Active learning of two stages with a known building.
```
scrabble -task scrabble -bl ap_m,ebu3b -nl 200,10 -ut true -neg true -c true -t ebu3b
```
This does the same thing as above except using 200 examples of ap_m additionally.

Active learning of the first stage (CRF) only.

scrabble -task char2ir -bl ap_m,ebu3b -nl 200,10 -c true -t ebu3b

Active learning of the second stage (MLP) only.

scrabble -task ir2tagsets -bl ap_m,ebu3b -nl 200,10 -ut true -neg true -c true -t ebu3b

Data

Full datasets are under preparation for sharing. A small example data set can be found inside metadata/

References

Scrabble: Transferrable Semi-Automated Semantic Metadata Normalization using Intermediate Representation, BuildSys 2018.
Plaster: An Integration, Benchmark, and Development Framework for Metadata Normalization Methods, BuildSys 2018.
Brick: Towards a unified metadata schema for buildings, BuildSys 2016.
Brick: Metadata schema for portable smart building applications, AppliedEnergy 2018.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
TS_Features		TS_Features
baseline		baseline
brick		brick
config		config
helper		helper
metadata		metadata
scrabble		scrabble
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

License

jbkoh/scrabble

Folders and files

Latest commit

History

Repository files navigation