GIM

This project is mainly used for graph structure learning of real industrial processes.

Multiple dataset support

The code supports multi-dataset training, which is particularly useful when real industrial engineering data is scarce. Researchers can collect datasets that are similar to industrial processes, construct standardized datasets, and train models directly using the provided code. The code utilizes a node mask matrix to manage the varying numbers of sensors across different datasets. However, a hyperparameter specifying the maximum number of nodes needs to be determined for all datasets to ensure consistency across the training process.

Multimodal data support

The code also provides support for incorporating node descriptions. If your data is collected from a Distributed Control System (DCS), it is likely that you will obtain data that includes descriptions of the nodes' functions, expressed in natural language. Our code utilizes SentenceTransformers as word2vec encoders to process these descriptions. During preprocessing, this encoded data is saved directly into the dataset. Although this paper does not leverage this information, researchers have the option to use it when building their own models.

Dataset construction

Pre-provided data inputs include: node_properties.csv, node_description.csv, graph.csv

node_properties.csv is the time series data of the node. The first column of the data is the sampling timestamp. The first line is the sensor name.

node_description.csv is used to describe the text information of each sensor. The sensor name in the first column must correspond to the name in node_properties.csv. It must contain the sensor type column, which is the necessary description information. Other additional information will be spliced during processing.

graph.csv contains the edge indices of the graph. The indices point from the sensors in the first column to the sensors in the second column. Note that the order of the node indices needs to be consistent with the order in node_properties.csv.

The simulation dataset is generated using the provided code. ''' python data/generate_dataset_multi.py '''

For water treatment datasets can be downloaded from the [website] (https://itrust.sutd.edu.sg/itrust-labs_datasets/dataset_info/).

Test dataset results

The dynamic process inferred for time steps 4000-5000 on the training set is illustrated in the following animation. For a higher resolution video, click the link below.

Video of all test dataset:

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
data		data
exp		exp
gifs		gifs
model		model
modules		modules
Experiment.py		Experiment.py
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GIM

Multiple dataset support

Multimodal data support

Dataset construction

Test dataset results

About

Releases

Packages

Languages

License

stonetre/GIM

Folders and files

Latest commit

History

Repository files navigation

GIM

Multiple dataset support

Multimodal data support

Dataset construction

Test dataset results

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages