Applying Logic Tensor Networks in Conceptual Spaces
Logic Tensor Networks (Paper, Code) provide a way of connecting logical rules with a feature space representation.
In this repository, we use LTNs to learn concepts in a conceptual space.
Copyright of "logictensornetworks.py" is retained by Luciano Serafini and Artur d'Avila Garcez. The files in data/Schockaert/ were created based on data downloadable from (https://www.cs.cf.ac.uk/semanticspaces/) and reported in JoaquĂn Derrac and Steven Schockaert. Inducing semantic relations from conceptual spaces: a data-driven approach to commonsense reasoning, Artificial Intelligence, vol. 228, pages 66-94, 2015.
The files in data/Ager were kindly provided by Thomas Ager and are based on research reported in Ager, T.; KuĹľelka, O. & Schockaert, S. Modelling Salient Features as Directions in Fine-Tuned Semantic Spaces Proceedings of the 22nd Conference on Computational Natural Language Learning, 2018, 530-540 Code.
Our code was written in Python 3.5 and has dependencies on tensorflow (version 1.10), sklearn, matplotlib, and others. A conda environment containing all necessary dependencies is available in the Utilities project.
Preprocessing the data set is a multi-step process. The script code/preprocessing.sge can be executed (both directly on the terminal and as a job on a sun grid engine) to automatically take care of all the steps involved. We give more detailed information on the individual steps below. All scripts for the preprocessing stage can be found in the folder code/preprocessing.
The data by Ager et al. comes in multiple individual files. The script code/preprocessing/merge_all_data.py collects the information from all the individual files and stores them in a central pickle file. It can be executed as follows:
python code/preprocessing/merge_all_data.py path/to/MDS-coordinates.npy path/to/projected_coordinates.txt path/to/keywords_folder/ path/to/genres_folder/ path/to/ratings_folder/ path/to/output_folder/
The first argument points to the file containing the original coordinates in the MDS space, the second argument to the file containing the coordinates with respect to the interpretable dimensions. Arguments three, four, and five point to folders containing information about the plot keywords, genres, and ratings, respectively. The final argument points to the output directory. If the optional argument -q (or --quiet) is set, then informational output during the process is supressed.
The resulting pickle file contains a dictionary with the following structure:
mds_space: A numpy array of shape(number_of_movies, size_of_MDS_space)containing the coordinates in the MDS space for each of the movies.projected_space: A numpy array of shape(number_of_movies, size_of_projected_space)containing the coordinates in the projected space for each of the movies.keyword_labels: A list containing all the labels for the different keyword classes.keyword_classifications: A binary numpy array of shape(number_of_movies, len(keyword_labels)), containing for each movie its classification according to thekeyword_labels.genre_labels: A list containing all the labels for the different genre classes.genre_classifications: A binary numpy array of shape(number_of_movies, len(genre_labels)), containing for each movie its classification according to thegenre_labels.rating_labels: A list containing all the labels for the different rating classes.rating_classifications: A binary numpy array of shape(number_of_movies, len(rating_labels)), containing for each movie its classification according to therating_labels.all_concepts: A concatenation ofkeyword_labels,genre_labels, andrating_labels.all_classifications: A binary numpy array of shape(number_of_movies, len(all_concepts)), containing for each movie its classification according to all concepts (merging result ofkeyword_classifications,genre_classifications, andrating_classifications).
The next step consists in splitting the overall data set into training set, validation set, and test set. This is done by using the script code/preprocessing/split_data.py. It can be invoked as follows, where the first argument points to the output file generated by merge_all_data.py and the second argument points to the desired output folder:
python code/preprocessing/split_data.py path/to/full_dataset.pickle path/to/output_folder/
The script takes the following optional arguments:
-tor--test_size: Size of the test set (relative to the overall data set). Defaults to 0.2 (i.e., 20 %).-vor--validation_size: Size of the validation set (relative to the overall data set). Defaults to 0.2 (i.e., 20 %).-sor--seed: Seed used to initialize the random number generator (ensuring a reproducible split). If not set, then results will differ between runs.-aor--analyze: If this flag is set, some statistics of the three subsets (training, validation, test) are computed and displayed on the terminal. Moreover, the label frequency of the individual concepts is stored in a file in LaTeX table format.-qor--quiet: If this flag is set, informational output during processing is supressed.
The script creates one pickle file for each of the created subsets and stores them in the output directory. Their internal structure is identical to the output of merge_all_data.py. The training set file additionally contains two more entries in its dictionary (balanced_third_indices and balanced_half_indices) which can be used in order to create more balanced classification problems for the individiual concepts. When using only the movies indicated by the indices in balanced_third_indices, then the class imbalance is at most 2 : 1 for all the concepts. Likewise, the indices in balanced_half_indices ensure a balanced class distribution (i.e, 1 : 1). Each of the two dictionary entries consists of a list of numpy arrays (one for each of the classes), where the numpy array contains all the indices of movies to use in the more balanced version of the training set.
The rule extraction consists of two steps: First, the apriori algorithm is applied in order to find frequent item sets. Afterwards, we use these frequent item sets to define logical rules.
The script code/preprocessing/apriori.py runs the apriori algorithm on our data set and stores all frequent item sets along with their support in a pickle file for later use. It can be invoked as follows (where input_file.pickle can either be the overall data set or any of the subsets created in the previous step):
python code/preprocessing/apriori.py path/to/input_file.pickle
The script accepts the following optional arguments:
-oor--output_folder: Path to the folder where the resulting pickle file should be stored. Defaults to., i.e., the current working directory.-lor--limit: The largest size of item sets to consider (corresponds to the maximal number of literals in the resulting rules). Defaults to 2.-sor--support: The minimal support for the item sets. Defaults to 0.008 (under the assumption that later only rules with an antecedent support of 0.01 and a confidence of 0.8 are of interest).-qor--quiet: If this flag is set, informational output during processing is supressed.
The script creates an ouput pickle file, containinga a dictionary with the following entries:
itemsets: A dictionary mapping from the size of the item set to a list of frequent item sets of this size (stored as a two-dimensional numpy array).supports: A dictionary mapping from the size of the item set to a list of the support values (sorted in same way asitemsets).concepts: A list of all concepts under consideration (original concepts and negated concepts).border: Integer specifying the border between positive and negative literals.concepts[:border]contains only positive literals andconcepts[border:]contains only negative literals.
Please note that the script apriori.py might take very long to run and might consume a large amount of memory depending on the value of --limit: For --limit 4 we found that 16 GB and 8 hours runtime did not suffice to complete computing all the frequent itemsets. The resulting pickle file is 3.3 GB large. If you just want a quick & dirty run, we recommend --limit 3, which takes less than 30 minutes and runs fine with 8 GB of main memory.
The script code/preprocessing/extract_rules.py takes the output of apriori.py as an input and extracts association rules from the given frequent item sets. It can be invoked as follows:
python code/preprocessing/extract_rules.py path/to/input_file.pickle
It takes the following optional arguments:
-oor--output_folder: Path to the folder where the resulting rules should be stored. Defaults to., i.e., the current working directory.-sor--support: The minimal antecedent support for the rules. Defaults to 0.01 (i.e., 1 %). Only rules whose antecedent applies to at least this fraction of the data set are considered.-cor--confidence: The minimal confidence for the rules. Defaults to 0.8 (i.e., 80 %). Only rules that reach this minimal confidence are considered.-dor--dynamic: If this flag is set, the confidence threshold is dynamically adapted for larger rules: Rules withn+1tokens need to have a higher confidence than rules withntokens. The update is done by halving the distance between the current confidence level and 1 (e.g, for a starting value of 0.8, we next need 0.9, next 0.95, next 0.975, ...).-qor--quiet: If this flag is set, informational output during processing is supressed.
This script creates multiple CSV files as output. One overall summary file contains information about how many rules of which type have been extracted. For each rule type (specified by the number of literals in the overall rule and the number of literals in the antecedent), an individual file containing all the individual rules of this type (along with their support and confidence) is created.
Literals in the antecedent are connected by a logical conjunction while literals in the consequent are connected with a logical disjunction. Moreover, no negative literals are allowed in the consequent of any rule. If a simple rule (e.g., Musical IMPLIES Music) is part of the rule set, more complex elaborations on it (like Musical AND NOT Horror IMPLIES Music or Musical IMPLIES Music or Horror) are not taken into account in order to keep the rule base small.
From here on, the documentation is not up to date. Will be updated along with progress in the refactoring process
The configuration files contain all LTN hyperparameters as well as the setup of the concrete experiment. This allows us to keep the actual run_ltn.py quite general.
Configuration files look as follows:
[ltn-default]
# number of receptive fields per predicate; default: 5
ltn_layers = 1
# factor to which large weights are penalized; default: 0.0000001
ltn_smooth_factor = 1e-10
# appropriate t-conorm is used to compute disjunction of literals within clauses; options: 'product', 'yager2', 'luk', 'goedel'; default: 'product'
ltn_tnorm = luk
# aggregation across data points when computing validity of a clause; options: 'product', 'mean', 'gmean', 'hmean', 'min'; default: 'min'
ltn_aggregator = min
# optimizing algorithm to use; options: 'ftrl', 'gd', 'ada', 'rmsprop'; default: 'gd'
ltn_optimizer = rmsprop
# aggregate over clauses to define overall satisfiability of KB; options: 'min', 'mean', 'hmean', 'wmean'; default: 'min'
ltn_clauses_aggregator = hmean
# penalty for predicates that are true everywhere; default: 1e-6
ltn_positive_fact_penalty = 1e-5
# initialization of the u vector (determining how close to 0 and 1 the membership values can get); default: 5.0
ltn_norm_of_u = 5.0
[simple]
# only 4 concepts (banana, pear, orange, lemon) with clean data & no rules
concepts_file = data/fruit_space/concepts_simple.txt
features_folder = data/fruit_space/features_simple/
rules_file = data/fruit_space/rules_simple.txt
num_dimensions = 3
max_iter = 1000
The section ltn-default sets the default LTN hyperparameters. Individual sections like simple define the files to use, the size of the space, and the maximal number of iterations for the optimization algorithm. They can also override the LTN hyperparameters set in the ltn-default section by re-defining them.
The input to the LTN training algorithm is given in three types of files.
This is a regular text file which simply lists the different concepts that one would like to learn. Each concept is listed in a single line. Note that all the concepts used in the features_file must also appear in the concepts_file.
In the folder specified by features_folder in the configuration file, three files named training.csv, validation.csv, and test.csv contain the training, validation, and test set, respectively.
Each of them is a csv file without a header where colums are separated by commas. The first num_dimensions colums contain the vector/point and all remaining columns contain the concept labels. There must be at least one label per data point, but there can be arbitrary many. Different data points can have different numbers of labels. All labels used in this file have to be defined in the concepts_file. Moreover, the number of colums used for the vector needs to be equivalent to the num_dimensions in the config file.
The following two lines illustrate how the content of such a file should look like:
0.338651517434,0.108252320347,0.240840991761,banana
0.658849789294,0.740900463574,0.289255306485,GrannySmith,apple
You can use the script tools/split_data.py to split your overall data set into these three parts automatically.
This is a regular text file that contains rules that should be taken into account when learning the concepts. Each rule is written in a separate line and can only involve concepts defined in the concepts_file. Currently, the following types of rules are supported:
FirstConcept DIFFERENT SecondConceptensures that there is only little overlap between the two given concepts.FirstConcept IMPLIES SecondConceptensures that the first concept is a subset of the second concept.
The label counting baseline can be executed as follows:
python ltn_code/run_counting.py configFile.cfg configName
Here, configFile.cfg is the name of your configuration file and configName is the name of the configuration within that file that you would like to use. From this configuration, only the information about the data set is used (ignoring the feature vectors and taking only into account the label information), all LTN hyperparameters are ignored.
The program calculates the validity of 21 different rule types (1 rule type A != B, 4 rule types in the form of A IMPLIES B with negated and non-negated concepts, 8 rule types in the form of (A AND B) IMPLIES C with negated and non-negated concepts, and 8 rule types in the form of A IMPLIES (B or C) with negated and non-negated concepts) on the three data sets.
Afterwards, for a set of different thresholds (0.7, 0.8, 0.9, 0.95, and 0.99), it removes all rules that have an accuracy of less than this threshold on either the training or the validation set. For the remaining rules, the average and minimum accuracy on the test set are computed.
Information on the thresholds, on the average and minimum accuracy on the test set, and on the number of rules left are displayed on the console for each rule type individually. Moreover, different output files are created: An overall csv file in the output folder contains the same information as is displayed on the console. Moreover, for each combination of rule type and desired threshold, an individual csv file is created in the output/rules folder that contains a list of all the extracted rules under this condition along with their individual performance on training, validation, and test set.
We compare the classification performance to two simple baselines:
- constant: This baseline predicts for all labels and for all data points always a membership value of 0.5
- distribution: This baseline computes the frequency of the labels in the data set and uses these frequencies as a prediction for all data points.
You can execute the baseline script as follows:
python ltn_code/run_baseline_classifiers.py configFile.cfg configName
The evaluation results are displayed on the console and additionally written into a csv file in the output folder.
The kNN baseline can be executed as follows:
python ltn_code/run_knn.py configFile.cfg configName k
Here, configFile.cfg is the name of your configuration file and configName is the name of the configuration within that file that you would like to use. From this configuration, only the information about the data set is used, all LTN hyperparameters are ignored. Finally, the parameter k indicates the number of neighbors to use in the classification.
The program trains a kNN classifier on the training set and evaluates it on the validation set. In the end, some evaluation metrics for both the training and the validation set are printed out. In addition, all the evaluation information displayed on the console is also written into a csv file in the output folder.
When you've set up your input files and your configuration file, you can execute the LTN as follows:
python ltn_code/run_ltn.py configFile.cfg configName
The program trains an LTN on the training set and evaluates it on the validation and the test set. Evaluation results are stored in a csv file in the output folder.
It takes the following arguments:
configFile.cfgis the name of your configuration file to use.configNameis the name of the configuration within that file that you would like to use (i.e., the specific experiment you would like to run).-tor--type: If this flag is set, then the type of membership function to use is overwritten by the type given immediately after this flag.-por--plot: If this flag is set and if your data has two or three dimensions, colored scatter plots are generated to illustrate the location of the learned concepts in the overall space.-qor--quiet: By default, the script prints the current satisfiability and the evaluation results on the terminal. If this flag is set, the output is reduced to a minimum.-eor--early: If this flag is set, then the LTN will stop training after reaching a satisfiability of0.99. Otherwise it will continue training until the number of epochs specified in the configuration is reached.-ror--rules: If this flag is set, then in each evaluation step the LTN tries to extract rules from the learned membership functions.
We have programmed a script to automatically analyze which hyperparameter configurations perform best on the training or validation set. It is applicable to both the kNN and the LTN classifications. This script can be executed as follows:
python tools/find_optimal_params.py input_csv_file data_set_to_analyze
Here, input_csv_file is the path to the csv output file created by either run_knn.py or by the compress_results.py script ran on the output ofrun_ltn.py, and data_set_to_analyze should be set to either training or validation.
This script selects a subset of hyperparameter configurations and outputs them (together with their associated evaluation metric values) in a csv file located in the same directory as input_csv_file, using the same basic file name but with the data_set_to_analyze appended. So if you call the script like python tools/find_optimal_params.py output/grid_search-LTN.csv validation, the results will be stored in output/grid_search-LTN_validation.csv.
The hyperparmeter configurations are selected in two ways:
- For each evaluation metric, the hyperparameter configuration achieving the optimal performance with respect to this metric is chosen.
- Moreover, the script searches for hyperparameter configurations that achieve a good performance with respect to multiple metrics (measured by belonging to the top 1,2,3, and 5 percentile). A configuration gets 4 points for being in the 1 percentile, 3 points for belonging to the 2 percentile, etc. for each of the metrics. The 1% of configurations with the highest total score are chosen (maximally 20 configurations in order to keep the resulting spreadsheet clean).
In addition to the selected configurations, the output file also contains a row BEST which contains the best value for each of the metrics that was achieved by any configuration, and a row WORST which records the worst observed values for each configuration.
In order to visualize the distribution of LTN performance with respect to all metrics, you can use the script plot_performance_distribution.py:
python tools/plot_performance_distribution.py input_csv_file
The parameter input_csv_file should contain one row for each of the different configurations. If configurations were run multiple times, this should thus be the averaged results (i.e., the output file of compress_results.py). The script collects all the values achieved for all metrics and creates some plots visualizing them: A histogram with 21 bins, a line graph, and a scatter plot. For the line graph and the scatter plot, the values are first sorted. The x-axis is just the index of the sorted lis and the y-axis gives the respective performance value. The script takes the following optional arguments:
-oor--output_folder: The folder where the plots are stored. Defaults to., i.e., the current working directory.-dor--data_set: Defines the data set to analyze. By default,validationis used.-por--percentage: Fraction of data points to plot. Defaults to 0.1, which means that only the top 10% of the data points are plotted (in order to make differences more visible)