Skip to content

Backup repository for philosophy of perception project

Notifications You must be signed in to change notification settings

minimalparts/Perception

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Work in-progress - current README

Data

The annotation folder contains four subdirectories for the various corpora used in this study:

  • British National Corpus (BNC)
  • ACPROSE (Academic section of the BNC -- all texts markes with <acprose> in the XML)
  • PHILO (Philosophy of perception corpus)
  • Stanford Encyclopedia of Philosophy (SEP)

The study is specifically focused on lexical entries see and aware. Usage samples from BNC, PHILO and SEP have been manually annotated to distinguish perceptual vs non-perceptual usages. The directory contains the normalised annotations for those three corpora. For comparison purposes, 1500 random sentences of ACPROSE are contained in the same folder, but those were automatically annotated (see below).

In addition to the annotation, the directory contains the contextualised vectors for see and aware, as obtained through BERT Base, for each annotated sentence. The images below show the class distribution for each of the annotated corpora, after reducing the BERT vectors to 2D with PCA.

Below, we also show the class distribution for the automatically annotated ACPROSE, using the BNC background model (using SEP results in very little difference).

Training

The training directory contains code to train a perceptual vs non-perceptual classifier, using as input the BERT vectors extracted from the data. The classifier can only be trained for corpora for which we have annotation, so BNC, PHILO and SEP. The classifier is a simple MLP with two hidden layers and RELu activation, with softmax on the output layer.

The training regime is as follows. For a given dataset, we first retain 200 instances for the optimisation of the model. Tuning is performed using optimise.py, which relies on Bayesian Optimisation (BayesOpt). BayesOpt is run for 200 iterations before returning the best set of hyperparameters.

python3 optimise.py BNC see

The best hyperparameters can be printed in a user-friendly way using the following command over the json file generated in the relevant directory:

python3 read_json.py <path to json file>

Once the system is tuned on a dataset, we perform 5-fold cross-validation on the rest of the data. E.g.

python3 classify.py --file=BNC/BNC_see_kfold_features.txt --lr=0.01 --batch=46 --epochs=50 --hidden=323 --wdecay=0.01

There is CUDA support for running on GPU.

Results for see are as follows (accuracy averages over 5-folds):

BNCSEPPHILO
90%98%96%

Results for aware are as follows (accuracy averages over 5-folds):

BNCSEPPHILO
90%98%92%

Similarity between corpora

Cross-domain classification

We first check how a model trained on one corpus fares on the other corpora. We first give results for see:

BNCSEPPHILO
Baseline71%59%60%
BNC-96%83%
SEP87%-94%
PHILO81%97%-

Results for aware are as follows:

BNCSEPPHILO
Baseline79%60%91%
BNC-85%88%
SEP89%-92%
PHILO75%80%-

N-gram distributions

We also inspect the most frequent ngrams for each corpus (with n=3).

BNCACPROSESEPPHILO
see
(before)
i saw 11
she saw 8
he saw 8
i can see 4
i do n't see 4
i ca n't see 4
you want to see 4
to come and see 3
i 've just seen 3
i 've never seen 3
as we have seen 40
as we shall see 24
we have seen 13
we have already seen 11
it can be seen 9
as can be seen 7
can also be seen 7
, we can see 7
is difficult to see 5
remains to be seen 5
as we have seen 43
is hard to see 19
to see 19
as we shall see 14
we have seen 10
as we will see 8
we see 8
as we saw 8
in order to see 7
we have already seen 7
as we have seen 78
as we shall see 40
we have already seen 33
we have seen 26
i do not see 25
we saw 21
as i can see 21
is difficult to see 20
that i am seeing 20
, as we saw 20
see
(after)
see . 13
see me . 7
see you . 7
see it . 4
see ? 3
see him . 3
see it ? 3
see it as a 3
see them . 3
see 3
see 19
see , for example 8
see below 7
see chapter 5 5
seen as 4
see above , p. 4
see , however , 3
seen to be a 3
see above 3
see chapter 9 3
see the entry on 35
see , e . 15
see e . 11
see , for example 11
see section 2 . 9
see section 3 . 6
see other internet resources 5
see the entries on 5
see section 5 . 5
see below ) , 5
seen 28
see 23
see the same flash 14
sees that a is 12
see that it is 11
seeing 10
seen , it is 8
seen in virtue of 7
see it 6
seen that it is 6
aware
(before)
need to be aware 14
she was aware 12
he was aware 8
i am aware 8
we are not aware 6
should be made aware 6
you should be aware 5
be aware 4
to be fully aware 4
being aware 4
need to be aware 19
important to be aware 12
we are not aware 8
needs to be aware 7
may not be aware 6
we are aware 6
to be more aware 5
necessary to be aware 5
he was aware 5
being aware 5
one is directly aware 23
we are directly aware 17
we are not aware 15
that we are aware 12
we are immediately aware 11
that he was aware 6
that i am aware 6
can be directly aware 6
i am directly aware 5
which we are aware 5
we are directly aware 13
that we are aware 11
hallucinating subject is aware 9
what we are aware 9
we are immediately aware 8
are not directly aware 6
, we are aware 6
that i am aware 5
i am immediately aware 5
we are not aware 4
aware
(after)
aware of it. 19
aware of the need 16
aware of the fact 12
aware of that. 12
aware of the dangers 10
aware of it , 10
aware of the problem 9
aware of this and 8
aware of this. 8
aware of the problems 8
aware of the need 23
aware of 22
aware of the fact 18
aware of the 16
aware of the nature 9
aware of the problem 9
aware of the importance 8
aware of the presence 7
aware 7
aware of the limitations 7
aware of the fact 30
aware of . 20
aware of it . 13
aware of it , 13
aware of , and 11
aware of them , 10
aware of one 's 9
aware of this , 8
aware of the limitations 8
aware of the problem 8
aware of an object 11
aware of 9
aware of the same 7
aware of material things 6
aware of them 5
aware of something that 5
aware of something 5
aware of them , 5
aware of it 5
aware of a non-normal 5

About

Backup repository for philosophy of perception project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages