Skip to content

Analysing productivity of affixes with BNC & Morphoquantics data

Notifications You must be signed in to change notification settings

suomela/bnc-affix

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bnc-affix

Analysing the productivity of affixes with BNC and Morphoquantics data.

These scripts are used to produce one of the data sets here: https://github.com/suomela/types-examples

General setup

We expect that types2 can be found at ../types. To set it up:

cd ..
git clone git://github.com/suomela/types.git
cd types
./config
make
cd -

For more information, see https://jukkasuomela.fi/types2/

We also assume that the BNC metadata database can be found at

../bnc-metadata-output/bnc.db

For more information, see https://github.com/suomela/bnc-metadata

Input data

We will need input data from two sources:

BNC search results are stored in the following locations:

*/input/bnc/*.txt

Morphoquantics data is stored in the following locations:

*/input/morphoquantics/*.txt

We use the following suffixes in our studies:

  • er: Suffixes "-er" (noun), "-or" (noun)

  • adverb: Suffixes "-ly" (adverb), "-wise" (adverb)

For these studies, BNC search results are stored in the following locations. We follow the convention that the files are named after the search term:

er/input/bnc/er.txt
er/input/bnc/or.txt
adverb/input/bnc/ly.txt
adverb/input/bnc/wise.txt

Morphoquantics files are stored in the following locations. We follow the naming convention used by Morphoquantics:

er/input/morphoquantics/_er_sup1.txt
er/input/morphoquantics/_er_sup2.txt
er/input/morphoquantics/_er_sup3.txt
er/input/morphoquantics/_er_sup4.txt
er/input/morphoquantics/_or_sup1.txt
er/input/morphoquantics/_or_sup2.txt
adverb/input/morphoquantics/_ly_sup2.txt
adverb/input/morphoquantics/_wise.txt

For the BNC search results, do a search for e.g. *er in spoken texts and download the results using the settings shown in docs/bncweb_downloadsettings.png.

Usage

Once all files are in place, you can run everything as follows:

./do-all.sh

To get approximate results faster, try:

./do-all.sh --quick

About

Analysing productivity of affixes with BNC & Morphoquantics data

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published