# Morphology in NLP

Morphology is the domain of linguistics that analyzes the internal structure of the word.

According to the classical approach in the linguistics, words are form of <span style="background-color: #93997a">morphemes</span> which are the minimal(i.e decomposable) linguistic unit that carry the meaning.

Words are not the smallest unit of language. They are composed of deeper internal structural morphemes (Morphems).
A word is like a molecule in chemistry, although it is a whole, but it can be split into atoms, and a morpheme is an atom enough to make it.

Example: <span style="background-color: #aacc1d">mis</span>-understand-<span style="background-color: #aacc1d">ing</span>-<span style="background-color: #aacc1d">s</span>


# Why Morphology?

Many languages processing applications need to extract the information encoded in the words.

  - Parsers which analyzes sentence structure need to know or check the agreement between.
  
    -> Subjects and verbs
    -> Adjectives and Nouns
    
  - Information retrieval systems which took benifit from know what the stem of a word is.
  - Machine Translation systems need to analyze words to their components and generate words with specific features in the    target language.
  
# Morphological Process

There are three Morphological process, used in many languages,can be distinguished:

- Inflection
- Derivation
- Compounding

<center><h2>Inflection</h2></center>

Phenomena of declination and conjugation (change the number, gender, time, person, mode and case). It doesn't change the POS(parts-of-speech) of a word.

Horse: Horse<span style="color: red">s</span>, Eat: eat<span style="color: red">ing</span>, Likes: Like<span style="color: red">d
    
Inflection doesn't induce a grammatical category change. There are various words linked by inflection(or inflected form) are by the lemmatization, represented by a single form, the lemma, which corresponds for the English langugaes to the infinitive of the verbs, masculine singular of the adjectives, and to the singular of the nouns.

<img src=".\Images\26.png">

<center><h2>Derivation</h2></center> 

Formation of new words thanks we can do that by the addition of affixes to the root. The Derivational morphology produces a new word with usually the different parts-of-speech(pos) category.

Example: make a verb from a noun

The new words is said to derived from the old words like happy(Adj) --> happi+ness(Noun), nation/national/nationalise/nationalist/nationalism/

If we can see in <span style="color: red">French</span>, we can distinguish three derivational operations:

First one, derivation by prefixation(prefix + root) like e.g. (precancer = pre + cancer)
Second, derivation by suffixation(root + suffix) e.g. (Cancerous = cancer + them)
Third, para synthetic training(prefix + root + suffix) e.g. (interavenous = intra + vein + euse)

<center><h2>Composition</h2></center>

In this, Combination of two or more bases to form a new word, for example, we can add free morpheme to another free morpheme.(e.g. blackboard, workflow, overflow, underflow)

# Morphemes 

__1.Concatenative morphology__

Each word consists of several morphemes next to each other.

- Roots
The central morpheme of a word is the meaning of the word.

- Affixes

i). Prefixes 
<span style="background-color: #aacc1d">pre</span>-natural,
<span style="background-color: #aacc1d">ir</span>-regular

ii). Suffixes(suffix)
determin-<span style="background-color: #aacc1d">ize</span>,
iterat-<span style="background-color: #aacc1d">or</span>

iii). Infixes
five-<span style="background-color: #aacc1d">bleep</span>-mile

iv). Circumfixes
<span style="background-color: #aacc1d">ge</span>-sammel-<span style="background-color: #aacc1d">t</span>


__2.Nonconcatenative morphology__
- Umlaut

foot:feet
tooth:teeth

- Ablaut(Verb transformation)

sing-sang-sung
read-read-read

- Root and pattern morphology or templatic morphology 

Common in Arabic, Hebrew, other Afroaisatic languages
Consonants from the root, and then push the vowels.

- Infixation

Gr-<span style="background-color: #aacc1d">um</span>-adwet

## Words

There are two main methods for morphemes to form words: inflection and derivative.

### Inflectional morphology

Add context-sensitive information to words. These words generally have the same semantics.For example, the number of nouns in the context, the third-person singular form of the verb, etc., and the change of the subject-predict-object position in the sentence.

For example: Number (singular versus plural) ---> automation: automata, walk: walks

Case (nominative versus accusative versus ...) ---> he: him: his ...

### Derivation morphology

Combine words with affixes to form new words. These words generally have different semantics.
For example: parse: parse, repulse: repulsive

### Irregularity

Inflectional forms are generally related to their roots. The same derived morpheme may have different meanings and functions depending on the root to which it is attached.

- Formal irregularity

walk: walked: walked, sing: sang: sung

- Semantic irregularity / unpredictability

a king-ly old man (use correctly), a slow-ly old man (use error)



# Ambiguity in Natural language processing

We can define ambiguity as the ability of having more than one meaning or its being understood in more than one way.Natural languages ambiguous, so computers are not going to understand language the way people do.The NLP(Natural Language Processing) is concerned with the development of computational models which are of aspects of human language processing. In NLP, ambiguity can occur at various level.Ambiguity could be like Lexical, Syntactic, Semantic, Pragmatic, etc.

As we already know, Natural language Processing(NLP) is an area of research and application that explores how computers can be used to understand and manipulate the natural language text or speech to do useful things. The text based NLP is regarded as consisting of various levels. They are: 

<span style="color: red">Lexical Analysis:</span> - Analysis of word forms

<span style="color: red">Syntactic Analysis:</span> -Structure processing

<span style="color: red">Semantic Analysis:</span> - Meaning representation

<span style="color: red">Discourse Analysis:</span> - Processing of interrelated sentences

<span style="color: red">Pragmatic Analysis:</span> -The purposeful use of sentences in situations

Ambiguity can be occur at all these levels.Ambiguity is a property of linguistic expressions. If an expression(word/phrase/sentence) has more than one interpretation we can refer this as ambiguous.
Let's understand it by example: "The chicken is ready to eat" 

The interpretations in the above sentence can be, the chicken(bird) is ready for the fedor. The chicken(food) is ready to be eaten.

"There was not a single man at the party"

The interpretations in this sentence can be, Lack of bachelors at the party or Lack of men altogether.

## Different types of Ambiguity

There are different types of ambiguities:

<span style="color: red">1.Lexical Ambiguity</span>: It is the ambiguity of a single word. A word can be amgiuous with respect to its syntactic class. e.g: books, study.
For e.g: Here we used the word silver which can be used as an noun, an adjective, or a verb.
She got two silver medals, He made the silver speech, He worries about his dyed hair with silver colour.
Lexical ambiguity can be resolved by the lexical category disambiguation, i.e pos(parts-of-speech) tagging. As many words which may belong to more than one lexical category pos(parts-of-speech) which is the process of assigning a part-of-speech(pos) or a lexical category such as the noun, verb, pronoun, preposition, adverb, adjective, etc to each and every word in the sentence.

<span style="color: red">1.1 Lexical semantic ambiguity</span>: The type of lexical ambiguity, this will occur when a single word can be associated with multiple senses. E.g. bank, cricket, bat, fast, etc. 

For e.g: The bank of a river, All banks are closed because of the national holiday.
The occurence of bank in both sentences corresponds to the syntactic category noun, but there meanings are different. So, this Lexical semantic ambiguity can be resolved by using word sense disambiguation(WSD) techniques, where WSD aim at the automatically assigning the meaning of the word in the context in a computational manner.

<span style="color: red">2.Syntactic ambiguity</span> : The structural ambiguities were the syntactic ambiguity.
These two kinds of syntactic ambiguity: ScopeAmbiguity and Attachment ambiguity

<span style="color: red">2.1.Scope ambiguity</span> : Scope ambiguity is involved with the operators and quantifiers.
Let's clear it out by example:

"Old men and women are getting a special treatment."

The Scope of the adjective(i.e the amount of its qualifies) is ambiguous. It is more complicated, whether the structure like(old men and women) or ((old men) and women)?

Here the scope of quantifiers is often not clear and it creates ambiguity.

"Behind every successful man there is a strong woman."

The interpretation here is, For every successful man there is woman and also it can be like without woman you can't be successful.

<span style="color: red">2.2.Attachment ambiguity</span> : We can say a sentence has attachment ambiguity if a constituent fits more than one position in a parse tree.Attachment ambiguity arises from uncertainity of attaching the phrase or clause to a parts-of-speech(pos).
Let's get better understanding by examples.

"The man saw the girl with the telescope."
In the above example, it is ambiguous whether the man saw the girl carrying a telescope or he saw it through the telescope.
The whole meaning is dependent on whether the preposition 'with' is attached to a girl or a man.

"Buy books for children"
Preposition phrase 'for children' which can be adverbial and attach to the verb buy or adjectival and attach to the object noun books.

<span style="color: red">3.Semantic ambiguity</span> : This ambiguity occur when the meaning of the words themselves can be misinterpreted.Even after the syntax and the meanings of the individual words which have been resolved, there are two ways of reading the sentence.
For example: "Tarun loves his mother and meghna does too."
The interpretation of above example can be meghna loves tarun's mother or Meghna loves her own mother.
Semantic ambiguity comes from the fact that generally a computer is not going to understand what will be the logical form and what is not.

<span style="color: red">4.Discourse</span> : For discourse level processing needs a shared knowledge or shared world and the interpretation is carried out by using this context. Anaphoric ambiguity which is comes under the discourse level.

<span style="color: red">4.1.Anaphoric ambiguity</span> : Anaphrous are the entities which we introduced previously in above section.
By example, let's understand it.
"The Donkey ran up the hill.It was too steep.It soon got tired."
The anaphoric refrence for "it" in this situation will create the ambiguity.
Steep means surface here 'it' can be hill. Tired applies here to animate objects hence 'it' can be horse.

<span style="color: red">Pragmatic Ambiguity</span> : It refers to the situation where the phrase context have multiple interpolation. This one is the hardest task in NLP. This problem involves processing the user intention, sentiment, belief.

Let's get the better understanding by example:
Customer to waiter:Go to my room and check my shoe is there;don't be late;I have to catch train in 10 minutes
Waiter(Comes back and panting heavily): Yes, it is there.
Clearly, the waiter is falling short for the expectation of the customer, he did not understand the pragmatics of the situation.

Pragmatic ambiguity arises when the statement is not the specific one, and the context doesn't provide the appropriate information needed to clarify the statement.
For example:
"I love you too."
This could be interpreted as
I love you(just like you love me)
I love you(just like someone else does)

# Syntax & Structure in NLP

For any language in the world, syntax and structure comes hand in hand, where a set of specific rules, convention and the principles govern the way words are combined into phrases;the phrases are combined into clauses and the finally clauses are combined into sentences.Here we will discussing mainly about the English language syntax and structure in this section.In English, words are usually combined together to form other constituent units. The constituents include words, phrases, clauses and sentences. Let's Considering a sentence, ***"Six sick hicks nick six slick bricks with picks and sticks."***, this sentence is made of a bunch of words and just looking at the words by themselves don't tell us much.

<img src=".\Images\27.png">
<center>Box have some unordered words which don't convey any information.</center>

Understanding about the structure and syntax of language is helpful in many areas like text processing, text annotations, and text parsing for further operations such as text summarization or classification. There are typical parsing techniques for understanding of text syntax which are mentioned below:

- Parts-of-speech(POS) tagging
- Shallow Parsing or Chunking
- Constitenuency Parsing
- Dependency Parsing

We'll give brief of all in below section.

## Tagging Parts-Of-Speech

POS(Parts of speech) are specific lexical categories to which words are assigned, that is based on their syntactic context and role. If we can see words are usually, words can fall into one of the following major categories.

- ***N(oun)*** : The Noun is usually denotes words that represent some object or entity, which would wither living or non-living.Some examples is like dog,cat, phone and so on. The POS tag symbol for noun is ***N***.

- ***V(erb)*** : Verb are those words which used to describe certain actions, states, or occurrences. There are a wide variety of subcategories of verb, such as auxiliary, reflexive, and transitive verbs(and many more). Some of the examples of verbs are running, eating, jumping, playing, read, etc. The POS tag symbol for the verb is ***V***.

- ***Adj(ective)*** : It is the words which are used to describe or qualify other words, typically these are nouns and noun phrases. The phrase beautiful girl has the Noun(N) girl which is described or qualified using the adjective(Adj) beautiful. The POS tagging symbol for adjective is ***ADJ***

Besides these three major categories of parts of speech, there were other categories that occur frequently in the English language. These includes adverb, pronouns, prepositions, interjections, conjuctions, determiners, and many more. Furthermore, each POS tag like the N(oun), this can be further subdivided into categories like *singular nouns* **(NN)**, singular proper nouns **(NNP)** , and the *plural nouns* **(NNS)**.

The process of classifying and labeling POS tags for words known as the POS or *parts-of-speech tagging*. Parts-of-speech are usually used to annotate words and represent their parts-of-speech, which is really helpful to perform for specific analysis, such as narrowing down upon nouns and see which one is most prominent, word sense disambiguation, and grammar analysis.


## Shallow parsing or Chunking

It is based on the hierarchy we represented earlier, group of words make some phrases. There are major five categories of phrases:

- **Noun Phrases(NP)** : In these type of phrases where a noun can acts as the head word. Noun phrases act as a subject or object to a verb.

- **Verb Phrases(VP)** : In these type of phrase are lexical units that have the verb which is acting as the head word.Usually, there are two forms of verb phrases. One of the verb has the verb components as well as other entities like nouns, adjectives, or adverbs as the parts of the object.

- **Adjective Phrase(ADJP)** : In these type of phrases with an adjective as the head word. The main role of adjective is to describe or qualify nouns and pronouns in a sentence, and it'll be either placed before or after the noun or pronoun.

- **Adverb Phrase(ADVP)** : In these phrase act like adverbs since adverb act as the head word in the phrase. It is used as modifiers for nouns, verbs, or adverbs themselves by providing the further details that describes or qualify them.

- **Prepositional Phrase(PP)** : In this phrases usually contains the preposition as the head word and other lexical components which are like nouns, pronouns and so on. These act like an adjective or adverb which describes other words or phrases.

Shallow parsing, we can also called light parsing or chunking, is a popular natural language processing(NLP) technique for analyzing structure of the sentence to break those sentence into its smallest constituents(which we called tokens also) and group them together into the higher-level of phrases. This includes the POS tagging plus phrases from a sentence. 

<img src=".\Images\28.png">
<center>In above example, it represent shallow parsing of higher level phrase anotations</center>

Here, we have taken the **conll2000** corpus for training our shallow parser model. So, this corpus is available in **nltk** with chunk of annotations are available and we will be using around 10k records for training the shallow parser model.  A sample annotated sentence is represented as follows: 

In [1]:
from nltk.corpus import conll2000

data1 = conll2000.chunked_sents()
trained_data1 = data1[:10900]
test_data1 = data1[10900:]

print(len(trained_data1), len(test_data1))
print(trained_data1[2])

10900 48
(S
  But/CC
  (NP analysts/NNS)
  (VP reckon/VBP)
  (NP underlying/VBG support/NN)
  (PP for/IN)
  (NP sterling/NN)
  (VP has/VBZ been/VBN eroded/VBN)
  (PP by/IN)
  (NP the/DT chancellor/NN)
  (NP 's/POS failure/NN)
  (VP to/TO announce/VB)
  (NP any/DT new/JJ policy/NN measures/NNS)
  (PP in/IN)
  (NP his/PRP$ Mansion/NNP House/NNP speech/NN)
  (NP last/JJ Thursday/NNP)
  ./.)


From the preceding output, we can see that our data points are the sentences which are already annotated with phrases and POS tag metadata that will be a plus point in training for our shallow parser model. We will going to leverage two chunking utility functions, tree2conltags2tree to generate a parse tree from these tokens triple. We will be using these functions for to train our parser. A sample representation below.

In [5]:
from nltk.chunk.util import tree2conlltags, conlltags2tree

wfh = tree2conlltags(trained_data1[2])
wfh

[('But', 'CC', 'O'),
 ('analysts', 'NNS', 'B-NP'),
 ('reckon', 'VBP', 'B-VP'),
 ('underlying', 'VBG', 'B-NP'),
 ('support', 'NN', 'I-NP'),
 ('for', 'IN', 'B-PP'),
 ('sterling', 'NN', 'B-NP'),
 ('has', 'VBZ', 'B-VP'),
 ('been', 'VBN', 'I-VP'),
 ('eroded', 'VBN', 'I-VP'),
 ('by', 'IN', 'B-PP'),
 ('the', 'DT', 'B-NP'),
 ('chancellor', 'NN', 'I-NP'),
 ("'s", 'POS', 'B-NP'),
 ('failure', 'NN', 'I-NP'),
 ('to', 'TO', 'B-VP'),
 ('announce', 'VB', 'I-VP'),
 ('any', 'DT', 'B-NP'),
 ('new', 'JJ', 'I-NP'),
 ('policy', 'NN', 'I-NP'),
 ('measures', 'NNS', 'I-NP'),
 ('in', 'IN', 'B-PP'),
 ('his', 'PRP$', 'B-NP'),
 ('Mansion', 'NNP', 'I-NP'),
 ('House', 'NNP', 'I-NP'),
 ('speech', 'NN', 'I-NP'),
 ('last', 'JJ', 'B-NP'),
 ('Thursday', 'NNP', 'I-NP'),
 ('.', '.', 'O')]

The Chunk tags use the IOB format. This IOB notations represents Inside, Outside, and Beginning. The B-prefix before the tag indicate that it is the begining of a chunk, and the I-prefix indicates that it is inside the chunk. The *O* tag indicates that the token is not belong to any chunk. The B-tag is usually used when there are subsequent tags of the same type following it without the presence of *O* tags between them.

## Constituent Parsing

Constituent-based grammars are used to analyze and determine the constituents of the sentence. These Constituent grammars can be used to model or depict the internal structure of the sentences in terms of a hierarchically ordered structure of their constituents. As we already know, each word usually belongs to the specific lexical category in the case and it forms the head word of different phrases. So, These phrases can be formed based on the rules which we called *phrase structure rule*.

**Phrase structure rule** form the deep core of constituency grammars, because they usually tak about the syntax and rules that govern the hierarchy and ordering of some constituents in the sentences. These rules serve to two things primarily.

- It determine what words are used to construct the phrases, sentences or constituents.
- They determine how you need to order these constituent together.

The common representation of the phrase structure rule is ***S --> AB***, which represents that the structure ***S*** consists of constituents ***A*** and ***B***, and the ordering is ***A*** followed by ***B***. While , there are several rules, the important rule describes how to divide the sentence or a clause. The phrase structure rule denotes a binary division for the sentence or clause as ***S --> NP VP*** here ***S*** is the sentence or clause, and it id divided it into the subject, denoted by the NP(noun phrase) and the predicate, denoted by the VP(verb phrase).

A constituency parser which can be built based on such grammars/rules, which are usually collectively available as context-free grammar(CFG). The parser would process input sentences according to these rules, and that will help in building a parse tree.

<img src=".\Images\29.png">

## Dependency Parsing

In dependency parsing, we will use dependency-based grammars to annalyze and deduce both structure and semantic dependencies and relationships between the tokens in the snetence. The basic principle behind the dependency grammar is that in any of the sentence in the language, all words except one, hwihc have some relationship or dependency on the other words in the same sentence. The word which don't have dependency is called the root of the sentence. Here we taken verb as the root of the sentence in most of the cases. All the ither words are directly or indirectly linked to te root verb using links, which are the dependencies.

We are considers our sentence ***"The brown fox is quick and he is jumping over the lazy dog”***  

<img src=".\Images\30.png">

If we take a closer look, some of dependencies are in there, it is not too hard to understand them.

- ***det*** dependency tag is pretty intuitive--it denotes the determiner relationship between the nominal head and the determiner. Usually, the word with the parts-of-speech(POS) tag **DET** will also have the ***det*** dependency tag relation. Examples like **fox --> the** and **dog --> the**.

- ***amod*** dependency tag stands for adjectival modifier and stands for any adjective that modifies the meaning of the noun. Examples like **fox --> brown** and **dog --> lazy**

- ***nsubj*** dependency tag which stands for an entity that acts as a subject or agent in the clause. Examples like **is --> fox** and **jumping --> he**.

- ***cc*** and ***conj*** dependencies have more to do with the linkages related to words connected by coordinating conjuctions. Examples like **is --> and** and **is --> jumping**.

- ***aux*** dependency tag indicates the auxiliary or secondary verb into the clause or sentence. Example like **jumping --> is***.

- ***acomp*** dependency tag stands for adjective complement and acts as the complement or object to a verb in the sentence. Example like **is --> quick**

- ***prep*** dependency tag denotes a prepositional modifier, which usually modifies the meaning of a noun, verb, adjective, or preposition.Examples: **jumping --> over**.

- ***pobj*** dependency tag which is used to denote the object of a preposition. This is usually the head of the noun phrase following a preposition in the sentence.Example like **over --> dog**

# Lexical Knowledge

Imagine, You have an intelligent computer which would extract knowledge from the unstructured or raw data and would be able to answer the questions concerning from the texts.Suppose, you're a teacher and you have to examine scores of students, mark them in examination and sometime you have to pass it through the cruel selection process. In terms of the situation, one sentence is there:

*"At the beginning of this year, I had 100 students and 90 eventually took my exams. I marked their assignments and flunked a third of the undergraduates."*

From your imagination, the smart machine was able to process and answer the following natural language questions, same like reasonably intelligent Human being:

- How many students passed?
- How many students failed?
- How many students passed the tests?
- How many undergraduates flunked?
- How many students did the teacher pass?

All these questions naive insofar as any 10-year old kid would answer it without any difficulty. It is very interesting to examine the psycholinguistic mechanisms which can be activated to analyse these type of questions and map them onto the information get extracted from the basic scenario. How do we manage to infer that 30 students not performed well or failed the tests from the statement that the teacher flunked one third of the 90 undergraduates who have given the exams? What kind of lexical knowledge we have to activate and what type of mechanisms do we have to trigger in order to make it possible for the computer to immitate the human behaviour?

## Computational lexicography

From over two decades, researchers have tried to tap a variety of lexical resources and textual resources to populate the lexical components of their NLP(natural language processing) systems. Comercial dictionary produced by many of the established publishing houses which have been found to contain a lot of syntactic, semantic, and pragmatic information concentrated into the compact lexical entries. Over the years, some methods were developed to obtain the cruicial knowledge from the electronic versions of these commercial dictionaries. The initial attempts to reuse existing dictionaries which focused on monolingual refrence works, mainly in English learner's dictionaries whose, systems of grammatical codes and the simplified definitions which had been found to house the very syntactic information required to drive the parser(but some researchers have not shown any interest that relying too much on dictionaries is dangerous and may not reflect the evidence found in the large corporaAtkins & Levin (1991) being a case in point). For automatic identification of genus terms which made it possible to construct the partial taxonomies of a is_a relations. Hiearchies such as of hyperonyms, hyponyms, and co-hyponyms are indeed a information retrieval, where the questions rarely matches the vocabulary used in the answers. Such type of requirements probably accounts for the widespread use of the WordNet database(Fellbaum 1998), which, despite its limitations, which had the undeniable merit of being freely accessible and of offering a very wide lexical coverage and the whole gamut of lexical-semantic relations. 

If you have remembered till now, we created one scenario, one of the main problem is to make it possible for a computer to compute the similarity between the word *test*, which we used in some of the questions, and the word *exam*, used in the source text containing the information to be exploited. The following entries, from the variety of well known dictionaries which is available in electronic form, ranging from the WordNet to LDOCE(Procter 1978) or Cobuild(Sinclair 1987), show that this similar things can be discovered and computed by which depends on the existing resources, despite that the level of preparatory work is different from one resource to another. While if you check in WordNet, it will make it possible to go from *test* to *exam*, it's simply because they are belong to same synset(set of the synonyms), the utilization of LDOCE or Cobuild requires more analysis of definitions.

<img src=".\Images\31.png">
<center>This one i got from the official WordNet page.</center>

<img src=".\Images\32.png">
<center>Got this from LDOCE official page</center>

<img src=".\Images\33.png">
<center>This one from the collin's cobuild</center>

If you look closely, What clearly needed here is, in any case, the thesauric approach to Organization of lexicon which are in order to capture semantically similar items which, in the traditional thesaurus like Roget's, that is appear under the same class(see also Calzolari 1988)

## Transivity alteration

We had seen in traditional grammar, the transitive verb is defined it as a verb which takes a direct object while an intransitive verb occurs without any such direct verb like "I <u>watched</u> a movie on TV last night vs. It <u>rained</u> for about of two hours". Atkins et al.(in 1986) has shown the distinction, which is very frequently we are using to identify seemingly, different senses in dictionaries, is too much exterior and that the linguistic description of the syntactic behaviour of verbs which depends upon the further classifications which are unfortunately too much implied in dictionaries. In our created scenario, it is clear that the question

"How many of undergraduate flunked?"

This can only be answered if machine realizes that the subject of the intransitive verb ("flunk is for the undergraduate student") appears as and co-relate to a direct object of the same verb used transitively in the source text of our initial scenario:

"I flunked a third of the undergraduate"

The property is not typical of the verb *flunk*, of course. In the same conditon, this alteration might be found with the similar verbs, as it is shown in the examples:

<center>(i). The teacher <u>failed</u> 15 students</center>
<center>(ii). 15 students <u>failed</u></center>
<center>(iii). The teacher <u>passed</u> 15 students</center>
<center>(iv). 15 student <u>passed</u></center>


## Collocations

The propensity of words to co-occur in prefabricated chunks of languages has attaracted a lot of attention over the decade((Sinclair 1991, Cowie 1998). Large availability of corpora has made it possible to shed some lights onto the concept of collocation and statistical tools are now the rules rather than the anamoly in many publishing houses, whose lexicographers are defy with the seemingly insurmountable task to shift through thousands of concordance to take out the most relevant fact about the behaviour of the lexical item they are analysing. Research in applied linguistics and languages learning has shown that the words are best learnt and retrained if they are there in the in the contexts and more specifically together with the other items with which they are most likely to appear.On the other hand, native and non-native speakers are often face the tip-of-the-tongue phenomenon, which causes them to look, sometimes in vain, for the appropriate word expressing the given meaning in the given context. These observations comes as the results in the  dimension as the central axis and are specifically designed to meet the requirements of those who wish to encode text.