# ASSIGNMENT 2 
The assignment consists in the development, in NLTK, OpenNLP, SketchEngine or GATE/Annie a pipeline that, starting from a text in input, in a given language (English, French, German and Italian are admissible) outputs the syntactic tree of the sentence itself, intended as a tree with root in S for sentence, and leaves on the tokens labelled with a single Part-of-speech. The generation of the tree can pass through one of the following models:

1) PURE SYMBOLIC. The tree is generated by a LR analysis with CF LL2 grammar as a base. Candidates can assume the following:

   a) Adjectives in English and German shall be only prefixed to nouns, whilst in French and Italian are only suffixed;

    b) Verbs are all at present tense;

    c) No pronouns are admitted;

    d) Only one adverb is admitted, always post-poned with respect to the verb (independently of the language, and the type of adverb);

    Overall the point above map a system that could be devised in regular expressions, but a Context-free grammar would be simpler to     
    define. Candidate can either define a system by themselves or use a syntactic tree generation system that can be found on GitHub. 
    Same happens for POS-tagging, where some of the above mentioned systems can be customized by existing techniques that are available
    in several fashions (including a pre-defined NLTK and OpenNLP libraries for POS-tagging and a module in GATE for the same purpose. Ambiguity 
    should be blocked onto first admissible tree.

2) PURE ML. Candidates can develop a PLM with one-step Markov chains to forecast the following token, and used to generate the forecast of the
     POS tags to be attributed. In this case the PLM can be generated starting with a Corpus, that could be obtained online, for instance by 
     using the Wikipedia access API, or other available free repos (including those available with SketchEngine. In this approach, candidates should
     never use the forecasting to approach the determination of outcomes (for this would be identical purpose of distinguishing EN/non ENG (and
     then IT/non IT, FR/not FR or DE/not DE) but only to identify the POS model in a sequence. In this case, the candidate should output the most
     likely POS tagging, without associating the sequence to a tree in a direct fashion.

Candidates are free to employ PURE ML approach to simplify, or pre-process the text in order to improve the performance of a PURE SYMBOLIC approach while generating a mixed model.

# CODE

In [20]:
import spacy
import nltk
from nltk import CFG
from nltk.tree import Tree, TreePrettyPrinter




### Base grammar

In [21]:

base_grammar = """
    S -> NP VP PUNCT | NP PUNCT
    NP -> NOUN | DET NP | ADJ NP | NP ADJ
    VP ->  VERB | VP ADV | AUX ADJ 
    """


## Parser

In [22]:
def Parser(sentence, nlp, base_grammar):
    pos_to_words = {}  # dictionary mapping parts of speech to a list of words with that POS
    words_in_sentence = []  # list of words in the sentence
    print("Sentence: " + sentence + "\n")
    for token in nlp(sentence):
        words_in_sentence.append(token.text)
        pos = token.pos_
        if pos not in pos_to_words:
            pos_to_words[pos] = []
        word = '"' + token.text + '"'
        if word not in pos_to_words[pos]:
            pos_to_words[pos].append(word)

    grammar_rules = base_grammar
    for pos, words in pos_to_words.items():
        rule = f"{pos} -> {' | '.join(words)}\n"
        grammar_rules += rule

    nltk_grammar = nltk.CFG.fromstring(grammar_rules)
    parser = nltk.ChartParser(nltk_grammar)
    trees = list(parser.parse(words_in_sentence))
    if trees:
        print(TreePrettyPrinter(trees[0]))


## English

In [23]:
nlp = spacy.load("en_core_web_sm")

sentences = [
"The black dog eats.",
"The cold river flows rapidly."
]
for s in sentences:
    Parser(s,nlp,base_grammar)


Sentence: The black dog eats.

               S             
       ________|__________    
      NP            |     |  
  ____|____         |     |   
 |         NP       |     |  
 |     ____|___     |     |   
 |    |        NP   VP    |  
 |    |        |    |     |   
DET  ADJ      NOUN VERB PUNCT
 |    |        |    |     |   
The black     dog  eats   .  

Sentence: The cold river flows rapidly.

               S                          
      _________|_______________________    
     NP                   |            |  
  ___|____                |            |   
 |        NP              VP           |  
 |    ____|____       ____|_____       |   
 |   |         NP    VP         |      |  
 |   |         |     |          |      |   
DET ADJ       NOUN  VERB       ADV   PUNCT
 |   |         |     |          |      |   
The cold     river flows     rapidly   .  



## Italian

In [24]:
nlp = spacy.load("it_core_news_sm")

sentences = [
"Il cane nero mangia.",
"Il fiume freddo scorre rapidamente."
]

for s in sentences:
    Parser(s,nlp,base_grammar)


Sentence: Il cane nero mangia.

              S               
          ____|____________    
         NP         |      |  
      ___|____      |      |   
     NP       |     |      |  
  ___|___     |     |      |   
 |       NP   |     VP     |  
 |       |    |     |      |   
DET     NOUN ADJ   VERB  PUNCT
 |       |    |     |      |   
 Il     cane nero mangia   .  

Sentence: Il fiume freddo scorre rapidamente.

                S                                
           _____|_____________________________    
          NP                 |                |  
      ____|_____             |                |   
     NP         |            VP               |  
  ___|____      |       _____|_______         |   
 |        NP    |      VP            |        |  
 |        |     |      |             |        |   
DET      NOUN  ADJ    VERB          ADV     PUNCT
 |        |     |      |             |        |   
 Il     fiume freddo scorre     rapidamente   .  



## French

In [25]:
nlp = spacy.load("fr_core_news_sm") 

sentences = [
"Le chien noir manger.",
"La rivière froide coule rapidement."
]

for s in sentences:
    Parser(s,nlp,base_grammar)


Sentence: Le chien noir manger.

               S               
           ____|____________    
          NP         |      |  
      ____|____      |      |   
     NP        |     |      |  
  ___|____     |     |      |   
 |        NP   |     VP     |  
 |        |    |     |      |   
DET      NOUN ADJ   VERB  PUNCT
 |        |    |     |      |   
 Le     chien noir manger   .  

Sentence: La rivière froide coule rapidement.

                  S                              
            ______|___________________________    
           NP                 |               |  
      _____|______            |               |   
     NP           |           VP              |  
  ___|_____       |       ____|______         |   
 |         NP     |      VP          |        |  
 |         |      |      |           |        |   
DET       NOUN   ADJ    VERB        ADV     PUNCT
 |         |      |      |           |        |   
 La     rivière froide coule     rapidement   .  



## German

In [26]:
nlp = spacy.load("de_core_news_sm")

sentences = [
"Der schwarze Hund isst.",
"Der kalte Fluss fließt schnell."
]

for s in sentences:
    Parser(s,nlp,base_grammar)


Sentence: Der schwarze Hund isst.

                  S             
        __________|__________    
       NP              |     |  
  _____|______         |     |   
 |            NP       |     |  
 |      ______|___     |     |   
 |     |          NP   VP    |  
 |     |          |    |     |   
DET   ADJ        NOUN VERB PUNCT
 |     |          |    |     |   
Der schwarze     Hund isst   .  

Sentence: Der kalte Fluss fließt schnell.

                S                           
       _________|________________________    
      NP                    |            |  
  ____|____                 |            |   
 |         NP               VP           |  
 |     ____|____       _____|_____       |   
 |    |         NP    VP          |      |  
 |    |         |     |           |      |   
DET  ADJ       NOUN  VERB        ADV   PUNCT
 |    |         |     |           |      |   
Der kalte     Fluss fließt     schnell   .  

