<a href="https://colab.research.google.com/github/maxhof905/se_corpus/blob/main/dependency_parsing_script.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# SE-constructions in spaCy (UD 1.2 to 2.9)

maxhof905

## Content

I provide this script to put the examples of SE-constructions cited in Degraeuwe & Gotthals (2020) und Silveira (2016) into the bigger context of how UD annotation guidelines and therefore spaCy dependency parsing have improved over the years. 



---


Sources of the examples:

Degraeuwe, J., & Goethals, P. (2020). Reflexive pronouns in Spanish Universal    Dependencies. Procesamiento del Lenguaje Natural, 64(0), 77–84. Retrieved Decemebr 3, 2021, from http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6198

Silveira, N. (2016). Designing syntactic representations for NLP: An empirical investigation [Stanford Univerity]. Retrieved January 24, 2022, from https://purl.stanford.edu/kv949cx3011


### Import statements & model instantiation

In [None]:
!pip install -U spacy
!python -m spacy download es_core_news_sm
!python -m spacy download pt_core_news_sm

Collecting spacy
  Downloading spacy-3.3.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.2 MB)
[K     |████████████████████████████████| 6.2 MB 7.5 MB/s 
[?25hCollecting pathy>=0.3.5
  Downloading pathy-0.6.1-py3-none-any.whl (42 kB)
[K     |████████████████████████████████| 42 kB 820 kB/s 
[?25hCollecting langcodes<4.0.0,>=3.2.0
  Downloading langcodes-3.3.0-py3-none-any.whl (181 kB)
[K     |████████████████████████████████| 181 kB 59.8 MB/s 
Collecting thinc<8.1.0,>=8.0.14
  Downloading thinc-8.0.16-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (660 kB)
[K     |████████████████████████████████| 660 kB 38.9 MB/s 
[?25hCollecting pydantic!=1.8,!=1.8.1,<1.9.0,>=1.7.4
  Downloading pydantic-1.8.2-cp37-cp37m-manylinux2014_x86_64.whl (10.1 MB)
[K     |████████████████████████████████| 10.1 MB 36.4 MB/s 
[?25hCollecting typing-extensions<4.0.0.0,>=3.7.4
  Downloading typing_extensions-3.10.0.2-py3-none-any.whl (26 kB)
Collecting typer<0.5.0,>=0.3.0
  Down

In [None]:
import spacy
from spacy import displacy
import es_core_news_sm
import pt_core_news_sm

In [None]:
spacy.__version__

'3.3.0'

In [None]:
es_nlp = es_core_news_sm.load()
pt_nlp = pt_core_news_sm.load()

### Degraeuwe & Goethals (2020)

UD v2.5

1) SE was labeled as direct object in spaCy (p.78)

**Now spaCy labels it expl:pass. Argueably, it should be labeled as expl:pv.**

In [None]:
doc = es_nlp("Se acuerdan de ti.")
displacy.render(doc, style='dep', jupyter=True)

2) SE was labeled as direct object in spaCy (p.78)

**Now spaCy labels it expl:pass**

In [None]:
doc = es_nlp("se celebran los cien años del club")
displacy.render(doc, style='dep', jupyter=True)

### Silveira (2015)

UD v1.2

Se-type in brackets according to the author

####Portuguese

1) (True reflexive) was labeled *dobj* in Portuguese treebank (p.128)

**Now spaCy labels it mark (temporal marker)**

In [None]:
doc = pt_nlp("Gravações acústicas se encaixam com o nosso tipo de som")
displacy.render(doc, style='dep', jupyter=True)

2) (Inchoative) was labeled *iobj* in Portuguese treebank (p.128)

**Now spaCy labels it expl**

In [None]:
doc = pt_nlp("A praia se torna exclusiva dos passageiros")
displacy.render(doc, style='dep', jupyter=True)

3) (Inherent) was labeled *dobj* in Portuguese treebank (p.129)

**Now spaCy labels it expl**

In [None]:
doc = pt_nlp("Ele se apropia de todas as formas de peixes")
displacy.render(doc, style='dep', jupyter=True)

4) (Passive) was labeled *dobj* in Portuguese treebank (p.129)

**Now spaCy labels it dep**

In [None]:
doc = pt_nlp("Colocam se novas dúvidas")
displacy.render(doc, style='dep', jupyter=True)

5) (Impersonal) was labeled *nsubj* in Portuguese treebank (p.129)

**Now spaCy labels it expl**

In [None]:
doc = pt_nlp("Causa-me perplexidade que se trabalhe para isso")
displacy.render(doc, style='dep', jupyter=True)

#### Spanish

1) (True reflexive) was labeled *iobj* in Spanish treebank (p.129)

**Now spaCy labels it expl:pass**

In [None]:
doc = es_nlp("La CNT se retira de los comités")
displacy.render(doc, style='dep', jupyter=True)

2) (Inchoative) was labeled *iobj* in Spanish treebank (p.130)

**Now spaCy labels it expl:pv**

In [None]:
doc = es_nlp("East Milton se encuentra ubicado en las coordenadas")
displacy.render(doc, style='dep', jupyter=True)

3) (Inherent) was labeled *iobj* in Spanish treebank (p.130)

**spaCy doesn't split the clitic from the infinitive in this example**

In [None]:
doc = es_nlp("El gobierno decidió no quedarse cruzado de brazos")
displacy.render(doc, style='dep', jupyter=True)

4) (Passive) was labeled *iobj* in Spanish treebank (p.130)

**Now spaCy labels it expl:pass**

In [None]:
doc = es_nlp("También se visualizan numerosas depresiones")
displacy.render(doc, style='dep', jupyter=True)

5) (Impersonal) was labeled *iobj* in Spanish treebank (p.130)

**Now spaCy labels it expl:pv**

In [None]:
doc = es_nlp("No se adhirió al pasaporte biológico")
displacy.render(doc, style='dep', jupyter=True)