-
Notifications
You must be signed in to change notification settings - Fork 38
Freeling
to obtain the output in the format used by Freeling [[https://github.com/TALP-UPC/FreeLing/blob/master/data/pt/senses30.src][here]]:
- perform this query ([[http://wnpt.brlcloud.com:10035/#/repositories/wn30/query/d/select%2520?s%2520?w%2520%257B%250A%2520%2520graph%2520source:own-pt.nt%2520%257B%250A%2520%2520%2520%2520%2520%2520?ss%2520wn30:containsWordSense/wn30:word/wn30:lexicalForm%2520?w%2520%3B%250A%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520wn30:synsetId%2520?id%2520.%250A%2520%2520%257D%250A%2520%2520BIND%2520(str(?ss)%2520as%2520?s)%250A%257D%250Aorder%2520by%2520?s][link]]) and export its results as CSV:
#+BEGIN_SRC sparql select ?s ?w { graph source:own-pt.nt { ?ss wn30:containsWordSense/wn30:word/wn30:lexicalForm ?w ; wn30:synsetId ?id . } BIND (str(?ss) as ?s) } order by ?s #+END_SRC
-
remove the carriage returns in the file.
-
then run:
#+BEGIN_SRC sh sed -nE -e 's/"https://w3id.org/own-pt/wn30-pt/instances/synset-([0-9]+-[nrva])","([^"]+)"/\1|\2/p' query.csv | tr ' |' '_\t' | sort -u -k1 | awk -F ' ' '$1==last {printf " %s",$2; next} NR>1 {print "";} {last=$1; printf "%s",$0;} END{print "";}' | sort -k1 > senses30.src #+END_SRC
The CL lisp code for produce the senses30.src is the one below:
#+BEGIN_SRC lisp
(with-open-file (out "senses30.src" :direction :output :if-exists :supersede)
(labels ((print-hash-entry (key value)
(format out "~a ~{a^ }%" key value)))
(let ((tb (make-hash-table :test #'equal)))
(mapcar (lambda (e)
(let ((k (car e))
(v (substitute #_ #\Space (string-downcase (cadr e)))))
(if (gethash k tb nil)
(push v (gethash k tb))
(setf (gethash k tb) (list v)))))
(cl-csv:read-csv #P"query-2.csv"))
(maphash #'print-hash-entry tb))))
#+END_SRC