Skip to content
Alexandre Rademaker edited this page Jan 6, 2019 · 4 revisions

to obtain the output in the format used by Freeling (https://github.com/TALP-UPC/FreeLing/blob/master/data/pt/senses30.src):

  • perform this query
select ?s ?w {
  graph source:own-pt.nt {
      ?ss wn30:containsWordSense/wn30:word/wn30:lexicalForm ?w ;
          wn30:synsetId ?id .
  }
  BIND (str(?ss) as ?s)
}
order by ?s

Run the code

  sed -nE -e 's/"https:\/\/w3id.org\/own-pt\/wn30-pt\/instances\/synset-([0-9]+-[nrva])","([^"]+)"/\1|\2/p' query.csv |
     tr ' |' '_\t' | sort -u -k1 | 
     awk -F ' ' '$1==last {printf " %s",$2; next} NR>1 {print "";} {last=$1; printf "%s",$0;} END{print "";}' | sort -k1 > senses30.src

The CL lisp code for produce the senses30.src is the one below:

(with-open-file (out "senses30.src" :direction :output :if-exists :supersede)
	   (labels ((print-hash-entry (key value)
		      (format out "~a ~{~a~^ ~}~%" key value)))
	       (let ((tb (make-hash-table :test #'equal)))
		 (mapcar (lambda (e)
			   (let ((k (car e))
				 (v (substitute #\_ #\Space (string-downcase (cadr e)))))
			     (if (gethash k tb nil)
				 (push v (gethash k tb))
				 (setf (gethash k tb) (list v)))))
			 (cl-csv:read-csv #P"query-2.csv"))
		 (maphash #'print-hash-entry tb))))

Clone this wiki locally