Skip to content

esteeschwarz/HU-LX

Repository files navigation

info on folder

13422.20221016(08.17)


contains R script transforming loosely formatted transcripts to CHAT conventional format. transcripts sources is the SES studies which are part of a large corpus of interviews conducted from 80ies-2010 by c.w. pfaff in the course of multilingual researches. the .cha (chat) file is generated from a transcript sample with exmaralda partitur editor to demonstrate how a standard conform transcript should be constructed.

stages

  • 20221023(22.47): generated .cha named output files of transformation; import to exmaralda partitur editor works cvd. the transformation is successful.
  • 20230305(19.01): script (conc_essai) to create database of lemmatized corpus (via SketchEngine). database final allows corpus analysis independent of sketchengine framework. base for the sketchengine lemmatization are standardized transcripts created with above transformation script.

class workflow/wrapup

About

HUB / C.W.Pfaff / exploring and archiving mulitlingual corpora essais

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages