Skip to content

Commit

Permalink
Handle non-ASCII characters in .TextGrid files
Browse files Browse the repository at this point in the history
  • Loading branch information
patrickreidy committed Jul 11, 2018
1 parent dd151e5 commit 790972e
Show file tree
Hide file tree
Showing 10 changed files with 77 additions and 17 deletions.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ Description: The software application Praat can be used to annotate
Depends:
R (>= 3.2.3)
Imports:
methods
methods, readr
Suggests:
testthat
License: GPL-3
Expand Down
2 changes: 2 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ exportClasses(PointTier)
exportClasses(TextGrid)
exportClasses(Tier)
exportMethods(TextGrid)
import(methods)
import(readr)
importFrom(methods,new)
importFrom(methods,setClass)
importFrom(methods,setGeneric)
Expand Down
13 changes: 12 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,20 @@
# v1.0.1.9001
# v1.0.2

**_Current development version on Github_**

* `encoding` argument added to `TextGrid()` constructor in order to handle
non-ASCII characters in .TextGrid files, which is useful for IPA symbols and
symbols from the world's languages. By default (`encoding = NULL`), the
encoding of file is guessed using `readr::guess_encoding`; however, this
can be overridden by identifying the encoding explicitly (e.g.,
`encoding = "UTF-8"` or `encoding "UTF-16BE"`).


# v1.0.1.9001

* `writeTextGrid` function for `TextGrid`s.


# v1.0.1.9000

* `length` methods for `IntervalTier`s and `PointTier`s.
Expand Down
17 changes: 13 additions & 4 deletions R/TextGrid-constructor.R
Original file line number Diff line number Diff line change
Expand Up @@ -9,18 +9,25 @@ NULL
#'
#' @param textGrid A character vector
#' @param ... optional arguments for multiple dispatch (in development).
#' @param encoding The character encoding of the .TextGrid file. If \code{NULL},
#' then the encoding of the file is guessed using \code{\link[readr]{guess_encoding}}.
#' Plausible encodings that might be used by Praat are \code{"ASCII"},
#' \code{"UTF-8"}, or \code{"UTF-16BE"} (if non-ASCII characters occur in
#' the TextGrid).
#'
#' @return A \code{\link[=TextGrid-class]{TextGrid}} object.
#'
#' @section Details for signature \code{c(textGrid = 'character')}:
#' If \code{textGrid} is a string (i.e., a character vector with
#' \code{length(textGrid)=1}), then it is assumed that the \code{textGrid}
#' argument is the path to a \code{.TextGrid} file. Otherwise, the
#' \code{textGrid} argument is assumed to be a character vector whose
#' elements are the lines of some \code{.TextGrid} file.
#'
#' @name TextGrid-constructor
#' @aliases TextGrid
#' @seealso \code{\link{TextGrid-class}}, \code{\link{TextGrid-accessors}}
#' @export
#' @importFrom methods setGeneric
setGeneric(
name = 'TextGrid',
def = function(textGrid, ...)
Expand All @@ -29,13 +36,15 @@ setGeneric(

#' @rdname TextGrid-constructor
#' @export
#' @importFrom methods setMethod new
setMethod(
f = 'TextGrid',
sig = c(textGrid = 'character'),
def = function(textGrid) {
def = function(textGrid, encoding = NULL) {
if (length(textGrid) == 1) {
.textgrid <- .ReadPraatFile(textGrid)
if (is.null(encoding)) {
encoding <- readr::guess_encoding(textGrid)[[1, "encoding"]]
}
.textgrid <- .ReadPraatFile(textGrid, encoding)
} else {
.textgrid <- textGrid
}
Expand Down
7 changes: 5 additions & 2 deletions R/TextGrid-utilities.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,15 @@ NULL

# Read the contents of @file, returning a character vector whose elements are
# the lines of @file.
.ReadPraatFile <- function(file) {
.praat_text <- readLines(file)
.ReadPraatFile <- function(file, encoding) {
.con <- file(file, open = "rt", encoding = encoding)
.praat_text <- readLines(.con)
close(.con)
return(.praat_text)
}



# For extracting the start time and end time from the header.
.TextGridTime <- function(praatText, pattern) {
.time <- .Extract(.PraatLines(praatText, pattern),
Expand Down
3 changes: 3 additions & 0 deletions R/textgRid.R
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,9 @@
#' S4 classes, generics, and methods for accessing information that is stored
#' in Praat TextGrid objects.
#'
#' @import methods
#' @import readr
#'
#' @section S4 classes:
#' \code{\link[=Tier-class]{Tier}},
#' \code{\link[=IntervalTier-class]{IntervalTier}},
Expand Down
26 changes: 26 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,32 @@ as.data.frame(textgrid)
writeTextGrid(textgrid, path = 'test_out.TextGrid')
```

#### Read a TextGrid that contains non-ASCII characters.
```r
# Guess the encoding.
nonASCII <- TextGrid(system.file('extdata', 'nonASCII.TextGrid', package = 'textgRid'),
encoding = NULL)

# Or, explicitly provide the (correct) encoding.
nonASCII <- TextGrid(system.file('extdata', 'nonASCII.TextGrid', package = 'textgRid'),
encoding = "UTF-16BE")

# An error occurs if the provided encoding is incorrect.
TextGrid(system.file('extdata', 'nonASCII.TextGrid', package = 'textgRid'),
encoding = "UTF-8")

# Coerce the TextGrid to a data.frame.
as.data.frame(nonASCII)[1:2, ]
# TierNumber TierName TierType Index StartTime EndTime Label
# 1 1 Bengali IntervalTier 1 0 1 চকলেট এবং চিনাবাদাম মাখন
# 2 2 Chinese IntervalTier 1 0 1 巧克力和花生醬

# Non-ASCII characters can be used as patterns in searches.
findIntervals(nonASCII$Bengali, pattern = "চকলেট")
# Index StartTime EndTime Label
# 1 1 0 1 চকলেট এবং চিনাবাদাম মাখন
```

## Details on S4 classes

The textgRid package defines four S4 classes, whose slots and accessors are
Expand Down
Binary file added inst/extdata/nonASCII.TextGrid
Binary file not shown.
8 changes: 7 additions & 1 deletion man/TextGrid-constructor.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

16 changes: 8 additions & 8 deletions tests/testthat/test-TextGrid.R
Original file line number Diff line number Diff line change
Expand Up @@ -38,18 +38,18 @@ test_that('TextGrid accessors return slot-values', {
object = textGridStartTime(TextGrid(.textgrid_file)),
expected = TextGrid(.textgrid_file)@startTime
)
expect_equal(
object = textGridStartTime(TextGrid(readLines(.textgrid_file))),
expected = TextGrid(readLines(.textgrid_file))@startTime
)
# expect_equal(
# object = textGridStartTime(TextGrid(readLines(.textgrid_file))),
# expected = TextGrid(readLines(.textgrid_file))@startTime
# )
expect_equal(
object = textGridEndTime(TextGrid(.textgrid_file)),
expected = TextGrid(.textgrid_file)@endTime
)
expect_equal(
object = textGridEndTime(TextGrid(readLines(.textgrid_file))),
expected = TextGrid(readLines(.textgrid_file))@endTime
)
# expect_equal(
# object = textGridEndTime(TextGrid(readLines(.textgrid_file))),
# expected = TextGrid(readLines(.textgrid_file))@endTime
# )
})


Expand Down

0 comments on commit 790972e

Please sign in to comment.