Skip to content

This repository contains a dataset of speech and text features extracted from the International Corpus Network of Asian Learners of English (ICNALE)

Notifications You must be signed in to change notification settings

senisioi/speech-text-features

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 

Repository files navigation

Comparing Speech and Text Classification of Native and Non-native English Speakers

This repository contains a dataset of speech and text features extracted from the International Corpus Network of Asian Learners of English (ICNALE).

Description:

  • the data is structured in libsvm format with files corresponding to each pair of native languages
  • speech.ftrs and function_words.ftrs contain the features and the corresponding integer used in the libsvm representation
  • the first line in each pair file is a libsvm comment of the form: 1 1:2 2:1 # , LANG_1, LANG_2 that indicates the labels of the two native languages used
Speech:
  • the speech files are split into 2 seconds chunks
  • each class (native language) is represented by an equal number of chunks randomly sampled
Text:
  • the text files are short ~110 words/file and used as they are
  • except for lowercasing and tokenization, no preprocessing was done on these files

For more details about this particular dataset, mailto:sergiu nisioi gmail dot com

About

This repository contains a dataset of speech and text features extracted from the International Corpus Network of Asian Learners of English (ICNALE)

Resources

Stars

Watchers

Forks

Packages

No packages published