Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 

README.md

Comparing Speech and Text Classification of Native and Non-native English Speakers

This repository contains a dataset of speech and text features extracted from the International Corpus Network of Asian Learners of English (ICNALE).

Description:

  • the data is structured in libsvm format with files corresponding to each pair of native languages
  • speech.ftrs and function_words.ftrs contain the features and the corresponding integer used in the libsvm representation
  • the first line in each pair file is a libsvm comment of the form: 1 1:2 2:1 # , LANG_1, LANG_2 that indicates the labels of the two native languages used
Speech:
  • the speech files are split into 2 seconds chunks
  • each class (native language) is represented by an equal number of chunks randomly sampled
Text:
  • the text files are short ~110 words/file and used as they are
  • except for lowercasing and tokenization, no preprocessing was done on these files

For more details about this particular dataset, mailto:sergiu nisioi gmail dot com

About

This repository contains a dataset of speech and text features extracted from the International Corpus Network of Asian Learners of English (ICNALE)

Resources

Packages

No packages published
You can’t perform that action at this time.