Skip to content

nilc-nlp/nurc-sp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

NURC/SP

The NURC Project that started in 1969 to study the cultured linguistic urban norm spoken in five Brazilian capitals, was responsible for compiling a large corpus for each capital. The digitized NURC/SP comprises 375 inquiries in 334 hours of recordings taken in Sao Paulo capital.

The purpose of this repository is to provide resources for NURC/SP analysis and research projects.

Research

We evaluated the a Minimum Corpus (MC) with 21 inquiries of the NURC/SP in the Bringing NURC/SP to Digital Life: the Role of Open-source Automatic Speech Recognition Model paper. For further details, see the cm_analysis directory.

NURC/SP Minimum Corpus (MC)

CORAA NURC-SP Minimal Corpus is a manually annotated corpus of Brazilian Portuguese spontaneous speech (São Paulo variety). The corpus is a subset of NURC (‘Cultured Linguistic Urban Norm’) project collection, one of the most influential in Brazilian Linguistics. The corpus was brought to digital life by TaRSiLa, a project aiming to build large multi-purpose datasets for speech processing (ASR, TTS, and Sentiment Analysis). It comprises 21 audio files and audio-aligned multilevel transcripts according to linguistically motivated intonation units. For further details, see the dataset website.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published