NURC/SP

The NURC Project that started in 1969 to study the cultured linguistic urban norm spoken in five Brazilian capitals, was responsible for compiling a large corpus for each capital. The digitized NURC/SP comprises 375 inquiries in 334 hours of recordings taken in Sao Paulo capital.

The purpose of this repository is to provide resources for NURC/SP analysis and research projects.

Research

We evaluated the a Minimum Corpus (MC) with 21 inquiries of the NURC/SP in the Bringing NURC/SP to Digital Life: the Role of Open-source Automatic Speech Recognition Model paper. For further details, see the cm_analysis directory.

NURC/SP Minimum Corpus (MC)

CORAA NURC-SP Minimal Corpus is a manually annotated corpus of Brazilian Portuguese spontaneous speech (São Paulo variety). The corpus is a subset of NURC (‘Cultured Linguistic Urban Norm’) project collection, one of the most influential in Brazilian Linguistics. The corpus was brought to digital life by TaRSiLa, a project aiming to build large multi-purpose datasets for speech processing (ASR, TTS, and Sentiment Analysis). It comprises 21 audio files and audio-aligned multilevel transcripts according to linguistically motivated intonation units. For further details, see the dataset website.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
cm_analysis		cm_analysis
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NURC/SP

Research

NURC/SP Minimum Corpus (MC)

About

Releases

Packages

Languages

License

nilc-nlp/nurc-sp

Folders and files

Latest commit

History

Repository files navigation

NURC/SP

Research

NURC/SP Minimum Corpus (MC)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages