SUD_Naija-NSC

Summary

A Surface Syntactic Universal Dependencies corpus for spoken Naija (Nigerian Pidgin).

Introduction

The corpus is based on dialogues and monologues and comprises 9,242 sentences and 140,729 tokens.

Sentences are annotated with the following metadata :

sent_id (which also indicates the sample file)
text
text_en (English translation)
text_ortho (A simplified version of text where macrosyntactic annotation has been replaced by standard punctuation)
speaker_id (from the NaijaSynCor Metadata)
sound_url (links to the corresponding sound file, AlignBegin and AlignEnd features give the miliseconds that allow for a positioning in the soundfile)

Structure

The text has been transcribed mostly following English spelling conventions for lexical words. Grammatical words have been transcribed following consensual conventions elaborated by the annotators.

The text is segmented into illocutionary units. The end of illocutionary units is indicated by a double slash (//). The sentence nucleus containing the predicate is separated from dislocated units by "lesser than" signs (<) from left-dislocated elements, and by "greater than" signs (>) from right-dislocated units. Paradigmatic lists (coordinations, appositions, and disfluencies) are marked with curly breackets, each conjunct being separated by the pipe symbol (|). Further details can be found on the "Macrosyntactic annotation guide".

The treebank is developed in SUD (https://surfacesyntacticud.github.io/) and is converted automatically into SUD_Naija-NSC.

Acknowledgments

The treebank was created within the NaijaSynCor project, directed by Bernard Caron and funded by the ANR, the French National Research Agency.

This corpus is a pilot for the larger corpus elaborated as part of the NaijaSynCor Project (Projet-ANR-16-CE27-0007). Its main aim is to elaborate and test the annotation and procedures that are used in the ANR-project. It will be part of a larger 500kW corpus that will be projected on prosodic and information structures and analysed for sociolinguistics variation (http://naijasyncor.huma-num.fr/).

The pilot corpus was recorded in various locations in Ibadan (Nigeria) by Bukola Babalola and Opeyemi Lewis. It was transcribed, translated and tagged manually using Elan-Corpa (http://llacan.vjf.cnrs.fr/res_ELAN-CorpA_en.php) by Folakemi Ladoja, Emeka Onwuegbuzia, Biola Oyelere and Samson Tella under the supervision of Bernard Caron. It was converted to CONLL by Mourad Aouini. First annotations were done by Marine Courtin and Sandra Bellato, who developed the guidelines under the supervision of Sylvain Kahane, Bernard Caron, and Kim Gerdes.The final Universal dependencies annotations have been manually checked by Chika Kennedy Ajede, Emeka Onwuegbuzia, and Samson Tella under the supervision of Bernard Caron using the processing chain developed by Kim Gerdes and Bruno Guillaume, based on the Arborator (https://arborator.ilpga.fr) and Grew (http://grew.fr). Marine Courtin, Kim Gerdes, Bruno Guillaume, Kirian Guillier, Sylvain Kahane, Mariam Nakhlé, Yuchen Song, Emmett Strickland, Manying Zhang have helped in the correction process.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SUD_Naija-NSC

Summary

Introduction

Structure

Acknowledgments

About

Releases

Packages

Contributors 8

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 275 Commits
non_gold		non_gold
not-to-release		not-to-release
.DS_Store		.DS_Store
ABJ_GWA_02_Market-Food-Church_DG.conllu		ABJ_GWA_02_Market-Food-Church_DG.conllu
ABJ_GWA_03_Cost-Of-Living-In-Abuja_MG.conllu		ABJ_GWA_03_Cost-Of-Living-In-Abuja_MG.conllu
ABJ_GWA_06_Ugo-Lifestory_MG.conllu		ABJ_GWA_06_Ugo-Lifestory_MG.conllu
ABJ_GWA_08_David-Lifestory_MG.conllu		ABJ_GWA_08_David-Lifestory_MG.conllu
ABJ_GWA_09_Journalism_MG.conllu		ABJ_GWA_09_Journalism_MG.conllu
ABJ_GWA_10_Steven-Lifestory_MG.conllu		ABJ_GWA_10_Steven-Lifestory_MG.conllu
ABJ_GWA_12_Accident_MG.conllu		ABJ_GWA_12_Accident_MG.conllu
ABJ_GWA_14_Mary-Lifestory_MG.conllu		ABJ_GWA_14_Mary-Lifestory_MG.conllu
ABJ_INF_08_Impatience_DG.conllu		ABJ_INF_08_Impatience_DG.conllu
ABJ_INF_10_Women-Battering_MG.conllu		ABJ_INF_10_Women-Battering_MG.conllu
ABJ_INF_12_Evictions_MG.conllu		ABJ_INF_12_Evictions_MG.conllu
ABJ_NOU_02_Gimba-Lifestory_MG.conllu		ABJ_NOU_02_Gimba-Lifestory_MG.conllu
BEN_02_Andrew-Lifestory_MG.conllu		BEN_02_Andrew-Lifestory_MG.conllu
BEN_08_Egusi-And-Banga-Soup_MG.conllu		BEN_08_Egusi-And-Banga-Soup_MG.conllu
BEN_09_Tailoring-Immunization_MG.conllu		BEN_09_Tailoring-Immunization_MG.conllu
BEN_14_BronzeFM-News_MG.conllu		BEN_14_BronzeFM-News_MG.conllu
BEN_34_Tale_MG.conllu		BEN_34_Tale_MG.conllu
BEN_36_Clever-Girl_MG.conllu		BEN_36_Clever-Girl_MG.conllu
ENU_01_Salomis-Egusi-Soup-Recipe_MG.conllu		ENU_01_Salomis-Egusi-Soup-Recipe_MG.conllu
ENU_02_Christmas-At-New-Berries_MG.conllu		ENU_02_Christmas-At-New-Berries_MG.conllu
ENU_09_Angry-Neighbours_MG.conllu		ENU_09_Angry-Neighbours_MG.conllu
ENU_13_School-Life_DG.conllu		ENU_13_School-Life_DG.conllu
ENU_17_Buying-Grocery_DG.conllu		ENU_17_Buying-Grocery_DG.conllu
ENU_22_Barman-Interview_MG.conllu		ENU_22_Barman-Interview_MG.conllu
ENU_33_A-Beg_MG.conllu		ENU_33_A-Beg_MG.conllu
ENU_34_Malaysia-Guy_MG.conllu		ENU_34_Malaysia-Guy_MG.conllu
ENU_37_Dmoris-Restaurant_MG.conllu		ENU_37_Dmoris-Restaurant_MG.conllu
IBA_01_Fola-Lifestory_MG.conllu		IBA_01_Fola-Lifestory_MG.conllu
IBA_02_Igwe-Festival_MG.conllu		IBA_02_Igwe-Festival_MG.conllu
IBA_03_Womanisers_MG.conllu		IBA_03_Womanisers_MG.conllu
IBA_04_Alaska-Pepe_MG.conllu		IBA_04_Alaska-Pepe_MG.conllu
IBA_07_Na-Love_DG.conllu		IBA_07_Na-Love_DG.conllu
IBA_15_Electrician_MG.conllu		IBA_15_Electrician_MG.conllu
IBA_20_Bose-Alade_MG.conllu		IBA_20_Bose-Alade_MG.conllu
IBA_21_Obodo-Barracks_MG.conllu		IBA_21_Obodo-Barracks_MG.conllu
IBA_23_Bitter-Leaf-Soup_MG.conllu		IBA_23_Bitter-Leaf-Soup_MG.conllu
IBA_31_Lens-Sermon_MG.conllu		IBA_31_Lens-Sermon_MG.conllu
IBA_32_Tori-By-Samuel_MG.conllu		IBA_32_Tori-By-Samuel_MG.conllu
IBA_33_News-Comments_MG.conllu		IBA_33_News-Comments_MG.conllu
IBA_34_News-Report-By-Samuel_MG.conllu		IBA_34_News-Report-By-Samuel_MG.conllu
IBA_40_Christ-Passion-Prologue_MG.conllu		IBA_40_Christ-Passion-Prologue_MG.conllu
IBA_41_Christ-Passion-Finale_MG.conllu		IBA_41_Christ-Passion-Finale_MG.conllu
JOS_01_People-Of-Plateau_MG.conllu		JOS_01_People-Of-Plateau_MG.conllu
JOS_10_Mothers-Against-Mini-Skirts_DG.conllu		JOS_10_Mothers-Against-Mini-Skirts_DG.conllu
JOS_12_How-To-Prepare-Gote-Soup_MG.conllu		JOS_12_How-To-Prepare-Gote-Soup_MG.conllu
JOS_14_Chibozor-View-About-Nigeria_MG.conllu		JOS_14_Chibozor-View-About-Nigeria_MG.conllu
JOS_19_Bukuru_MG.conllu		JOS_19_Bukuru_MG.conllu
JOS_20_Beauty-Of-Jos_MG.conllu		JOS_20_Beauty-Of-Jos_MG.conllu
JOS_21_Marriage-Talk-With-Oscar-1_DG.conllu		JOS_21_Marriage-Talk-With-Oscar-1_DG.conllu
KAD_03_Why-Men-Watch-Football_MG.conllu		KAD_03_Why-Men-Watch-Football_MG.conllu
KAD_09_Kabir-Gymnasium_MG.conllu		KAD_09_Kabir-Gymnasium_MG.conllu
KAD_10_Egusi-Soup_MG.conllu		KAD_10_Egusi-Soup_MG.conllu
KAD_12_Mechanic-At-Work_MG.conllu		KAD_12_Mechanic-At-Work_MG.conllu
KAD_13_Entering-University_MG.conllu		KAD_13_Entering-University_MG.conllu
KAD_15_Money-Wahala_MG.conllu		KAD_15_Money-Wahala_MG.conllu
KAD_17_Turkeys_MG.conllu		KAD_17_Turkeys_MG.conllu
KAD_22_Chatting-At-The-Restaurant_DG.conllu		KAD_22_Chatting-At-The-Restaurant_DG.conllu
LAG_07_Johns-Biography_MG.conllu		LAG_07_Johns-Biography_MG.conllu
LAG_11_Adeniyi-Lifestory_MG.conllu		LAG_11_Adeniyi-Lifestory_MG.conllu
LAG_12_Insurance_MG.conllu		LAG_12_Insurance_MG.conllu
LAG_21_I-Like-Stout_MG.conllu		LAG_21_I-Like-Stout_MG.conllu
LAG_27_Shawarma_MG.conllu		LAG_27_Shawarma_MG.conllu
LAG_31_Road-Safety_MG.conllu		LAG_31_Road-Safety_MG.conllu
LAG_37_Soap-Making_MG.conllu		LAG_37_Soap-Making_MG.conllu
LICENSE.txt		LICENSE.txt
ONI_07_Dis-Year-Na-My-Year_MG.conllu		ONI_07_Dis-Year-Na-My-Year_MG.conllu
ONI_10_Sport-Commentary_MG.conllu		ONI_10_Sport-Commentary_MG.conllu
ONI_26_News-Highlights_MG.conllu		ONI_26_News-Highlights_MG.conllu
ONI_27_A-Hotelier-Interview_MG.conllu		ONI_27_A-Hotelier-Interview_MG.conllu
PRT_01_Banga-Soup_MG.conllu		PRT_01_Banga-Soup_MG.conllu
PRT_02_Food-And-Health_MG.conllu		PRT_02_Food-And-Health_MG.conllu
PRT_05_Ghetto-Life_MG.conllu		PRT_05_Ghetto-Life_MG.conllu
PRT_07_Drummer_MG.conllu		PRT_07_Drummer_MG.conllu
PRT_11_A-Man-Named-Jesus_MG.conllu		PRT_11_A-Man-Named-Jesus_MG.conllu
README.md		README.md
WAZA_01_Triplea-Sports_MG.conllu		WAZA_01_Triplea-Sports_MG.conllu
WAZA_03_Obi-Lifestory_MG.conllu		WAZA_03_Obi-Lifestory_MG.conllu
WAZA_05_Big-Mo_MG.conllu		WAZA_05_Big-Mo_MG.conllu
WAZA_08_Body-Matter_MG.conllu		WAZA_08_Body-Matter_MG.conllu
WAZA_09_Tv-News_MG.conllu		WAZA_09_Tv-News_MG.conllu
WAZA_10_Bluetooth-Lifestory_MG.conllu		WAZA_10_Bluetooth-Lifestory_MG.conllu
WAZK_07_As-E-Dey-Hot-News-Read_MG.conllu		WAZK_07_As-E-Dey-Hot-News-Read_MG.conllu
WAZK_08_Fuel-Price-Increase_MG.conllu		WAZK_08_Fuel-Price-Increase_MG.conllu
WAZL_02_Good-Morning-Nigeria_MG.conllu		WAZL_02_Good-Morning-Nigeria_MG.conllu
WAZL_03_News-On-Gmns_MG.conllu		WAZL_03_News-On-Gmns_MG.conllu
WAZL_08_Edewor-Lifestory_MG.conllu		WAZL_08_Edewor-Lifestory_MG.conllu
WAZL_15_MC-Abi_MG.conllu		WAZL_15_MC-Abi_MG.conllu
WAZP_03_Education_MG.conllu		WAZP_03_Education_MG.conllu
WAZP_04_Ponzi-Scheme_MG.conllu		WAZP_04_Ponzi-Scheme_MG.conllu
WAZP_07_Imonirhua-Lifestory_MG.conllu		WAZP_07_Imonirhua-Lifestory_MG.conllu

License

surfacesyntacticud/SUD_Naija-NSC

Folders and files

Latest commit

History

Repository files navigation

SUD_Naija-NSC

Summary

Introduction

Structure

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 8

Languages

Packages