2395383

<h1><i>'Spikk Looder'</i>: Reviving the Shetland Dialect with AI</h1>

<h2>Concept</h2>

<p>‘SpikkBot’ is an AI-powered chatbot designed to promote and preserve Shetland dialect through engaging, conversational interactions. It uses a curated dataset of Shetland speech and phrases to respond to user input in a natural, friendly way. Inspired by the value of local heritage and the importance of accessible cultural preservation, ‘SpikkBot’ aims to engage users of all ages in learning and using the dialect confidently and authentically. 'Spikk' translates to 'Speak'.

The data is to be sourced from a mixture of places: dialect speaking locals, ‘experts’ in Shetland dialect heritage e.g. through pronunciation, idioms, and spellings, along with the data that is currently available from the online Shetland Dictionary, the Shetland Museum and Archives, and other research papers. This ensures that ‘Spikkbot’ reflects the richness and diversity of Shetland’s oral and written history. Unlike general purpose chatbots, ‘Spikkbot’ is trained to understand and generate dialect specific grammar, idioms, speech, and pronunciation, creating an educational and engaging experience for the user.

Developing the bot, a human-centred design (HCD) approach was at the core of the design process. The project began by identifying a meaningful gap, which was the inaccessibility or inactivity of Shetland dialect in everyday life. In line with step one of HDC, the project is not to just create a novelly piece of technology but to develop a tool that is shaped by and is for the people it serves. Additionally, the tool aims to work in alignment with the Scottish AI Strategy 2021.
</p>

<h2>User Flow Example</h2>

<p><b>Scenario:</b> A young Shetlander wants to learn a phrase that their deceased grandparent used regularly.

<b>User types:</b> “What does ‘an den dae made tae’ mean?”

<b>Spikkbot responds:</b> “It means ‘and then they made tea’. It is an anecdotal term that is often used to finish a story, to poke fun at the number of times Shetlanders drink tea, which is before, during and after everything. Would you like to hear a pronunciation?”

The user clicks ‘yes’, and the audio plays in authentic dialect voice. Spikkbot then offers to them practice the phrase. The user agrees, repeats the phrase out loud, and receives tailored feedback on their pronunciation.
</p>


<h2>Human Role</h2>

<p>Human involvement is key for the development of AI bots and is particularly critical for Spikkbot. The project is not just about encoding dialect but about combining human judgement and voice into the system. As this project is all about empowering and reviving an endangered dialect, it is essential to involve the community within its development. Therefore, locals and experts will be ultised in the creation of the datatset used to train the model.

 Taking from the HCD framework, the process emphasises:

<ul>
<li> Collaboration with diverse community members, as Shetland dialect can vary from place to place it is important to include varying fluency to inform the bot.</li>
<li> Reflections on whether AI was the right solution for this issue. Other non-AI options could be considered, but a generative model offers an engaging and interactive use experience which aligns with goals of the project.</li>
</ul>

Throughout the development of the project, human oversight will be essential. The dataset is not only reviewed by humans for accuracy and tone but is also open to public feedback. Users can submit issues, suggested changes, or any challenges they faced while using the bot. This follows step five of HCD, which emphasises the importance of user agency in correcting or questioning AI systems.
</p>

<h2>AI Components</h2>

<p>At the core of the technology is the Large Language Model (LLM). This is trained on Shetland dialect texts and audios and is aligned with standard spoken English. The aim is ultimately to understand, interpret, and generate Shetland dialect. This involves careful design of the AI components working together to create natural interactions between the technology and the user.

The conversational aspect of the LLM will respond in dialect and will outline pronunciations of phrases and words. This is in comparison to a generic chatbot like GPT-4, where the model may not be fine-tuned to specific dialects. This function will allow ‘Spikkbott’ to understand dialect specific grammar and vocabulary, ensuring that responses are authentic through the integration of locals in training.

The speech recognition component of the model is proposed to be built on OpenAI’s ‘Whisper’. Whisper is an automatic speech recognition system (ASR) trained on multilingual datasets to then help with recognising and translating languages. The framework can be ran using Python, by setting up the environment with the pip function like the following: </p>


In [1]:
!pip install git+https://github.com/openai/whisper.git 
!pip install torch


Collecting git+https://github.com/openai/whisper.git
  Cloning https://github.com/openai/whisper.git to /tmp/pip-req-build-h2n62urk
  Running command git clone --filter=blob:none --quiet https://github.com/openai/whisper.git /tmp/pip-req-build-h2n62urk
  Resolved https://github.com/openai/whisper.git to commit 517a43ecd132a2089d85f4ebc044728a71d49f6e
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
[?25hCollecting more-itertools (from openai-whisper==20240930)
  Downloading more_itertools-10.6.0-py3-none-any.whl.metadata (37 kB)
Collecting numba (from openai-whisper==20240930)
  Downloading numba-0.61.2-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (2.8 kB)
Collecting tiktoken (from openai-whisper==20240930)
  Downloading tiktoken-0.9.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)
Collecting tqdm (from openai-whisper=

<p>‘Spikkbot’ will also feature a Shetland accent voice, based on text-to-speech (TTS) technology. This may utilise the framework from Tacotron 2, a two-part neutral network that converts written speech into audio speech. This will allow Spikkbot to speak in authentic dialect, as it is trained based on high quality audio along with exact transcriptions. Tacotron 2 can be input into Python, by setting up the environment with the pip function like the following: </p>

In [None]:
!pip install numpy scipy librosa unidecode inflect librosa
apt-get update
apt-get install -y libsndfile1


<p>‘Spikkbot’ will then require integrating a dialogue management system to ensure consistent back and forth interactions between the user and the chatbot. It will help remember past questions and ensure conversation flows naturally and meaningful. For example, if a user mispronounces a word, the system might offer a second attempt or break the word down for them.

Finally, ‘Spikkbot’ is required to have a learning tracker to support learning over time, and to create personalised learning plans. This aligns with the overall aim of the project, as it mainly about engagement and encouraging the use of AI applications in language preservation

<b>User Input (Speech or Text) – ASR (if speech) - LLM – TTS (if audio) – Feedback Engine</b>

To evaluate the accuracy of SpikkBot’s speech recognition component, Word Error Rate (WER) can be utilised. WER measures how closely the transcribed text matches a known reference transcript by calculating the number of substitutions, insertions, and deletions required to transform the model’s output into the correct text. This metric is valuable in a project like SpikkBot because dialects often contain unfamiliar words and pronounciations that can trip up generic models. By calculating the WER on a curated dataset of Shetland dialect recordings with accurate transcriptions, the machine can track improvements as the model is continuously fine-tuned. The machine aims to get a lower WER as that suggests better transcription performance, and helpts to identify which dialect words are commonly incorrect, which will help to guide future model training.


In [5]:
!pip install jiwer


Collecting jiwer
  Downloading jiwer-3.1.0-py3-none-any.whl.metadata (2.6 kB)
Collecting click>=8.1.8 (from jiwer)
  Downloading click-8.1.8-py3-none-any.whl.metadata (2.3 kB)
Collecting rapidfuzz>=3.9.7 (from jiwer)
  Downloading rapidfuzz-3.13.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Downloading jiwer-3.1.0-py3-none-any.whl (22 kB)
Downloading click-8.1.8-py3-none-any.whl (98 kB)
Downloading rapidfuzz-3.13.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.1/3.1 MB[0m [31m73.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: rapidfuzz, click, jiwer
Successfully installed click-8.1.8 jiwer-3.1.0 rapidfuzz-3.13.0


In [6]:
from jiwer import wer

# Reference (correct) transcription
reference = "I am going to the shop to get some bannocks"

# Hypothesis (transcription generated by your model)
hypothesis = "I am going to shop to get some bannock"

# Calculate WER
error = wer(reference, hypothesis)

print(f"Word Error Rate: {error:.2%}")


Word Error Rate: 20.00%


The above means my model is 80% accurate, and 20% of the words are incorrect. This might suggest we want to add more Shetland words to the dataset to ensure the model picks them up with accuracy.

<H2>Value</h2>

<p>The introduction of this piece of technology aligns with the goals of Scotland’s AI strategy (2021). The design and purposes of ‘Spikkbot’ strongly align with the following principles:

<ul>

<li><b> Community and Individual Benefits:</b>
‘Spikkbot’ supports this by primarily by empowering Shetland communities to preserve and promote their endangered dialect. Communities and individuals will reap the benefits from a technology that publicises and encourages the use of it for people within and out with the islands.</li>
<li><b> Building Trust and Transparency with AI:</b>
‘Spikkbot’ promotes trust and transparency of AI technologies by incorporating human input. As locals are encouraged to contribute to the training of the LLM, it will help to ensure accuracy of the bot.</li>
<li><b> Inclusive Digital Growth:</b>
The platform aims to support inclusive digital growth by providing tools that are particularly use for rural users in Shetland, along with setting an example for other rural communities. Due to the aspect of voice-based interactions, it is accessible for users with varying levels of digital literacy. Similarly, the typing aspect also helps those with speech difficulties. The application will encourage intergenerational learning and participation, which is important as the dialect is declining in younger populations.</li>
<li><b> Community-Led Initiatives:</b>
This application aims to create a place-based digital innovation which is grounded in local cultural and heritage. Additionally, it might be the first introduction of AI to small communities, meaning it could be a key educational tool.</li>
<li><b> Educational Benefit</b>
‘Spikkbot’ aims to educate people about the Shetland dialect, and to create an interactive and engaging experience that will encourage people to use it in their daily lives, whether that be through daily speech or through phrases and idioms.</li>

</ul>

</p>


<h2>Ethics</h2>

<p>Addressing the ethics of creating a project like ‘Spikkbot’, the HCD framework was referenced. In line with steps three and six, potential harms of the project include perpetuating dialect stereotypes, possible bias, and a lack of human oversight. 

This project aims to overlook these by including humans at every stage in the development process – through incorporating locals and experts during dataset creation and training, to feedback from the result. Additionally, ‘Spikkbot’ will be rigorously tested with various Shetland voices and dialects, including those of different ages and genders. It is key to not misrepresent the diversity of dialect.

Additionally, the project has been designed with a sustainable future in mind. It is important that Shetland dialect is preserved for future generations, and therefore ‘Spikkbot’ has a long-term need. It will be kept updated based on user feedback and the evolution of language.
</p>