<div align="center"><h1>Live Slide Creation for the Perfect Lecture</h1></div><br>


<table style="margin: 0 auto; width: 50%;">
<tr>
    <td style="text-align: center">Diego van der Mast</td>
    <td style="text-align: center">Marvin Ong</td>
    <td style="text-align: center">Yochem van Rosmalen</td>
</tr>
<tr>
    <td style="text-align: center">12222933</td>
    <td style="text-align: center">12096431</td>
    <td style="text-align: center">12223247</td>
</tr>
<tr>
    <td style="text-align: center"><a href="mailto:diego.vandermast@student.uva.nl">diego.vandermast@student.uva.nl</a></td>
    <td style="text-align: center"><a href="mailto:marvin.ong@student.uva.nl">marvin.ong@student.uva.nl</a></td>
    <td style="text-align: center"><a href="mailto:yochem.vanrosmalen@student.uva.nl">yochem.vanrosmalen@student.uva.nl</a></td>
</tr>
</table>

## Abstract

Creating presentation slides can be a time-consuming task. In this paper a
solution is proposed which combines several techniques of NLP, namely
speech-to-text, named entity recognition and noun phrase extraction, to create
slides with relevant text live. Pre-trained models were used for each step
in the process. 

_Keywords: NLP, NER, Speech-to-text, POS, noun phrase, slides, lectures_

## Contents
1. Introduction
2. Method
3. Results
4. Discussion and conclusion
5. References

## 1. Introduction

Everyone who works in academia knows that teaching a course is not always fun
and exciting. There are also that are quite boring, but these tasks still have
to be done. An example of such task is creating slides for the lectures. Even
though some see slides as an art form in which someone can excel, most
of us use slides to help students with taking notes. If the slides are
good, they can even be used as _the lecture notes_, but oftentimes these
slides contain either too much information ("We will skip this slide because
this is not a topic for this course") or too little information (think of
slides with just one keyword on them). These slides can not be used as lecture
notes. To overcome these issues, a solution is proposed which combines several Natural language processing (NLP) techniques together. NLP is a subfield within Artificial Intelligence which researches interactions between human language and computers. NLP
is mainly concerned with speech and text processing. Examples are Siri and
Google Assistant, but also chatbots are becoming more and more active among
firms to enhance customer experience. The main question of this report was how different NLP techniques can be used together to create slides live. The pipeline proposed in this report does live
slide creation using speech-to-text, named entity recognition (NER) and
noun-phrase extraction. This program can be split up in three steps, each of
them being a subquestion. Firstly, a speech-to-text model is used that
converts microphone input to text live. Secondly, the text is ran through the
NER model and named entities are used to extract information from the text.
Lastly, noun-phrase extraction is used to find noun-phrases in the text and
display them on the slides.

In Section 2, the implementation of these three models is described. In this
section the used data and models are also shown and explained. In Section 3
the results are shown. After this, in section 4, the results are discussed and
a conclusion is drawn from the results.

## 2. Method

In this section the method of creating the live slide builder is described. It
is split up over the parts data, model and own work. The whole pipeline is
covered in every subsection, so first all data use of the three steps is
covered, then all models, et cetera.

In [None]:
"""
WARNING: PyAudio needs to have portaudio header file on computer. This can be installed by following this guide:
medium.com/@niveditha.itengineer/learn-how-to-setup-portaudio-and-pyaudio-in-ubuntu-to-play-with-speech-recognition-8d2fff660e94
"""
import sys
import time

import collections
import noisereduce as nr
import numpy as np
import pandas as pd
import pyaudio
import spacy
import torch

from transformers import Wav2Vec2Tokenizer, Wav2Vec2ForCTC
from nltk.corpus import stopwords
from nltk.tokenize import sent_tokenize
from multiprocessing import Process


my_tokenizer = Wav2Vec2Tokenizer.from_pretrained("facebook/wav2vec2-large-960h-lv60-self")
my_model = Wav2Vec2ForCTC.from_pretrained("facebook/wav2vec2-large-960h-lv60-self")

nlp = spacy.load("en_core_web_sm")

ModuleNotFoundError: No module named 'noisereduce'

### Step 1: Speech To Text

For the first part in the pipeline, the pretrained speech-to-text 
model called wav2vec 2.0 from Baevski, Zhou, Mohamed & Auli (2020) is used. 
Because it is pretrained, there was no data used on our side. As explained in 
Baevski et al. (2020), the wav2vec2 model has been trained several times with different amounts of data, 
leading to different results. In a different paper by Xu et al. (2020), it is shown that the performance of the wav2vec2 model can be improved even further by using self-training. For this project, the large model trained on 960 hours of audio from libri-light and librispeech combined with self-training was chosen, as it performs the best out of the bunch. [As per the Huggingface page.](https://huggingface.co/facebook/wav2vec2-large-960h-lv60-self), it has a word error rate of around 1.9 and 3.9 on librispeech clean and other, respectively. The reason why a pretrained model was used is because we did not have the computational resources to train a model of our own. Secondly, wav2vec2.0 has state-of-the-art performance, which is hard to beat. Lastly, for this project it is important that the ASR model is very reliable because its output is the input for step 2 and 3 in the pipeline.

To have to model transcribe speech in real time, we had to build an interface to feed it parts of speech received through the microphone input. The interface collects the user's speech up until they take a break from speaking. Once this happens, the collected speech gets passed to the model which transcribes it. By waiting for the user to stop speaking, it is ensured that the voice input will not get cut up abruptly, which could lead to incorrect transcriptions. To reduce background noise, a denoiser is used before passing the collected speech audio to the ASR model. Below, a more technical explanation is provided.

The audio generated by the microphone is received as chunks with a duration that can be set as a parameter of the interface. Every chunk received gets collected in a buffer. It is kept there until it is passed to the model, after which the buffer is emptied. For each audio chunk, the volume gets measured using the root mean square (RMS) formula. If the volume of the most recently received chunk drops below a volume silence threshold, indicating that the user has stopped speaking, the buffer gets passed to the model. The silence threshold gets estimated using a calibration sequence. This sequence has the user speak for five seconds, after which it required the user to stay silent for another five seconds. By measuring the volume of these individual parts, a volume level can get estimated at which the user is not speaking. This level is subsequently used as the silence threshold.



In [None]:
class liveASR:
    """Transcribes live microphone audio to text with 🤗 model
    
    The audio can be transcribed live by gathering the audio when
    the user is speaking and monitoring the average volume.
    If the average volume drops below a silence threshold,
    the recorded audio gets passed to the ASR model, which transcribes it.
    After this, the process repeats until the user-set duration has passed.
    The silence threshold can be set manually or can be found using the
    designated calibration function.

    Args:
        rate (int): Sample rate
        chunk_length (int): The output of the pyaudio microphone stream input
                            Gets cut up in chunks. This variable dicates the duration
                            of those chunks in seconds.
        tokenizer: ASR tokenizer to be used 
        model: ASR prediction model to be used
        threshold: Amplitude based silence threshold. 
                   This threshold is used in the lASR function.
                   If this threshold is met, the recorded audio will get passed 
                   to the audio transcriber function.

    Methods:
        calibrate(): Calibration procedure for establishing silence threshold from microphone

        start_lASR(duration=0): Starts live ASR for `duration` minutes
                            duration=0 means indefinitely
    """

    def __init__(self, rate, chunk_length, tokenizer, model, threshold=0.2):
        self.rate = rate
        self.chunk = int(rate * chunk_length)
        self.buff = np.array([])
        self.tokenizer = tokenizer
        self.model = model
        self.threshold = threshold
        self.pyaudio = pyaudio.PyAudio()
        self.stream = self.pyaudio.open(format=pyaudio.paInt16, 
                                        channels=1, rate=self.rate, input=True,
                                        frames_per_buffer=self.chunk, start=False)

    def RMS(self, data):
        """Returns Root Mean Square (RMS) of data.

        The RMS functions as the mean volume of a piece of audio

        args:
            data (np.array): An 1D array containing float values in [0, 1.0]
    
        returns:
            RMS of data
        """
        return np.sqrt(np.mean(data ** 2))

    def aggregate_data(self, data):
        """Gathers data and sends it to transcriber after RMS threshold is broken
        
        Appends audio to `buff` array until the RMS drops below the set threshold,
        after which `buff` is passed to transcribe_audio to transcribe the audio,
        and `buff` gets cleared.

        args:
            data (np.array): An 1D array containing float values in [0, 1.0]
        """
        RMS = self.RMS(data)
        if RMS > self.threshold:
            self.buff = np.append(self.buff, data)

        elif len(self.buff) > 0:
            seconds = len(self.buff) / self.rate 
            print(f"Transcribing {seconds} seconds of data..")
            self.transcribe_audio(self.buff)
            self.buff = np.array([])

    def transcribe_audio(self, data):
        """Transcribes voice in float audio sample array to text in string format

        Following the standard procedure, the `data` array input
        gets transcribed to a string. The tokenizer and model passed in the 
        initialization of the class are used for this process.

        args:
            data (np.array): An 1D array containing float values in [0, 1.0]

        """
        input_values = self.tokenizer(data, return_tensors="pt", padding=True).input_values
        logits = self.model(input_values).logits
        predicted_ids = torch.argmax(logits, dim=-1)
        transcription = tokenizer.batch_decode(predicted_ids)
        sentence = transcription[0].lower()
        with open("transcript.txt", 'a') as f:
            f.write(sentence + '.\n')
        print([sentence + '.'])
            
    def start_lASR(self, duration):
        """Starts listening to input and transcribing to text for `duration` minutes

        args:
            duration (int/float): duration after which lASR will stop in minutes.

        """
        if duration == 0:
            indefinite = True
        else:
            indefinite = False

        time_end = time.time() + duration * 60
        print("Listening..")
        self.stream.start_stream()
        while time.time() < time_end or indefinite:
            # divide data by max value of np.int16 to get float in range [0, 1.0]
            data = np.frombuffer(self.stream.read(self.chunk), dtype=np.int16) / 32767
            self.aggregate_data(data)
        self.stop()

    def cali_listen(self):
        """Listens to microphone input for 5 seconds and returns resulting array"""
        data = np.array([])       
        self.stream.start_stream()

        # gathers data until length exceeds 5 times the sample rate (with chunk_length = 1, this is 5 seconds)
        while len(data) / self.rate < 5:
            data = np.append(data, np.frombuffer(self.stream.read(self.chunk), dtype=np.int16) / 32767)
            print(len(data) / self.rate)
        self.stream.stop_stream()
        return data

    def calibrate(self):
        """User-assisted microphone silence threshold calibration

        By having the user speak and be quiet for 5 seconds each, 
        a silence threshold can be calculated that is used in the main process.
        """
        print("Starting calibration. You will be asked to speak up and be quiet for 5 seconds each.")
        time.sleep(3)

        # aquiring average speaking level
        print("Start speaking continously until the the timer runs out.")
        time.sleep(1)
        data = self.cali_listen()
        talking_level = self.RMS(data)

        # aquiring average silence level
        print("Thanks! It's time to be quiet now. Again until the timer runs out.")
        time.sleep(2)
        print("Shh!!")
        data = self.cali_listen()
        silence_level = self.RMS(data)
        print("Excellent! The calibration is done.")

        # setting silence threshold
        self.threshold = (talking_level + silence_level) / 2

    def stop(self):
        """Stops and terminates microphone input and pyaudio interface, respectively"""
        self.stream.stop_stream()
        self.stream.close()
        self.pyaudio.terminate()
        print("Stopped listening.")

In [None]:
# first calibrate the mic
a = liveASR(16000, 1, my_tokenizer, my_model)
a.calibrate()

NameError: name 'my_tokenizer' is not defined

In [None]:
# listen for a minute
a.start_lASR(1)

### Step 2: Named Entity Recognition

There are many libraries which offer pre-trained NER models. For this project
DeepPavlov was chosen first, because the pre-trained model had an F1-score of
91.9 on the CoNLL dataset.  The model is size is ~360MB. However, the CoNLL 2003
dataset does not provide a lot of labels, only for persons, locations and
organizations. Furthermore, there were problems during the installation, which
we could not solve. Thus, another pre-trained model was chosen for this step.
The language model that is loaded in is called `en_core_web_sm`. This is part
of the Python package SpaCy (Honnibal, Montani, Van Landeghem & Boyd, 2020).
The library provides functionality for advanced NLP tasks.  The NER model is
trained on the Ontonotes 5 dataset, achieved an F1-score of 0.84 and its size
is ~13MB. It can classify entities with the following labels: `CARDINAL`, `DATE`, `EVENT`,
`FAC`, `GPE`, `LANGUAGE`, `LAW`, `LOC`, `MONEY`, `NORP`, `ORDINAL`, `ORG`,
`PERCENT`, `PERSON`, `PRODUCT`, `QUANTITY`, `TIME`, `WORK_OF_ART`. The
function `entities` extracts the entities with its corresponding label from the
sentence and returns a dictionary containing them. The keys are the named entities and the values are the corresponding label. The labeled entities are shown with the `ent.label_` attribute. For the evaluation of the model, consistency of entity labels was calculated. The dataset from [Kaggle](https://www.kaggle.com/abhinavwalia95/entity-annotated-corpus)
"Annotated Corpus for Named Entity Recognition", which contains 47958 sentences, was used for the evaluation.



In [None]:
def entities(sentence):
    """
    Return a dictionary of entities (keys) with its labels (values).

    Args:
        sentence (str): a sentence
    """
    ents = sentence.ents
    ents_dict = {}
    if not ents:
        return {}

    for ent in ents:
        ents_dict[ent.text] = ent.label_

    return ents_dict

### Step 3: Noun Phrase Extraction

The last part of the pipeline is the noun phrase extraction. This is also done
using the spaCy library. It uses the same training data as the NER model;
`en_core_web_sm`. First, the text input is split into sentences using the
`sent_tokenize` function from NTLK (Bird, Loper & Klein, 2009). After this, it
loops over every sentence, tags the sentence with word dependent tags. After
tagging, it gets the noun chunks, which is a built-in functionality of spaCy.
Finding noun chunks is done with _dependency parsing_, which assigns syntactic
dependency labels to tokens. These tokens can for example be `dobj` for
object, `nsubj` for the subject, et cetera.  It also uses the result from the
NER model in the slides. If a named entity is found, it is also shown on the
slide. For every three sentences, it creates a new slide with the noun phrases
as bullet points on the slide. The slides are a simple markdown file, where
slides are separated by `---` and supports [markdown
syntax] (https://www.markdownguide.org/basic-syntax). This markdown file is
then live reloaded by the [lookatme](https://github.com/d0c-s4vage/lookatme)
command line interface. So, if the text for the slides is written to a file
named `presentation.md`, then the presentation is started with the command
`lookatme --live presentation.md`.



In [None]:
def bulletize(transcript, filename):
    """Summarize as bullet points and print to filename.

    Args:
        transcript: The filename of the transcript text file created by ASR model
        filename: The filename of the presentation slides file (.md is preferred)
    """
    file = open(filename, "a")
    chunk_set = set()
    some_stopwords = stopwords.words("english")
        
    with open(f'{transcript}.txt', 'r') as f:
        sentences = sent_tokenize(f.read())

    for i, sent in enumerate(sentences):
        sentence = nlp(sent)
        noun_chunks = [chunk.text for chunk in sentence.noun_chunks]
        if noun_chunks:
            if i % 3 == 0:
                print("\n-------------\n", file=file)

            if len(entities(sentence)) > 2:
                print(f"- {sentence}", file=file)
                chunk_set |= set(noun_chunks)

            for chunk in noun_chunks:
                if chunk in chunk_set or chunk in some_stopwords:
                    continue
                else:
                    print(f"- {chunk.strip()}", file=file)
                    chunk_set.add(chunk)

    file.close()

### Final model

At last, all steps are combined to form the final model which can create live
slides. If named entities are found in a sentence, the whole sentence is
displayed on the slide, because then the sentence contains useful information,
such as names or numbers. Otherwise, the nouns are displayed on the slide. The model listens to the speaker while simultaneously create slides.
The
whole pipeline is shown in Figure 1.

<img src='docs/lsg.png'>
![Visual representation of pipeline](lsg.png)

<div style="text-align: center"><i>Figure 1: Visual representation of pipeline.</i></div>

In [None]:
def LiveSlideCreator(SpeakTime, SlideFileName):
    """
    NOTE: PARALLEL PROCESSING DOESN'T WORK.
    The final model. It uses multiprocessing to simultaneously run 2 functions:
    Listening to the speaker and creating the slides.

    Args:
        SpeakTime (float/int): Amount of time (min) the ASR listens
        SlideFileName (str): Name of slides file (markdown file is preferred)
    """
    a = liveASR(16000, 1, tokenizer, model)
    a.calibrate()
    p1 = Process(target=a.start_lASR(SpeakTime))
    p1.start()
    p2 = Process(target=bulletize('transcript', SlideFileName))
    p2.start()

# Listen for 1 minute
LiveSlideCreator(1, 'presentation.md')

## 3. Results

In this section the results of each sub-model are presented and the result of
all three steps combined.

### Speech-to-text
The output of the speech-to-text model is a string of the spoken utterance. However, it is printed as list so it is easier to distinguish sentences. The text is written to a .txt file where each line contains one sentence ending with a period. A spoken utterance could be "I have a big car.", then the model prints this as `['I have a big car.']`. As mentioned in the the speech to text section of the method, the WER-evaluation of this model is available on the [Huggingface page](https://huggingface.co/facebook/wav2vec2-large-960h-lv60-self) of the model.

### NER-model
The entities are extracted from a `string`. For example, the sentence is: "John lives in the UK for 4000 dollar in 2014.". Then the output is the following `{'John': 'PERSON', 'UK': 'GPE', '4000 dollar': 'MONEY', '2014': 'DATE'}`. 

### Noun phrase extraction
The output of this model is nouns. The sentence "I have big car." gets reduced to `big car`, which is the output.

### Final model
Here is a screen recording of the model in action. We used a text-to-speech model to read the sentences out loud from the transcript file, thus it is not really "live". We could not figure out how to run two functions (`start_lASR` and `bulletize`) simultaneously. However, it is live in the sense that the keywords displayed on the slide are extracted from the sentence the supposedly speaker is at, so it does not know what the speaker will say next or take the history into consideraton.



In [None]:
from IPython.display import Video

Video('docs/result.mp4', embed=True)

## 4. Discussion and conclusion

### Speech-to-text
The evaluation of the speech-to-text model is rudimentary. It does not consider at any discrepancies in performance between male and female speakers or race. However, the perfomance probably does not differ much between male and female speakers, as the data that the model was trained on is evenly split between male and female speakers (Panayotov et al., 2015) (Kahn et al., 2019). In terms of accents, the model probably performs slightly better on speakers with US-based accents. This is because the majority of the Librispeech dataset is made up of speakers with an US-based accent. (Winata, et al., 2020). This could easily be changed by finetuning the model on a dataset with speakers that have a different accent.

### NER-model
The dataset from Kaggle mentioned under section 2 was used as an extra evaluation for the NER-model. Normally, this is done via the F1-score or precision/recall. However, as the model is already trained, there was no need to do this again. Instead, the consistency of the NER-model was evaluated.
The consistency rate was calculated by dividing the number of unique entities by the number of unique entities with more than one different label. The function `eval_ner` was used for this purpose. First, the model was run on the first 10.000 sentences from the dataset to extract the named entities from each sentence. 10.000 was chosen to drastically reduce running time (1.5 minutes for 10.000 sentences). After this, a dictionary was created where the key is the entity and the value is a `set` of the corresponding labels. This way entities only occur once, the same holds for the values. There was a total of 7201 unique entities and 167 of those has more than 1 label, which resulted in a consistency rate of 0.023. Furthermore, there were 15 entities with more than 2 labels. A recommendation is to use the whole dataset for consistency evaluation, this will result in a more representative rate.


In [None]:
df = pd.read_csv('data/ner_dataset.csv', encoding = "ISO-8859-1").drop('POS', axis=1)

# split df into sentence-level df
indices = list(df.loc[df['Sentence #'] > 'Sentence'].index)
mini_df = [df.iloc[indices[i]:indices[i+1]] for i in range(len(indices)-1)]

# create list of all extracted sentences
sentences = [list(sentence["Word"]) for sentence in mini_df]
sentences = [" ".join(sent) for sent in sentences]
print("Number of sentences:", len(sentences))

def eval_ner(sentences, label_n=1):
    """
    Print inconsistency rate

    Args: 
        sentences: list of sentences
        label_n: number of labels given to an entity
    """
    ents = []
    for sentence in sentences:
        ent = entities(nlp(sentence))
        if ent != {}:
            ents.append(ent)
    
    # unique entities with their label, no duplicates
    all_ents = collections.defaultdict(set)
    for dic in ents:
        for key, value in dic.items():
            all_ents[key].add(value)

    print("Number of unique entities:", len(all_ents))
    mismatch = [(key, all_ents[key]) for key in all_ents if len(all_ents[key]) > label_n]
    print("Number of entities with more than %d labels: %d" % (label_n ,len(mismatch)))

    print("Inconsistency rate:", len(mismatch)/len(all_ents))

eval_ner(sentences[:10000])

Number of sentences: 47958
Iran 's new President Mahmoud Ahmadinejad said Tuesday that European incentives aimed at persuading Iran to end its nuclear fuel program are an insult to the Iranian nation .
{'Iran': 'GPE', 'Mahmoud Ahmadinejad': 'PERSON', 'Tuesday': 'DATE', 'European': 'NORP', 'Iranian': 'NORP'}
Number of unique entities: 7201
Number of entities with more than 1 label: 167
Inconsistency rate: 0.023191223441188725


### Noun phrase extraction

Although the noun phrase extraction itself works great, it is questionable if it is the most important information from a sentence. Sometimes, (almost) all information is in the verb. For example, when listing activities a robot is capable of:

```
The robot is capable of running, jumping and doing backflips.

Noun phrase: The robot
```

The noun phrase does not contain the information about the robot, only that it is talked about. We have tried subject-verb-object (SVO) extraction, but the model was not
accurate enough. This would be a big improvement, though. We think that almost all information of most sentences can be expressed in the SVO part.

We suggest adding multiple extractors together and assigning each word from these extractors an *informationful* rate. Then grab the $n$ most informationful words.

### Final model

Live slide creation at its current form is not so good. It is only possible to extract nouns, which will not be always the best fit to summarize the spoken text. It also lacks lots of features seen in normal slides; graphs, highlighting (bold and italics) and subheaders. This could be improved with a bettter understanding of the actual content of the spoken text. It would also be a good idea to keep evaluating previous slides using the new information received. Maybe slides can better be merged or split up. It is far from perfect in the state it is in now. Furthermore, the model would really be **live** if the two functions, mentioned before, can be run at the same time. So, the `start_lASR()` functions write to the `transcript.txt` file, while `bulletize()` reads that file continuously to create and update the slides.

### Conclusion

The pipeline works great, but there is a delay when processing the spoken text into a summary. This could of course be improved, but probably not much. It just takes some time to summarize.

There is most certainly potential in live slide making. The same principle could also be applied for other fields of live speaking, like in news shows or other tv shows, entertainment and in meetings to summarize the meeting. Notetaking is the perfect place where good AI can help, providing extra times for humans to spend on more important tasks.



## References

- Baevski, A., Zhou, H., Mohamed, A., & Auli, M. (2020). wav2vec 2.0: A
framework for self-supervised learning of speech representations. _arXiv
preprint arXiv:2006.11477_.

- Bird, S., Loper, E., & Klein, E. (2009). Natural Language Processing with
Python. O'Reilly Media Inc.

- Honnibal, M., Montani, I., Van Landeghem, S., & Boyd, A. (2020). spaCy:
Industrial-strength Natural Language Processing in Python. Zenodo.
_10.5281/zenodo.1212303_.

- Xu, Q., Baevski, A., Likhomanenko, T., Tomasello, P., Conneau, A., Collobert, R., Synnaeve, G., & Auli, M., (2020). Self-training and Pre-training are Complementary for Speech Recognition, _arxiv:2010.11430_.

- Panayotov, V., Guoguo, C., Povey, D., Khudanpur, S. (2015), Librispeech: An ASR Corpus Based On Public Domain Audio Books, _DOI: 10.1109/ICASSP.2015.7178964_

- Kahn, J., Riviere, M., Zheng, W., Kharitonov, E., Xu, Q., Mazaré, P.E., Karadayi, J., Liptchinsky, V., Collobert, R., Fuegen, C., Likhomanenko, T., Synnaeve, G., Joulin, A., Mohamed, A., Dupoux, E., Libri-Light: A Benchmark for ASR with Limited or No Supervision, _arXiv:1912.07875v1_

- Winata, G., Cahyawijaya, S., Lui, Z., Lin, Z., Madotto, A., Xu, P., Fung, P., Learning Fast Adaptation on Cross-Accented Speech Recognition (2020), _arXiv:2003.01901v1_

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=3aca66d9-b1cf-4558-867a-a972b9b40a0c' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>