# Exam details
Written work:  

You will have to analyze a dataset and test a relevant hypothesis.

You should choose your own dataset by Lecture 12 (written hand-in already available)

The report will be written as a Jupiter notebook.

Minimal set of analysis:
1) Describe your data and visualize some key dimensions.
2) Perform at least two analysis (depending on what is appropriate given the data you selected):
- Does your data contain quantitative values that allow for a hypothesis testing?
   IF YES: Formulate an hypothesis and test it. Complement the testing with an appropriate visualization.
- Does your data contain unstructured textual information?
 IF YES: Perform sentiment analysis on your data and describe and visualize the results.
- Does your data contain network structures (or a network structure can be extracted)?
 IF YES: Ask a question about the network structure and answer it.

OBS: Different datasets can be investigated in many different ways. Any combination of the above-described analysis is acceptable as long as you ask 2 questions. 
e.g. Statistical hypothesis testing + network analysis, Sentiment analysis + network analysis , Statistical hypothesis testing + Statistical hypothesis testing.

Groups:  Find your own group (2 ppls - 3 is possible) let the TAs know before Fall break.

Where to find the data:
The internet is full of data, these are just few starting points:
Data (digst.dk) - Danish Open Data
GHO | By theme (who.int) - WHO Open Data
DataHub  - Many datasets
Data.gov Home - Data.gov - US Open Data
Kaggle - All sort of datasets
Dataset Search (google.com)
TLC Trip Record Data - TLC (nyc.gov) - TAXI data in NYC
Industry data and insights | BFI - Movies!
CDE (cjis.gov) - Crime data from FBI
Københavns Kommune (opendata.dk) - Open Data from CPH

Opgaven skal indeholde en eller flere af følgende: 
- Hypothesis testing
- Sentiment Analyse
- Network Analysis 

Desuden også
- Interactive Visualization 

# Data
Datasættet stammer fra "The Danish Parliament Corpus 2009 - 2017, v2, w. subject annotation".

Kilde: 
Hansen, Dorte Haltrup and Navarretta, Costanza, 2021, The Danish Parliament Corpus 2009 - 2017, v2, w. subject annotation, CLARIN-DK-UCPH Centre Repository, http://hdl.handle.net/20.500.12115/44.


Datasættet består af transkriptioner af taler i Folketinget fra første samling 2009 til og med først samling 2016 (6/10 2009 – 7/9 2017). Til hver tale er der tilknyttet metadata, dels om medlemmet af folketinget ('Name', 'Gender', 'Party', 'Role', 'Title', 'Birth', 'Age'), dels om talen (Date', 'samling', 'Start time', 'End time', 'Time', 'Agenda item', 'Case no', 'Case type', 'Agenda title', 'Subject 1', 'Subject 2').

Datasættet er struktureret i tsv txt-filer, som er formateret i utf-8. Der er en fil per møde.

Kilde:
Samme, Readme


Til denne opgave har vi samlet tsv filerne i et nyt datasæt, som vi har gemt i en csv fil separeret med pipes. Csv filen er uploadet til sciencedata.dk, hvorfra den kan downloades via url med pandas.read_csv() metoden.





# Emnet
Emnet er indvandredebatten 2009 - 2017

1. de vigtigste emner over tid - hvordan er indvandredebatten afspejlet der i?
2. nøgleord som bliver aktualiseret i debatterne om indvandre. 2.1. Tf-Idf i relation til at identificere nøgleord. 2.2.Tf til at observere nøgleords bevægelse over tid. 
3. sentiment analyse på taler som indeholder noget om flygtning. Det kan f.eks. være taler, som handler om 'os' og 'dem' - nærlæsning. Bliver der større variation i sentiment-scorrerne mere varieret op til et valg? Bliver de mere varieret omkring 2015?  
4. pos -tag f.eks. verber fra forskellige partier, og hvilke adjektiver knytters sig til et begreb. 


# Load data

Data ligger på sciencedata.dk og deles derfra med download link.

In [2]:
import pandas as pd

In [3]:
df = pd.read_csv('https://sciencedata.dk/shared/825e999a5c13fd22d28d4289fa899ba1?download', sep='|')
print (f'The coloumns are {df.columns}')

The coloumns are Index(['ID', 'Date', 'samling', 'Start time', 'End time', 'Time',
       'Agenda item', 'Case no', 'Case type', 'Agenda title', 'Subject 1',
       'Subject 2', 'Name', 'Gender', 'Party', 'Role', 'Title', 'Birth', 'Age',
       'Text'],
      dtype='object')


In [16]:
from sklearn.feature_extraction.text import TfidfVectorizer

def top_distinctive_words(documents):
    
    with open('dk.txt', 'r', encoding='utf-8-sig') as f:
        stop_words = f.read().split('\n')
    
    # Create a TF-IDF vectorizer
    tfidf_vectorizer = TfidfVectorizer(max_features=10000, stop_words=stop_words)

    # Fit and transform the input documents
    tfidf_matrix = tfidf_vectorizer.fit_transform(documents)

    # Get the feature names (words)
    feature_names = tfidf_vectorizer.get_feature_names_out()

    # Initialize a list to store the top distinctive words for each document
    top_words_list = []

    # Iterate through the TF-IDF matrices for each document
    for tfidf_vector in tfidf_matrix.toarray():
        # Create a list of tuples (word, TF-IDF score) for the current document
        word_tfidf_tuples = [(feature, tfidf_score) for feature, tfidf_score in zip(feature_names, tfidf_vector) if tfidf_score > 0]

        # Sort the tuples by TF-IDF score in descending order
        word_tfidf_tuples.sort(key=lambda x: x[1], reverse=True)

        # Select the top five words with the highest TF-IDF scores
        top_words = [word for word, _ in word_tfidf_tuples[:5]]

        top_words_list.append(top_words)

    return top_words_list

{'analysis': 0.20851441405707477, 'calculation': 0.20851441405707477, 'for': 0.20851441405707477, 'idf': 0.41702882811414954, 'important': 0.20851441405707477, 'in': 0.20851441405707477, 'is': 0.41702882811414954, 'sample': 0.20851441405707477, 'text': 0.41702882811414954, 'tf': 0.41702882811414954, 'this': 0.20851441405707477}

# Feature engineering

In [18]:
import pandas as pd
import re
import nltk
import spacy
from collections import defaultdict
import string

# Type token ratio
def calculate_ttr(text):
    # Tranform to lower text case  
    text = text.lower()
    
    # Tokenize the input text into words
    words = re.findall(r'\b\S+\b', text)

    # Calculate the total number of tokens (words)
    total_tokens = len(words)

    # Calculate the number of unique types (unique words)
    unique_types = len(set(words))

    # Calculate the TTR (Type-Token Ratio)
    ttr = unique_types / total_tokens

    return ttr

def calculate_average_sentence_length(text):
    # Tokenize the input text into sentences
    sentences = nltk.sent_tokenize(text)

    # Calculate the total number of sentences
    total_sentences = len(sentences)

    # Calculate the total number of words in all sentences
    total_words = sum(len(nltk.word_tokenize(sentence)) for sentence in sentences)

    # Calculate the average sentence length
    if total_sentences > 0:
        average_length = total_words / total_sentences
    else:
        average_length = 0  # To avoid division by zero if there are no sentences

    return average_length


def get_pos_tags(text):
    # Load the spaCy model
    nlp = spacy.load('da_core_news_sm')

    # Process the text with spaCy
    doc = nlp(text)

    # Extract tokens and their POS tags
    pos_tags = [(token.text, token.pos_) for token in doc]

    return pos_tags

def get_pos_tag_counts(text):
    # Load the spaCy model
    nlp = spacy.load('da_core_news_sm')

    # Process the text with spaCy
    doc = nlp(text)

    # Initialize a dictionary to store POS tag counts
    pos_tag_counts = defaultdict(int)

    # Extract and count POS tags
    for token in doc:
        pos_tag_counts[token.pos_] += 1

    return dict(pos_tag_counts)


def create_bag_of_words(text):
    # Initialize a dictionary to store the bag of words
    bag_of_words = defaultdict(int)

    # Remove punctuation and convert text to lowercase
    translator = str.maketrans('', '', string.punctuation)
    text = text.translate(translator).lower()

    # Split the text into words
    words = text.split()

    # Count the frequency of each word
    for word in words:
        bag_of_words[word] += 1

    return dict(bag_of_words)


def get_featrues(dataframe):
    # Copy the input DataFrame to avoid modifying the original
    result_df = dataframe.copy()

    # Add columns for TTR, Average Sentence Length, POS Tags, and Bag of Words
    result_df['TTR'] = dataframe['Text'].apply(calculate_ttr)
    result_df['Average_Sentence_Length'] = dataframe['Text'].apply(calculate_average_sentence_length)
    result_df['POS_Tags'] = dataframe['Text'].apply(get_pos_tags)
    result_df['POS_Tag_Counts'] = dataframe['Text'].apply(get_pos_tag_counts)
    result_df['Bag_of_Words'] = dataframe['Text'].apply(create_bag_of_words)

    return result_df[['ID','TTR','Average_Sentence_Length','POS_Tags','POS_Tag_Counts','Bag_of_Words']]


In [19]:
id_text_df = df[['ID', 'Text']].head(20)

result_dataframe = get_featrues(id_text_df)

In [20]:
result_dataframe

Unnamed: 0,ID,TTR,Average_Sentence_Length,POS_Tags,POS_Tag_Counts,Bag_of_Words
0,20100531100002,0.911765,10.25,"[(Mødet, NOUN), (er, AUX), (åbnet, VERB), (., ...","{'NOUN': 12, 'AUX': 3, 'VERB': 4, 'PUNCT': 6, ...","{'mødet': 1, 'er': 1, 'åbnet': 1, 'finansudval..."
1,20100531100026,1.0,6.0,"[(Forhandlingen, NOUN), (er, AUX), (åbnet, VER...","{'NOUN': 3, 'AUX': 1, 'VERB': 1, 'PUNCT': 2, '...","{'forhandlingen': 1, 'er': 1, 'åbnet': 1, 'fru..."
2,20100531100055,0.462939,27.0,"[(Regeringen, NOUN), (har, AUX), (sammen, ADV)...","{'NOUN': 162, 'AUX': 42, 'ADV': 81, 'ADP': 115...","{'regeringen': 3, 'har': 6, 'sammen': 2, 'med'..."
3,20100531100625,0.833333,7.333333,"[(Det, PRON), (tror, VERB), (jeg, PRON), (best...","{'PRON': 3, 'VERB': 3, 'ADV': 4, 'PUNCT': 3, '...","{'det': 2, 'tror': 1, 'jeg': 1, 'bestemt': 1, ..."
4,20100531100634,0.546012,17.727273,"[(Tak, NOUN), (,, PUNCT), (hr., NOUN), (forman...","{'NOUN': 29, 'PUNCT': 27, 'PRON': 19, 'AUX': 1...","{'tak': 1, 'hr': 1, 'formand': 1, 'det': 9, 's..."
5,20100531100725,1.0,2.0,"[(Ordføreren, NOUN), (., PUNCT)]","{'NOUN': 1, 'PUNCT': 1}",{'ordføreren': 1}
6,20100531100728,0.627737,31.8,"[(Som, ADP), (jeg, PRON), (nævnte, VERB), (,, ...","{'ADP': 16, 'PRON': 19, 'VERB': 22, 'PUNCT': 2...","{'som': 3, 'jeg': 2, 'nævnte': 1, 'er': 3, 're..."
7,20100531100825,1.0,2.5,"[(Hr., PROPN), (Torben, PROPN), (Hansen, PROPN...","{'PROPN': 3, 'PUNCT': 1}","{'hr': 1, 'torben': 1, 'hansen': 1}"
8,20100531100828,0.556757,21.7,"[(Tak, NOUN), (,, PUNCT), (hr., NOUN), (forman...","{'NOUN': 28, 'PUNCT': 30, 'CCONJ': 10, 'PRON':...","{'tak': 1, 'hr': 1, 'formand': 1, 'jamen': 1, ..."
9,20100531100924,1.0,2.0,"[(Ordføreren, NOUN), (., PUNCT)]","{'NOUN': 1, 'PUNCT': 1}",{'ordføreren': 1}


# Question 2
# TF-IDF. Segment = speech 
2. nøgleord som bliver aktualiseret i debatterne om indvandre. 


Question: Which key words are essential for each party in the debates about immigrants.

Method:
1. Use **Tf-Idf** to identify essential key words in speeches on the subject of "Immigration".
2. Use group by to group the keywords of each party
3. Count keyword for each the party

In [60]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
import re

from sklearn.feature_extraction.text import TfidfVectorizer

def top_distinctive_words(documents):
    
    with open('dk.txt', 'r', encoding='utf-8-sig') as f:
        stop_words = f.read().split('\n')
    
    # Create a TF-IDF vectorizer
    tfidf_vectorizer = TfidfVectorizer(max_features=10000, stop_words=stop_words)

    # Fit and transform the input documents
    tfidf_matrix = tfidf_vectorizer.fit_transform(documents)

    # Get the feature names (words)
    feature_names = tfidf_vectorizer.get_feature_names_out()

    # Initialize a list to store the top distinctive words for each document
    top_words_list = []

    # Iterate through the TF-IDF matrices for each document
    for tfidf_vector in tfidf_matrix.toarray():
        # Create a list of tuples (word, TF-IDF score) for the current document
        word_tfidf_tuples = [(feature, tfidf_score) for feature, \
                             tfidf_score in zip(feature_names, tfidf_vector) \
                             if tfidf_score > 0]

        # Sort the tuples by TF-IDF score in descending order
        word_tfidf_tuples.sort(key=lambda x: x[1], reverse=True)

        # Select the top five words with the highest TF-IDF scores
        top_words = [word for word, _ in word_tfidf_tuples[:5]]

        top_words_list.append(top_words)

    return top_words_list

In [61]:
# Subset data
input_data = df[df['samling'] == 20151]
input_data = input_data[input_data['Subject 1'] == 'Immigration']
input_data = input_data[input_data['Role'] == 'medlem'].reset_index(drop=True)

# Create a function to apply top_distinctive_words to each group
def apply_top_distinctive_words(group):
    return top_distinctive_words(group['Text'].tolist())

# Group the DataFrame by 'Party', apply the function, and send the result to a dataframe
top_keywords_by_party = input_data.groupby('Party').apply(apply_top_distinctive_words)\
            .to_frame().reset_index().rename(columns = {0:'key_words'})

In [62]:
# I got a column called 'key_words'. It is a list of lists. 
# I want to change it into a string to be able to count the keywords in it.
# So I flatten the list and joins it to a text string.

def flatten_list_output_string(list_input):
    return ' '.join([i for y in list_input for i in y])

# Change list og lists to text string
top_keywords_by_party['key_words'] = top_keywords_by_party['key_words'].apply(lambda x : flatten_list_output_string(x))

In [63]:
import collections
def count_and_sort_words(text):
    # Tokenize the input text into words (split by whitespace and remove punctuation)
    words = text.lower().split()
    
    # Count the occurrences of each word using Counter
    word_counts =collections.Counter(words)

    # Convert the Counter object to a dictionary
    word_counts_dict = dict(word_counts)

    # Sort the dictionary by values (word counts) in descending order
    sorted_word_counts = dict(sorted(word_counts_dict.items(), key=lambda item: item[1], reverse=True))

    return sorted_word_counts

# Aplly the function on the  
top_keywords_by_party['key_words_count'] = top_keywords_by_party['key_words'].apply( lambda x : count_and_sort_words(x)) 
top_keywords_by_party

Unnamed: 0,Party,key_words,key_words_count
0,ALT,kommunerne investere økonomisk kommunernes udg...,"{'asyl': 12, 'dansk': 11, 'aftale': 9, 'statsb..."
1,DF,angående regeringen hjemsendelsesstrategi regn...,"{'statsborgerskab': 27, 'enhedslisten': 21, 't..."
2,EL,kommunerne maj skrotte kommunernes kommuneafta...,"{'tyrkiet': 22, 'kommunerne': 20, 'arbejdskraf..."
3,KF,kommunerne flygtningene penge sikre efterkomme...,"{'statsborgerskab': 8, 'kommunerne': 5, 'folk'..."
4,LA,kommunerne danmark større enhedslisten flygtni...,"{'danmark': 11, '000': 9, 'tyrkiet': 8, 'aftal..."
5,RV,kommunerne flygtningene formår staten sigt for...,"{'kommunerne': 13, 'flygtninge': 10, 'socialde..."
6,S,forelægge handlingsplan informations omgående ...,"{'kommunerne': 13, 'dansk': 12, '000': 12, 'fo..."
7,SF,højskolerne gribe købet oven skub forælder int...,"{'boliger': 9, 'måneder': 8, 'dansk': 7, 'menn..."
8,V,kommunerne kl fleksibelt asylansøgere lettere ...,"{'kommunerne': 14, 'lande': 14, 'danmark': 13,..."


In [65]:
top_keywords_by_party.at[0, 'key_words_count'] 

{'asyl': 12,
 'dansk': 11,
 'aftale': 9,
 'statsborgerskab': 9,
 'penge': 9,
 'værdier': 9,
 'vores': 8,
 'danmark': 8,
 'rigtig': 7,
 'mennesker': 7,
 'søge': 7,
 'samfund': 7,
 'kommunerne': 6,
 'børn': 6,
 'ja': 6,
 'nærområderne': 6,
 'lovforslag': 6,
 'danske': 6,
 'flygtninge': 6,
 'tyrkiet': 6,
 'arbejde': 5,
 'eu': 5,
 'arbejdskraft': 4,
 'år': 4,
 'virkelig': 4,
 'lov': 4,
 'dispensation': 4,
 'ret': 4,
 'advokatsamfundet': 4,
 'verden': 4,
 'født': 4,
 'virksomheder': 4,
 'studerende': 4,
 'ressourcer': 4,
 'lovforslaget': 3,
 'integrerbare': 3,
 'socialdemokraterne': 3,
 'integrerbart': 3,
 'barn': 3,
 'ordning': 3,
 'job': 3,
 'greencardordningen': 3,
 'væk': 3,
 'krav': 3,
 'nye': 3,
 'udvalget': 3,
 'ønsker': 3,
 'handler': 3,
 'land': 3,
 'rigtige': 3,
 'forskel': 3,
 'vej': 3,
 'identitet': 3,
 'svar': 3,
 'asylansøgere': 3,
 'spørgsmål': 3,
 'simpelt': 3,
 'egentlig': 3,
 'sikre': 3,
 'problemet': 3,
 'dsb': 3,
 'grænsen': 3,
 'vende': 3,
 'flygtning': 3,
 'hjælpe': 3,

# Question 2
# TF-IDF. Segment = party week 

In [None]:
Brug TF IDF til at identificere key words
Brug TF til at 
Key words skal komme fra TF IDF.


# Question 3

Sentiment analyse af taler som indeholder noget om flygtning. Det kan f.eks. være taler, som handler om 'os' og 'dem' - nærlæsning. Bliver der større variation i sentiment-scorerne mere varieret op til et valg? Bliver de mere varieret omkring 2015?

M: Brug sentiment analyse til at identificere forandringer i sentimenter i taler knyttet til emneordet Immigration.
Definition: "SA is a part of applied computational linguistics and attempts to quantify the emotions" (Kran, E., & Orm, S. (2020)).


### Hvilken sentiment analyse tilgang anvender vi?

På Alexandra Instituttets DaNLP repository, et repository for Natural Language Processing resources for the Danish Language, leverer Alexandra Instituttet et overblik over open sentiment analysis models and dataset for Danish.

Der er følgende modeller:
1. AFINN - wordlist model, der returnerer en tal-score, der modsvarer en følelse i et ord. Negativ ( minus ), neutral (0) eller positiv ( plus ) 
2. Sentida - wordlist model, der ligesom ovenfor returnerer en tal-score, der modsvarer en følelse.
3. BERT Emotion - BERT model, der returner en beskrivende tekststreng, der modsvarer et semantisk felt, som et ord er indlejret i, f.eks. glæde/sindsro, forventning/interesse, tillid/accept.     
4. BERT Tone - Bert model, der ligeledes returner en beskrivende tekststreng.
5. SpaCy Sentiment - Spacy model, der også returner en tekststreng.
6. Senda - Bert model, der også returner en tekststreng. 

De to første, AFINN og Sentida, samt den sidste Senda, er ikke en del af danlp projektet og dermed ikke en del af DaNLP Python projektet, som kan anvendes via pip. 

Kilde: Alexandra Institute. (2021). Sentiment_analysis.md. https://github.com/alexandrainst/danlp/blob/master/docs/docs/tasks/sentiment_analysis.md


I denne opgave vil vi anvende sentiment analyse til at se på variationer i scorerne. Vi kan derfor vælge en af de første to, hvorved vi også vælger ikke at anvende DaNLPs Python pakke. Af de to første wordlist modeller, AFINN og Sentida vælger vi Sentida, fordi modellen er blevet opdateret med ny funktionalitet, og fordi set ud fra Finn Årup Nielsens publicationer siden 2019 har han ikke udgivet nyt omkring sentiment analyse i den periode. 

Både AFINN og Sentida er wordlist modeller, der aggregerer en sentiment scores baseret på forekomsten af ord fra ordlisten i en given tekst. Den aggregerede score bliver anvendt som en indikator på, hvor positiv teksten 
er. Der er fire problemer forbundet med denne model:
1. modellen tager ikke højde for syntaktiske relation mellem ord.
2. modellen ignorerer adverbier, og dermed den betydning adverbierne har i at udtykke grader af noget og holdinger til noget.
3. modellen afspejler ikke menneskers måde at opfatte følelser.
4. modellen kan ikke håndtere ord der betyder to forskellige ting.
Udviklerne bag Sentida har fundet inspiration i den engelsk sprogede VADER model og har forsøgt at minimere problemerne ved at indbygge forskellige, simple former for "awareness". For eksempel at forøge score omkring negationer, captital letter og udråbstegn for at give disse elementer i sproget større opmærksomhed. 

For example:

_“Maden (+0.3) var god (+2.3), (← x 0.5) men(1.5 x →) serviceringen (+0.3) var elendig (-4.3).” ⇒ 1.3 -6 ⇒ sentiment score: -4.7 

“The  food (+0.3)was  good (+2.3),  (←  x  0.5) but(1.5x  →) the  service (+0.3)was horrendous (-4.3).” ⇒ 1.3 -6 ⇒ sentiment score: -4.7_


_“Det er så sejt (+3.6)!(← x 1.291)” ⇒ sentiment score: 4.6

“It is so cool (+3.6)!(←x 1.291)”⇒ sentiment score: 4.6_


_“DET ERSÅ SEJT (+3.6). (← x 1.733)” ⇒ sentiment score: 6.2

“IT IS SO COOL (+3.6). (← x 1.733)” ⇒ sentiment score: 6.2_


(Kran, E., & Orm, S. (2020)).






Neural networks baserede SA modeller som "aspect-based sentiment analysis" kan tage højde for ord og kontekst og er på den måde mere præcise, men eftersom neurale networds modeller forudsætter store mængder af træningsdata, er de langsomme og upraktiske at benytte (Kran, E., & Orm, S. (2020)).

In [13]:
# pip install sentida

In [19]:
from sentida import Sentida
# Define the class:
SV = Sentida()

SV.sentida(
        text = 'Lad der blive fred.',
        output = 'mean',
        normal = False,
        speed = 'normal')

2.0

https://github.com/alexandrainst/danlp/blob/master/docs/docs/tasks/sentiment_analysis.md

https://github.com/alexandrainst/danlp/blob/master/docs/docs/frameworks/spacy.md

# Litteraturliste

Lauridsen, G. A., Dalsgaard, J. A., & Svendsen, L. K. B. (2019). SENTIDA: A New Tool for Sentiment Analysis in Danish. Journal of Language Works - Sprogvidenskabeligt Studentertidsskrift, 4(1), 38–53. Retrieved from https://tidsskrift.dk/lwo/article/view/115711

Kran, E., & Orm, S. (2020). EMMA: Danish Natural-Language Processing of Emotion in Text: The new State-of-the-Art in Danish Sentiment Analysis and a Multidimensional Emotional Sentiment Validation Dataset. Journal of Language Works - Sprogvidenskabeligt Studentertidsskrift, 5(1), 92–110. Retrieved from https://tidsskrift.dk/lwo/article/view/121221

Brogaard Pauli, Amalie and Barrett, Maria and Lacroix, Ophélie and Hvingelby, Rasmus. (2021) An open-source toolkit for Danish Natural Language Processing. https://ep.liu.se/ecp/178/053/ecp2021178053.pdf 

Alexandra Institute. (2021). Sentiment_analysis.md. https://github.com/alexandrainst/danlp/blob/master/docs/docs/tasks/sentiment_analysis.md

Hansen, Dorte Haltrup and Navarretta, Costanza, 2021, The Danish Parliament Corpus 2009 - 2017, v2, w. subject annotation, CLARIN-DK-UCPH Centre Repository, http://hdl.handle.net/20.500.12115/44.

McKinney, Wes. Python for Data Analysis : Data Wrangling with Pandas, NumPy and Jupyter. Third edition. Sebastopol, CA: O’Reilly Media, Inc., 2022

In [17]:
# bump_chart
# https://altair-viz.github.io/gallery/bump_chart.html