# Part 1. Loading data

In [1]:
import pandas as pd # data analysis package
import numpy as np # math
import os # for changing directory
import re # regular expression

One of the biggest problems I've encountered in relation to understanding the types of therapy by reading articles about them is that the descriptions differ quite significantly and it's hard to tell what information is most important. My hypothesis is that the most important information should be repeated in all articles the most. For this reason, I have compiled different articles about each modality into text files and will use them for text analysis.

I have used a wide range of websites, most common ones being:

- https://www.nhs.uk/
- https://www.apa.org/
- https://www.mind.org.uk/
- https://en.wikipedia.org/
- https://www.mayoclinic.org/
- https://www.counselling-directory.org.uk/
- https://www.choosingtherapy.com/
- https://www.verywellmind.com/
- https://www.goodtherapy.org/
- https://www.psychologytoday.com/
- https://www.thebritishcbtcounsellingservice.com/

I have tried to stick to the mental-health oriented websites, and avoided descriptions formulated by psychotherapists themselves in their private websites.

First, I will open all text files and add them into a dataframe.

In [2]:
# An empty dictionary where I will add text files
modality_dict = {}

In [3]:
# Foler path
path = 'C:\\Users\\laimi\\Desktop\\Studies\\Data Analytics\\Portfolio\\Types of Psychotherapy\\Code\\main\\text\\modality_text_files'

In [4]:
# Setting directory
os.chdir(path)

In [5]:
# Function to read files and add to the dictionary
def read_text_file(file_path):
    with open(file_path, 'r', encoding='UTF-8') as f:
        modality_dict[file] = f.read()

In [6]:
# Iterating through the files
for file in os.listdir():
    file_path = f"{path}\\{file}"
    read_text_file(file_path)

In [7]:
# Transforming dictionary into a dataframe
modality_df = pd.DataFrame.from_dict(modality_dict, orient='index').sort_index()
modality_df = modality_df.reset_index()

In [8]:
# Naming the columns
modality_df.columns = ['therapy', 'transcript']

In [9]:
# Removing .txt from therapy name
modality_df['therapy'] = modality_df['therapy'].apply(lambda x: x[0:-4])

In [10]:
modality_df

Unnamed: 0,therapy,transcript
0,adlerian,"Adlerian therapy, also called individual psych..."
1,behavioural,Systematic desensitization\nSystematic desensi...
2,biodynamic,What Is Biodynamic Psychotherapy?: Biodynamic ...
3,body,"Body psychotherapy, a branch of therapy that f..."
4,brief,Almost all psychotherapy is language-based and...
5,cat,What is CAT?\nCAT stands for Cognitive Analyti...
6,cbt,Cognitive behavioural therapy (CBT) is a talki...
7,cognitive,Cognitive therapy (CT) is a type of psychother...
8,core process,Core process psychotherapy (CPP) is a mindfuln...
9,creative arts,Art therapy is an established form of psycholo...


In [11]:
# Saving
# modality_df.to_csv('descriptions_df.csv')

# Part 2. Cleaning

In [12]:
# Creating a text-cleaning function
def clean_data_1(text):
    text = text.lower() # lowercase
    text = re.sub(r'[^\w\s]', ' ', text) # punctuation - removing everything except words and white space
    text = re.sub(r'\d+', '', text) # removing digits
    text = re.sub('\n', ' ', text) # removing new line characters
    text = re.sub('behavior', 'behaviour', text) # normalising the spelling
    text = re.sub(r'\s+', ' ', text).strip() # removing white space
    
    return text

round1 = lambda x: clean_data_1(x)

In [13]:
clean_data = pd.DataFrame(modality_df['transcript'].apply(round1))

In [14]:
clean_data.iloc[15, 0]

'humanistic therapy is a mental health approach that emphasizes the importance of being your true self in order to lead the most fulfilling life it s based on the principle that everyone has their own unique way of looking at the world this view can impact your choices and actions humanistic therapy also involves a core belief that people are good at heart and capable of making the right choices for themselves if you don t hold yourself in high regard it s harder to develop your full potential humanistic therapy involves better understanding your world view and developing true self acceptance this is accomplished partially through the development of unconditional positive regard both from others and from yourself when you believe that others only respect you if you act a certain way it s easy to fall into the trap of constantly feeling like you aren t enough this feeling of worthlessness in turn can negatively impact how you view both yourself and the world around you remember accordin

# Removing stopwords

In [15]:
# Importing nltk package to remove stopwords
import nltk

In [16]:
# Creating a set of english stopwords
stop_words = set(nltk.corpus.stopwords.words('english'))

In [17]:
# Creating a function to remove stopwords from the text
def remove_stopwords(text):
    sentence_split = text.split() # splitting transcripts into lists
    text = " ".join([w for w in sentence_split if w not in stop_words]) # keeping only words that are not stop words
    
    return text

round2 = lambda x: remove_stopwords(x)

In [18]:
clean_data = pd.DataFrame(clean_data['transcript'].apply(round2))

In [19]:
clean_data.index = modality_df['therapy']

In [20]:
clean_data.iloc[0, 0]

'adlerian therapy also called individual psychology short term goal oriented positive psychodynamic therapy based theories alfred adler one time colleague sigmund freud adler focused much research feelings inferiority versus superiority discouragement sense belonging context one community society large according adler feelings inferiority result neurotic behaviour right setting also used motivation strive greater success adlerian therapy focuses development individual personality understanding accepting interconnectedness humans alfred adler born near vienna austria studied medicine became doctor first practicing ophthalmology shifting general medicine treating different populations early adler met regularly sigmund feud began develop psychoanalysis yet adler soon parted ways freud begin branch therapy would become adlerian therapy individual psychology developed approach met patients spoke methods death adlerian therapy evidence based approach applied successfully treatment type psych

In [21]:
# Saving
# clean_data.to_csv('clean_descriptions.csv')

## Creating bi-grams and tri-grams

In [22]:
# NLP package gensim
import gensim
# from gensim.models import CoherenceModel

In [23]:
# I need all transcripts to be in a list format
transcript_list = clean_data.transcript.values.tolist()

In [24]:
# Creating a list with all words from transcripts that will be used to train the model to identify bigrams and trigrams
gensim_train = []
for transcript in transcript_list:
    lst = transcript.split()
    gensim_train.append(lst)

In [25]:
gensim_train[0][0:20]

['adlerian',
 'therapy',
 'also',
 'called',
 'individual',
 'psychology',
 'short',
 'term',
 'goal',
 'oriented',
 'positive',
 'psychodynamic',
 'therapy',
 'based',
 'theories',
 'alfred',
 'adler',
 'one',
 'time',
 'colleague']

In [26]:
# Given each modality might have its own specific bigrams and trigrams, I've set the threshold pretty low - to 4 occurences

# Training model to recognise bigrams
bigram = gensim.models.Phrases(gensim_train, min_count=4, threshold=50) 
# Training model to recognse trigrams using bigrams
trigram = gensim.models.Phrases(bigram[gensim_train], threshold=50)

# No more future training needed, so I will save model to use less memory
bigram_mod = gensim.models.phrases.Phraser(bigram) 
trigram_mod = gensim.models.phrases.Phraser(trigram)

# Function to iterate throught documents and create bigrams
def make_bigrams(texts):
    return [bigram_mod[doc] for doc in texts]

# Function to create trigrams
def make_trigrams(texts):
    return [trigram_mod[bigram_mod[doc]] for doc in texts]

data_bigrams = make_bigrams(gensim_train) # Creating bigrams
data_bi_trigrams = make_trigrams(data_bigrams) # Creating trigrams

ngram_docs = data_bi_trigrams # Renaming for easier use

In [27]:
# Bigrams and trigrams are now connected with an underscore
ngram_docs[0][0:50]

['adlerian',
 'therapy',
 'also',
 'called',
 'individual',
 'psychology',
 'short_term',
 'goal_oriented',
 'positive',
 'psychodynamic',
 'therapy',
 'based',
 'theories',
 'alfred_adler',
 'one',
 'time',
 'colleague',
 'sigmund_freud',
 'adler',
 'focused',
 'much',
 'research',
 'feelings_inferiority',
 'versus',
 'superiority',
 'discouragement',
 'sense_belonging',
 'context',
 'one',
 'community',
 'society',
 'large',
 'according',
 'adler',
 'feelings_inferiority',
 'result',
 'neurotic',
 'behaviour',
 'right',
 'setting',
 'also',
 'used',
 'motivation',
 'strive',
 'greater',
 'success',
 'adlerian',
 'therapy',
 'focuses',
 'development']

## Lemmatizatoin

In [28]:
import spacy # NLP library
import en_core_web_sm # A trained English pipeline (small version)

In [29]:
# Loading the model and disabling some components for optimisation
nlp = spacy.load('en_core_web_sm', disable=['parser', 'ner']) 

In [30]:
# Creating a function for lemmatization. I will use noun, adjective, verb and adverb tagging
def lemmatization(texts, allowed_postags=['NOUN', 'ADJ', 'VERB', 'ADV']):
    lemmatized = []
    for sent in texts:
        doc = nlp(" ".join(sent)) 
        lemmatized.append([token.lemma_ for token in doc if token.pos_ in allowed_postags])
    return lemmatized

In [31]:
# setting a higher max length
nlp.max_length = 1300000

In [32]:
data_lemmatized = lemmatization(ngram_docs)

In [33]:
data_lemmatized[0]

['therapy',
 'also',
 'call',
 'individual',
 'psychology',
 'short_term',
 'goal_oriente',
 'positive',
 'psychodynamic',
 'therapy',
 'base',
 'theory',
 'alfred_adler',
 'time',
 'adler',
 'focus',
 'much',
 'research',
 'feelings_inferiority',
 'superiority',
 'discouragement',
 'context',
 'community',
 'society',
 'large',
 'accord',
 'adler',
 'feelings_inferiority',
 'result',
 'neurotic',
 'behaviour',
 'right',
 'setting',
 'also',
 'use',
 'motivation',
 'strive',
 'great',
 'success',
 'therapy',
 'focus',
 'development',
 'individual',
 'personality',
 'understand',
 'accept',
 'interconnectedness',
 'human',
 'alfred_adler',
 'bear',
 'study',
 'medicine',
 'become',
 'doctor',
 'first',
 'practice',
 'ophthalmology',
 'shift',
 'general',
 'medicine',
 'treat',
 'different',
 'population',
 'early',
 'adler',
 'meet',
 'regularly',
 'sigmund',
 'feud',
 'begin',
 'develop',
 'psychoanalysis',
 'adler',
 'soon',
 'part',
 'way',
 'freud',
 'begin',
 'branch',
 'therapy',


## Common Words

There are a lot of words that are repeated all the time but they don't give us much insight, such as names. I will remove them.

In [34]:
# I've manually added these word to the list based on the result of following analysis, i.e. these are word that 
# would appear as most-common words, but I found them to not provide a lot of insight for my purposes.
common_words = ['therapy', 'therapist', 'client', 'experience', 'person', 'use', 'people', 'patient', 'approach', 'individual', 'also',
                'psychotherapy', 'help', 'work', 'understand', 'other', 'well', 'issue', 'problem', 'way', 'technique', 'session', 'life', 
               'treatment', 'theory', 'behavior', 'behaviour', 'self', 'gestalt', 'form', 'often', 'rather', 'see', 'personal',
               'include', 'find', 'form', 'thing', 'involve', 'psychology', 'psychological', 'response', 'jung', 'jungian', 'adler',
               'freud', 'hypnotherapy', 'alfred_adler', 'adlerian', 'cat', 'cbt' 'eclectic', 'therapeutic', 'eft', 'emdr', 'existential',
               'family', 'humanistic', 'hypnotherapist', 'integrative', 'eclectic', 'ipt', 'nlp', 'practitioner', 'roger', 'carl_roger', 
               'rogerian', 'phenomenological', 'primal', 'janov', 'psychoanalysis', 'psychosynthesis', 'transpersonal', 'child',
               'aaron_beck', 'sfbt']

In [35]:
# Defining a function to remove the common words
def remove_common_words(text):
    new_list = []
    for transcript in text:
        lst = [x for x in transcript if x not in common_words]
        new_list.append(lst)
    return new_list

In [36]:
clean_lemmatized = remove_common_words(data_lemmatized)

In [37]:
# I will need joined tokens for further analysis
data_lemmatized_joined = []

for token in clean_lemmatized:
    text = " ".join(token)
    data_lemmatized_joined.append(text)

# Bag of Words and most common words

In [38]:
from sklearn.feature_extraction.text import CountVectorizer

In [39]:
cv = CountVectorizer(stop_words='english') # initialising the vectorizer
data_cv = cv.fit_transform(data_lemmatized_joined) # fitting and transforming the data
data_bow = pd.DataFrame(data_cv.toarray(), columns=cv.get_feature_names_out()) # creating a count matrix DataFrame
data_bow.index = clean_data.index # setting therapy names as index

In [40]:
data_bow = data_bow.transpose() # transposing format

In [41]:
data_bow

therapy,adlerian,behavioural,biodynamic,body,brief,cat,cbt,cognitive,core process,creative arts,...,process oriented,psychoanalysis,psychodynamic,psychosexual,psychosynthesis,relational,somatic,systemic,transactional,transpersonal
aata,0,0,0,0,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0
abandon,1,0,0,0,0,2,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0
abandonment,0,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
abate,0,0,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
abend,0,0,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
yummy,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0
zeitgeist,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
zeitler,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
zinker,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [42]:
# List of top-counted words. Sorting each column descending and grabbing top 30 words
top_dict = {} 
for c in data_bow.columns:
    top = data_bow[c].sort_values(ascending=False).head(30)
    top_dict[c] = list(zip(top.index, top.values))
    
top_dict['adlerian']

[('feel', 33),
 ('goal', 29),
 ('develop', 27),
 ('belief', 22),
 ('insight', 20),
 ('inferiority', 20),
 ('new', 19),
 ('social', 18),
 ('create', 18),
 ('feelings_inferiority', 18),
 ('assessment', 17),
 ('healthy', 17),
 ('change', 16),
 ('influence', 15),
 ('believe', 15),
 ('positive', 14),
 ('learn', 14),
 ('relationship', 14),
 ('society', 14),
 ('make', 14),
 ('need', 14),
 ('sense', 14),
 ('overcome', 13),
 ('memory', 13),
 ('type', 13),
 ('feeling', 13),
 ('focus', 13),
 ('value', 12),
 ('style', 12),
 ('context', 12)]

Looking at this list we can get a better insight into the "core" of each modality - e.g. Creative Arts therapy, as expected, seems to be about creative expression, emotions, making art, using physical movement, and perhaps used for people with trauma or cancer. Behavioural therapy is about reward/punishment, responding to stimulus, often used for certain disorders, utilises learning and changing old patterns, etc.

# TF-IDF

Term Frequency - Inverse Document Frequency emphasizes importance of unique words in a document, while reducing the value of words that appear in many other documents. It might give different results than just common word count.

Will follow the same steps as above with Count Vectorizer.

In [43]:
from sklearn.feature_extraction.text import TfidfVectorizer

In [44]:
vect = TfidfVectorizer(stop_words=stop_words) # using stopwords from before
data_vect = vect.fit_transform(data_lemmatized_joined)
data_tfidf = pd.DataFrame(data_vect.toarray(), columns=vect.get_feature_names_out())
data_tfidf.index = clean_data.index

In [45]:
data_tfidf = data_tfidf.transpose()

In [46]:
data_tfidf

therapy,adlerian,behavioural,biodynamic,body,brief,cat,cbt,cognitive,core process,creative arts,...,process oriented,psychoanalysis,psychodynamic,psychosexual,psychosynthesis,relational,somatic,systemic,transactional,transpersonal
aata,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.00467,...,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0
abandon,0.012466,0.0,0.0,0.0,0.0,0.039317,0.0,0.0,0.0,0.00000,...,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.003141,0.0
abandonment,0.000000,0.0,0.0,0.0,0.0,0.024539,0.0,0.0,0.0,0.00000,...,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0
abate,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.00000,...,0.0,0.009302,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0
abend,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.00000,...,0.0,0.009302,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
yummy,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.00000,...,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.004377,0.0
zeitgeist,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.00000,...,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0
zeitler,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.00000,...,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0
zinker,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.00000,...,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0


In [47]:
# data_tfidf.to_csv('tfidf_matrix.csv')

In [48]:
tfidf_dict = {}

for c in data_tfidf.columns:
    top = data_tfidf[c].sort_values(ascending=False).head(30)
    tfidf_dict[c] = list(zip(top.index, top.values))
tfidf_dict['adlerian']

[('inferiority', 0.3474254596340459),
 ('feelings_inferiority', 0.31268291367064127),
 ('assessment', 0.15322625494553105),
 ('goal', 0.1488536553944655),
 ('feel', 0.14735147283358313),
 ('birth_order', 0.13897018385361834),
 ('society', 0.13119001706020114),
 ('insight', 0.12551354769451464),
 ('develop', 0.12395658521915956),
 ('reorientation', 0.12159891087191606),
 ('belief', 0.11947484421815974),
 ('lifestyle', 0.11420987889702357),
 ('healthy', 0.11329140424930416),
 ('engagement', 0.11181196742553598),
 ('social', 0.10652887790763578),
 ('style', 0.10418882979311202),
 ('community', 0.09914640025887303),
 ('strive', 0.09550642731035268),
 ('memory', 0.09514457573374091),
 ('overcome', 0.09216712566803188),
 ('new', 0.08968797383808755),
 ('insight_reorientation', 0.08685636490851148),
 ('social_interest', 0.08685636490851148),
 ('create', 0.08496755416239873),
 ('value', 0.07997040299950882),
 ('superiority', 0.07967913082013767),
 ('betterhelp', 0.07967913082013767),
 ('protot

While there are some minor changes, overall result seems to be similar to the previous list.

## Similarities between modality descriptions

The lack of consensus in terms of therapy categorisation is another issue that makes any informed decision making and research into help  available quite difficult. For example, psychosyntesis has analytic and jungian roots so could be classified as psychodynamic, but it also places an emphasis on client-therapsit relationship and focuses on spiritual aspect of human experience, which would make it humanistic. Given it integrates ideas from other modalities, it is also integrative. 

I have decided to solve this issue by measuring the cosine similarity (using bag of words for vectorising) between the descriptions of the therapies, and when in doubt group therapies that are similar together. For example, Cognitive Analytic Therapy is more similar to CBT than to Psychodynamic therapy, so I would group them together.

In [49]:
# Cosine_similarity function could not work with one-dimensional data (coloumn vs column). I could reshape it,
# but instead I will just calculate manually using dot and norm functions

therapies_sim_vals = []
for c1 in data_tfidf.columns:
    for c2 in data_tfidf.columns:
        if c1 == c2: # not intersted in therapy's relationship to itself
            continue
        else:
            # manual calculation of cosine similarity
            value = np.dot(data_tfidf[c1], data_tfidf[c2])/(np.linalg.norm(data_tfidf[c1])*np.linalg.norm(data_tfidf[c2]))
            dic = {'therapy_1':c1, 'therapy_2':c2, 'value':value} # dictionary listing two modalities and their similarity
            therapies_sim_vals.append(dic)

In [50]:
# List to dataframe
sim_df = pd.DataFrame(therapies_sim_vals)

In [51]:
sim_df.sort_values('value', ascending=False)

Unnamed: 0,therapy_1,therapy_2,value
1057,somatic,body,0.612061
132,body,somatic,0.612061
530,humanistic,person centred,0.451701
729,person centred,humanistic,0.451701
456,gestalt,humanistic,0.401783
...,...,...,...
960,psychosexual,core process,0.041567
959,psychosexual,cognitive,0.041082
265,cognitive,psychosexual,0.041082
974,psychosexual,personal construct,0.029832


The relations are duplicated, but inversed ( a -> b and b -> a). I can remove duplicates based on identical similarity values

In [52]:
# Counting duplicates
sim_df['value'].duplicated().sum()

595

In [53]:
# Removing duplicates based on similarity value
sim_df = sim_df.drop_duplicates(subset=['value'])

In [54]:
# Cheking
sim_df['value'].duplicated().sum()

0

In [55]:
sim_df.sort_values('value', ascending=False)

Unnamed: 0,therapy_1,therapy_2,value
132,body,somatic,0.612061
530,humanistic,person centred,0.451701
456,gestalt,humanistic,0.401783
420,existential,gestalt,0.391813
220,cbt,integrative_eclectic,0.386303
...,...,...,...
327,creative arts,personal construct,0.043124
575,hypnotherapy,systemic,0.042293
299,core process,psychosexual,0.041567
265,cognitive,psychosexual,0.041082


In [56]:
# sim_df.to_csv('modality_similarity_values.csv')

# LDA

I am also curious whether there are some hidden similarities that I cannot see which would group the therapy types better. I will use LDA topic modelling to test this.

In [63]:
import pyLDAvis # for visualisation of topic modelling
import pyLDAvis.gensim_models
from gensim.models import TfidfModel

In [58]:
# Making a copy to not alter previous data
lda_data = clean_lemmatized.copy()

In [59]:
len(lda_data)

35

In [60]:
dictionary = gensim.corpora.Dictionary(lda_data) # associating each word in the transcripts with a unique integer ID
corpus = [dictionary.doc2bow(transcript) for transcript in lda_data] # vectorizing the corpus using bag of words
tfidf = TfidfModel(corpus, id2word = dictionary) # for transforming original vector representation to tf-idf


# Filter low value words and also words missing in tfidf models.

low_value = 0.02

for i in range(0, len(corpus)):
    bow = corpus[i]
    low_value_words = []
    tfidf_ids = [id for id, value in tfidf[bow]]
    bow_ids = [id for id, value in bow]
    low_value_words = [id for id, value in tfidf[bow] if value < low_value]
    words_missing_in_tfidf = [id for id in bow_ids if id not in tfidf_ids] # The words with tf-idf socre 0 will be missing

    new_bow = [b for b in bow if b[0] not in low_value_words and b[0] not in words_missing_in_tfidf]  

#reassign        
corpus[i] = new_bow

In [61]:
# Training the model
lda_model = gensim.models.ldamodel.LdaModel(corpus=corpus,
                                           id2word=dictionary,
                                           num_topics=7,
                                           random_state=100,
                                           passes=10,
                                           alpha='auto')

In [64]:
# Visualising
pyLDAvis.enable_notebook()
vis = pyLDAvis.gensim_models.prepare(lda_model, corpus, dictionary, mds='mmds', R=30)
vis

  default_term_info = default_term_info.sort_values(
See https://numpy.org/devdocs/release/1.25.0-notes.html and the docs for more information.  (Deprecated NumPy 1.25)
  return np.find_common_type(types, [])


After some experimenting with the parameters and number of topics it seems that the model can pick up some of the groups pretty well, such as art and expression, body-oriented, or unconscious oriented. However, there can be inaccuracies due to use of similar words in completely different ways. For example, "child" and "parent" can be used in context of childhood experiences and relationship with the parents, but they can also be used to refer to inner processes, such as Child and Parent ego-states in transactional analysis. This can lead to seemingly different modalities being grouped together.

Overall, in contrast to the "Big 3", i.e. psychodynamic, humanistic, and Cognitive-Behavioural groups, this encourages me to include more groups in my visualisation, namely:
- Body-oriented therapies
- Spirituality/Philosophy oriented ones
- Expressive therapies
- Relationships oriented therapies
- Humanistic / unstructured therapies
- Unconcious-oriented therapies
- CBT / short-term / interventioanal / goal-oriented therapies

# EDIT: 
Since the last interactive visualisation does not load on github, to see it please enter the repository url here: https://nbviewer.org/