Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel $\rightarrow$ Restart) and then **run all cells** (in the menubar, select Cell $\rightarrow$ Run All).

Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE", as well as your name and email below:

In [45]:
# Full name
NAME = ""
# Institutional email (hm.edu or hmtm.de)
EMAIL = ""

---

# Day 4 - Visualizing painter biographies

## 4.0 - Getting Started

### Introduction

The fourth day of this class will show you:

- [HuggingFace](https://huggingface.co/), a platform for finding and working with different machine learning models.
- How to visualize how similarity between painters

Please download the code and data from the [github repository](https://github.com/aica-wavelab/aica-assignments) and follow the instructions in the `A4_painter_semantic_distance`.

[Github repository of the course](https://github.com/aica-wavelab/aica-assignments)

### Content of the repository

- `data`: A folder containing the summary information for artists gathered from Wikipedia.
- `A4_painter_semantic_distance.ipynb`: This notebook where we will do the analysis and visualization work.

### Assignment

Today's task is to find a way to cluster and visualize painters based on the summaries of their Wikipedia pages.

@@REVIEW@@

### Installation required

Make sure you have the following packages installed for today.

In [88]:
!pip install pandas numpy matplotlib seaborn sentence-transformers umap-learn plotly

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)




---

## 4.1 - The dataset

The dataset is something I put together using the `wikipedia-api` package (linked [here](https://pypi.org/project/Wikipedia-API/)). It's a collection of the summaries from painter pages on Wikipedia. The painter pages come Wikipedia's [List of painters by name](https://en.wikipedia.org/wiki/List_of_painters_by_name). While it has a lot of painters, it's important to note that it does not cover _all_ the painters who have pages on Wikipedia.

The dataset is divided into two sections:

- The main file is `painter_summaries_all.csv`; it has data on all 3900+ painters listed in the Wikipedia article. One listed painter has been removed from this dataset that appears in the partial files and the IDs not been changed.
- There are also 6 files in the `partial` directory with the format `painter_summaries_part##.csv`. These files have the data split into smaller chunks based on how the data was gathered.

### Inspecting the data

Open `painter_summaries_all.csv` file in a spreadsheet program (Excel, Numbers, Sheets, etc) and take a look at the data.

<div class="alert alert-info">
<b>Instruction:</b> What are the columns in this dataset? What do they each contain?
</div>

**@@ YOUR ANSWER HERE @@**

### Loading the data

Let's load the complete dataset and inspect it using pandas.

In [47]:
import pandas as pd

painter_summaries_df = pd.read_csv("data/painter_summaries_all.csv")

painter_summaries_df.head(5)

Unnamed: 0,painter_id,painter_name,summary,url
0,1,Alfred Richard Gurrey Sr.,Alfred Richard Gurrey Sr. (1852–1944) was an ...,https://en.wikipedia.org/wiki/Alfred_Richard_G...
1,2,Edward Otho Cresap Ord II,"Edward Otho Cresap Ord, II (November 9, 1858 –...",https://en.wikipedia.org/wiki/Edward_Otho_Cres...
2,3,George Barret Jr.,"George Barret Jr. (1767–1842), sometimes refer...",https://en.wikipedia.org/wiki/George_Barret_Jr.
3,4,George Barret Sr.,George Barret Sr. (c. 1730 – 29 May 1784) was...,https://en.wikipedia.org/wiki/George_Barret_Sr.
4,5,Henry Ives Cobb Jr.,"Henry Ives Cobb Jr. (March 24, 1883 – August 1...",https://en.wikipedia.org/wiki/Henry_Ives_Cobb_Jr.


<div class="alert alert-info">
<b>Instruction:</b> How many painters are there in the dataset? Are there any duplicates?
</div>

In [48]:
# YOUR CODE HERE
painter_summaries_df["painter_name"].value_counts()

painter_name
Galli da Bibiena family    4
Walter Emerson Baum        2
Hristofor Žefarović        2
Giulio Clovio              2
Domenichino                2
                          ..
Giorgio de Chirico         1
Giorgio De Vincenzi        1
Giorgio Morandi            1
Giorgione                  1
Þórarinn B. Þorláksson     1
Name: count, Length: 3925, dtype: int64

### Cleaning the dataset 
<div class="alert alert-info">
<b>Instruction:</b> Create a new dataframe <strong>painter_summaries_clean</strong> that does not have duplicates based on the <em>painter_name</em> column.
</div>

In [49]:
# YOUR CODE HERE
# raise NotImplementedError()
painter_summaries_clean = painter_summaries_df.drop_duplicates(subset="painter_name", keep="first")
painter_summaries_clean["painter_name"].value_counts()

painter_name
Alfred Richard Gurrey Sr.    1
Mati Klarwein                1
Matija Jama                  1
Matsumura Goshun             1
Matsuno Chikanobu            1
                            ..
Giorgio Cavallon             1
Giorgio de Chirico           1
Giorgio De Vincenzi          1
Giorgio Morandi              1
Þórarinn B. Þorláksson       1
Name: count, Length: 3925, dtype: int64

Now that the dataset is duplicate free we can start working it for our analysis.

If you look at the data file in a spreadsheet program, you will notice that the summaries are of various lengths. Let's keep track of that somehow because we may want to filter later on.

In [50]:
def count_words(text):
    return len(text.split())

<div class="alert alert-info">
<b>Instruction:</b> Create a new column <em>summary_length</em> using the count_words() function.
</div>


In [51]:
painter_summaries_clean["summary_length"] = painter_summaries_clean["summary"].apply(count_words)

painter_summaries_clean.head(10)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  painter_summaries_clean["summary_length"] = painter_summaries_clean["summary"].apply(count_words)


Unnamed: 0,painter_id,painter_name,summary,url,summary_length
0,1,Alfred Richard Gurrey Sr.,Alfred Richard Gurrey Sr. (1852–1944) was an ...,https://en.wikipedia.org/wiki/Alfred_Richard_G...,164
1,2,Edward Otho Cresap Ord II,"Edward Otho Cresap Ord, II (November 9, 1858 –...",https://en.wikipedia.org/wiki/Edward_Otho_Cres...,84
2,3,George Barret Jr.,"George Barret Jr. (1767–1842), sometimes refer...",https://en.wikipedia.org/wiki/George_Barret_Jr.,27
3,4,George Barret Sr.,George Barret Sr. (c. 1730 – 29 May 1784) was...,https://en.wikipedia.org/wiki/George_Barret_Sr.,263
4,5,Henry Ives Cobb Jr.,"Henry Ives Cobb Jr. (March 24, 1883 – August 1...",https://en.wikipedia.org/wiki/Henry_Ives_Cobb_Jr.,71
5,6,John Byrne (English artist),John Byrne (1786–1847) was an English painter ...,https://en.wikipedia.org/wiki/John_Byrne_(Engl...,29
6,7,John Frederick Herring Jr.,John Frederick Herring Jr. (1820–1907) was an ...,https://en.wikipedia.org/wiki/John_Frederick_H...,17
7,8,John Frederick Herring Sr.,John Frederick Herring Sr. (12 September 1795 ...,https://en.wikipedia.org/wiki/John_Frederick_H...,61
8,9,A. B. Jackson (painter),"Alexander Brooks Jackson (April 18, 1925 – Mar...",https://en.wikipedia.org/wiki/A._B._Jackson_(p...,14
9,10,A. J. Casson,"Alfred Joseph Casson (May 17, 1898 – February...",https://en.wikipedia.org/wiki/A._J._Casson,66


We'll save the data as it is now and then we can work with these summaries.

In [52]:
painter_summaries_clean.to_csv("data/painter_summaries_clean.csv", index=False)


---

## 4.2 - Sentence Similarity

Let's take a step back and think about where we want to end up and where we are currently. Right now we have a dataset of biographies of different painters (with some differences in length). We want to end up with a visual of the painters clustered based on their biographies.

We could manually take each biography and interpret the text and try to group the painters ourselves. In some cases we might group painters by their nationality (e.g., Dutch painters), their style (e.g. Surrealist painters), their subject matter (e.g, still life painters), or the time period they lived in (e.g. Renaissance painters). 

<div class="alert alert-info">
<b>Instruction:</b> How many painter biographies would you go through before getting bored?
</div>

**@@YOUR ANSWER HERE@@**

We can use machine learning to assist us in clustering these biographies by comparing how similar or different the summaries are. This task is also known as Sentence Similarity and you can read more about it here: [https://huggingface.co/tasks/sentence-similarity](https://huggingface.co/tasks/sentence-similarity). 

For now we'll play a bit with the widget on the page. First let's get a series of painter summaries to work with. I picked names that might have some obvious groupings so we can do sanity checks as we work.

In [53]:
select_painter_names = [
    "Albrecht Dürer",
    "Leonardo da Vinci",
    "Michelangelo",
    "Raphael",
    "Titian",
    "Joaquín Sorolla",
    "Pablo Picasso",
    "Salvador Dalí",
    "Andy Warhol",
    "Vincent van Gogh",
    "Johannes Vermeer",
    "Sandro Botticelli",
    "Hokusai"
]

select_painter_bios = painter_summaries_clean[
    painter_summaries_clean["painter_name"].isin(select_painter_names)
]

# For this short dataset, we don't care about the other columns.
select_painter_bios = select_painter_bios[["painter_name", "summary"]]
select_painter_bios

Unnamed: 0,painter_name,summary
116,Albrecht Dürer,Albrecht Dürer (; German: [ˈʔalbʁɛçt ˈdyːʁɐ]; ...
255,Andy Warhol,Andy Warhol (; born Andrew Warhola Jr.; August...
1559,Hokusai,"Katsushika Hokusai (葛飾 北斎, c. 31 October 1760 ..."
1936,Joaquín Sorolla,Joaquín Sorolla y Bastida (Valencian: Joaquim ...
1975,Johannes Vermeer,"Johannes Vermeer (, Dutch: [vərˈmeːr], see bel..."
2375,Leonardo da Vinci,Leonardo di ser Piero da Vinci (15 April 1452 ...
2685,Michelangelo,Michelangelo di Lodovico Buonarroti Simoni (It...
2874,Pablo Picasso,Pablo Ruiz Picasso (25 October 1881 – 8 April ...
3062,Raphael,Raffaello Sanzio da Urbino (Italian: [raffaˈɛl...
3263,Salvador Dalí,Salvador Domingo Felipe Jacinto Dalí i Domènec...


<div class="alert alert-info">
<b>Instruction:</b> Cluster the 13 painters based on what you may know, can quickly read about them.
</div>

**@@YOUR ANSWER HERE@@**

Now let's play with the sentence similarity widget on Hugging face. For that we'll need the full summaries for each painter. I will save the previous table to a CSV for faster copy+paste, but you can also use the Python code under that to get the bios for a particular artist

In [54]:
select_painter_bios.to_csv("data/select_painter_bios.csv", index=False)

In [55]:
painter_name = "Vincent van Gogh"
select_painter_bios[select_painter_bios["painter_name"] == painter_name]["summary"].values[0]

"Vincent Willem van Gogh (Dutch: [ˈvɪnsɛnt ˈʋɪləɱ‿vɑŋ‿ˈɣɔx] ; 30 March 1853 – 29 July 1890) was a Dutch Post-Impressionist painter who is among the most famous and influential figures in the history of Western art. In just over a decade, he created approximately 2100 artworks, including around 860 oil paintings, most of them in the last two years of his life. His oeuvre includes landscapes, still lifes, portraits, and self-portraits, most of which are characterized by bold colors and dramatic brushwork that contributed to the rise of expressionism in modern art. Van Gogh's work was beginning to gain critical attention before he died from a self-inflicted gunshot at age 37. During his lifetime, only one of Van Gogh's paintings, The Red Vineyard, was sold. \r\nBorn into an upper-middle-class family, Van Gogh drew as a child and was serious, quiet and thoughtful, but showed signs of mental instability. As a young man, he worked as an art dealer, often travelling, but became depressed afte

<div class="alert alert-info">
<b>Instruction:</b> Pick 5 painters from our test set. Put their bios in the <a href="https://huggingface.co/tasks/sentence-similarity">Sentence Similarity demo</a> and write down the values. Then add your interpretation of the values. Are they high or low? Why might that be? Fill in the table below:
</div>

| painter_name  | similarity_score | Interpretation                                          |
|---------------|-----------------:|---------------------------------------------------------|
| SOURCE NAME |               -- | The first painter is the source and does not get a score |
| PAINTER NAME   |         Score @@ | Interpretation here @@                                  |
| PAINTER NAME   |         Score @@ | Interpretation here @@                                  |
| PAINTER NAME   |         Score @@ | Interpretation here @@                                  |
| PAINTER NAME   |         Score @@ | Interpretation here @@                                  |

This Sentence Similarity demo is quite cool. It takes each summary and converts it into an **embedding**, a numerical vector representation of the text that does a good job of capturing the semantics of the text. This is the part connected to machine learning. In the demo, the pre-trained model `all-MiniLM-L6-v2` is used to compute the embeddings. We'll work with this same model below.

Once all the embeddings are computed, then it's a math game. The demo takes the source embedding (whichever artist you introduced first) and compares that embedding with each of the other embeddings in pairs. For each pair that is compared, say *source_painter* and *painter_1*, it produces a score between 0 and 1, where 0 means there is no similarity, and 1 means they are identical. There are many ways to compute similarity and a popular one is Cosine Similarity. There is some info on the demo page linked above, but reproduced here:
>     The similarity of the embeddings is evaluated mainly on cosine similarity. It is calculated as the cosine of the angle between two vectors. It is particularly useful when your texts are not the same length

---

## 4.3 - Visualizing the `select_painter_bios`

### Create embeddings
The first step to being able to cluster and visualize the painters is to compute the embeddings. We will do this as an extra column in our dataframe of `select_painter_bios`

In [56]:
from sentence_transformers import SentenceTransformer

# Load the pre-trained model
model = SentenceTransformer("all-MiniLM-L6-v2")

select_painter_bios["embeddings"] = select_painter_bios["summary"].apply(
    lambda x: model.encode(x).tolist()
)
select_painter_bios



Unnamed: 0,painter_name,summary,embeddings
116,Albrecht Dürer,Albrecht Dürer (; German: [ˈʔalbʁɛçt ˈdyːʁɐ]; ...,"[-0.056997984647750854, 0.024535084143280983, ..."
255,Andy Warhol,Andy Warhol (; born Andrew Warhola Jr.; August...,"[0.0025709313340485096, -0.08942723274230957, ..."
1559,Hokusai,"Katsushika Hokusai (葛飾 北斎, c. 31 October 1760 ...","[-0.024148855358362198, 0.04442296922206879, -..."
1936,Joaquín Sorolla,Joaquín Sorolla y Bastida (Valencian: Joaquim ...,"[0.04648873955011368, 0.08807677775621414, -0...."
1975,Johannes Vermeer,"Johannes Vermeer (, Dutch: [vərˈmeːr], see bel...","[0.0292351171374321, 0.08469347655773163, -0.0..."
2375,Leonardo da Vinci,Leonardo di ser Piero da Vinci (15 April 1452 ...,"[-0.049784105271101, -0.0011410464067012072, 0..."
2685,Michelangelo,Michelangelo di Lodovico Buonarroti Simoni (It...,"[0.01565895415842533, 0.09450660645961761, 0.0..."
2874,Pablo Picasso,Pablo Ruiz Picasso (25 October 1881 – 8 April ...,"[-0.013010181486606598, 0.016082225367426872, ..."
3062,Raphael,Raffaello Sanzio da Urbino (Italian: [raffaˈɛl...,"[-0.04124732315540314, -0.0005537216202355921,..."
3263,Salvador Dalí,Salvador Domingo Felipe Jacinto Dalí i Domènec...,"[0.0810016617178917, -0.022114435210824013, -0..."


<div class="alert alert-info">
<b>Instruction:</b> Save the dataframe with the embeddings as <em>select_painter_embeddings.csv</em>
</div>

In [57]:
# YOUR CODE HERE
# raise NotImplementedError()
select_painter_bios.to_csv("data/select_painter_embeddings.csv", index=False)

We have created our embeddings using the specific model, `all-MiniLM-L6-v2`. That is one of many many models we can use. See a full list here: [https://huggingface.co/models?library=sentence-transformers&author=sentence-transformers](https://huggingface.co/models?library=sentence-transformers&author=sentence-transformers).

<div class="alert alert-info">
<b>Instruction:</b> Pick a model from the link above and create a new set of embeddings. Name that new column <em>embeddings2</em>. Fill in the table below, and save the data for that run as well.
</div>

| What model did you pick? | How long are the vectors? (read the description of the model) |
|--------------------------|---------------------------------------------------------------|
| all-MiniLM-L6-v2 @@REPLACE WITH YOUR CHOICE@@ | 384 @@REPLACE WITH THE VALUE RELATED TO YOUR CHOICE@@ |

In [79]:
# YOUR CODE HERE
# raise NotImplementedError()

# Load the pre-trained model
model = SentenceTransformer(
    "sentence-transformers/distilbert-multilingual-nli-stsb-quora-ranking"
)

select_painter_bios["embeddings2"] = select_painter_bios["summary"].apply(
    lambda x: model.encode(x).tolist()
)

select_painter_bios.to_csv("data/select_painter_embeddings2.csv", index=False)

select_painter_bios



Unnamed: 0,painter_name,summary,embeddings,embeddings2
116,Albrecht Dürer,Albrecht Dürer (; German: [ˈʔalbʁɛçt ˈdyːʁɐ]; ...,"[-0.056997984647750854, 0.024535084143280983, ...","[-0.028985479846596718, 0.30463314056396484, 0..."
255,Andy Warhol,Andy Warhol (; born Andrew Warhola Jr.; August...,"[0.0025709313340485096, -0.08942723274230957, ...","[0.21260204911231995, 0.24627937376499176, 0.0..."
1559,Hokusai,"Katsushika Hokusai (葛飾 北斎, c. 31 October 1760 ...","[-0.024148855358362198, 0.04442296922206879, -...","[-0.041256651282310486, 0.08839588612318039, 0..."
1936,Joaquín Sorolla,Joaquín Sorolla y Bastida (Valencian: Joaquim ...,"[0.04648873955011368, 0.08807677775621414, -0....","[-0.0851181149482727, 0.18426494300365448, 0.4..."
1975,Johannes Vermeer,"Johannes Vermeer (, Dutch: [vərˈmeːr], see bel...","[0.0292351171374321, 0.08469347655773163, -0.0...","[-0.021375127136707306, 0.16360527276992798, 0..."
2375,Leonardo da Vinci,Leonardo di ser Piero da Vinci (15 April 1452 ...,"[-0.049784105271101, -0.0011410464067012072, 0...","[0.13558390736579895, 0.034301578998565674, 0...."
2685,Michelangelo,Michelangelo di Lodovico Buonarroti Simoni (It...,"[0.01565895415842533, 0.09450660645961761, 0.0...","[0.1159994974732399, 0.23528455197811127, 0.54..."
2874,Pablo Picasso,Pablo Ruiz Picasso (25 October 1881 – 8 April ...,"[-0.013010181486606598, 0.016082225367426872, ...","[-0.10371260344982147, 0.4012528657913208, 0.2..."
3062,Raphael,Raffaello Sanzio da Urbino (Italian: [raffaˈɛl...,"[-0.04124732315540314, -0.0005537216202355921,...","[0.1412470042705536, 0.26519975066185, 0.48621..."
3263,Salvador Dalí,Salvador Domingo Felipe Jacinto Dalí i Domènec...,"[0.0810016617178917, -0.022114435210824013, -0...","[-0.04626660421490669, 0.2813034653663635, 0.3..."


### Reducing dimensions

Now that we have at least one set of embeddings, we can work to visualize them. This is the embedding using the first model for Vincent van Gogh:

In [80]:
van_gogh = select_painter_bios[select_painter_bios["painter_name"] == "Vincent van Gogh"]["embeddings"].values[0]

van_gogh

[0.08493974059820175,
 0.03074062615633011,
 0.01686738058924675,
 0.016839053481817245,
 0.05989311635494232,
 0.03570970147848129,
 0.03619306907057762,
 0.020847732201218605,
 -0.05887269973754883,
 -0.05923779308795929,
 -0.03598632290959358,
 -0.024541286751627922,
 0.015377012081444263,
 -0.00451229652389884,
 0.02243201620876789,
 0.06546244770288467,
 -0.011373251676559448,
 0.009499303065240383,
 -0.022144265472888947,
 0.01504893135279417,
 -0.06678536534309387,
 0.005186570808291435,
 -0.006639833562076092,
 -0.1190100833773613,
 0.016290709376335144,
 0.013186655938625336,
 0.037642695009708405,
 -0.11208225786685944,
 0.01721331477165222,
 0.01671992801129818,
 0.007216630503535271,
 0.03301443159580231,
 -0.07266112416982651,
 -0.016496745869517326,
 -0.020407019183039665,
 -0.031391970813274384,
 -0.035718824714422226,
 0.09690754860639572,
 -0.02765747159719467,
 0.048770707100629807,
 -0.014674311503767967,
 -0.02429278753697872,
 -0.11335866153240204,
 -0.024795353412

<div class="alert alert-info">
<b>Instruction:</b> How long is this vector?
</div>

In [81]:
# YOUR CODE HERE
# raise NotImplementedError()
len(van_gogh)

384

This embedding has 384 components. It will be very difficult to visualize all 384 dimensions of this vector directly in a way that is interpretable. We are better off if we can somehow get these 384 dimensions into 2 or 3 dimensions (using 1 dimension might be too simplistic). This process of taking a large number of dimensions and reducing them to less dimensions is also known as projection. 

The technique that we will use is called [UMAP](https://umap-learn.readthedocs.io/en/latest/), or Uniform Manifold Approximation and Projection for Dimension Reduction. There are others, like SNE and t-SNE that are worth looking into.

In [106]:
import umap

umap_model = umap.UMAP(n_components=2, n_neighbors=5, min_dist=0.3, metric="cosine")
embeddings = select_painter_bios["embeddings"].tolist()
embedded_data_2d = umap_model.fit_transform(embeddings)
embedded_data_2d

array([[-0.18798459,  6.4335604 ],
       [ 3.5670145 ,  4.1177673 ],
       [ 1.606992  ,  6.2584844 ],
       [ 1.969826  ,  3.0119884 ],
       [ 4.351476  ,  4.0956063 ],
       [ 0.7943661 ,  6.1044984 ],
       [ 1.8906181 ,  5.45726   ],
       [ 2.7382073 ,  3.7811651 ],
       [ 0.01221535,  5.3592668 ],
       [ 3.037905  ,  2.837429  ],
       [ 0.9158178 ,  4.9423156 ],
       [-0.7844418 ,  5.5186667 ],
       [ 4.0737176 ,  3.232379  ]], dtype=float32)

What we have done is use the UMAP technique to project all 384 dimensions of the original embedding into 2 dimensions that we can now visualize.

Each of the parameters in `umap.UMAP()` can affect our output:
* `n_components`: This parameter controls the dimensionality of the reduction. We set it to 2 because we want to end up with 2 components in the end (that we can visualize).
* `n_neighbors`: This parameter tweaks how UMAP balances local vs global patterns. Play around with this if your visualization later looks off.
* `min_dist`: This parameter controls how packed points can be. 

You can read more about these parameters, and see some visuals of how they affect the output at the UMAP website [here](https://umap-learn.readthedocs.io/en/latest/parameters.html).

Let's add those dimensions to our dataframe. We'll name these new columns `umap1_x` and `umap1_y` because we're using the first set of embeddings that were created using the `all-MiniLM-L6-v2` model.

In [107]:
select_painter_bios["umap1_x"] = embedded_data_2d[:, 0]
select_painter_bios["umap1_y"] = embedded_data_2d[:, 1]

### Scatterplot visualization

Now that we have reduced the 384-component long embeddings to 2 dimensions. Let's visualize them using a scatterplot.

In [110]:
import plotly.express as px

# Create a scatter plot with Plotly
fig = px.scatter(select_painter_bios, x="umap1_x", y="umap1_y", hover_data=["painter_name"], width=800, height=800)

# Show the plot
fig.show()


<div class="alert alert-info">
<b>Instruction:</b> How do you interpret your figure?
</div>

@@ YOUR ANSWER HERE@@

---

## 4.5 - Visualizing the embeddings for the model you chose

Now that you have visualized the first embedding using the `all-MiniLM-L6-v2` model, do it for the model you chose. Feel free to reuse code that is above, but be sure to write comments and notes explaining your process.

In [None]:
# YOUR CODE HERE
# raise NotImplementedError()

<div class="alert alert-info">
<b>Instruction:</b> Use the space below to interpret the final visualization of your embedding. How does it compare with the previous visual?
</div>

**@@YOUR ANSWER HERE@@**

---

## 4.6 - Visualizing the entire dataset

Taking all the tools from above, visualize the entire dataset of artists. The code may take longer to run, but the process is still the same.

1. Compute embeddings using one of the models from [this page](https://huggingface.co/models?library=sentence-transformers&author=sentence-transformers)
1. Reduce dimensions using UMAP
1. Plot the result.



In [111]:
# YOUR CODE HERE
# raise NotImplementedError()