<a href="https://colab.research.google.com/github/karthik111/AnomalyDetectionCVPR2018/blob/master/notebooks/Visualizing_Text_Embeddings.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In this notebook, we understand the intuition behind text embeddings, what use cases are they good for, and how we can customize them via finetuning.

Read the accompanying [blog post here](https://txt.cohere.ai/text-embeddings/).

In [24]:
! pip install cohere altair -q

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m48.2/48.2 kB[0m [31m971.3 kB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.7/2.7 MB[0m [31m27.7 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
llmx 0.0.15a0 requires openai, which is not installed.
llmx 0.0.15a0 requires tiktoken, which is not installed.[0m[31m
[0m

In [25]:
import cohere
import pandas as pd
import numpy as np
import altair as alt

api_key = '22GKhTy4iMnY3iw1MNUjzqOBsFLQow31t6eNQi7s' # Paste your API key here. Remember to not share it publicly
co = cohere.Client(api_key)

In [26]:
# Load the dataset to a dataframe
df_orig = pd.read_csv('https://raw.githubusercontent.com/cohere-ai/notebooks/main/notebooks/data/atis_intents_train.csv',names=['intent','query'])

# Take a small sample for illustration purposes
sample_classes = ['atis_airfare', 'atis_airline', 'atis_ground_service']
df = df_orig.sample(frac=0.12, random_state=30)
df = df[df.intent.isin(sample_classes)]
df_orig = df_orig.drop(df.index)
df.reset_index(drop=True,inplace=True)

# Remove unnecessary column
intents = df['intent'] #save for a later need
df.drop(columns=['intent'], inplace=True)
df.head()

Unnamed: 0,query
0,which airlines fly from boston to washington ...
1,show me the airlines that fly between toronto...
2,show me round trip first class tickets from n...
3,i'd like the lowest fare from denver to pitts...
4,show me a list of ground transportation at bo...


# 1. Intuition

When you hear about large language models (LLM), probably the first thing that comes to mind is the text generation capability, such as writing an essay or creating a marketing copy.

But another thing you can get is text representation: a set of numbers that represent what the text means, and somehow capture the semantics of the text. These numbers are called text embeddings.

![Comparing text generation and text representation](https://github.com/cohere-ai/notebooks/raw/main/notebooks/images/vis-embeds/1-text-gen-rep.png)

## 1.1 - Turn text into embeddings

In [27]:
# Get text embeddings
def get_embeddings(texts, model='embed-english-v3.0', input_type="search_document"):
  output = co.embed(
                model=model,
                input_type=input_type,
                texts=texts)
  return output.embeddings

In [28]:
# Embed the dataset
df['query_embeds'] = get_embeddings(df['query'].tolist())
df.head()

Unnamed: 0,query,query_embeds
0,which airlines fly from boston to washington ...,"[0.026550293, 0.012084961, -0.00881958, 0.0113..."
1,show me the airlines that fly between toronto...,"[0.013084412, 0.01776123, -0.014343262, -0.003..."
2,show me round trip first class tickets from n...,"[0.02053833, -0.038482666, 0.061523438, 0.0099..."
3,i'd like the lowest fare from denver to pitts...,"[0.0016889572, 0.015411377, -0.029052734, 0.03..."
4,show me a list of ground transportation at bo...,"[0.03793335, -0.008010864, -0.002319336, -0.01..."


## 1.2 - Visualize embeddings on a heatmap

In [29]:
# Reduce dimensionality using PCA
from sklearn.decomposition import PCA

# Function to return the principal components
def get_pc(arr,n):
  pca = PCA(n_components=n)
  embeds_transform = pca.fit_transform(arr)
  return embeds_transform

In [30]:
# Reduce embeddings to 10 principal components to aid visualization
embeds = np.array(df['query_embeds'].tolist())
embeds_pc = get_pc(embeds,10)

In [31]:
# Set sample size to visualize
sample = 9

# Reshape the data for visualization purposes
source = pd.DataFrame(embeds_pc)[:sample]
source = pd.concat([source,df['query']], axis=1)
source = source.melt(id_vars=['query'])

# Configure the plot
chart = alt.Chart(source).mark_rect().encode(
    x=alt.X('variable:N', title="Embedding"),
    y=alt.Y('query:N', title='',axis=alt.Axis(labelLimit=500)),
    color=alt.Color('value:Q', title="Value", scale=alt.Scale(
                range=["#917EF3", "#000000"]))
)

result = chart.configure(background='#ffffff'
        ).properties(
        width=700,
        height=400,
        title='Embeddings with 10 dimensions'
       ).configure_axis(
      labelFontSize=15,
      titleFontSize=12)

# Show the plot
result

Notice the 3 inquiries about ground transportation in Boston - their embeddings patterns are very similar, and at the same time are distinctive from the rest.

## 1.3 - Visualize embeddings on a 2D plot

In [32]:
# Function to generate the 2D plot
def generate_chart(df,xcol,ycol,lbl='on',color='basic',title=''):
  chart = alt.Chart(df).mark_circle(size=500).encode(
    x=
    alt.X(xcol,
        scale=alt.Scale(zero=False),
        axis=alt.Axis(labels=False, ticks=False, domain=False)
    ),

    y=
    alt.Y(ycol,
        scale=alt.Scale(zero=False),
        axis=alt.Axis(labels=False, ticks=False, domain=False)
    ),

    color= alt.value('#333293') if color == 'basic' else color,
    tooltip=['query']
    )

  if lbl == 'on':
    text = chart.mark_text(align='left', baseline='middle',dx=15, size=13,color='black').encode(text='query', color= alt.value('black'))
  else:
    text = chart.mark_text(align='left', baseline='middle',dx=10).encode()

  result = (chart + text).configure(background="#FDF7F0"
        ).properties(
        width=800,
        height=500,
        title=title
       ).configure_legend(
  orient='bottom', titleFontSize=18,labelFontSize=18)

  return result

In [35]:
# Reduce embeddings to 2 principal components to aid visualization
embeds_pc2 = get_pc(embeds,2)

# Add the principal components to dataframe
df_pc2 = pd.concat([df, pd.DataFrame(embeds_pc2)], axis=1)

# Plot the 2D embeddings on a chart
df_pc2.columns = df_pc2.columns.astype(str)
generate_chart(df_pc2.iloc[:sample],'0','1',title='2D Embeddings')

In [36]:
df_pc2.head()

Unnamed: 0,query,query_embeds,0,1
0,which airlines fly from boston to washington ...,"[0.026550293, 0.012084961, -0.00881958, 0.0113...",-0.097359,0.293267
1,show me the airlines that fly between toronto...,"[0.013084412, 0.01776123, -0.014343262, -0.003...",0.015971,0.260222
2,show me round trip first class tickets from n...,"[0.02053833, -0.038482666, 0.061523438, 0.0099...",-0.202197,-0.118116
3,i'd like the lowest fare from denver to pitts...,"[0.0016889572, 0.015411377, -0.029052734, 0.03...",-0.086202,-0.209349
4,show me a list of ground transportation at bo...,"[0.03793335, -0.008010864, -0.002319336, -0.01...",0.298315,0.231221


Here texts of similar meaning are located close together. We see inquiries about tickets on the left, inquiries about airlines somewhere around the middle, and inquiries about ground transportation on the top right.

# 2. Use Cases

## 2.1 - Semantic Search




Semantic, or similarity search, that can surface results based on the context or semantic meaning of a query instead of purely keyword-matching.

In [37]:
# Calculate cosine similarity between the search query and existing queries

from sklearn.metrics.pairwise import cosine_similarity

def get_similarity(target,candidates):
  # Turn list into array
  candidates = np.array(candidates)
  target = np.expand_dims(np.array(target),axis=0)

  # Calculate cosine similarity
  sim = cosine_similarity(target,candidates)
  sim = np.squeeze(sim).tolist()
  sort_index = np.argsort(sim)[::-1]
  sort_score = [sim[i] for i in sort_index]
  similarity_scores = zip(sort_index,sort_score)

  # Return similarity scores
  return similarity_scores


In [38]:
# Add new query
new_query = "show business fares"

# Get embeddings of the new query
new_query_embeds = get_embeddings([new_query], input_type="search_query")[0]

In [39]:
# Get the similarity between the search query and existing queries
similarity = get_similarity(new_query_embeds,embeds[:sample])

# View the top 5 articles
print('Query:')
print(new_query,'\n')

print('Similar queries:')
for idx,sim in similarity:
  print(f'Similarity: {sim:.2f};',df.iloc[idx]['query'])

Query:
show business fares 

Similar queries:
Similarity: 0.28;  show me round trip first class tickets from new york to miami
Similarity: 0.24;  show me boston ground transportation
Similarity: 0.23;  i'd like the lowest fare from denver to pittsburgh
Similarity: 0.21;  which airlines fly from boston to washington dc via other cities
Similarity: 0.21;  i would like your rates between atlanta and boston on september third
Similarity: 0.20;  what ground transportation is available in boston
Similarity: 0.20;  of all airlines which airline has the most arrivals in atlanta
Similarity: 0.20;  show me the airlines that fly between toronto and denver
Similarity: 0.19;  show me a list of ground transportation at boston airport


The top-ranked FAQ we get is an inquiry about first-class tickets, which is very relevant considering the other options. Notice that it doesn’t contain the keyword “business” and nor does the search query contain the keyword “class”. But their meanings turn out to be the most similar compared to the rest and are captured in their embeddings.

### Plot the new query and existing queries on a chart

In [42]:
# Create new dataframe and append new query
df_sem = df.copy()
df_sem.loc[len(df_sem.index)] = [new_query, new_query_embeds]

# Reduce embeddings dimension to 2
embeds_sem = np.array(df_sem['query_embeds'].tolist())
embeds_sem_pc2 = get_pc(embeds_sem,2)

# Add the principal components to dataframe
df_sem_pc2 = pd.concat([df_sem, pd.DataFrame(embeds_sem_pc2)], axis=1)

In [43]:
# Create column for representing chart legend
df_sem_pc2['Source'] = 'Existing'
df_sem_pc2.at[len(df_sem_pc2)-1, 'Source'] = "New"

# Plot on a chart
df_sem_pc2.columns = df_sem_pc2.columns.astype(str)
selection = list(range(sample)) + [-1]
generate_chart(df_sem_pc2.iloc[selection],'0','1',color='Source',title='Semantic Search')

On a plot, we see that the new query is located closest to the FAQ about first-class tickets.

# 2.2 - Clustering


Clustering is a process of grouping similar documents into clusters. It is used to organize a large number of documents into a smaller number of groups and lets us discover emerging patterns in the documents.

In [44]:
from sklearn.cluster import KMeans

# Embed the text for clustering
df['clustering_embeds'] = get_embeddings(df['query'].tolist(), input_type="clustering")
embeds = np.array(df['clustering_embeds'].tolist())

# Pick the number of clusters
df_clust = df_pc2.copy()
n_clusters=2

# Cluster the embeddings
kmeans_model = KMeans(n_clusters=n_clusters, random_state=0)
classes = kmeans_model.fit_predict(embeds).tolist()
df_clust['cluster'] = (list(map(str,classes)))

# Plot on a chart
df_clust.columns = df_clust.columns.astype(str)
generate_chart(df_clust.iloc[:sample],'0','1',lbl='on',color='cluster',title='Clustering with 2 Clusters')



When specified with 2 clusters to group the documents by, the algorithm looks to be spot on, where it generates one cluster related to airline information and one cluster related to ground service information.

## 2.3 - Classification

While clustering is an unsupervised learning algorithm where we don’t know the number of classes and what they are, classification is a supervised learning algorithm where we do know them.

In [45]:
# Bring back the 'intent' column so we can build the classifier
df_class = df_pc2.copy()
df_class['intent'] = intents

# Use the remaining dataset as training data
df_test = df_class[:sample]
df_train = df_class[sample:]

# Reset the index of the slices
df_test = df_test.reset_index(drop=True)
df_train = df_train.reset_index(drop=True)

df_test = df_test.drop('query_embeds', axis=1)
df_train = df_train.drop('query_embeds', axis=1)

In [46]:
# Embed the text for clustering
df_train['classification_embeds'] = get_embeddings(df_train['query'].tolist(), input_type="classification")
# embeds = np.array(df_train['classification_embeds'].tolist())

In [47]:
# Train the classifier with Support Vector Machine (SVM) algorithm

# import SVM classifier code
from sklearn.svm import SVC
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler


# Initialize the classifier
svm_classifier = make_pipeline(StandardScaler(), SVC())

# Prepare the training features and label
features = df_train['classification_embeds'].tolist()
label = df_train['intent']

# Fit the support vector machine
svm_classifier.fit(features, label)

In [48]:
# Predict with test data

# Prepare the test inputs
# df_test = df_test.copy()
df_test['classification_embeds'] = get_embeddings(df_test['query'].tolist(), input_type="classification")
inputs = df_test['classification_embeds'].tolist()

# Predict the labels
df_test['intent_pred'] = svm_classifier.predict(inputs)

# Compute the score
score = svm_classifier.score(inputs, df_test['intent'])
print(f"Prediction accuracy is {100*score}%")

Prediction accuracy is 100.0%


In [49]:
# Plot the predicted classes
df_test.columns = df_test.columns.astype(str)
generate_chart(df_test,'0','1',lbl='off',color='intent_pred',title='Classification - Prediction')

In [50]:
# Plot the actual classes
generate_chart(df_test,'0','1',lbl='off',color='intent',title='Classification - Actual')

The two plots above show that all predictions (each class is represented by one color) match the actual classes.

# 3. Finetuning

In practical applications, you will likely need to customize the model to your task, and in particular, the kind of data you are dealing with.

This is where finetuning comes in. A baseline model already comes pre-trained with a huge amount of text data. But finetuning can further build on that by taking in and adapting to your own data.

The result is a custom model that produces outputs that are more attuned to the task you have at hand.

In [51]:
# The finetuned model ID
atis_ft_v1 = "ccc2a8dd-bac5-4482-8d5e-ddf19e847823-ft" # Replace with your own model ID

In [52]:
# Embed the dataset - use the finetuned model this time
df_ft = df.copy()
df_ft['intent'] = intents

df_ft['query_embeds'] = get_embeddings(df_ft['query'].tolist(), model=atis_ft_v1, input_type="classification")

# Reduce embeddings to 2 dimensions
embeds_ft = np.array(df_ft['query_embeds'].tolist())
embeds_ft_pc2 = get_pc(embeds_ft,2)

# Plot the 2D embeddings from a finetuned model
df_ft_pc2 = pd.concat([df_ft, pd.DataFrame(embeds_ft_pc2)], axis=1)
df_ft_pc2.columns = df_ft_pc2.columns.astype(str)
generate_chart(df_ft_pc2.iloc[:sample],'0','1',lbl='off',color='intent', title='Finetuned model')

CohereAPIError: ignored

In [None]:
# Plot the 2D embeddings from a non-finetuned model
generate_chart(df_test,'0','1',lbl='off',color='intent',title='Non-finetuned model')

Referring to the two plots above:
- With a baseline (non-finetuned) model, which is what we’ve been using before (first plot), we can already get a good separation between classes, which shows that it can perform well in this task.
- But with a finetuned model (second plot), the separation becomes even more apparent. Similar data points are now pushed even closer together and further apart from the rest. This indicates that the model has adapted to the additional data it receives during finetuning, hence is more likely to perform even better in this task.

In [1]:
! pip install "unstructured[ppt]"

Collecting unstructured[ppt]
  Downloading unstructured-0.10.30-py3-none-any.whl (1.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m9.5 MB/s[0m eta [36m0:00:00[0m
Collecting filetype (from unstructured[ppt])
  Downloading filetype-1.2.0-py2.py3-none-any.whl (19 kB)
Collecting python-magic (from unstructured[ppt])
  Downloading python_magic-0.4.27-py2.py3-none-any.whl (13 kB)
Collecting emoji (from unstructured[ppt])
  Downloading emoji-2.8.0-py2.py3-none-any.whl (358 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m358.9/358.9 kB[0m [31m12.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting dataclasses-json (from unstructured[ppt])
  Downloading dataclasses_json-0.6.2-py3-none-any.whl (28 kB)
Collecting python-iso639 (from unstructured[ppt])
  Downloading python_iso639-2023.6.15-py3-none-any.whl (275 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m275.1/275.1 kB[0m [31m19.8 MB/s[0m eta [36m0:00:00[0

In [2]:
pip install "unstructured[gcs]"

Collecting fsspec==2023.9.1 (from unstructured[gcs])
  Downloading fsspec-2023.9.1-py3-none-any.whl (173 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m173.4/173.4 kB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting bs4 (from unstructured[gcs])
  Downloading bs4-0.0.1.tar.gz (1.1 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
INFO: pip is looking at multiple versions of gcsfs to determine which version is compatible with other requirements. This could take a while.
Collecting gcsfs (from unstructured[gcs])
  Downloading gcsfs-2023.10.0-py2.py3-none-any.whl (33 kB)
  Downloading gcsfs-2023.9.2-py2.py3-none-any.whl (33 kB)
  Downloading gcsfs-2023.9.1-py2.py3-none-any.whl (33 kB)
Building wheels for collected packages: bs4
  Building wheel for bs4 (setup.py) ... [?25l[?25hdone
  Created wheel for bs4: filename=bs4-0.0.1-py3-none-any.whl size=1257 sha256=09260ffa77840b18826d320db432ed13356e5de0bcac6bbe52ab692c8a496379
  Stored in directory: /roo

In [1]:
from unstructured.partition.auto import partition
elements = partition(filename="Arts-Science-Viewbook-2023-24.pdf")

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


In [6]:
pip install "unstructured[pdf]"

Collecting onnx (from unstructured[pdf])
  Downloading onnx-1.15.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (15.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.7/15.7 MB[0m [31m52.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting pdf2image (from unstructured[pdf])
  Downloading pdf2image-1.16.3-py3-none-any.whl (11 kB)
Collecting pdfminer.six (from unstructured[pdf])
  Downloading pdfminer.six-20221105-py3-none-any.whl (5.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.6/5.6 MB[0m [31m68.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting unstructured-inference==0.7.11 (from unstructured[pdf])
  Downloading unstructured_inference-0.7.11-py3-none-any.whl (62 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.0/62.0 kB[0m [31m7.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting unstructured.pytesseract>=0.3.12 (from unstructured[pdf])
  Downloading unstructured.pytesseract-0.3.12-py3-none-any.whl (14 k

In [4]:
from google.colab import files
uploaded = files.upload()

Saving Arts-Science-Viewbook-2023-24.pdf to Arts-Science-Viewbook-2023-24.pdf


In [2]:
elements

[<unstructured.documents.elements.NarrativeText at 0x7b38d8c56d70>,
 <unstructured.documents.elements.Title at 0x7b38d8c56b60>,
 <unstructured.documents.elements.Text at 0x7b38d8c57850>,
 <unstructured.documents.elements.NarrativeText at 0x7b38d9582260>,
 <unstructured.documents.elements.Title at 0x7b38d8cd4880>,
 <unstructured.documents.elements.Title at 0x7b38d67e0070>,
 <unstructured.documents.elements.Title at 0x7b38d8cd7f40>,
 <unstructured.documents.elements.Text at 0x7b38d8cd4040>,
 <unstructured.documents.elements.Text at 0x7b38d8cd75b0>,
 <unstructured.documents.elements.Text at 0x7b38d8cd7a90>,
 <unstructured.documents.elements.Text at 0x7b38d8cd4160>,
 <unstructured.documents.elements.Title at 0x7b38d8cd76a0>,
 <unstructured.documents.elements.Title at 0x7b38d8cd7a60>,
 <unstructured.documents.elements.Title at 0x7b38d8cd7ee0>,
 <unstructured.documents.elements.Title at 0x7b38d8cd78e0>,
 <unstructured.documents.elements.Title at 0x7b38d8cd40a0>,
 <unstructured.documents.elem

In [6]:
from unstructured.documents.elements import NarrativeText
from unstructured.partition.text_type import sentence_count

for element in elements:
    if isinstance(element, NarrativeText) and sentence_count(element.text) > 2:
        print(element)
        print("\n")

The day I picked U of T was when I did the campus tour. I immediately fell in love with the campus and the people. I felt very at home, and I knew that would be critical to enjoying my university experience.” Syeda conflict & justice, minor in creative expression & society


I can’t imagine my university experience without a college. I spend all of my time outside of class at my college, and that’s where I made all of my friends. U of T is a big place; having a college gives me my own little corner.”


When you join Arts & Science, you will be assigned to a college — a close- knit community with support services and social events to make university an unforgettable time. On your U of T application, you can indicate your college preferences, or you can select “no preference” — there’s no right way to choose. Each college is special and will support you throughout your time at U of T. Within each college you will find:


What Interests You? On your Arts & Science application, you will se

In [10]:
from unstructured.documents.elements import NarrativeText
from unstructured.partition.text_type import sentence_count

for element in elements:
    print(type(element))

<class 'unstructured.documents.elements.NarrativeText'>
<class 'unstructured.documents.elements.Title'>
<class 'unstructured.documents.elements.Text'>
<class 'unstructured.documents.elements.NarrativeText'>
<class 'unstructured.documents.elements.Title'>
<class 'unstructured.documents.elements.Title'>
<class 'unstructured.documents.elements.Title'>
<class 'unstructured.documents.elements.Text'>
<class 'unstructured.documents.elements.Text'>
<class 'unstructured.documents.elements.Text'>
<class 'unstructured.documents.elements.Text'>
<class 'unstructured.documents.elements.Title'>
<class 'unstructured.documents.elements.Title'>
<class 'unstructured.documents.elements.Title'>
<class 'unstructured.documents.elements.Title'>
<class 'unstructured.documents.elements.Title'>
<class 'unstructured.documents.elements.Title'>
<class 'unstructured.documents.elements.Title'>
<class 'unstructured.documents.elements.Title'>
<class 'unstructured.documents.elements.Title'>
<class 'unstructured.document

In [9]:
type(element)

unstructured.documents.elements.Title

In [21]:
from unstructured.documents.elements import ListItem
from unstructured.partition.text_type import sentence_count

for element in elements:
        print(str(type(element).__name__) + " --- " + str(element))
        print("\n")

NarrativeText --- There is so much here for you


Title --- Guide to Arts & Science St. George Campus


Text --- 2024


NarrativeText --- Arts & Science students are surrounded by opportunity from the moment they arrive on campus — intellectual, entrepreneurial, creative, global, professional and research experiences that set them in motion toward lifelong goals and extraordinary achievements.


Title --- Guide to Arts & Science 2024


Title --- The Best Education


Title --- TOP 10


Text --- 340+


Text --- #1


Text --- #1


Text --- 276K


Title --- university in Canada


Title --- in Canada for graduate employability


Title --- research organizations in the world


Title --- undergraduate programs


Title --- alumni in more than 190 countries


Title --- — Rankings, 2024


Title --- QS World University


Title --- — Times Higher Education Global University Employability Ranking, 2022


Title --- — Top 100 Global Innovators, 2023


NarrativeText --- e c n e i c S & s t r A f o y t