# <center>Critical AI</center>
<center>ENGL 54.41</center>
<center>Dartmouth College</center>
<center>Winter 2026</center>
<pre>Created: 08/23/2019; Revised: 01/02/2026</pre>

## Part I: Datatypes, Intro to Python

In [None]:
# This is Jupyter code cell. This line is a comment; 
# comments are not executed by the interpreter.
 
# Here we are assigning a variable 'title' the value of 'Critical AI',
# this will automatically make 'title' a String.
title = 'Critical AI'

In [None]:
# To see the type assigned, we can use the type() function:
type(title)

In [None]:
# To see or display the value of a String (especially a short one), we can use the print() function:
print(title)

In [None]:
# To learn what else you can do with strings, you can execute help(str) in this cell.
# We are going to move on quickly now to learn something about lists. A list is a 
# collection of items. 

## Part II: Vectors, Matrices, and Tensors

In [None]:
# Python supports a very large number of libraries that can be loaded
# or imported as needed. We only import the libraries that we need in 
# order to reduce the memory requirements and (possibly) prevent 
# collisions in the namespace used by various functions.

import numpy as np
import torch

In [None]:
# Pytorch (torch) is a very popular library for building neural networks and 
# for deep learning. https://pytorch.org/
# 
# It is the industry standard for working with the kinds of AI & ML technologies
# that we will be studying this year. 
# 
# It introduces a new datatype called a tensor. Tensors are similar to the
# arrays and matrices used by NumPy (numpy) but are designed to run on faster
# processing devices called GPU (graphics processing units) that we will be
# hopefully using later this term. They also can keep a record, some history,
# of the transformations that created them. 

![Pytorch Tensor Types](../img/pytorch-tensor-types.png)

Pytorch Tensor Types from Eli Stevens et al. *Deep Learning with Pytorch* (Manning, 2020)

## Vectors and Vectorization

The sociologist Adrian Mackenzie writes in [*Machine Learners: Archaeology of a Data Practice*](https://mitpress.mit.edu/author/adrian-mackenzie-8915/) (MIT Press, 2017) of the function of vectorization as a remapping of space:

>"Machine learning locates data practice in an expanding epistemic space. The expansion
derives, I will suggest, from a specific operational diagram that maps data into a vector
space. It vectorizes data according to axes, coordinates, and scales. Machine learners, in
turn, inhabit a vectorized space, and their operations vectorize data...Often data are represented as a homogenous set of numbers or a continuous flowing stream. We need, however, to archaeologically examine some of the transformations that allow different shapes and densities of data, whether in the form of numbers,
words, or images, to become machine learnable. Data in their local complexes space
out in many different density shapes, depending on how the changes, signals, propensities,
and norms have been generated or configured."

In [None]:
# A vector is typically thought of as a single dimension list of values, not unlike a list.
#
# This variable is list of floating point numbers. What's a floating point number? It's a numerical value 
# with greater precision than a integer (i.e., the int 4 vs. the float 4.3). 
vec = [4.3, 3.0, 1.1, 0.1]

In [None]:
# If we try perform an operation on the list (on every element of the list) we will 
# most likely not get the result we want:
vec * 3

In [None]:
# Converting this list to an array (and treating as a vector) enables us to apply a 
# transformation to the entire vector at once:
vec = np.array(vec)
vec * 3

In [None]:
# We can do other fun stuff with this vector. For example, here are some basic 
# summary statistics:
vec.mean()

In [None]:
vec.min()

In [None]:
vec.max()

In [None]:
# Now display all these at once:
vec_mean = vec.mean()
vec_min = vec.min()
vec_max = vec.max()
print(f'mean: {vec_mean}, min: {vec_min}, max: {vec_max}')

In [None]:
# We are going to now use Pytorch tensors rather than numpy arrays. This is because
# this datatype is especially good for the sort of work we are going to do in
# Critical AI.

vec1 = torch.tensor([4.3,3.0,1.1,0.1])
vec2 = torch.tensor([6.3,2.8,5.1,1.5])

In [None]:
# Let's say that these are representations of two kinds of flowers (because they are).
# The values, let's call them features, are measures of the length and width of two 
# types of flower appendages (sepal and petal). 
#
# How might we answer the question of how similar are these two flowers?
#
# One way might be to find the difference across all four feature dimensions. We can
# take the absolute value of that difference to get a sense of how similar these two
# samples are to each other:
torch.abs(vec1 - vec2)

In [None]:
# We can also combine these 1D vectors into a 2D (tensor) matrix:
matrix = torch.vstack([vec1,vec2])
matrix

In [None]:
# Tell us about this matrix--what is it shape? How many rows and columns do we have?
matrix.shape

In [None]:
# display mean values across all four feature dimensions of the 
matrix.mean(axis = 0)

In [None]:
# display standard deviation values across all four feature dimensions of the 
matrix.std(axis = 0)

## A Better Way: The Distance Matrix

Reconceptualizing our data as features in a standardize space (via vectorization) allows us to measure distances between points, where each point is a multidimensional value.

In [None]:
# Euclidean distance is a measurement of a straight line between two points
# https://en.wikipedia.org/wiki/Euclidean_distance
from sklearn.metrics import euclidean_distances

# Cosine similarity is a measurement of the angle between two vectors
# https://en.wikipedia.org/wiki/Cosine_similarity
from sklearn.metrics.pairwise import cosine_similarity

In [None]:
# This will look a bit funny, but here we are measuring the distance
# on a straight line between 4 and 10. We are seeing these distances
# as pairs. The first row displays the distance betwen 4 and 4
# and then 4 and 10. The second begins with the later and then the former.
#
# This is a know as the distance matrix. As we add values, we can compare 
# distances among all the rows and columns. The top and bottom triangle 
# separated by the  diagonal measuring the distance between each item to 
# itself, thus all zeros
#
euclidean_distances([[4],[10]])

In [None]:
# Here is the distance matrix for our separate vectors:
euclidean_distances([vec1,vec2])

In [None]:
# Now processing the matrix composed of those two stacked vectors:
euclidean_distances(matrix)

In [None]:
# Observe the differences with cosine similarity:
cosine_similarity([vec1,vec2])

In [None]:
cosine_similarity(matrix)

In [None]:
# we'll create a larger matrix now. These are the 
# first two samples from three different classes
# of Iris flowers (setosa, versicolor, virginica)
# from the Fisher dataset.
iris_matrix = [[5.1,3.5,1.4,0.2],
               [4.9,3.0,1.4,0.2],
               [7.0,3.2,4.7,1.4],
               [6.4,3.2,4.5,1.5],
               [6.3,3.3,6.0,2.5],
               [5.8,2.7,5.1,1.9]]

In [None]:
dist = euclidean_distances(iris_matrix)
dist

In [None]:
# But there are better ways to view this!

# import what we need to visualize
import matplotlib.pyplot as plt
%matplotlib inline

# show it!
plt.imshow(dist)
plt.show()

In [None]:
# We'll create a cosine DISimilarity plot by subtracting from 1
# (making similarity items closer to 0 rather than 1) -- this will
# make it comparable to the euclidean data above.

dist = 1 - cosine_similarity(iris_matrix)

# show it!
plt.imshow(dist)
plt.show()