# Modalities of Data

Modality is a term used to describe the way data is represented. For example, a single image is a 2D array of pixels. A video is a sequence of images. A sound is a 1D array of samples. A text is a sequence of characters.

There are too many modalities to list them all. Each modality has its own characteristics and requires different methods to process and analyze. Each data modality has its own area of research and expertise. For example, computer vision is the area of research that deals with images and videos. Speech processing is the area of research that deals with sounds. Natural language processing is the area of research that deals with text.

In this section, we will discuss the most common modalities of data and how to work with them in Python.

The image below shows a 2D array of pixels. Each pixel is a 3D vector of RGB values. The image is 3 pixels wide and 2 pixels high. The first row of pixels is red, green, and blue. The second row of pixels is black, white, and black.

There are too many data modalities to enumerate here. 

We are going to focus on just tabular data. Mathematically, there would be matrix representations.

Here are a few covered: 

1. **Text**: Natural Language Processing, Computational Linguistics
2. **Images**: Computer Vision, Digital Image Processing
3. **Sounds**: Digital Signal Processing (DSP)
4. **Graphs**: Graph Theory, Network Theory
5. **Time Series**: Time Series Analysis
6. **Geographic**: Geographic Information Systems (GIS), Spatial Computing.

Some common modalities are:



There are many different ways to represent data. In this section, we will discuss the most common modalities of data and how to work with them in Python.

# Text data

## Bag of Words

Bag of Words is a method to represent text data when modeling text with machine learning algorithms. It is a way of extracting features from the text for use in machine learning algorithms.

### Bag of Words Process

1. Collect Data
2. Tokenize Data
3. Count Tokens
4. Create Vocab
5. Create Vectors

### Bag of Words Example

#### Collect Data

```python
# define documents
docs = ['Well done!',
        'Good work',
        'Great effort',
        'nice work',
        'Excellent!']
```

#### Tokenize Data

```python
# split into tokens by white space
tokens = [d.split() for d in docs]
print(tokens)
```

    [['Well', 'done!'], ['Good', 'work'], ['Great', 'effort'], ['nice', 'work'], ['Excellent!']]

#### Count Tokens
    
    ```python
    # count the tokens
    from collections import Counter
    counts = Counter()
    for d in tokens:
        counts.update(d)
    print(counts)
    ```

    Counter({'work': 2, 'Well': 1, 'done!': 1, 'Good': 1, 'Great': 1, 'effort': 1, 'nice': 1, 'Excellent!': 1})

#### Create Vocab
    
    ```python
    # create a vocabulary of unique words
    vocabulary = set()
    for d in tokens:
        vocabulary.update(d)
    print(vocabulary)
    ```

    {'effort', 'done!', 'Excellent!', 'Good', 'Great', 'work', 'Well', 'nice'}

<!-- ## TFIDF -->

# Graphs 

Graphs are a data structure that consists of a set of nodes (vertices) and a set of edges that relate the nodes to each other. The set of edges describes relationships among the vertices. Graphs are used to model many real-world systems, including computer networks, social networks, and transportation systems.

Graphs are a very general structure, so they can be used to model many different kinds of systems. The nodes and edges can have any kind of data associated with them. For example, a graph could be used to represent a social network, where the nodes are people and the edges are friendships. The nodes could have additional data associated with them, such as the person's name, age, and hometown. The edges could have additional data associated with them, such as the date the friendship began.

## Matrix Representations

There are two common ways to represent a graph as a matrix. The first way is to use an adjacency matrix, which is a matrix where each row and column represents a vertex. If there is an edge from vertex $i$ to vertex $j$, then the entry in row $i$ and column $j$ is 1. Otherwise, the entry is 0. For example, the following matrix represents a graph with 4 vertices and 4 edges:

$$
\begin{bmatrix}
0 & 1 & 0 & 1 \\
1 & 0 & 1 & 0 \\
0 & 1 & 0 & 1 \\
1 & 0 & 1 & 0 \\
\end{bmatrix}
$$



### Edge List

The second way to represent a graph as a matrix is to use an edge list. An edge list is a list of pairs of vertices that are connected by an edge. For example, the following edge list represents the same graph as the adjacency matrix above:

$$
\begin{bmatrix}
0 & 1 \\
0 & 3 \\
1 & 2 \\
2 & 3 \\
\end{bmatrix}
$$

### Adjacency Matrix

## `networkx`

networkx is a Python library for working with graphs. It provides functions for creating graphs, adding nodes and edges to graphs, and traversing graphs. It also provides functions for computing various properties of graphs, such as the shortest path between two nodes.

### Creating Graphs

To create a graph, use the `Graph()` function. This function returns a graph object that can be used to add nodes and edges to the graph.

### Adding Nodes and Edges

To add a node to a graph, use the `add_node()` function. This function takes a single argument, which is the name of the node. To add an edge to a graph, use the `add_edge()` function. This function takes two arguments, which are the names of the nodes that are connected by the edge.

### Traversing Graphs

To traverse a graph, use the `nodes()` function. This function returns a list of all the nodes in the graph. To traverse a graph, use the `edges()` function. This function returns a list of all the edges in the graph.

### Computing Properties of Graphs

To compute the shortest path between two nodes in a graph, use the `shortest_path()` function. This function takes two arguments, which are the names of the nodes. It returns a list of the nodes that are on the shortest path between the two nodes.

# Images

An image is a 2D array of pixels. Each pixel is a 3D vector of red, green, and blue (RGB) values. The RGB values are usually represented as integers between 0 and 255. The RGB values are used to represent the color of the pixel. For example, a pixel with RGB values of (255, 0, 0) is red, (0, 255, 0) is green, and (0, 0, 255) is blue. A pixel with RGB values of (0, 0, 0) is black and (255, 255, 255) is white.

In this notebook, we will learn how to read and write images, and how to manipulate them.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline


# Read an image
img = plt.imread('../data/belltower.jpg');
plt.imshow(img);

In [None]:
img.shape # (height, width, channels)

In [None]:
img[0,0,:] # RGB values of the first pixel

In [None]:
# Convert to grayscale

img_gray = img.mean(axis=2)
plt.imshow(img_gray, cmap='gray');

In [None]:
# Invert the image

img_inv = 255 - img_gray

plt.imshow(img_inv, cmap='gray');

In [None]:
# Threshold the image

img_thresh = img_gray > 100 

plt.imshow(img_thresh, cmap='gray');

In [None]:
img.shape

In [None]:
## Crop the image
img_crop = img_gray[:700, :700]

plt.imshow(img_crop, cmap='gray');

# Resize the image

img_resize = img_gray[::2, ::2] # every other pixel

print("Image resized from {} to {}".format(img_gray.shape, img_resize.shape))

plt.imshow(img_resize, cmap='gray');
plt.colorbar();

# Rotate the image

img_rot = img_gray.T

plt.imshow(img_rot, cmap='gray');

# Flip the image

img_flip = img_gray[::-1, ::-1] # reverse the rows and columns

plt.imshow(img_flip, cmap='gray');

# Save the image

plt.imsave('../data/belltower_gray.jpg', img_gray, cmap='gray');


## Exercises

**1**. Read in the image `../data/belltower.jpg` and convert it to grayscale. Then, invert the image and save it as `../data/belltower_inv.jpg`.

**2**. Read in the image `../data/belltower.jpg` and convert it to grayscale. Then, crop the image to the top-left 700x700 pixels and save it as `../data/belltower_crop.jpg`.

**3**. Read in the image `../data/belltower.jpg` and convert it to grayscale. Then, resize the image to half its size and save it as `../data/belltower_resize.jpg`.

**4**. Read in the image `../data/belltower.jpg` and convert it to grayscale. Then, rotate the image 90 degrees and save it as `../data/belltower_rot.jpg`.

**5**. Read in the image `../data/belltower.jpg` and convert it to grayscale. Then, flip the image vertically and save it as `../data/belltower_flip.jpg`.


## Image datasets

# Reading in handwritten digits

In this notebook, we will read in the handwritten digits dataset from the UCI Machine Learning Repository. The dataset consists of 1797 images of handwritten digits. Each image is a 2D array of pixels, where each pixel is an integer between 0 and 255. The images are 8x8 pixels, so there are 64 pixels in total. Each image is labeled with the digit it represents.

You can find the dataset [here](https://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits).

## Reading in the data

The data is stored in a CSV file. We can read it in using `pandas`.

```python
import pandas as pd
df = pd.read_csv('../data/digits.csv')
```


# Audio

In this section, we will learn how to use representations of audio data in machine learning.

## Audio data

Audio files can be represented in a variety of ways. The most common is the waveform, which is a time series of the amplitude of the sound wave at each time point. The waveform is a one-dimensional array of numbers. The sampling rate is the number of samples per second.

| Sampling rate | Quality |
|---------------|---------|
| 8 kHz         | Telephone call |
| 44.1 kHz      | Music CD |
| 48 kHz        | DVD |
| 96 kHz        | Studio quality |

To load an audio file, we can use the `librosa` library. The `librosa.load` function returns the waveform and the sampling rate.

In [None]:
import librosa
waveform, sampling_rate = librosa.load('audio.wav')

A dataset of audio files is available at https://www.kaggle.com/c/tensorflow-speech-recognition-challenge/data.

## Spectrogram

The spectrogram is a visual representation of the spectrum of frequencies of a signal as it varies with time. It is a two-dimensional array of numbers. The x-axis represents time, the y-axis represents frequency, and the color represents the amplitude of the frequency at that time.

The spectrogram can be computed using the `librosa.stft` function. The `librosa.amplitude_to_db` function converts the amplitude to decibels.

## Mel spectrogram

The mel spectrogram is a spectrogram where the frequencies are converted to the mel scale. The mel scale is a scale of pitches judged by listeners to be equal in distance from one another. The mel spectrogram is a two-dimensional array of numbers. The x-axis represents time, the y-axis represents mel frequency, and the color represents the amplitude of the frequency at that time.

The mel spectrogram can be computed using the `librosa.feature.melspectrogram` function.

## MFCC

The mel-frequency cepstrum (MFC) is a representation of the short-term power spectrum of a sound, based on a linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency. The MFCC is a one-dimensional array of numbers.

The MFCC can be computed using the `librosa.feature.mfcc` function.

## Chromagram

The chromagram is a representation of the short-term power spectrum of a sound, based on a linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency. The chromagram is a two-dimensional array of numbers. The x-axis represents time, the y-axis represents pitch class, and the color represents the amplitude of the pitch class at that time.

The chromagram can be computed using the `librosa.feature.chroma_stft` function.

## Chroma vector

The chroma vector is a representation of the short-term power spectrum of a sound, based on a linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency. The chroma vector is a one-dimensional array of numbers.

The chroma vector can be computed using the `librosa.feature.chroma_stft` function.

## Chroma deviation

The chroma deviation is a representation of the short-term power spectrum of a sound, based on a linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency. The chroma deviation is a one-dimensional array of numbers.

The chroma deviation can be computed using the `librosa.feature.chroma_stft` function.

## Chroma distance

The chroma distance is a representation of the short-term power spectrum of a sound, based on a linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency. The chroma distance is a one-dimensional array of numbers.

The chroma distance can be computed using the `librosa.feature.chroma_stft` function.

import librosa

waveform, sampling_rate = librosa.load('data/train/audio/bed/00176480_nohash_0.wav')

waveform
