# Introducing Set Theory
## Data Science for Data Scientists
---


## How do I write sets in latex?

Latex Notation for mathematics

```
$\mathcal{D} = \{17, 18 \dots 20\}$

$ x \in \mathcal{D}$
```

$\mathcal{D_x} = \{17, 18 \dots 20\}$

$ x \in \mathcal{D}$

## What is a Set?

## How do I determine membership?

In [2]:
Dx = {17, 18, 19, 20} # domain of x

x_age = 17

x_age in Domain_age

True

## What is set cardinality?

In [3]:
len(Dx)

4

## Operations

### Union = Both 

In [5]:
A = {1, 2, 3}
B = {3, 4}

A  | B

{1, 2, 3, 4}

In [7]:
A.union(B)

{1, 2, 3, 4}

### Intersection = Common

In [8]:
A & B

{3}

In [11]:
A.intersection(B)

{3}

### Difference (without)

$ \mathcal{A} / {1} $

In [12]:
A - {1}

{2, 3}

In [14]:
A.difference({1})

{2, 3}

## Aside: Ordered Sequences (vs Sets)
### Tuples: Pairs, Triples, ... (quadr-uple)

In [19]:
x_user1 = (18, 10, 8) # age, price, rating


Dx1 = {17, 18, 19}
Dx2 = {8, 9, 10}

In [17]:
Dx1 == {18, 19, 17}

True

In [20]:
x_user2 = (17, 9, 7)

In [21]:
x_user1

(18, 10, 8)

In [22]:
x_user1 == x_user2

False

In [24]:
x_user1[0]

18

## Set Builder Notation

$ \{ element : element \in Domain, filters \dots \}$

$ \{ n : n \in \{1, 2, \dots 10\}, n \, mod \, 2 = 0\}$

In [1]:
Z1to10 = set(range(1, 10)) # {1, 2, 3, ... 100}

In [2]:
Z1to10

{1, 2, 3, 4, 5, 6, 7, 8, 9}

In [3]:
{ n for n in Z1to10 if n % 2 == 0}

{2, 4, 6, 8}

## Exercise

* Consider a data science problem
    * eg., classifying an image into animal categories
* Define a set of all possible outcomes for $y$., `Dy`
* On any given day you observe a member of $y$, ie., a sample of observations
* Create several such subsets (`A, B, C`), and compare them using `union` and `intersection`
* Report the cardinality (len), and other features of such sets


* EXTRA:
    * using set builder notation (ie., set comprehensions) filter the domain of $y$, `Dy` 
    * any filter of `Dy` (or `A, B, C`) will do...
    * eg., harder: use `Dy` to produce `A`....
    * HINT: `from itertools import product` 
    * HINT: what does `product` do?

## Exercise
Probability is a classic use of sets... probability can be understood as nothing more than proportions of sets

(In particular, *measures* on sets which map sets to real numbers. Examples of measures include the length of a line: a line can be understood as a set of real numbers, and so the length is a measurement of that set. Probability is a real number in the interval [0,1], and a measure on sets of events/outcomes). 

P(E) is just a ratio, `#{...}/#{...}`, e.g., 
```
dice = {1, 2, 3, 4, 5, 6}
odd = {1, 3, 5}

P(odd) = #odd / #dice
```

Using len(), union(), intersection(), analyse possible events concerning the roll of a die...

* events: odd, even, only 6, only 1, either 1 or 6
* what's in common between these events?
* what's unique to two events?
* what's the probability of each of these events

EXTRA:
- using itertools' combinations() define the set of outcomes for two rolls. 
- using a set comprehension select those rolls which sum to 7


In [44]:
dice = {1, 2, 3, 4, 5, 6} # outcome space = all possible outcomes
odd = {1, 3, 5} # contain only possible outcomes
even = {2, 4, 6}
one = {1}
six = {6}
highlow = {1, 6}
winner = {2, 3, 4} # i win on a bet if i get 2, 3, 4

odd & dice == odd
even & dice == even
{0, 1} & dice == {0, 1}

False

P(E|O) = #(E & O) / #O

In [41]:
len(odd) / len(dice)

0.5

In [43]:
len(odd & highlow) / len(dice)

0.16666666666666666

In [46]:
len(winner & even) / len(dice)

0.3333333333333333

In [50]:
len(winner & odd) / len(dice) # intersection = and

0.16666666666666666

In [52]:
len(winner | highlow) / len(dice) # union = or

0.8333333333333334

In [83]:
from itertools import product

In [88]:
two_rolls = set(product(dice, repeat=2))

In [90]:
len(dice)

6

In [92]:
len(two_rolls)

36

In [93]:
len(dice) ** 2

36

$\mathcal{D} = \{1, 2, 3, 4, 5, 6\}$

$\mathcal{Di}^2 = \{(1, 1) \dots (6,6)\}$

### Set Builder (comprehensions)

$\{r | r \in \mathcal{D}^2, \sum_i^2 r_i  = 7\}$

In [66]:
{roll for roll in two_rolls if sum(roll) == 7}

{(1, 6), (2, 5), (3, 4)}

In [70]:
{(u, v) for u, v in two_rolls if u + v == 7}

{(1, 6), (2, 5), (3, 4)}

## ML Example

In [71]:
D = {(18, 8, 10), 
     (19, 3, 20), 
     (19, 6, 15), }  # (x1, x2, y)

{ (x1, x2) for (x1, x2, y) in D}

{(18, 8), (19, 3), (19, 6)}

In [72]:
{ y for (x1, x2, y) in D}

{10, 15, 20}