# Sets In-class
## What is the cardinality of the following sets?

* $ S_1 = \{i\%7 : \forall i \in \mathbb{Z}\}$
* $ S_2 = \{x | x \in \mathbb{R} \text{ and } 0 \le x \lt 1\}$
* $S_3$ equals the set of square roots of 1
* $D$ where
$$
A = \{1,2,3,4\}\\
B = \{4, 5, 6\}\\
C = \{1, 3, 5, 7, 9, 11\}\\
D = (A\cup B)\cap C
$$

cardinality is size

S1: i mod 7 = 0 for all i in set of integers" = "all i divisible by seven"
S2: x is in real, between zero and one, infinite, cardinality = 0
S3: finite, +-1, cardinality = 2
D is A union B intersect C = {1, 3, 5}: cardinality = 3

In [1]:
from sympy import FiniteSet, S, Symbol
from nose.tools import assert_true, assert_equal, assert_false
import numpy as np

In [2]:
import sympy.sets.fancysets

#### `S` defines the special  sets
* `UniversalSet` ($U$)
* `EmptySet` ($\varnothing$)
* `Integers` ($\mathbb{Z}$)
* `Reals` ($\mathbb{R}$)

Each of these sets has the methods:

* `is_subset`
* `is_proper_subset`
#### Answer the following qeustions

* $\varnothing \subseteq U$?
* $\mathbb{R} \subseteq \mathbb{Z}$?
* $\mathbb{Z} \subset \mathbb{Z}$?

answer the following questions:
Q1: empty set is a subset of universal set?
A1: S.EmptySet.is_subset(S.UniversalSet)= true
Q2: real numbers subset of integers? 
A2: S.Reals.is_subset(S.Integers) = false (integers are a subset of the reals)
Q3: that symbol stands for proper subset (can't be equal), so it's false?

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

## Cartesian Product
>“Finding the Cartesian product of sets is useful for finding all possible combinations of the set members” (Amit Saha. *Doing Math with Python*)

In [3]:
S1 = FiniteSet(1,2,3,4)
S2 = FiniteSet(4,5,6,7,8)

In [None]:
#list(S1*S2) returns each number in S1 paired with each number in S2

## Exercise

What is the cardinality of the Cartesian product of $S1$ and $S2$ ($S1\times S2$)?

In [4]:
doctors = FiniteSet(Symbol("Dr. No"), 
                    Symbol("Dr. Yes"), 
                    Symbol("Dr. Maybe"))
diseases = FiniteSet(Symbol("renal cell carcinoma"), 
                     Symbol("hypertension"), 
                     Symbol("hcc"), Symbol("depression"))
set(doctors*diseases)

{(Dr. No, depression),
 (Dr. Yes, renal cell carcinoma),
 (Dr. Yes, hcc),
 (Dr. Maybe, renal cell carcinoma),
 (Dr. Maybe, hcc),
 (Dr. Maybe, hypertension),
 (Dr. No, hcc),
 (Dr. No, renal cell carcinoma),
 (Dr. Maybe, depression),
 (Dr. Yes, depression),
 (Dr. Yes, hypertension),
 (Dr. No, hypertension)}

## Testing Membership



prime thing to think about sets?
uniqueness!
also, testing memberships.

In [5]:
import numpy.random as ra
import random

In [6]:
data = ra.randint(low=0, high=5000000000, size=1000000)

In [7]:
d_list = list(data) #convert array (a million samples from random array) into different forms
d_tuple = tuple(data)
d_set = set(data)
d_fset = frozenset(data)

### What does the following metric indicate?

In [8]:
len(d_set) / len(d_tuple) #this measures whether they are the same
#if nothing was repeated the answer would be 1

0.999896

#### use the %timeit magic to time how long it takes to execute the statement

recall lists are mutable and tuples are not. so maybe tuples are slower? eh, not really.
but look at doing it with a set!
so if you have a big data set... then using a set is worthwhile...

In [9]:
listin = %timeit -o 127 in d_list

170 ms ± 2.54 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [10]:
tuplein = %timeit -o 127 in d_tuple

172 ms ± 1.94 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [11]:
setin = %timeit -o 127 in d_set

76.8 ns ± 0.429 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


In [12]:
fsetin = %timeit -o 127 in d_fset

83.3 ns ± 1.87 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


## What is the fastest way to test membership/in?



## How could we check whether this was a fluke due to our choice of `127`?

In [13]:
listin = []
#we tested if 127 is in our random set. let's try other numbers.
import random
#test 10 times
for i in range(10):
    rslt = %timeit -o random.randint(0,5000000000) in d_list
    listin.append(rslt.average) #listin has worst, best, average -- we'll use average
    

183 ms ± 840 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
182 ms ± 1.39 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
183 ms ± 1.25 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
183 ms ± 1.07 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
183 ms ± 1.69 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
187 ms ± 8.45 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
182 ms ± 1.87 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
182 ms ± 1.47 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
183 ms ± 2.13 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
182 ms ± 2.69 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [None]:
print("%e"%np.mean(listin))

In [None]:
fsetin = []
for i in range(10):
    rslt = %timeit -o random.randint(0,5000000000) in d_fset
    fsetin.append(rslt.average) #listin has worst, best, average -- we'll use average
    

In [None]:

print("%e"%np.mean(fsetin))

In [None]:
setin = []
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
print("%e"%np.mean(setin))