<a href="https://colab.research.google.com/github/hshirzeh/Data-Machine-learnin/blob/main/5_Topic2_Lecture.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Sets in Python
* Python has a data type called `set` 
* close to, but not the same as, mathematical sets.
* A set is initialized using a list, or a tuple, containing elements.

In [1]:
# A set of numbers
set([1,2,30])

{1, 2, 30}

In [2]:
# order is not presenrved
# When printing out a set, the elements will be listed 
# A set of strings
set(['this','that','the other'])

{'that', 'the other', 'this'}

In [3]:
# each element can appear in a set only once.
set([1,2,3,2])

{1, 2, 3}

more commands creating and editing sets

In [4]:
# Create an empty set
A = set()

In [5]:
# Add elements from a set
A = set()
A.update({0, 10})
print(A)

{0, 10}


In [6]:
# Delete an element from a set
A = {1, 2, 3}
A.remove(2)       # Works only if the element to be deleted is present in the set
A.discard(4)      # Works even if the element to be deleted is not present in the set
A.discard(3)      
print(A)

{1}


In [7]:
# Remove a random element from a set
A = {1, 2, 3}
A.pop()
print(A)

{2, 3}


In [8]:
A = {-1,2,1, 3}
A.pop()
print(A)

{2, 3, -1}


In [9]:
# Get a sorted list from a set
A = {1, 10, 4, -9, 7, 8, -6, 3, 2}
print(sorted(A))

[-9, -6, 1, 2, 3, 4, 7, 8, 10]


## Elements of sets must be immutable
While in mathematics anything can be an element of a set, in Python an element has to be
**immutable** and **hashable** i.e. an object with a fixed value. Immutable objects include numbers, strings and tuples. 

In [10]:
#elements can be tuples
set([(1,2),(1,3),(3,1)])

{(1, 2), (1, 3), (3, 1)}

In [11]:
# but cannot be lists
set([[1,2],[1,3],[3,1]])

TypeError: ignored

## Operations on sets
Python defines many operations on sets. Most operations come in two forms: as a method, and as an overload of mathematical operators.


For a full description of operations on sets for Python 3.6 look [here](https://docs.python.org/3.6/tutorial/datastructures.html#sets)

In [12]:
A=set(range(0,3)) # all integers between 0 and 2
B=set(range(0,6,2)) # even integers between 0 and 2
C=set(range(0,6))   # all integers between 0 and 5
'A=',A,'B=',B,'C=',C

('A=', {0, 1, 2}, 'B=', {0, 2, 4}, 'C=', {0, 1, 2, 3, 4, 5})

In [13]:
## Checking if an element is in a set:
1 in A, 1 in B, 3 not in B

(True, False, True)

In [14]:
A.issubset(C), A<=C

(True, True)

In [15]:
C.issuperset(B),C>=B

(True, True)

In [16]:
A.union(B),A | B

({0, 1, 2, 4}, {0, 1, 2, 4})

In [17]:
A.intersection(B), A&B

({0, 2}, {0, 2})

In [18]:
# The difference between A and B contains all elements that are in A but not in B
A.difference(B), A-B

({1}, {1})

In [19]:
# The symetric difference contains all elements that are in one of the two sets, but not in both
A.symmetric_difference(B), A^B

({1, 4}, {1, 4})

### Finding the primes
The following program finds the prime numbers between $2$ and $k$.

It stars with a set of all of the integers between `2` and `k`, which is called `I`.

It then removes all multiples of `2`, all multiples of `3`, all multiples of `4` etc.  
Ending with $\sqrt{k}$.

It does so by using the set substraction operation `A-=B`

It is enough to multiples of numbers up to $\sqrt{k}$ because any non-prime number has at least one factor that is smaller that $\sqrt{k}$. 

* Prove by contradiction.


In [21]:
from math import sqrt
k=100
I=set(range(2,k))
print('start, remaining=%d'%len(I))

for j in range(2,int(sqrt(k))+1):
    I-=set(range(2*j,k,j))
    print('iteration=%d, remaining=%d'%(j,len(I)))

start, remaining=98
iteration=2, remaining=50
iteration=3, remaining=34
iteration=4, remaining=34
iteration=5, remaining=28
iteration=6, remaining=28
iteration=7, remaining=25
iteration=8, remaining=25
iteration=9, remaining=25
iteration=10, remaining=25


In [22]:
','.join([str(i) for i in I])

'2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97'

### Computing the cartesian Product


In [23]:
A=set(['a','b','c'])
B=set([1,2])

In [24]:
C=set()
for x in A:
    for y in B:
        C.add((x,y))
C

{('a', 1), ('a', 2), ('b', 1), ('b', 2), ('c', 1), ('c', 2)}