# Set Theory in Python

* unordered
* unique
* mutable but can have only immutable elements
* elements are case-sensitive Eg : 'apple' & 'Apple' are considered as different elements

**1. Defining a Set**

In [None]:
A = {1, 2, 3}
B = {4, 5, 6}

print("A :: " , A)
print("B :: " , B)

In [None]:
# B : vowels in English Alphabet
B = {'a', 'e', 'i', 'o', 'u'}

In [None]:
# C : prime numbers less than 15
C = {2, 3, 5, 7, 11}

In [None]:
# D : basic colors
C = {'red', 'blue', 'green', 'Green'}
print(C) # Case-sensitive : will be printing both 'green' & 'Green'

**Complement**
* Set contains all elements from the universal set excluding the elements of the Set for which it is complementing
* In the below example U - Universal set and A - Sub set of U, then A_ - is the complement of A. U = A + A_

In [None]:
#Univeral Set : 1 - 10
U = {1,2,3,4,5,6,7,8,9,10}

In [None]:
#A - Set of even number between 1 - 10
A = {2, 4, 6, 8, 10}

In [None]:
# complement of A from the Universal Set : A_
A_ = U - A
print(A_)

In [None]:
B = A.union(A_)

In [None]:
print(" U :: ", U)
print(" A :: ", A)
print(" A_ :: ", A_)
print(" B :: ", B)
print(" Checking whether U is Equal to B :: ", (U == B))

**Cardinality - Size of the Set**

In [None]:
list_of_countries = {'India', 'Pakistan', 'Srilanka', 'Bangladesh'}
print("Cardinality/Size of the Set :: ", len(list_of_countries))

**Equipotent**
* Sets having same cardinality by different elements

**Equivalent or Identical sets**
* Sets having same cardinality and same elements - order doesn't matter in sets

In [None]:
first = {1,2,3,4}
second = {'red', 'blue', 'black'}
third = {5,6,7,8}
fourth = {'blue', 'red', 'black'}

print("Checking whether first is Equal to second :: ", (first == second))
print("Checking whether first is Equal to third :: ", (first == third))
print("Checking whether first is Equal to fourth :: ", (first == fourth))
print("Checking whether second is Equal to third :: ", (second == third))
print("Checking whether second is Equal to fourth :: ", (second == fourth)) # Equivalent, all remaining are Equipotent
print("Checking whether third is Equal to fourth :: ", (third == fourth))


In [None]:
empty1 = {}
empty2 = {}
print("Checking whether empty1 is Equal to empty2 :: ", (empty1 == empty2))

**Alternative method for creating Set in Python**
* **frozenset** - Immutable Set, can be added to Set 

In [None]:
asian_countries = {'India', 'Pakistan', 'Srilanka', 'Bangladesh'}
australian_countries = set({'Australia', 'New Zealand'}) # another way of representing Set
countries = {'America', frozenset(asian_countries), frozenset(australian_countries)}
print("Asian countries :: ", asian_countries)
print("Australian countries :: ", australian_countries)
print('Countries :: ', countries)

**Is Element Exist**
* returns boolean value based on element existence in the set

In [None]:
sports = {'cricket', 'football', 'baseball'}

print("Is criket exists in the set ? " ,("cricket" in sports))
print("Is football exists in the set ? ", ("football" in sports))
print("Is basketball exists in the set ? ", ("basketball" in sports))

**Singleton Set**
* Having only single element in the set

In [None]:
a = {'black'}
b = {'rock'}
print( "Example for Singleton Sets :: ", a, b)

**Subset**
* A subset of element from Superset is contained in the Subset
* **Improper Subset or Trivial subset** - both Superset and Subset are having same set of elements. Superset >= Subset
* **Proper Subset** - Subset is having less number of elements of Superset. Superset >= Subset
* A set is a **improper** subset of itself
* Operator : **<=**

In [None]:
animals = {'lion', 'cat', 'whale', 'snake'}
reptiles = {'snake'}
domestic = {'dog', 'cat'}
wild = {'lion', 'snake'}
pet = {'cat', 'dog'}

print("Checking whether reptiles is the subset of animals :: ", (reptiles.issubset(animals)))
print("Checking whether domestic is the subset of animals :: ", (domestic.issubset(animals)))
print("Checking whether domestic is the subset of animals :: ", (domestic <= animals)) # alternative way of checking is subset
print("Checking whether wild is the subset of animals :: ", (wild.issubset(animals)))
print("Checking whether domestic is the subset of pet :: ", (domestic.issubset(pet))) # Trivial subset, others are proper subset



**Superset**
* A Superset contains all elements of the Subset
* **Improper Superset** - both Superset and Subset are having same set of elements. Superset >= Subset
* **Proper Superset** - Subset is having less number of elements of Superset. Superset >= Subset
* A set is a **improper** Superset of itself
* Operator : **>=**

In [None]:
animals = {'lion', 'cat', 'whale', 'snake'}
reptiles = {'snake'}
domestic = {'dog', 'cat'}
wild = {'lion', 'snake'}
pet = {'cat', 'dog'}

print("Checking whether animals is the superset of reptiles :: ", (animals.issuperset(reptiles)))
print("Checking whether animals is the superset of domestic :: ", (animals.issuperset(domestic)))
print("Checking whether animals is the superset of domestic :: ", (animals >= domestic)) # alternative way of checking is superset
print("Checking whether animals is the superset of wild :: ", (animals.issuperset(wild)))
print("Checking whether domestic is the superset of pet :: ", (domestic.issuperset(pet))) # Improper superset, others are proper subset

**Powerset**
* All posible subsets of a superset = 2 ** n
* For empty set - empty set will be the powerset -> {} and cardinality is 1

In [None]:
from itertools import combinations
continents = {'Asia', 'Africa', 'Australia', 'Europe'}
total_count = len(players)

result = []
result.append(['']) # adding empty set
for i in range(0, total_count):
    cc = combinations(continents, i +1)
    comb = [j for j in cc]
    for j in comb:
        result.append(set(j))

print("Expected combinations count :: ", (2 ** len(continents)))
print("Printing the total number of combinations :: ", (len(result)))
print("Printing the combinations ")
for r in result:
    print(r, len(r))

**Union**
* All elements of the both elements should appear
* **But each element should apear only once**
* **|** operator can be used

In [None]:
animals = {'lion', 'cat', 'whale', 'snake'}
reptiles = {'snake'}
domestic = {'dog', 'cat'}

print("Printing the animals union of reptiles :: ", animals.union(reptiles))
print("Printing the animals union of domestic :: ", animals.union(domestic))
print("Printing the animals union of domestic :: ", (animals | domestic)) # using | operator
print("Printing the reptiles union of domestic :: ", reptiles.union(domestic))

**Intersection**
* Only the elments that exists on both elements should appear
* **&** operator can be used

In [None]:
animals = {'lion', 'cat', 'whale', 'snake'}
reptiles = {'snake'}
domestic = {'dog', 'cat'}

print("Printing the animals intersection of reptiles :: ", animals.intersection(reptiles))
print("Printing the animals intersection of domestic :: ", animals.intersection(domestic))
print("Printing the animals intersectoin of domestic :: ", animals & domestic) # using & operator
print("Printing the reptiles intersection of domestic :: ", reptiles.intersection(domestic))

**Difference**
* Exclusive elements on the first set

In [None]:
animals = {'lion', 'cat', 'whale', 'snake'}
reptiles = {'snake'}
domestic = {'dog', 'cat'}

print("Printing the animals difference of reptiles :: ", animals.difference(reptiles))
print("Printing the animals difference of domestic :: ", animals.difference(domestic))
print("Printing the reptiles difference of domestic :: ", reptiles.difference(domestic))

**Symmentric Difference**
* Return only the not common elements on the both Set

In [None]:
animals = {'lion', 'cat', 'whale', 'snake'}
reptiles = {'snake'}
domestic = {'dog', 'cat'}

diff1_animals = animals - reptiles
diff1_reptiles = reptiles - animals
diff2_animals = animals - domestic
diff2_domestic =  domestic - animals
diff3_reptiles = reptiles - domestic
diff3_domestic = domestic - reptiles

diff1 = diff1_animals.union(diff1_reptiles)
diff2 = diff2_animals.union(diff2_domestic)
diff3 = diff3_reptiles.union(diff3_domestic)

print("Validating Symmentric Difference with the identifying manual difference between sets and computing union of the difference")
print("Animal + Reptiles :: ", diff1, animals.symmetric_difference(reptiles), " ----- ", (diff1 ==  animals.symmetric_difference(reptiles) ) )
print("Animal + Domestic :: ", diff2,  animals.symmetric_difference(domestic), " ----- ", (diff2 ==  animals.symmetric_difference(domestic) ) )
print("Reptiles + Domestic :: ", diff3, reptiles.symmetric_difference(domestic), " ----- ", (diff3 ==  reptiles.symmetric_difference(domestic) ) )

**Disjoint and Non-Disjoint**
* **Disjoint** - No common element between two sets
* **Non-Disjoint** - Atleast one element between two sets

In [None]:
animals = {'lion', 'cat', 'whale', 'snake'}
reptiles = {'snake'}
domestic = {'dog', 'cat'}


print("Printing the animals and reptiles are Disjoint/Non-disjoint :: ", "Disjoint " if animals.isdisjoint(reptiles) else "Non-disjoint " )
print("Printing the animals and domestic are Disjoint/Non-disjoint :: ", "Disjoint " if animals.isdisjoint(domestic) else "Non-disjoint " )
print("Printing the reptiles and domestic are Disjoint/Non-disjoint :: ", "Disjoint " if reptiles.isdisjoint(domestic) else "Non-disjoint " )


print("Printing the animals and reptiles are Disjoint/Non-disjoint :: ", "Non-disjoint " if animals.intersection(reptiles) else "Disjoint " )
print("Printing the animals and domestic are Disjoint/Non-disjoint :: ", "Non-disjoint " if animals.intersection(domestic) else "Disjoint " )
print("Printing the reptiles and domestic are Disjoint/Non-disjoint :: ", "Non-disjoint " if reptiles.intersection(domestic) else "Disjoint " )


**Set Comprehension**
* set can be created using set comprehension

In [None]:
squares = {x ** 2 for x in range(1, 6)}
print(" Set with squares : ", squares)

even = {x for x  in range(1, 10) if x%2==0}
print(" Set with even : ", even)


**Cartisian Product**
* returns all possible ordered pairs

In [None]:
import itertools

a = {'a','b','c','d'}
b = {'z', 'y', 'x'}

elements = list(itertools.product(a, b))

print("Printing the total number of pairs :: ", (len(elements)))
print("Printing the pairs ")
for r in elements:
    print(r)


In [None]:
#Initializing the set
a = {'P', 'Y', 'T', 'H', 'O', 'N'}
b = set()
print("A :: ", a, " Size ::" , len(a))
print("B :: ", b, " Size ::" , len(b))


print("-------------------------------")
# adding element to the existing Set
a.add('3')
b.add('3')

print("After adding Element")
print("A :: ", a, " Size ::" , len(a))
print("B :: ", b, " Size ::" , len(b))


print("-------------------------------")
# removing element to the existing Set
try:
    a.remove('Y')
    print("Removed Y from :: ", a)
except KeyError:
    print("Error in removing Y from :: ", a)

# removing element to the existing Set - which raises KeyError if key not exists
try:
    b.remove('Y')
    print("Removed Y from :: ", b)
except KeyError:
    print("Error in removing Y from :: ", b)


print("After removing Element Y")
print("A :: ", a, " Size ::" , len(a))
print("B :: ", b, " Size ::" , len(b))

print("-------------------------------")
# discard element from the existing Set - which won't raises KeyError if key not exists
a.discard('T')
b.discard('T')

print("After discarding Element T")
print("A :: ", a, " Size ::" , len(a))
print("B :: ", b, " Size ::" , len(b))

print("-------------------------------")
# popping the arbitary element from the existing Set - which raises KeyError if key not exists
a.pop()
b.pop()

try:
    a.pop()
    print("Popped arbitary element :: ", a)
except KeyError:
    print("Error in popping element from :: ", a)

try:
    b.pop()
    print("Popped arbitary element :: ", b)
except KeyError:
    print("Error in popping element from :: ", b)

print("After popping arbitary element")
print("A :: ", a, " Size ::" , len(a))
print("B :: ", b, " Size ::" , len(b))

print("-------------------------------")
# clearing all elements of the Set
a.clear()
b.clear()

print("After applying clear")
print("A :: ", a, " Size ::" , len(a))
print("B :: ", b, " Size ::" , len(b))



In [None]:
#pip install venny4py matplotlib
from venny4py.venny4py import *


animals = {'lion', 'cat', 'whale', 'snake'}
reptiles = {'snake'}
domestic = {'dog', 'cat'}
wild = {'lion', 'snake', 'tiger'}
pet = {'cat', 'dog'}

#dict of sets
sets = {
    'Animals': animals,
    'Wild': wild}
    
venny4py(sets=sets)
plt.show()

In [None]:
#pip install matplotlib_venn
from matplotlib_venn import venn2

animals = {'lion', 'cat', 'whale', 'snake'}
reptiles = {'snake'}
domestic = {'dog', 'cat'}
wild = {'lion', 'snake', 'tiger'}
pet = {'cat', 'dog'}

venn2([animals, reptiles])
plt.show()

## Set Theory Laws

**Identity Laws**
* **Union** with Empty set and **Intersection** with Universal set will not change the identity

In [None]:
all_subjects = {'Maths', 'Science', 'English', 'Tamil', 'Moral Science', 'History', 'Geography'}
language = {'English', 'Tamil'}
science = {}

# Union with empty set
identity1 = language.union(science)

# intesection with Universal set
identity2 = language.intersection(all_subjects)

print("Universal Set :: ", all_subjects)
print("Actual Set :: ", language)
print("Identity Test 1 :: ", identity1)
print("Identity Test 2 :: ", identity2)

**Idempotent Laws**
* A Set **Unions** or **Intersection** with itself does not changes the **Identity**

In [None]:
vehicle = {'car', 'bike', 'van'}

# Union with itself
identity1 = vehicle.union(vehicle)

# intesection with itself
identity2 = vehicle.intersection(vehicle)

print("Actual Set :: ", vehicle)
print("Identity Test 1 :: ", identity1)
print("Identity Test 2 :: ", identity2)

**Domination Laws**
* A Set which **Unions** with Univerasal set or has **Intersection** with Empty set gets swallowed up completely

In [None]:
earth_known_metals = {'steel', 'copper', 'silver', 'aluminium'} 
ornament_metals = {'gold', 'silver', 'platinum'}
mars_known_metals = {}

# Union with Universal set :: Swallowed by the Union set
identity1 = ornament_metals.union(earth_known_metals)

# intesection with empty set :: Swallowed by the Union set
identity2 = ornament_metals.intersection(mars_known_metals)

print("Universal Set :: ", earth_known_metals)
print("Actual Set :: ", ornament_metals)
print("Identity Test 1 :: ", identity1)
print("Identity Test 2 :: ", identity2)

**Complementation Laws**
* A Set which **Unions** with it's complement results in Universal set
* A Set which **Intersection** with it's complement results in Empty set

In [None]:
numbers_1_to_10 = {1,2,3,4,5,6,7,8,9,10} 
even_numbers = {2,4,6,8,10}

even_numbers_c = numbers_1_to_10 - even_numbers


# Union with Compliment:: Result will be a Universal set
set1 = even_numbers.union(even_numbers_c)

# intesection with Compliment:: Result will be a Empty set
set2 = even_numbers.intersection(even_numbers_c)

print("Universal Set :: ", numbers_1_to_10)
print("Actual Set :: ", even_numbers)
print("Actual Set's complement :: ", even_numbers_c)
print("Set 1 :: ", set1)
print("Set 2 :: ", set2)

**Commutative Laws**
* A Set which **Unions** or **Intersection** with another set or visa-versa happens result will be the same

In [None]:
food_ingredients = {'rice', 'vegetable'} 
added_ingredients = {'salt', 'pepper'}

# food_ingredients union with added_ingredients
set1 = food_ingredients.union(added_ingredients)

# added_ingredients union with food_ingredients
set2 = added_ingredients.union(food_ingredients)

print("Actual Set 1 :: ", food_ingredients)
print("Actual Set 2 :: ", added_ingredients)
print("Set 1 :: ", set1)
print("Set 2 :: ", set2)
print("Is Set 1 and Set 2 are same ? :: ", (set1 == set2))


**Distributive Laws**
* Set works same as
    - A X (B + C) = (A + B) * (A + C)
    - A + (B X C) = (A X B) + (A X C)

In [None]:
spacecraft_metals = {'titanium', 'aluminium'} 
ornament_metals = {'gold', 'silver', 'platinum'}
tools_metals = {'aluminium', 'steel'}

# A Unions (B Intersects C) = (A Intersects B) union (A Intersects C)
# spacecraft_metals union with (ornament_metals intersects tools_metals) 
#              = (spacecraft_metals intersects ornament_metals) union (spacecraft_metals intersects tools_metals)
set1_left = spacecraft_metals.union(ornament_metals.intersection(tools_metals))
set1_right = (spacecraft_metals.union(ornament_metals)).intersection((spacecraft_metals.union(tools_metals)))


# A Intersects (B Unions C) = (A Unions B) Intersects (A Unions C)
# spacecraft_metals union intersects (ornament_metals union tools_metals) 
#              = (spacecraft_metals union ornament_metals) intersects (spacecraft_metals union tools_metals)
set2_left = spacecraft_metals.intersection(ornament_metals.union(tools_metals))
set2_right = (spacecraft_metals.intersection(ornament_metals)).union((spacecraft_metals.intersection(tools_metals)))


print("Actual Set 1 Left :: ", set1_left)
print("Actual Set 1 Right :: ", set1_right)
print("Actual Set 2 Left :: ", set2_left)
print("Actual Set 2 Right :: ", set2_right)

print("Is Set 1 and Set 2 are same ? :: ", (set1_left == set1_right))
print("Is Set 1 and Set 2 are same ? :: ", (set2_left == set2_right))


**Absorption Laws**
* works as below
    - A union (A intersect B) = A
    - A intersect (A union B) = A

In [None]:
spacecraft_metals = {'titanium', 'aluminium'} 
ornament_metals = {'gold', 'silver', 'platinum'}

# A Unions (A Intersects B) = A
# spacecraft_metals union with (spacecraft_metals intersects ornament_metals) = A
set1_left = spacecraft_metals.union(spacecraft_metals.intersection(ornament_metals))
set1_right = spacecraft_metals


# A Intersects (A Union B) = A
# spacecraft_metals intersects with (spacecraft_metals union ornament_metals) = A
set2_left = spacecraft_metals.intersection(spacecraft_metals.union(ornament_metals))
set2_right = spacecraft_metals


print("Actual Set 1 Left :: ", set1_left)
print("Actual Set 1 Right :: ", set1_right)
print("Actual Set 2 Left :: ", set2_left)
print("Actual Set 2 Right :: ", set2_right)

print("Is Set 1 and Set 2 are same ? :: ", (set1_left == set1_right))
print("Is Set 1 and Set 2 are same ? :: ", (set2_left == set2_right))


**Associative Laws**
* Set works same as
    - A union (B union C) = (A union B) union C
    - A intersect (B intersect C) = (A intersect B) intersect C

In [None]:
spacecraft_metals = {'titanium', 'aluminium'} 
ornament_metals = {'gold', 'silver', 'platinum'}
tools_metals = {'aluminium', 'steel'}

# A Unions (B Unions C) = (A Unions B) Unions C
# spacecraft_metals union with (ornament_metals union with tools_metals) 
#     = (spacecraft_metals union with ornament_metals) union with tools_metals

set1_left = spacecraft_metals.union(ornament_metals.union(tools_metals))
set1_right = spacecraft_metals.union(ornament_metals).union(tools_metals)


# A Intersects (B Intersects C) = (A Intersects B) Intersects C
# spacecraft_metals intersects with (ornament_metals intersects with tools_metals) 
#     = (spacecraft_metals intersects with ornament_metals) intersects with tools_metals
set2_left = spacecraft_metals.intersection(ornament_metals.intersection(tools_metals))
set2_right = spacecraft_metals.intersection(ornament_metals).intersection(tools_metals)


print("Actual Set 1 Left :: ", set1_left)
print("Actual Set 1 Right :: ", set1_right)
print("Actual Set 2 Left :: ", set2_left)
print("Actual Set 2 Right :: ", set2_right)

print("Is Set 1 and Set 2 are same ? :: ", (set1_left == set1_right))
print("Is Set 1 and Set 2 are same ? :: ", (set2_left == set2_right))


**De Morgan's Law**
* works as below
    - (A union B)' = A' intersect B'
    - (A intersect B)' = A' union B'

In [None]:
earth_known_metals = {'steel', 'copper', 'silver', 'aluminium', 'titanium'} 
spacecraft_metals = {'titanium', 'aluminium'} 
tools_metals = {'aluminium', 'steel'}

# (A Unions B)' = A' Intersect B'
# earth_known_metals - (spacecraft_metals union with tools_metals) 
#     = (earth_known_metals - spacecraft_metals) intersects with (earth_known_metals - tools_metals)
set1_left = earth_known_metals - (spacecraft_metals.union(tools_metals))
set1_right = (earth_known_metals - spacecraft_metals).intersection(earth_known_metals - tools_metals)

# (A Intersect B)' = A' Unions B'
# earth_known_metals - (spacecraft_metals intersects with tools_metals) 
#     = (earth_known_metals - spacecraft_metals) union with (earth_known_metals - tools_metals)
set2_left = earth_known_metals - (spacecraft_metals.intersection(tools_metals))
set2_right = (earth_known_metals - spacecraft_metals).union(earth_known_metals - tools_metals)

print("Actual Set 1 Left :: ", set1_left)
print("Actual Set 1 Right :: ", set1_right)
print("Actual Set 2 Left :: ", set2_left)
print("Actual Set 2 Right :: ", set2_right)

print("Is Set 1 and Set 2 are same ? :: ", (set1_left == set1_right))
print("Is Set 1 and Set 2 are same ? :: ", (set2_left == set2_right))


In [None]:
earth_known_metals = {'steel', 'copper', 'silver', 'aluminium', 'titanium'} 
spacecraft_metals = {'titanium', 'aluminium'} 
tools_metals = {'aluminium', 'steel'}
liquid_metals = {'mercury', 'hydrogen'}

# (A Unions B Unions C)' = A' Intersect B' Intersect C'
# earth_known_metals - (spacecraft_metals union with tools_metals union with liquid_metals) 
#     = (earth_known_metals - spacecraft_metals) intersects with (earth_known_metals - tools_metals) 
#        intersects with (earth_known_metals - liquid_metals)
set1_left = earth_known_metals - (spacecraft_metals.union(tools_metals).union(liquid_metals))
set1_right = (earth_known_metals - spacecraft_metals).intersection(earth_known_metals - tools_metals).intersection(earth_known_metals - liquid_metals)

# (A Intersect B Intersect C)' = A' Unions B' Unions C'
# earth_known_metals - (spacecraft_metals intersects with tools_metals intersects with liquid_metals) 
#     = (earth_known_metals - spacecraft_metals) unions with (earth_known_metals - tools_metals) 
#        unions with (earth_known_metals - liquid_metals)
set2_left = earth_known_metals - (spacecraft_metals.intersection(tools_metals).intersection(liquid_metals))
set2_right = (earth_known_metals - spacecraft_metals).union(earth_known_metals - tools_metals).union(earth_known_metals - liquid_metals)

print("Actual Set 1 Left :: ", set1_left)
print("Actual Set 1 Right :: ", set1_right)
print("Actual Set 2 Left :: ", set2_left)
print("Actual Set 2 Right :: ", set2_right)

print("Is Set 1 and Set 2 are same ? :: ", (set1_left == set1_right))
print("Is Set 1 and Set 2 are same ? :: ", (set2_left == set2_right))


**Double Negation Law**
* Complemnent of complement of a Set will result the initial Set state

In [None]:
earth_known_metals = {'steel', 'copper', 'silver', 'aluminium', 'titanium', 'mercury','hydrogen'} 
liquid_metals = {'mercury', 'hydrogen'}

# ((A)')' = A
# earth_known_metals - (earth_known_metals -  liquid_metals) = liquid_metals
set1_left = (earth_known_metals - (earth_known_metals - liquid_metals))
set1_right = liquid_metals

print("Actual Set 1 Left :: ", set1_left)
print("Actual Set 1 Right :: ", set1_right)

print("Is Set 1 and Set 2 are same ? :: ", (set1_left == set1_right))

**Jaccard Similarity**
* Similarity between 2 sets, **Range : 0 - 1**
* (A intersect B) / (A union B)
* Used in :
  - text analysis
  - Image processing
  - Recommendations systems
* Range definition:
  - 0.00 to 0.19 : Very low similarity
  - 0.20 to 0.39 : Low similarity
  - 0.40 to 0.59 : Moderate similarity
  - 0.60 to 0.79 : High similarity
  - 0.80 to 1.00 : Very high similarity

In [None]:
earth_known_metals = {'steel', 'copper', 'silver', 'aluminium', 'titanium', 'mercury','hydrogen'} 
liquid_metals = {'mercury', 'hydrogen'}

language = {'English', 'Tamil', 'Hindi'}
subject = {'English', 'Hindi', 'Telugu'}

a = {'a','b','c','d'}
b = {'a', 'b', 'c'}

colors_1 = {'red','blue','green'}
colors_2 = {'green', 'blue', 'red'}

set1_dividend = len(earth_known_metals.intersection(liquid_metals))
set1_divisor = len(earth_known_metals.union(liquid_metals))

print("Actual Set 1 dividend :: ", set1_dividend)
print("Actual Set 1 divisor :: ", set1_divisor)

print("How much Set 1 : sets are similar ? :: ", (set1_dividend / set1_divisor))

set2_dividend = len(language.intersection(subject))
set2_divisor = len(language.union(subject))

print("Actual Set 2 dividend :: ", set2_dividend)
print("Actual Set 2 divisor :: ", set2_divisor)

print("How much Set 2 : sets are similar ? :: ", (set2_dividend / set2_divisor))


set3_dividend = len(a.intersection(b))
set3_divisor = len(a.union(b))

print("Actual Set 3 dividend :: ", set3_dividend)
print("Actual Set 3 divisor :: ", set3_divisor)

print("How much Set 3 : sets are similar ? :: ", (set3_dividend / set3_divisor))

set4_dividend = len(colors_1.intersection(colors_2))
set4_divisor = len(colors_1.union(colors_2))

print("Actual Set 4 dividend :: ", set4_dividend)
print("Actual Set 4 divisor :: ", set4_divisor)

print("How much Set 4 : sets are similar ? :: ", (set4_dividend / set4_divisor))


In [None]:
#pip install scikit-learn
from sklearn.metrics import jaccard_score

# number of elements should be same for using sklearn's jaccard_score
language = {'English', 'Tamil', 'Hindi'}
subject = {'English', 'Hindi', 'Telugu'}

a = [1,2,3,4]
b = [1,2,3,5]

print("Is sets are similar ? :: ", jaccard_score(list(language), list(subject), average=None))
print("Is sets are similar ? :: ", jaccard_score(list(language), list(subject), average='weighted'))

print("--------------------------------")
print("Is sets are similar (macro) ? :: ", jaccard_score(a, b, average='macro'))
print("Is sets are similar (weighted) ? :: ", jaccard_score(a, b, average='weighted'))
print("Is sets are similar (micro) ? :: ", jaccard_score(a, b, average='micro'))
#print("Is sets are similar (binary) ? :: ", jaccard_score(a, b, average='binary'))
#print("Is sets are similar (samples) ? :: ", jaccard_score(a, b, average='samples'))
print("Is sets are similar (None) ? :: ", jaccard_score(a, b, average=None))


**Application of Set Theory in Machine Learning**
* Text clasifications and sentiment analysis : set of words or n-gram to get similarity measures like Jaccard Index
* Recommender systems and rule-based systems : recommendation - similarities based on user preference, customer purchase history and activity
* Clustering and graph based machine learning : manipulate and compare graph structures
* Feature selection and data pre-processing : identifying redundant or irrelevant features, clean, preprocess, etc (credit score)
* Association rule mining and social media analysis
* Anamoly detection and image segmentation
* Evaluation metrics for machine learning models

In [None]:
positive_keywords = {"great", "amazing", "excellent", "awesome", "outstanding"}
negative_keywords = {"awful", "boring", 'terrible', "poor", "disappointing"}

review1 = "The movie was great and had an amazing story"
review2 = "I found the movie to be boring and the acting was terrible"
review3 = "The movie was not bad, but the acting could be better"

review1_set = set()
review2_set = set()
review3_set = set()

[review1_set.add(x) for x in review1.split(' ')]
[review2_set.add(x) for x in review2.split(' ')]
[review3_set.add(x) for x in review3.split(' ')]

print("--------------------------------------------")
# review 1

review1_positive_dividend = len(review1_set.intersection(positive_keywords))
review1_positive_divisor = len(review1_set.union(positive_keywords))

review1_negative_dividend = len(review1_set.intersection(negative_keywords))
review1_negative_divisor = len(review1_set.union(negative_keywords))

review1_positive_score = review1_positive_dividend / review1_positive_divisor
review1_negative_score = review1_negative_dividend / review1_negative_divisor

review1_sentiment = 'neutral' if ((review1_positive_score == 0 and review1_negative_score == 0) or review1_positive_score == review1_negative_score) else ("positive" if review1_positive_score > review1_negative_score else 'negative')

print("Actual Review 1 Positive dividend :: ", review1_positive_dividend)
print("Actual Review 1 Positive divisor :: ", review1_positive_divisor)

print("Actual Review 1 Negative dividend :: ", review1_negative_dividend)
print("Actual Review 1 Negative divisor :: ", review1_negative_divisor)
print("How much Review 1 is sentiment ? :: ", (review1_sentiment))

print("--------------------------------------------")
# review 2

review2_positive_dividend = len(review2_set.intersection(positive_keywords))
review2_positive_divisor = len(review2_set.union(positive_keywords))

review2_negative_dividend = len(review2_set.intersection(negative_keywords))
review2_negative_divisor = len(review2_set.union(negative_keywords))

review2_positive_score = review2_positive_dividend / review2_positive_divisor
review2_negative_score = review2_negative_dividend / review2_negative_divisor

review2_sentiment = 'neutral' if ((review2_positive_score == 0 and review2_negative_score == 0) or review2_positive_score == review2_negative_score) else ("positive" if review2_positive_score > review2_negative_score else 'negative')

print("Actual Review 2 Positive dividend :: ", review2_positive_dividend)
print("Actual Review 2 Positive divisor :: ", review2_positive_divisor)

print("Actual Review 2 Negative dividend :: ", review2_negative_dividend)
print("Actual Review 2 Negative divisor :: ", review2_negative_divisor)
print("How much Review 2 is sentiment ? :: ", (review2_sentiment))

print("--------------------------------------------")
# review 3

review3_positive_dividend = len(review3_set.intersection(positive_keywords))
review3_positive_divisor = len(review3_set.union(positive_keywords))

review3_negative_dividend = len(review3_set.intersection(negative_keywords))
review3_negative_divisor = len(review3_set.union(negative_keywords))

review3_positive_score = review3_positive_dividend / review3_positive_divisor
review3_negative_score = review3_negative_dividend / review3_negative_divisor

review3_sentiment = 'neutral' if ((review3_positive_score == 0 and review3_negative_score == 0) or review2_positive_score == review2_negative_score) else ("positive" if review3_positive_score > review3_negative_score else 'negative')

print("Actual Review 3 Positive dividend :: ", review3_positive_dividend)
print("Actual Review 3 Positive divisor :: ", review3_positive_divisor)

print("Actual Review 3 Negative dividend :: ", review3_negative_dividend)
print("Actual Review 3 Negative divisor :: ", review3_negative_divisor)
print("How much Review 3 is sentiment ? :: ", (review3_sentiment))




**Dice Coefficient**
* Similarity between 2 sets, **Range : 0 - 1**
* 2 * (A intersect B) / (A + B)
* to measure of the extend to which two sets overlap - higher values indicates greater similarity
* Used in :
  - text classification
  - infomrmation retrieval
  - natural language processing
  - Recommendations systems
  - clustering

* Range definition:
  - 0 : No similarity
  - 0 to 0.5 : Low similarity
  - 0.5 : Moderate similarity
  - 0.5 to 1 : High similarity
  - 1 : Identical

In [None]:
# number of elements should be same for using sklearn's jaccard_score
language = {'English', 'Tamil', 'Hindi'}
subject = {'English', 'Hindi', 'Telugu'}

a = {1,2,3,4}
b = {1,2,4,5}
c = {0,2,5}
d = {9, 8}

a_b = 2 * len((a & b)) / (len(a) + len(b))
b_c = 2 * len((b & c)) / (len(b) + len(c))
a_c = 2 * len((a & c)) / (len(a) + len(c))
a_d = 2 * len((a & d)) / (len(a) + len(d))

print("Is sets a_b are similar ? :: ", a_b)
print("Is sets b_c are similar ? :: ", b_c)
print("Is sets a_c are similar ? :: ", a_c)
print("Is sets a_c are similar ? :: ", a_d)

print("--------------------------------")


**Tversky index**
* more general similarity measure that can be reduced to Jaccard similarity or Dice coefficent by adjusting the parameter.
* Penality to false-positive and false-negative
* Helps by providing way to incorporate penalities into the similarity system
* **Jaccard (alpha=0.5, beta=2.0)** and **Sørensen-Dice (alpha=0.5, beta=1.0)**.
  - Wiki : https://en.wikipedia.org/wiki/Tversky_index - implementation not working as mentioned
  - ? (alpha=1, beta=1) for Jaccard similarity
  - ? (alpha=0.5, beta=0.5) for Dice coefficient
* Example
  - In recommendation system, false negative (user may like but not rendered) may be more costly than false positives (Items which are like by the user, but user did not like)

-----------------

* Reference: https://www.akkio.com/post/precision-vs-recall-how-to-use-precision-and-recall-in-machine-learning-complete-guide#:~:text=They%20are%20both%20related%20to,correctly%20identified%20by%20the%20model.
* An spam detecter classifier is provided with a data of 13 spams in the data set of 20 messages. It identifies 9 as spam and out of which only 4 are actual spam and remaining 5 are not spam.
  - **Precision** : Out of identified how many were spam ? 4/9
  - **Recal** : Out of all spams how many are identified ? 4/13

- Reference:: https://developers.google.com/machine-learning/crash-course/classification/true-false-positive-negative#:~:text=Similarly%2C%20a%20true%20negative%20is,incorrectly%20predicts%20the%20negative%20class.
- **True positive** : model correctly predicts the positive class.
- **False positives** : model incorrectly predicts the positive class
- **False negatives** : model incorrectly predicts the negative class
- **True negatives** : model correctly predicts the negative class.


In [None]:
def jaccard_similarity(set1, set2):
    a = len(set1.intersection(set2))
    b = len(set1.union(set2))
    return (a / b)

def dice_coefficient(set1, set2):
    return 2 * len((set1 & set2)) / (len(set1) + len(set2))

#Reference : https://github.com/chartbeat-labs/textacy/blob/main/src/textacy/similarity/tokens.py
def tversky_index(set1, set2, alpha, beta):
    intersection = len(set1 & set2)

    set1_not_set2 = len(set1 - set2)
    set2_not_set1 = len(set2 - set1)
    
    a = min(set1_not_set2, set2_not_set1)
    b = max(set1_not_set2, set2_not_set1)

    try:        
        return intersection / (intersection + (beta * (alpha * a + (1 - alpha) * b)))
    except ZeroDivisionError:
        return 0.0
        
animals = {'lion', 'cat', 'whale', 'snake', 'dog'}
reptiles = {'snake'}
domestic = {'dog', 'cat'}
wild = {'lion', 'snake', 'tiger'}
pet = {'cat', 'dog', 'parrot'}

for x in [animals, reptiles, domestic, wild, pet]:
    print("-----------")
    print("processing ", animals, " and ", x)
    print("tversky_index : ", tversky_index(animals, x, alpha=0.5, beta=2.0))
    print("jaccard_similarity : ", jaccard_similarity(animals, x))

for x in [animals, reptiles, domestic, wild, pet]:
    print("-----------")
    print("processing ", animals, " and ", x)
    print("tversky_index : ", tversky_index(animals, x, alpha=0.5, beta=1.0))
    print("dice_coefficient : ", dice_coefficient(animals, x))



### Multiple Set

**Reduce**
- **Apply function of two arguments cumulatively** to the items of iterable, from left to right, so as to **reduce the iterable to a single value**.

In [None]:
from functools import reduce

animals = {'lion', 'cat', 'whale', 'snake', 'dog'}
reptiles = {'snake'}
domestic = {'dog', 'cat'}
wild = {'lion', 'snake', 'tiger'}
pet = {'cat', 'dog', 'parrot'}

print(" Set reduced with union :", reduce(set.union, [animals, reptiles, domestic, wild, pet]))
print(" Set reduced with intersection :", reduce(set.intersection, [animals, domestic, pet]))
print(" Set reduced with intersection :", reduce(set.difference, [animals, domestic, pet]))
print(" Set reduced with intersection :", reduce(set.symmetric_difference, [animals, domestic, pet]))