# Python Sets: A Comprehensive Guide

This notebook covers the fundamentals of working with sets in Python, including creation, operations, and best practices.

## 1. Introduction to Sets

Sets are unordered collections of unique elements in Python. They are mutable, but their elements must be immutable (hashable).

In [None]:
# Creating a set
fruits = {'apple', 'banana', 'cherry'}
print("Set of fruits:", fruits)

# Creating a set from a list
numbers = set([1, 2, 3, 3, 4, 4, 5])
print("Set of numbers:", numbers)  # Duplicate elements are automatically removed

In [None]:
# Comparing sets with other data structures
list_example = [1, 2, 3, 3, 4]
tuple_example = (1, 2, 3, 3, 4)
dict_example = {'a': 1, 'b': 2, 'c': 3}
set_example = {1, 2, 3, 4}

print("List:", list_example)
print("Tuple:", tuple_example)
print("Dictionary:", dict_example)
print("Set:", set_example)

# Demonstrating unordered nature
print("\nUnordered nature of sets:")
print({3, 1, 4, 1, 5, 9, 2, 6, 5})

In [None]:
# Set immutability (elements must be hashable)
valid_set = {1, 'hello', (1, 2, 3)}
print("Valid set:", valid_set)

try:
    invalid_set = {1, [2, 3], {4, 5}}
except TypeError as e:
    print("Error:", e)

## 2. Basic Set Operations

In [None]:
# Adding elements to a set
colors = {'red', 'green', 'blue'}
colors.add('yellow')
print("After add():", colors)

colors.update(['orange', 'purple'])
print("After update():", colors)

In [None]:
# Removing elements from a set
numbers = {1, 2, 3, 4, 5}
numbers.remove(3)
print("After remove(3):", numbers)

numbers.discard(10)  # No error if element doesn't exist
print("After discard(10):", numbers)

popped = numbers.pop()
print(f"Popped element: {popped}, Set after pop(): {numbers}")

numbers.clear()
print("After clear():", numbers)

In [None]:
# Set length and membership
fruits = {'apple', 'banana', 'cherry', 'date'}
print("Number of fruits:", len(fruits))
print("Is 'apple' in fruits?", 'apple' in fruits)
print("Is 'grape' not in fruits?", 'grape' not in fruits)

## 3. Accessing Set Elements

In [None]:
# Iterating over sets
colors = {'red', 'green', 'blue'}
print("Colors in the set:")
for color in colors:
    print(color)

In [None]:
# Accessing elements in an unordered collection
numbers = {1, 2, 3, 4, 5}
print("First element (arbitrary):", next(iter(numbers)))

# Converting to list for indexing (not recommended for large sets)
number_list = list(numbers)
print("Third element after conversion to list:", number_list[2])

In [None]:
# Looping through sets with for and enumerate
fruits = {'apple', 'banana', 'cherry'}
for index, fruit in enumerate(fruits, start=1):
    print(f"Fruit {index}: {fruit}")

## 4. Set Methods

In [None]:
# Set operations
set1 = {1, 2, 3, 4}
set2 = {3, 4, 5, 6}

print("Union:", set1.union(set2))
print("Intersection:", set1.intersection(set2))
print("Difference (set1 - set2):", set1.difference(set2))
print("Symmetric Difference:", set1.symmetric_difference(set2))

In [None]:
# Modifying sets
set1 = {1, 2, 3}
set2 = {3, 4, 5}

set1.update(set2)
print("After update():", set1)

set1.intersection_update({2, 3, 4})
print("After intersection_update():", set1)

set1.difference_update({3})
print("After difference_update():", set1)

set1.symmetric_difference_update({2, 4})
print("After symmetric_difference_update():", set1)

In [None]:
# Copying sets and set comparisons
original = {1, 2, 3}
copied = original.copy()
print("Copied set:", copied)

set1 = {1, 2, 3, 4}
set2 = {1, 2}
set3 = {5, 6}

print("Is set2 a subset of set1?", set2.issubset(set1))
print("Is set1 a superset of set2?", set1.issuperset(set2))
print("Are set1 and set3 disjoint?", set1.isdisjoint(set3))

## 5. Set Operations with Operators

In [None]:
# Set operations using operators
a = {1, 2, 3, 4}
b = {3, 4, 5, 6}

print("Union (|):", a | b)
print("Intersection (&):", a & b)
print("Difference (-):", a - b)
print("Symmetric Difference (^):", a ^ b)

## 6. Set Comprehensions

In [None]:
# Basic set comprehension
squares = {x**2 for x in range(10)}
print("Squares:", squares)

In [None]:
# Set comprehension with conditional logic
even_squares = {x**2 for x in range(10) if x % 2 == 0}
print("Even squares:", even_squares)

In [None]:
# Performance comparison: set comprehension vs. loop
import timeit

# Using set comprehension
comp_time = timeit.timeit('{x for x in range(1000)}', number=1000)

# Using loop
loop_time = timeit.timeit(
    'set_1 = set()\nfor x in range(1000):\n    set_1.add(x)',
    number=1000
)

print(f"Comprehension time: {comp_time:.6f} seconds")
print(f"Loop time: {loop_time:.6f} seconds")

## 7. Frozen Sets

In [None]:
# Introduction to Frozen Sets
regular_set = {1, 2, 3}
frozen_set = frozenset([1, 2, 3])

print("Regular set:", regular_set)
print("Frozen set:", frozen_set)

# Attempting to modify a frozen set (will raise an error)
try:
    frozen_set.add(4)
except AttributeError as e:
    print("Error:", e)

In [None]:
# Using frozen sets as dictionary keys
fs1 = frozenset([1, 2, 3])
fs2 = frozenset([3, 4, 5])

set_dict = {
    fs1: "Set A",
    fs2: "Set B"
}

print("Dictionary with frozen set keys:", set_dict)
print("Value for fs1:", set_dict[fs1])

Frozen sets are immutable versions of sets. They are useful when you need a hashable set (e.g., as dictionary keys) or when you want to ensure that a set remains unchanged.

## 8. Working with Multiple Sets

In [None]:
# Combining multiple sets
set1 = {1, 2, 3}
set2 = {3, 4, 5}
set3 = {5, 6, 7}

combined = set1.union(set2, set3)
print("Combined sets:", combined)

In [None]:
# Set intersection and difference with multiple sets
common = set1.intersection(set2, set3)
print("Common elements:", common)

diff = set1.difference(set2, set3)
print("Elements in set1 but not in set2 or set3:", diff)

## 9. Sets and Mathematical Operations

In [None]:
# Mathematical set theory concepts
universe = set(range(1, 11))
A = {1, 2, 3, 4, 5}
B = {4, 5, 6, 7, 8}

print("Universe:", universe)
print("Set A:", A)
print("Set B:", B)
print("Union (A ∪ B):", A.union(B))
print("Intersection (A ∩ B):", A.intersection(B))
print("Difference (A - B):", A.difference(B))
print("Complement of A:", universe.difference(A))

In [None]:
# Practical application: finding unique words in texts
text1 = "the quick brown fox jumps over the lazy dog"
text2 = "the lazy dog sleeps in the sun"

words1 = set(text1.split())
words2 = set(text2.split())

print("Common words:", words1.intersection(words2))
print("Unique words in text1:", words1.difference(words2))
print("Unique words in text2:", words2.difference(words1))
print("All unique words:", words1.symmetric_difference(words2))

## 10. Sets vs. Other Data Structures

In [None]:
# Sets vs. Lists: When to use each
numbers_list = [1, 2, 3, 2, 4, 3, 5]
numbers_set = set(numbers_list)

print("Original list:", numbers_list)
print("As a set (duplicates removed):", numbers_set)

# Use lists when:
print("Third element in list:", numbers_list[2])  # Order matters
numbers_list.append(6)  # Need to maintain duplicates
print("List after appending 6:", numbers_list)

# Use sets when:
print("Is 4 in the set?", 4 in numbers_set)  # Fast membership testing
unique_numbers = list(set(numbers_list))  # Remove duplicates efficiently
print("Unique numbers:", unique_numbers)

In [None]:
# Sets vs. Dictionaries: Key differences and use cases
set_example = {1, 2, 3}
dict_example = {1: 'one', 2: 'two', 3: 'three'}

print("Set:", set_example)
print("Dictionary:", dict_example)

# Sets for unique collections
unique_numbers = {1, 2, 3, 2, 1, 4}
print("Unique numbers:", unique_numbers)

# Dictionaries for key-value pairs
capital_cities = {'France': 'Paris', 'Italy': 'Rome', 'Spain': 'Madrid'}
print("Capital of France:", capital_cities['France'])

In [None]:
# Performance comparison: Sets vs. Lists
import timeit
import random

def setup():
    global data_list, data_set, search_items
    data_list = list(range(10000))
    random.shuffle(data_list)
    data_set = set(data_list)
    search_items = random.sample(range(20000), 1000)

def test_list():
    return [item in data_list for item in search_items]

def test_set():
    return [item in data_set for item in search_items]

setup()
list_time = timeit.timeit(test_list, number=100)
set_time = timeit.timeit(test_set, number=100)

print(f"List search time: {list_time:.6f} seconds")
print(f"Set search time: {set_time:.6f} seconds")

## 11. Set Performance Considerations

In [None]:
# Time complexity of common set operations
import timeit

def measure_time(operation, setup, repeat=3, number=1000):
    times = timeit.repeat(operation, setup, repeat=repeat, number=number)
    return min(times)

setup = '''
s = set(range(10000))
element = 5000
'''

print("Time for set operations (lower is better):")
print(f"Add:       {measure_time('s.add(10001)', setup):.6f} seconds")
print(f"Remove:    {measure_time('s.remove(element)', setup):.6f} seconds")
print(f"In:        {measure_time('element in s', setup):.6f} seconds")
print(f"Union:     {measure_time('s.union(set(range(10000, 20000)))', setup):.6f} seconds")
print(f"Intersect: {measure_time('s.intersection(set(range(5000, 15000)))', setup):.6f} seconds")

In [None]:
# Memory usage of sets
import sys

small_set = set(range(10))
large_set = set(range(10000))

print(f"Memory usage of small set: {sys.getsizeof(small_set)} bytes")
print(f"Memory usage of large set: {sys.getsizeof(large_set)} bytes")

In [None]:
# Optimizing set operations for large data sets
import random

# Generate large sets
set1 = set(random.sample(range(1000000), 100000))
set2 = set(random.sample(range(1000000), 100000))

# Efficient way to find common elements
common = set1.intersection(set2)

# Less efficient way (for comparison)
common_slow = set()
for item in set1:
    if item in set2:
        common_slow.add(item)

print(f"Number of common elements: {len(common)}")
print(f"Efficient and less efficient methods produce same result: {common == common_slow}")

These examples demonstrate the efficiency of set operations, especially for large data sets. Using built-in set methods is generally faster and more memory-efficient than implementing the same operations manually.

## 12. Iterating Over Sets with for Loops

In [None]:
# Basic set iteration
fruits = {'apple', 'banana', 'cherry'}
for fruit in fruits:
    print(fruit)

# Output:
# cherry
# banana
# apple

Explanation: This example demonstrates how to iterate over a set using a simple for loop. Note that sets are unordered, so the items may be printed in any order.

In [None]:
# Iterating and performing operations
numbers = {1, 2, 3, 4, 5}
sum_of_squares = 0
for num in numbers:
    sum_of_squares += num ** 2
print(f"Sum of squares: {sum_of_squares}")

# Output: Sum of squares: 55

Explanation: This example shows how to iterate over a set of numbers and perform calculations with each element.

## 13. Using enumerate() with Sets

In [None]:
# Basic usage of enumerate() with sets
colors = {'red', 'green', 'blue'}
for index, color in enumerate(colors):
    print(f"Index {index}: {color}")

# Output:
# Index 0: blue
# Index 1: red
# Index 2: green

Explanation: `enumerate()` allows you to iterate over a set while keeping track of the iteration count. Remember that the order of items in a set is arbitrary.

In [None]:
# Using enumerate() with a custom start index
animals = {'cat', 'dog', 'elephant'}
for count, animal in enumerate(animals, start=1):
    print(f"Animal {count}: {animal}")

# Output:
# Animal 1: elephant
# Animal 2: cat
# Animal 3: dog

Explanation: You can specify a custom start index for `enumerate()` using the `start` parameter.

## 14. Set Comprehensions for Iteration

In [None]:
# Basic set comprehension
numbers = {1, 2, 3, 4, 5}
squared_numbers = {x**2 for x in numbers}
print(squared_numbers)

# Output: {1, 4, 9, 16, 25}

Explanation: Set comprehensions provide a concise way to create new sets based on existing sets. This example creates a new set with the squares of the original numbers.

In [None]:
# Set comprehension with conditions
numbers = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
even_squares = {x**2 for x in numbers if x % 2 == 0}
print(even_squares)

# Output: {4, 16, 36, 64, 100}

Explanation: You can include conditions in set comprehensions to filter elements. This example creates a set of squares of even numbers only.

# Best Practices for Working with Sets

This section covers best practices for using sets effectively in Python, including when to use them and how to perform efficient operations.

## Choosing Sets for Uniqueness and Membership Testing

In [None]:
# Using a set for efficient membership testing
valid_users = {'alice', 'bob', 'charlie'}

def check_user(username):
    return username in valid_users

print(check_user('alice'))  # Output: True
print(check_user('dave'))   # Output: False

Explanation: Sets are ideal for membership testing because they offer O(1) average time complexity for the `in` operation, making them more efficient than lists for this purpose.

In [None]:
# Using a set to remove duplicates
numbers = [1, 2, 2, 3, 4, 4, 5]
unique_numbers = set(numbers)
print(unique_numbers)

# Output: {1, 2, 3, 4, 5}

Explanation: Sets automatically remove duplicates, making them useful for quickly obtaining unique elements from a collection.

## Avoiding Overuse of Sets for Ordered Data

In [None]:
# Bad practice: Using a set for ordered data
steps = {'mix ingredients', 'bake for 20 minutes', 'preheat oven'}
for step in steps:
    print(step)

# Output (order may vary):
# preheat oven
# mix ingredients
# bake for 20 minutes

Explanation: Sets are unordered, so they're not suitable for data where order matters. In this case, a list would be more appropriate.

In [None]:
# Good practice: Using a list for ordered data
steps = ['preheat oven', 'mix ingredients', 'bake for 20 minutes']
for step in steps:
    print(step)

# Output:
# preheat oven
# mix ingredients
# bake for 20 minutes

Explanation: When order matters, use a list or another ordered collection instead of a set.

## Efficient Set Operations in Real-World Applications

In [None]:
# Efficient set operations for data analysis
users_a = {'alice', 'bob', 'charlie', 'david'}
users_b = {'bob', 'david', 'eve', 'frank'}

# Users in both sets
common_users = users_a & users_b
print("Common users:", common_users)

# Users in either set
all_users = users_a | users_b
print("All users:", all_users)

# Users in A but not in B
unique_to_a = users_a - users_b
print("Users only in A:", unique_to_a)

# Output:
# Common users: {'bob', 'david'}
# All users: {'alice', 'bob', 'charlie', 'david', 'eve', 'frank'}
# Users only in A: {'alice', 'charlie'}

Explanation: Set operations like intersection (`&`), union (`|`), and difference (`-`) are very efficient and useful for comparing data sets in real-world applications.

In [None]:
# Using sets for efficient duplicate removal in a list
def remove_duplicates(items):
    return list(set(items))

original_list = [1, 2, 2, 3, 4, 4, 5, 5]
unique_list = remove_duplicates(original_list)
print(unique_list)

# Output: [1, 2, 3, 4, 5]

Explanation: Converting a list to a set and back to a list is an efficient way to remove duplicates, especially for large datasets.