### Introduction to Python Sets
In Python, a **set is** a group of elements that are **unordered and do not contain duplicates**. Although it may seem that the usefulness of this data structure is limited, it can actually be very helpful for organizing items and performing set mathematics.

For example, we can imagine two different groups of items that have some similarities and differences. **Using set** mathematics, we can find the matching items, differences, combine the sets based on different parameters, and more! This **is especially helpful when combing through very large datasets**.

Alternatively, there is also **an immutable version of a set called a frozenset**. A frozenset behaves similarly to a normal set, but it does not include methods that modify the frozenset in any way.

### Here are the ways to create sets:

In [None]:
genre_results = ['rap', 'classical', 'rock', 'rock', 'country', 'rap', 'rock', 'latin', 'country', \
                 'k-pop', 'pop', 'rap', 'rock', 'k-pop',  'rap', 'k-pop', 'rock', 'rap', 'latin', 'pop',\
                 'pop', 'classical', 'pop', 'country', 'rock', 'classical', 'country', 'pop', 'rap', 'latin']

# converting a list (or any iterable) to a set -> it will remoive duplicates
survey_genres = set(genre_results)

# creating a set with set comprehension
survey_abbreviated = { genre[:3] for genre in genre_results}

# set comprehension with condition
items = ['country', 'punk', 'rap', 'techno', 'pop', 'latin']
music_genres = {category for category in items if category[0] == 'p'}

# creating an empty set:
empty_set = set()

# A frozenset can be created only by using its constructor
# Creating a frozenset from a list
frozen_music_genres = frozenset(['country', 'punk', 'rap', 'techno', 'pop', 'latin'])

# We can also create an empty frozenset:
empty_frozen_music_genres = frozenset()

### Adding element(s) to a set

- `.add()` -> will add a single element to a set
- `.update` -> can add multiple elements to a set

There are a few things to note about adding to a set:
- Neither of these methods will add a duplicate item to a set.
- A frozenset can not have any items added to it and so neither of these methods will work.
- Notice that when the elements are printed, they are not printed in the same order in which they entered the set. This is because set and frozenset containers are unordered.


### Removing elements from a set

- `.remove()` - removes a single element, if it doesn't exist throws a KeyError
- `.discard()` - works the same way, but doesn't throw a KeyError if the element isn't in the set

Items cannot be removed from a frozenset

### Finding Elements in a Set

In Python, set and frozenset items cannot be accessed by a specific index. This is due to the fact that both containers are unordered and have no indices. However, like most other Python containers, we can use the `in` keyword to test if an element is in a set or frozenset.

In [None]:
allowed_tags = ['pop', 'hip-hop', 'rap', 'dance', 'electronic', 'latin', 'indie', 'alternative rock', \
                'classical', 'k-pop', 'country', 'rock', 'metal', 'jazz', 'exciting', 'sad', 'happy', \
                'upbeat', 'party', 'synth', 'rhythmic', 'emotional', 'relationship', 'warm', 'guitar', \
                'fiddle', 'romance', 'chill', 'swing']

song_data_users = {'Retro Words': ['pop', 'explosion', 'hammer', 'bomb', 'warm', 'due', 'writer', 'happy', \
                                   'horrible', 'electric', 'mushroom', 'shed']}

# Create a set from the tags in song_data_users dictionary
tag_set = set(song_data_users['Retro Words'])

# collect the not allowed tags to a list
bad_tags =[]
for tag in tag_set:
  if tag not in allowed_tags:
    bad_tags.append(tag)

# remove the bad tags from the tag_set
for tag in bad_tags:
  tag_set.remove(tag)

# Update the dictionary with the correct set
song_data_users['Retro Words'] = tag_set  
print(song_data_users)

### Union - > `set_a.union(set_b)` or `set_a | set_b`
The resulting set contains all the elements in both set A and set B as well as elements they have in common (**minus the duplicates**). In this case we are only looking at merging two sets but it’s also common to perform the operation on as many as we need!

Note that the return value takes the form of th left operand. If a normal set is on the left the result will be a normal set. If it is a frozenset, then the result will be a frozenset too.

**Consolidate the tags into one dictionary for each category**

In [None]:
song_data = {'Retro Words': ['pop', 'warm', 'happy', 'electronic'],
             'Wait For Limit': ['rap', 'upbeat', 'romance'],
             'Stomping Cue': ['country', 'fiddle', 'party'],
             'Lowkey Space': ['electronic', 'dance', 'synth']}

user_tag_data = {'Lowkey Space': ['party', 'synth', 'fast', 'upbeat'],
                 'Retro Words': ['happy', 'electronic', 'fun', 'exciting'],
                 'Wait For Limit': ['romance', 'chill', 'rap', 'rhythmic'], 
                 'Stomping Cue': ['country', 'swing', 'party', 'instrumental']}

new_song_data = {}

for key,value  in song_data.items():
  song_tag = set(value)
  user_tag = set(user_tag_data[key])
  tag_union = song_tag | user_tag
  new_song_data[key] = tag_union

print(new_song_data)


### Set Intersection
Let’s say that we have two or more sets, and we want to find which items both sets have in common. The set container has a method called `.intersection()` which returns a new set or frozenset consisting of those elements. An intersection can also be performed on multiple sets using the `&` operator.

Similar to the other operations, the type of the first operand (a set or frozenset on the left side of the operator or method) determines if a set or frozenset is returned when finding the intersection.

In addition to a regular intersection, the set container can also use a method called `.intersection_update()`. Instead of returning a new set, the original set is updated to contain the result of the intersection.

**Exercise:**

We want to add a feature to our app which will recommend songs based on the most recent songs a user has listened to. 
One way we can do this is by using the intersection of the recent song tags. 
Let’s use the intersection of these tags to find which other songs are similar. 

First, create a variable called tags_int that stores the intersection between the tags for the user_recent_songs two recent songs 'Retro Words' and 'Lowkey Space'. Remember to convert each list into a set to perform the operation.
We will be using these common tags as a basis for finding a recommended song in song_data.


Now, let’s find the recommended songs based on the common tags we found in the previous step. 
Find all other songs in song_data which have these tags. 
Store the songs which have any matching tags into a dictionary called recommended_songs. 
Make sure that you do not add any songs which the user has listened to recently!

Print recommended_songs to see the result!


In [None]:
song_data = {'Retro Words': ['pop', 'warm', 'happy', 'electronic', 'synth'],
             'Wait For Limit': ['rap', 'upbeat', 'romance'],
             'Stomping Cue': ['country', 'fiddle', 'party'],
             'Lowkey Space': ['electronic', 'dance', 'synth', 'upbeat'],
             'Back To Art': ['pop', 'sad', 'emotional', 'relationship'],
             'Blinding Era': ['rap', 'intense', 'moving', 'fast'],
             'Down To Green Hills': ['country', 'relaxing', 'vocal', 'emotional'],
             'Double Lights': ['electronic', 'chill', 'relaxing', 'piano', 'synth']}

user_recent_songs = {'Retro Words': ['pop', 'warm', 'happy', 'electronic', 'synth'],
                     'Lowkey Space': ['electronic', 'dance', 'synth', 'upbeat']}

tags_int = set(user_recent_songs['Retro Words']) & set(user_recent_songs['Lowkey Space'])

recommended_songs = {}

for key, value in song_data.items():
  if key in user_recent_songs.keys():
    continue
  else:
    for tag in tags_int:
      if tag in value:
        recommended_songs[key] = value

print(recommended_songs)

### Set Difference
We can find unique elements in one set. To do so, the set or frozenset use the `.difference()` method or the `-` operator. 
This returns a set or frozenset, which contains only the elements from the first set which are not found in the second set. 

Similar to the other operations, the type of the first operand (a set or frozenset on the left side of the operator or method) determines if a set or frozenset is returned when finding the difference.

This operation also supports an updating version of the method. You can use `.difference_update()` to update the original set with the result instead of returning a new set or frozenset object.


### Symmetric Difference
The last operation we will be looking at is the symmetric difference. We can think of this operation as the opposite of the intersection operation. A resulting set will include all elements from the sets which are in one or the other, but not both. In other words, elements that are unique to each set.

To perform this operation on the set or frozenset containers, we can use the `.symmetric_difference()` method or the `^` operator. Like the other operators, the type of the first operand (a set or frozenset on the left side of the operator or method) determines if a set or frozenset is returned when finding the symmetric difference.

We can also update the original set using this operation by using the `.symmetric_difference_update()` method to update the original set with the result instead of returning a new set or frozenset object.

**Exercise:**
The users of our app would like to be able to see which tags are unique between them and their friends. This means that the tags which are not shared between the user and their friend are shown. In order to find this, we can use the symmetric difference.


In [None]:
user_song_history = {'Retro Words': ['pop', 'warm', 'happy', 'electronic', 'synth'],
                     'Stomping Cue': ['country', 'fiddle', 'party'],
                     'Back To Art': ['pop', 'sad', 'emotional', 'relationship'],
                     'Double Lights': ['electronic', 'chill', 'relaxing', 'piano', 'synth']}

friend_song_history = {'Lowkey Space': ['electronic', 'dance', 'synth', 'upbeat'],
                     'Blinding Era': ['rap', 'intense', 'moving', 'fast'],
                     'Wait For Limit': ['rap', 'upbeat', 'romance', 'relationship'],
                     'Double Lights': ['electronic', 'chill', 'relaxing', 'piano', 'synth']}

user_tags = set()
friend_tags = set()
for key, value in user_song_history.items():
  user_tags.update(set(value))

for key, value in friend_song_history.items():
  friend_tags.update(set(value))

unique_tags = user_tags.symmetric_difference(friend_tags)
print(unique_tags)