# Sets and Maps
### Linear scan tradeoff
Sets and maps allow constant time lookups and so can be used to avoid repeated linear scans over an array. There is an upfront cost of $O(n)$ time and space to build a map, but this unlocks $O(1)$ lookups. There is often a tradeoff between time and space.

#### Account sharing detection
You have a list of IP addresses and usernames. All IP addresses are unique, but some usernames may be shared. 
Create a function that returns an IP associated with any of the shared usernames or returns an empty string if no usernames are shared.

In [8]:
def account_sharing_detection(arr):
    seen = set()
    for ip, user in arr:
        if user in seen:
            return ip
        else:
            seen.add(user)
    return ""

In [10]:
connections = [("203.0.113.10", "mike"), ("298.51.100.25", "bob"), ("292.0.2.5", "mike"), ("203.0.113.15", "bob")]
account_sharing_detection(connections)

'292.0.2.5'

### Frequency maps
A frequency map associates a count with each element, to record how many times the element occurs.

##### Some important notes
For many languages, the order of iteration through a set or dictionary is not deterministic. From Python 3.7, Python guarantees that set/map iteration order will match insertion order.
There can be errors or unexpected behaviour if you modify or delete a dictionary/set entry while iterating through it. Instead, save the keys to delete or alter afterwards.
You shouldn't change set or map keys and in python, they can't be mutable (eg an array). If these are changed it affects their hash, but the element doesn't automatically shift to the new location. If you need a list-type key, use a tuple.

#### Most shared account
You have a list of IP addresses and usernames, where all IP addresses are unique but some usernames may be shared. Return the most shared username. If there's a tie, return either.

In [20]:
def most_shared_account(arr):
    username_freq = {}
    highest_count = 0
    most_common_username = arr[0][1]
    for _, user in arr:
        if not user in username_freq:
            username_freq[user] = 0
        username_freq[user] += 1
        if username_freq[user] > highest_count:
            highest_count += 1
            most_common_username = user
    return most_common_username

In [36]:
connections = [("203.0.113.10", "mike"), ("298.51.100.25", "bob"), ("292.0.2.5", "mike2"), ("203.0.113.15", "bob")]
most_shared_account(connections)

'bob'

In [34]:
# The above approach kept track of the most common username during the inital sequence through the account list.
# An alternate strategy is to compile the frequency dictionary and then iterate through it to find the most common. 
# This may be a more efficient strategy for lists featuring a high volume of sharing.

def most_shared_account(arr):
    username_freq = {}
    for _, user in arr:
        if not user in username_freq:
            username_freq[user] = 0
        username_freq[user] += 1

    most_common_username = None
    for user, count in username_freq.items():
        if not most_common_username or count > username_freq[most_common_username]:
            most_common_username = user
    return most_common_username

#### Most frequent octet
Given a list of unique IP addresses in IPv4 format, return the most common first octet (the first 8-bit number before the dot).

In [47]:
def most_frequent_octet(arr):
    octet_freq = {}
    for ip in arr:
        oct = ip.split('.')[0]
        if oct not in octet_freq:
            octet_freq[oct] = 0
        octet_freq[oct] += 1
    most_common_octet = ''
    highest_count = 0
    for oct, count in octet_freq.items():
        if count > highest_count:
            most_common_octet = oct
            highest_count = count
    return most_common_octet

In [49]:
ips = ['203.455.124.67', '208.51.100.5', '202.0.2.5', '203.0.113.5']
most_frequent_octet(ips)

'203'

##### Analysis
Most frequent octet requires $O(n)$ time complexity to set up the dictionary. 
It then requires $O(m)$ time to cycle through the dictionary, where m is the number of distinct entries and a maximum of 256. The size complexity is also $O(m)$ = $O(1)$.

#### Multi-account cheating
A service requires that users have only a single account. Create a function that checks if any service users have the same set of IP addresses attributed to them.

In [83]:
# This problem requires creating a map with list-like keys. The challenge in doing this is that python requires keys to be immutable,
# which means using tuples or strings.
# A secondary challenge is that the lists of IP addresses given are not ordered. They will need to be given a canonical order to ensure 
# identity between matching collections.

def multi_account_cheating(users):
    ip_collections = set()
    for _, ips in users:
        ip_tup = tuple(sorted(ips))
        if ip_tup in ip_collections:
            return True
        else:
            ip_collections.add(ip_tup)
    return False

In [85]:
users = [('mike', ["203.0.3.10", "208.51.0.5", "52.0.2.5"]), 
         ('bob', ['111.0.0.10', '222.0.0.5', '222.0.0.8']), 
         ('bob2', ['222.0.0.5', '222.0.0.8', '111.0.0.10'])]
multi_account_cheating(users)

True