<a href="https://colab.research.google.com/github/iamkhiemnguyen/CSE-6040/blob/main/Module%200/Session%2010/solutions_MT1%20Easy%20Debugging.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Debugging Bad Solutions: #
## From Midterm 1, Spring 2023 - 1 Point Exercises ##

**Purpose:**
On the exams you may initially write solutions that do not pass the test cases. That's okay! You will need to debug your code to determine what is causing the issue(s) and then figure out to how fix them. So how can we get better at debugging? We practice!

Below are four 1 point exercises from the Spring 2023 Midterm 1. We have pre-written solutions for each exercise that are "bad" in one or more ways. Our solutions may contain one or more logic and/or syntax errors. Can you find and fix the issues in each exercise and pass all of the test cases?

Right before each exercise test cell, there is a block of text explaining the variables available to you for debugging. You may use these to test your code and can print/display them as needed.

<br/>

**Exercise point breakdown:**

- Exercise 0: **1** point
- Exercise 1: **1** point
- Exercise 5: **1** point
- Exercise 6: **1** point

# Background: Mining for Friends #

In this notebook, you will analyze social media data from Twitter and FourSquare. Your ultimate goal will be to help people find others with common interests.

The (anonymized) data consists of the following:

- A collection of places, or _points of interest_ (POIs), like restaurants, movie theaters, post offices, museums, and so on.
- A database of cities.
- A collection of _check-ins_, that is, the places that a specific person has visited.
- Existing _connections_, that is, "who follows whom" type relationships.

We will analyze this data and then, by the very last exercise, create a function that can, for a given person, recommend other people they might be compatible with based on their affinity for the same places.

Start by running the next cell, which will set up some of the code and data you'll need later.

In [None]:
# !python --version
!pip install dill
import dill as pickle

In [None]:
### Global Imports
# Some functionality needed by the notebook and demo cells:
from pprint import pprint, pformat
import math

def status_msg(s, time=None):
    from datetime import datetime
    if time is None:
        time = datetime.now()
    print(f"[{time}] {s}")

def load_json(filename):
    from json import load
    with open(filename, "rt") as fp:
        return load(fp)
    return None

def save_json(filename, obj):
    from json import dump
    with open(filename, "w") as fp:
        dump(obj, fp, indent=2)

def load_pickle(filename):
    from pickle import load
    with open(filename, "rb") as fp:
        return load(fp)
    return None

def save_pickle(filename, obj):
    from pickle import dump
    with open(filename, "wb") as fp:
        dump(obj, fp)

def choose_ext(s, ext_map):
    for ext in ext_map:
        if s[-(len(ext)+1):] == f".{ext}":
            return ext_map[ext]
    return None

def load_database(basename, tag=None, sample=None, pathname=""):  # pathname="resource/asnlib/publicdata/"
    filename = pathname + basename
    status_msg(f"Loading {tag+' ' if tag is not None else ''}[{filename}] ...")
    loader = choose_ext(basename, {'pickle': load_pickle, 'json': load_json})
    assert loader is not None, "*** Unrecognized file extension. ***"
    database = loader(filename)
    status_msg("... done!")
    print(f"\nThis data has {len(database):,} entries.")
    if sample is not None and isinstance(database, dict):
        print("\nHere is a sample:\n")
        pprint({key: database[key] for key in sample})
    return database

def save_database(database, basename, tag=None, sample=None, pathname=""):  #  pathname="resource/asnlib/publicdata/"
    if sample is not None and isinstance(database, dict):
        database = {key: database[key] for key in sample}
    print(f"This data has {len(database):,} entries.")
    filename = pathname + basename
    status_msg(f"\nSaving {tag+' ' if tag is not None else ''}[{filename}] ...")
    saver = choose_ext(basename, {'pickle': save_pickle, 'json': save_json})
    assert saver is not None, "*** Unrecognized file extension. ***"
    saver(filename, database)
    status_msg("... done!")

def sample_dict(d, k=1):
    """Extract a sample of at most `k` key-value pairs from the dictionary `d`."""
    assert k >= 0, f"*** The number of samples must be nonnegative (k={k}). ***"
    from random import sample
    keys = sample(d.keys(), min(k, len(list(d.keys()))))
    return {k: d[k] for k in keys}

def sample_safely(x, k):
    """Returns a set of at most `k` uniform-random samples from `x`."""
    from random import sample
    return set(sample(x, min(k, len(x))))

def subset_dict(d, ks):
    """Returns a subset of the dictionary `d` for the keys `ks`."""
    return {k: v for k, v in d.items() if k in ks}

def enum_map(x):
    map_dict = {e: k for k, e in enumerate(x)}
    return lambda i: map_dict[i]

def remap_dict(d, map_key=lambda k: k, map_val=lambda k: k):
    """Relabel the key-value pairs of a dictionary."""
    assert len(set(map_key(k) for k in d.keys())) == len(d.keys()), '*** `map_key` is not one to one ***'
#    assert len(set(map_val(k) for k in d.values())) == len(d.values()), '*** `map_val` is not one to one ***'
    return {map_key(k): map_val(v) for k, v in d.items()}

def remap_set(s, map_fun=lambda e: e):
    """Relabel the elements of a set."""
    s_new = {map_fun(e) for e in s}
    assert len(s_new) == len(s), "*** `map_fun` is not one to one ***"
    return s_new

In [None]:
# import files
!wget https://raw.githubusercontent.com/gt-cse-6040/bootcamp/main/Module%200/Session%2010/Colab%20Support%20Files/active_users.pickle
!wget https://raw.githubusercontent.com/gt-cse-6040/bootcamp/main/Module%200/Session%2010/Colab%20Support%20Files/cities.pickle
!wget https://raw.githubusercontent.com/gt-cse-6040/bootcamp/main/Module%200/Session%2010/Colab%20Support%20Files/connections.pickle
!wget https://raw.githubusercontent.com/gt-cse-6040/bootcamp/main/Module%200/Session%2010/Colab%20Support%20Files/food_and_drink_types.pickle
!wget https://raw.githubusercontent.com/gt-cse-6040/bootcamp/main/Module%200/Session%2010/Colab%20Support%20Files/pois2.pickle
!wget https://raw.githubusercontent.com/gt-cse-6040/bootcamp/main/Module%200/Session%2010/Colab%20Support%20Files/tc_0
!wget https://raw.githubusercontent.com/gt-cse-6040/bootcamp/main/Module%200/Session%2010/Colab%20Support%20Files/tc_1
!wget https://raw.githubusercontent.com/gt-cse-6040/bootcamp/main/Module%200/Session%2010/Colab%20Support%20Files/tc_5
!wget https://raw.githubusercontent.com/gt-cse-6040/bootcamp/main/Module%200/Session%2010/Colab%20Support%20Files/tc_6

!mkdir tester_fw
%cd tester_fw

!wget https://raw.githubusercontent.com/gt-cse-6040/bootcamp/main/Module%200/Session%2010/Colab%20Support%20Files/tester_fw/__init__.py
!wget https://raw.githubusercontent.com/gt-cse-6040/bootcamp/main/Module%200/Session%2010/Colab%20Support%20Files/tester_fw/test_utils.py
!wget https://raw.githubusercontent.com/gt-cse-6040/bootcamp/main/Module%200/Session%2010/Colab%20Support%20Files/tester_fw/testers.py

%cd ..

# Part A: Points-of-interest and cities #

The first part of the dataset consists of **points-of-interest** or **POIs**. A POI is a place that a person can visit.

Run this code cell to load the dataset of POIs.

In [None]:
pois = load_database('pois2.pickle',
                     tag="points-of-interest (POIs)")

There are over half-a-million POIs. Each one is stored as a key-value pair where
- the key is the **POI's ID**; and
- the value holds the **POI's attributes**, stored as another Python dictionary.

In the sample above, there are three POIs, each one with four attributes.
- **`'country_code'`**: A two-letter country code indicating in which country the POI is located. In this example, one is in Brazil (`'BR'`) and two in Turkey (`'TR'`).
- **`'lat'` and `'long'`**: The latitude and longitude coordinates of the POI.
- **`'type'`**: The type of POI. We see three types in this example: a `'University'`, a `'Automotive Shop'`, and a `'Turkish Restaurant'`.

## Exercise 0 (**1** point): `find_food_and_drink_types` ##

To practice inspecting the POIs, suppose we want to find types of POIs that are associated with restaurants, bars, and the like. Implement the function,
```python
def find_food_and_drink_types(pois):
    ...
```
to accomplish this task.

**Inputs:** A collection of POIs as shown above, given as a Python dictionary of dictionaries.

**Task:** For each POI type, which is a string, split the string into "words" using whitespace as a delimiter.

Compare each word to the target list, below. Consider the word a "match" if it is exactly the same except for case. (So `'Food'` and `'FOOD'` match `'food'`, but `'foods'` and `'food'` do not match.)

The target words are:
```
'bar', 'bars', 'beer', 'brewery', 'cafe', 'cafes', 'coffee',
'cocktail', 'cocktails', 'drink', 'drinks', 'food',
'restaurant', 'restaurants', 'sake', 'tea', 'wine',
'whisky', 'whiskey'
```

**Outputs:** Return a Python set containing types whose words matched the target list. The case of the returned types should match the originals _exactly_, including case. (See the demo.)

In [None]:
# Demo input:
demo_pois_ex0 = {'4c0e251bb1b676b0f788e186': {
                     'country_code': 'AT',
                     'lat': 48.257386,
                     'long': 16.400122,
                     'type': 'Fast Food Restaurant'},
                 '4c247550b012b713167d0893': {
                     'country_code': 'HR',
                     'lat': 45.347616,
                     'long': 14.300922,
                     'type': 'Pool'},
                 '4fa3c7ade4b0f90206220bbf': {
                     'country_code': 'KR',
                     'lat': 37.545866,
                     'long': 127.122266,
                     'type': 'Bike Shop'}}

<!-- Expected demo output text block -->
If `demo_pois_ex0` (above) is the input, then `find_food_and_drink_types(demo_pois_ex0)` should produce:
```
{'Fast Food Restaurant'}
```
<!-- Include any shout outs here -->

In [None]:
### Exercise 0 solution
def find_food_and_drink_types(pois):
    '''
    INPUT:
    pois is a dictionary of dictionaries where the key is the POI's ID and the value is a dictionary of POI attributes. The attribute we care about is called 'type'

    GOAL:
    Return a Python Set containing types whose words matched the target list. The case of the returned types should match the originals exactly, including case.

    STRATEGY:
    1. Store all of the target keywords in a list (call this target_words)
    2. Create an empty set to hold the types whose words included the target words (call this food_and_drink_types)
    3. Iterate over pois dictionary
    4. Get the value of the poi (aka the nested dictionary of attributes). Let's call this poi_nested_dict
    5. Grab the value of 'type' from the nested dictionary of attributes (let's call this type_words)
    6. Split the type string into a list based using whitespace as a delimiter (let's call this type_words_list)
    7. For each word in the split list, check to see if the lowercase of that word is found in the target_words list
    8. If it is found, add the type to the set 'food_and_drink_types' (consider breaking out of inner for loop if match is found - for speed purposes)
    9. After all iterations have completed, return the set 'food_and_drink_types'
    '''

    # SOLUTION:
    target_words = ['bar', 'bars', 'beer', 'brewery', 'cafe', 'cafes', 'coffee', 'cocktail', 'cocktails', 'drink', 'drinks', 'food', 'restaurant', 'restaurants', 'sake', 'tea', 'wine', 'whisky', 'whiskey']
    food_and_drink_types = set()

    for poi in pois:
      poi_nested_dict = pois[poi]
      type_words = poi_nested_dict['type']
      type_words_list = type_words.split()

      for word in type_words_list:
        if word.lower() in target_words:
          food_and_drink_types.add(type_words)
          break

    return food_and_drink_types


### demo function call
find_food_and_drink_types(demo_pois_ex0)

<!-- Test Cell Boilerplate -->
The cell below will test your solution for Exercise 0. The testing variables will be available for debugging under the following names in a dictionary format.
- `input_vars` - Input variables for your solution.
- `original_input_vars` - Copy of input variables from prior to running your solution. These _should_ be the same as `input_vars` - otherwise the inputs were modified by your solution.
- `returned_output_vars` - Outputs returned by your solution.
- `true_output_vars` - The expected output. This _should_ "match" `returned_output_vars` based on the question requirements - otherwise, your solution is not returning the correct output.

In [None]:
### test_cell_ex0
from tester_fw.testers import Tester

conf = {
    'case_file':'tc_0',
    'func': find_food_and_drink_types, # replace this with the function defined above
    'inputs':{ # input config dict. keys are parameter names
        'pois':{
            'dtype':'dict', # data type of param.
            'check_modified':True,
        }
    },
    'outputs':{
        'output_0':{
            'index':0,
            'dtype':'set',
            'check_dtype': True,
            'check_col_dtypes': True, # Ignored if dtype is not df
            'check_col_order': True, # Ignored if dtype is not df
            'check_row_order': True, # Ignored if dtype is not df
            'check_column_type': True, # Ignored if dtype is not df
            'float_tolerance': 10 ** (-6)
        }
    }
}
tester = Tester(conf, key=b'1ZePXzAcTR7lcNpmx1HRK0lT3v-Ikrg8mZ3n-wVFTBo=', path='')  # path='resource/asnlib/publicdata/'
for _ in range(70):
    try:
        tester.run_test()
        (input_vars, original_input_vars, returned_output_vars, true_output_vars) = tester.get_test_vars()
    except:
        (input_vars, original_input_vars, returned_output_vars, true_output_vars) = tester.get_test_vars()
        raise

print('Passed! Please submit.')

**RUN ME: Food and drink types:** If you had a correct solution, you could use it to determine the food and drink types. We have precomputed this list for you, and here are the results. **Run this cell whether or not you completed Exercise 0.**

In [None]:
food_and_drink_types = load_database('food_and_drink_types.pickle')
print(f"\nThere are {len(food_and_drink_types):,} types that match the food and drink keywords. They are:")
print(food_and_drink_types)

# Part B: Cities #

A second component of the dataset is a collection of **city records**. Run the following cell to load it.

In [None]:
cities = load_database('cities.pickle',
                       tag="cities",
                       sample = ["Beijing, China", "Hartford, United States", "Stockholm, Sweden"])

Each city record is a key-value pair. The key is the **city name**, and the value is a dictionary of **city attributes**. The attributes are a two-letter country code, the latitude and longitude coordinates of the city center, and the type of city.

In this example, there are three cities: `'Beijing, China'`, `'Hartford, United States'`, and `'Stockholm, Sweden'`.

**Important!** Both the POIs and the cities are dictionaries of dictionaries, where in both cases the _outer_ dictionary's key-value pairs define a record, and the inner dictionaries have the keys, `'country_code'`, `'lat'`, `'long'`, and `'type'`.

## Exercise 1 (**1** point): `get_common_ccs` ##

Suppose we wish to determine which countries exist in **both** the POIs data **and** the cities data. Implement the function,
```python
def get_common_ccs(pois, cities):
    ...
```
to complete this task.

**Inputs:**
- `pois`: POIs, stored in a Python dictionary of dictionaries, mapping POI IDs keys to attribute values (like earlier examples)
- `cities`: City records, stored in a Python dictionary of dictionaries, mapping city names to attribute values (like earlier examples)

You may assume that the inner dictionaries have the same four keys: `'country_code'`, `'lat'`, `'long'`, and `'type'`.

**Task:** Identify all country codes that exist in `pois` and in `cities`.

**Outputs:** Return a new **Python set** consisting of the common country codes.

In [None]:
### DEMO INPUTS ###
demo_pois_ex1 = \
    {'abc': {'country_code': 'US', 'lat': -5.2, 'long': 2.0, 'type': 'Restaurant'},
     'def': {'country_code': 'US', 'lat': -2.3, 'long': 6.8, 'type': 'Golf Course'},
     'ghi': {'country_code': 'PL', 'lat': 1.0, 'long': 6.5, 'type': 'Apartment'},
     'jkl': {'country_code': 'VN', 'lat': 11.0, 'long': 1.5, 'type': 'Post Office'}}

demo_cities_ex1 = \
    {'Seattle, United States': {'country_code': 'US', 'lat': -5.0, 'long': 3.0, 'type': 'Other'},
     'Richmond, United States': {'country_code': 'US', 'lat': -2.0, 'long': 7.0, 'type': 'Other'},
     'Hanoi, Vietnam': {'country_code': 'VN', 'lat': 10.0, 'long': 2.0, 'type': 'National and provincial capital'},
     'Curitiba, Brazil': {'country_code': 'BR', 'lat': 0.0, 'long': 6.0, 'type': 'Provincial capital'}}

<!-- Expected demo output text block -->
When a correct implementation runs on the demo inputs, it should return
```
{'US', 'VN'}
```
<!-- Include any shout outs here -->
The POI `'PL'` does **not** appear in the output since it exists in `pois` but **not** in `cities`.

In [None]:
### Exercise 1 solution
def get_common_ccs(pois, cities):
   '''
   INPUT:
   We are given 2 dictionaries of dictionaries as input: pois and cities.
   The 'country_code' attribute within the nested dictionary is all we really care about in this exercise.

   GOAL:
   Return a Set of all country codes that exist within both the pois and cities.

   STRATEGY:
   1. Create an empty set to hold our shared country codes between pois and cities. Let's call this 'shared_country_codes'.
   2. Create an empty set to hold our pois country codes. Let's call this 'pois_country_codes'.
   3. Create an empty set to hold our cities country codes. Let's call this 'cities_country_codes'.

   4. Iterate over 'pois' dictionary of dictionaries.
   5. Grab the country code from the nested dictionary.
   6. Add this country code to our set 'pois_country_codes'

   7. Iterate over the 'cities' dictionary of dictionaries.
   8. Grab the country code from the nested dictionary.
   9. Add this country code to our set 'cities_country_codes'.

   10. Find all shared country codes in 'pois_country_codes' and 'cities_country_codes'. Assign this to our variable 'shared_country_codes'
   11. Return 'shared_country_codes'
   '''

   # SOLUTION:
   shared_country_codes = set()
   pois_country_codes = set()
   cities_country_codes = set()

   for poi in pois:
      poi_nested_dict = pois[poi]
      pois_country_codes.add(poi_nested_dict['country_code'])

   for city in cities:
      city_nested_dict = cities[city]
      cities_country_codes.add(city_nested_dict['country_code'])

   shared_country_codes = pois_country_codes.intersection(cities_country_codes)
   return shared_country_codes


### demo function call
get_common_ccs(demo_pois_ex1, demo_cities_ex1)

<!-- Test Cell Boilerplate -->
The cell below will test your solution for Exercise 1. The testing variables will be available for debugging under the following names in a dictionary format.
- `input_vars` - Input variables for your solution.
- `original_input_vars` - Copy of input variables from prior to running your solution. These _should_ be the same as `input_vars` - otherwise the inputs were modified by your solution.
- `returned_output_vars` - Outputs returned by your solution.
- `true_output_vars` - The expected output. This _should_ "match" `returned_output_vars` based on the question requirements - otherwise, your solution is not returning the correct output.

In [None]:
### test_cell_ex1

from tester_fw.testers import Tester

conf = {
    'case_file':'tc_1',
    'func': get_common_ccs, # replace this with the function defined above
    'inputs':{ # input config dict. keys are parameter names
        'pois':{
            'dtype':'dict', # data type of param.
            'check_modified':True,
        },
        'cities':{
            'dtype':'dict',
            'check_modified':True,
        },
    },
    'outputs':{
        'output_0':{
            'index':0,
            'dtype':'set',
            'check_dtype': True,
            'check_col_dtypes': True, # Ignored if dtype is not df
            'check_col_order': True, # Ignored if dtype is not df
            'check_row_order': True, # Ignored if dtype is not df
            'check_column_type': True, # Ignored if dtype is not df
            'float_tolerance': 10 ** (-6)
        }
    }
}
tester = Tester(conf, key=b'1ZePXzAcTR7lcNpmx1HRK0lT3v-Ikrg8mZ3n-wVFTBo=', path='')  # path='resource/asnlib/publicdata/'
for _ in range(70):
    try:
        tester.run_test()
        (input_vars, original_input_vars, returned_output_vars, true_output_vars) = tester.get_test_vars()
    except:
        (input_vars, original_input_vars, returned_output_vars, true_output_vars) = tester.get_test_vars()
        raise

print('Passed! Please submit.')

# Part E: Connections #

The last part of the dataset is the collection of **follower connections**. Run the cell below to load these data:

In [None]:
connections = load_database('connections.pickle')
print(f"\nThere are {len(connections):,} connections (nonsymmetric).")
print(f"For instance, here are user 54's connections:")
[x for x in connections if x[0] == 54]

The connections are stored as a Python list of pairs of user IDs. The example above shows people that user 54 is connected to on some social media site.

**Important caveat:** The connections are stored nonsymmetrically. That is, while the pair `(54, 528)` appears, user `54` happens to not appear in the connections list of `528`:

In [None]:
[x for x in connections if x[0] == 528]

This choice was made by the people who generated the data to save space. However, for our analysis, we will assume that if a pair `(a, b)` exists, it implies that **both** `a` is connected to `b` **and** `b` is connected to `a`.

## Exercise 5 (**1** point): `get_active_users`

Let's define an **active user** to be one who has _both_ connections _and_ check-in visits. Implement a function,
```python
def get_active_users(connections, checkins):
    ...
```
to find and return all active users.

**Inputs:**
- `connections`: A Python list of pairs of user IDs (integers).
- `checkins`: Check-in records, stored as a dictionary of lists of dictionaries (mapping user IDs to lists of visits; recall Part C, Exercise 3 or Part D, Exercise 4).

**Task:** Determine which users have both connections (meaning they appear in the `connections` list) **and** check-ins (meaning they have at least one check-in visit).

**Output:** Return a Python set of all active users. If there are no active users, this function should return an empty set.

**Notes and hints:** Assume relationships are symmetric. That is, we say an active user `x` "has connections" if it appears in _either_ a pair `(x, y)` _or_ a pair `(y, x)`.

In [None]:
### Define demo inputs
demo_connections_ex5 = \
[ (72049, 99545),
  (99545, 822927),
  (470311, 665256),
  (470311, 846214),
  (470311, 894292),
  (470311, 1045011),
  (502736, 921752),
  (668626, 921752),
  (686743, 921752),
  (921752, 944779),
  (921752, 972956),
  (921752, 1076615)]

demo_checkins_ex5 = \
{ 99545: [ { 'date': 'Thu Aug 16 22:55:43 +0000 2012',
             'poi': '4b6ef8a8f964a520f2d32ce3'},
           { 'date': 'Tue Aug 14 23:46:30 +0000 2012',
             'poi': '4c66a5b1e1da1b8daf6e9bc3'},
           { 'date': 'Thu Aug 09 18:45:08 +0000 2012',
             'poi': '4ce575825fce548110bf5baa'}],
  470311: [ { 'date': 'Sun Apr 08 08:53:52 +0000 2012',
              'poi': '4cddd943db1254815fee2cce'},
            { 'date': 'Sun Jun 03 00:01:22 +0000 2012',
              'poi': '4cddd943db1254815fee2cce'}],
  921752: [ { 'date': 'Sun Aug 19 13:35:29 +0000 2012',
              'poi': '4b58539af964a520ea5228e3'},
            { 'date': 'Sat May 05 10:24:08 +0000 2012',
              'poi': '4b55a124f964a520b2e927e3'},
            { 'date': 'Mon Aug 06 03:42:48 +0000 2012',
              'poi': '4ed2ff869adf25445a084c63'},
            { 'date': 'Thu Jul 05 11:01:38 +0000 2012',
              'poi': '4b3dd6ddf964a520089725e3'}]}

<!-- Expected demo output text block -->
Given the preceding input, a correct solution would return the set of active users,
```
{921752, 99545, 470311}
```
<!-- Include any shout outs here -->
These are the only user IDs in this input that have _both_ visits _and_ connections.

In [None]:
### Exercise 5 solution
def get_active_users(connections, checkins):
    '''
    INPUT:
    We are given 2 inputs:
    connections: A Python list of pairs of user IDs (integers)
    checkins: Check-in records, stored as a dictionary of lists of dictionaries

    GOAL:
    Determine which users have both connections (meaning they appear in the connections list) and check-ins (meaning they have at least one check-in visit).
    Return a Python set of all active users. If there are no active users, this function should return an empty set.

    STRATEGY:
    1. Create an empty set to hold all active users. Let's call this 'active_users'.
    2. Create an empty set to hold user ids in 'connections'. Let's call this 'connections_users'.
    3. Create an emtpy set to hold user ids in 'checkins'. Let's call this 'checkins_users'.

    4. Iterate over each pair of user ids 'connections'
    5. Add each of the two user ids to 'connections_users'

    6. Iterate over user id keys in 'checkins'
    7. Add each user id to 'checkins_users'

    8. Find intersection of 'connections_users' and 'checkins_users'. Assign this to our variable 'active_users'.
    9. Return 'active_users'
    '''

    # SOLUTION:
    active_users = set()
    connections_users = set()
    checkins_users = set()

    for pair in connections:
      connections_users.add(pair[0])
      connections_users.add(pair[1])

    for user_id in checkins:
      checkins_users.add(user_id)

    active_users = connections_users.intersection(checkins_users)
    return active_users


### demo function call
get_active_users(demo_connections_ex5, demo_checkins_ex5)

<!-- Test Cell Boilerplate -->
The cell below will test your solution for Exercise 5. The testing variables will be available for debugging under the following names in a dictionary format.
- `input_vars` - Input variables for your solution.
- `original_input_vars` - Copy of input variables from prior to running your solution. These _should_ be the same as `input_vars` - otherwise the inputs were modified by your solution.
- `returned_output_vars` - Outputs returned by your solution.
- `true_output_vars` - The expected output. This _should_ "match" `returned_output_vars` based on the question requirements - otherwise, your solution is not returning the correct output.

In [None]:
### test_cell_ex5

from tester_fw.testers import Tester

conf = {
    'case_file':'tc_5',
    'func': get_active_users, # replace this with the function defined above
    'inputs':{ # input config dict. keys are parameter names
        'connections':{
            'dtype':'list', # data type of param.
            'check_modified':True,
        },
        'checkins':{
            'dtype':'dict', # data type of param.
            'check_modified':True,
        }
    },
    'outputs':{
        'output_0':{
            'index':0,
            'dtype':'set',
            'check_dtype': True,
            'check_col_dtypes': True, # Ignored if dtype is not df
            'check_col_order': True, # Ignored if dtype is not df
            'check_row_order': True, # Ignored if dtype is not df
            'check_column_type': True, # Ignored if dtype is not df
            'float_tolerance': 10 ** (-6)
        }
    }
}
tester = Tester(conf, key=b'1ZePXzAcTR7lcNpmx1HRK0lT3v-Ikrg8mZ3n-wVFTBo=', path='') # path='resource/asnlib/publicdata/'
for _ in range(70):
    try:
        tester.run_test()
        (input_vars, original_input_vars, returned_output_vars, true_output_vars) = tester.get_test_vars()
    except:
        (input_vars, original_input_vars, returned_output_vars, true_output_vars) = tester.get_test_vars()
        raise

print('Passed! Please submit.')

**RUN ME: Active users:** If you had a correct solution, you could use it to find all active users in the full dataset. We have done just that.

**Run this cell whether or not you completed Exercise 5**. It will create a Python set of active users' IDs called `active_users`.

In [None]:
active_users = load_database('active_users.pickle')
print(f"\nThere are {len(active_users):,} active users.")
print("Here are a few of them:", *list(active_users)[:5], "...")

## Exercise 6 (**1** point): `form_connection_vectors` ##

Given some user, `a`, the **connection vector** of `a` is a Python set consisting of all other users connected to `a`. Implement the function,
```python
def form_connection_vectors(connections):
    ...
```
to construct connection vectors for all users.

**Input:**
- `connections`: A Python list of user ID-pairs, `(a, b)`, signifying a mutual connection between user `a` and user `b`. By "mutual connection," we mean that `a` is connected to `b` **and** `b` is connected to `a`. Therefore, if there is a pair `(a, b)`, then `b` should be in the connection vector of `a` and `a` should be in the connection vector of `b`.

**Task:** Sweep through the connections and construct connection vectors for all users.

**Outputs:** Return a Python dictionary of Python sets. The dictionary keys are user IDs; their values are the connection vectors (Python sets).

In [None]:
### Define demo inputs
demo_connections_ex6 = [(2, 10), (6, 7), (7, 0), (7, 9), (11, 10), (12, 10), (13, 10)]

<!-- Expected demo output text block -->
Suppose the input connections are given by `demo_connections_ex6`. Then a correct solution to this problem would produce the following output:
```
{ 0: {7},
  2: {10},
  6: {7},
  7: {0, 9, 6},
  9: {7},
  10: {2, 11, 12, 13},
  11: {10},
  12: {10},
  13: {10}}
```
<!-- Include any shout outs here -->
Observe that `0` only appears in the pair `(7, 0)`; therefore, the connection vector of `0` is the singleton set `{7}`. By contrast, `7` appears in the pairs `(6, 7)`, `(7, 0)`, and `(7, 9)`; therefore, the connection vector of `7` is the set `{0, 6, 9}`. (Recall that sets are equal regardless of element order.)

In [None]:
### Exercise 6 solution
def form_connection_vectors(connections):
    '''
    INPUT:
    connections: A Python list of user ID-pairs, (a, b), signifying a mutual connection between user a and user b.
    By "mutual connection," we mean that a is connected to b and b is connected to a.
    Therefore, if there is a pair (a, b), then b should be in the connection vector of a and a should be in the connection vector of b.

    GOAL:
    Sweep through the connections and construct connection vectors for all users.
    Return a Python dictionary of Python sets. The dictionary keys are user IDs; their values are the connection vectors (Python sets).

    STRATEGY:
    1. Create a default dictionary to hold our user ids and the set of their connection vectors. So it will look like: {user1: {set1}, user2: {set2}, etc.. }.
       Let's call this 'connection_vectors'.
    2. Iterate over pairs in 'connections' list
    3. For each pair, add both directions to 'connection_vectors' default dictionary. Example: pair is (123, 456). We add 456 to the set for the user_id key 123.
       And we also add 123 to the set for user_id key 456. So we end up with: {123: {456}, 456: {123}}
    4. Return 'connection_vectors'
    '''

    # SOLUTION:
    from collections import defaultdict
    connection_vectors = defaultdict(set)

    for pair in connections:
      connection_vectors[pair[0]].add(pair[1])
      connection_vectors[pair[1]].add(pair[0])

    return connection_vectors


### demo function call
form_connection_vectors(demo_connections_ex6)

<!-- Test Cell Boilerplate -->
The cell below will test your solution for Exercise 6. The testing variables will be available for debugging under the following names in a dictionary format.
- `input_vars` - Input variables for your solution.
- `original_input_vars` - Copy of input variables from prior to running your solution. These _should_ be the same as `input_vars` - otherwise the inputs were modified by your solution.
- `returned_output_vars` - Outputs returned by your solution.
- `true_output_vars` - The expected output. This _should_ "match" `returned_output_vars` based on the question requirements - otherwise, your solution is not returning the correct output.

In [None]:
### test_cell_ex6

from tester_fw.testers import Tester

conf = {
    'case_file':'tc_6',
    'func': form_connection_vectors, # replace this with the function defined above
    'inputs':{ # input config dict. keys are parameter names
        'connections':{
            'dtype':'list', # data type of param.
            'check_modified':True,
        }
    },
    'outputs':{
        'output_0':{
            'index':0,
            'dtype':'dict',
            'check_dtype': True,
            'check_col_dtypes': True, # Ignored if dtype is not df
            'check_col_order': True, # Ignored if dtype is not df
            'check_row_order': True, # Ignored if dtype is not df
            'check_column_type': True, # Ignored if dtype is not df
            'float_tolerance': 10 ** (-6)
        }
    }
}
tester = Tester(conf, key=b'1ZePXzAcTR7lcNpmx1HRK0lT3v-Ikrg8mZ3n-wVFTBo=', path='') # path='resource/asnlib/publicdata/'
for _ in range(70):
    try:
        tester.run_test()
        (input_vars, original_input_vars, returned_output_vars, true_output_vars) = tester.get_test_vars()
    except:
        (input_vars, original_input_vars, returned_output_vars, true_output_vars) = tester.get_test_vars()
        raise


print('Passed! Please submit.')