# Sets, Records, and Maps
#### Introduction to Programming with Python

## Sets

A __set__ is like a list, but the items don't have any order. You can create one with a similar syntax as a list, except you use `{ }` instead of `[ ]`. Here's an example showing the same data being put into both a list and a set. Notice that when we print them, the set data is different from its original order - that's because sets don't actually store the items in order, and so it could be different every time you run it.

In [1]:
my_list = ["bread","milk","rice","butter","eggs","apples"]
my_set = {"bread","milk","rice","butter","eggs","apples"}

print(my_list)
print(my_set)

['bread', 'milk', 'rice', 'butter', 'eggs', 'apples']
{'apples', 'eggs', 'milk', 'butter', 'rice', 'bread'}


## Adding items in sets vs. lists

Because sets have no order, there is an _add_ method instead of _append_ and _insert_.

Remember, we'd use `.append()` to add a new item to `my_list`. The correspoinding way to do it with a set is to use `.add()`.

In [2]:
my_list = ["bread","milk","rice","butter","eggs","apples"]
my_set = {"bread","milk","rice","butter","eggs","apples"}

my_list.append("bananas")
my_set.add("bananas")

print(my_list)
print(my_set)

['bread', 'milk', 'rice', 'butter', 'eggs', 'apples', 'bananas']
{'apples', 'eggs', 'bananas', 'milk', 'butter', 'rice', 'bread'}


### Reflection question

The following code has some things you can do with a list. Write a comment for what each of them does, and then try to see if you can do it with a set too. Write down any differences there are between how the list and set work.

In [None]:
my_list = ["bread","milk","rice","butter","eggs","apples"]
my_set = {"bread","milk","rice","butter","eggs","apples"}

print( len(my_list) )
print( my_list[2] )
print( "milk" in my_list)
my_list.remove("eggs")
print( my_list )
print(my_list.pop())
print( my_list )

for item in my_list:
    print(item)


## Maps vs. Records

Now lets go back to thinking about dictionaries. We've seen them used in two different ways, so lets give some names to these different uses: *maps* and *records*.

A __map__ where all the keys are the same kind of thing and all the values are the same kind of thing. The `cs_dept_phonebook` was a *map*.

A __record__ is where each of the keys represent different data about the same item. We could make a record with my contact information like `manley_record` below. Note that all of the things in the record are information about one entity. The keys are different data **fields** and the values give this entity's specific values for each of them. 

In [None]:
cs_dept_phonebook = { 
                        "Porter"  : "3041",
                        "Case"    : "4618",
                        "Reza"  : "1972",
                        "Moore" : "3110",
                        "Manley"  : "2177",
                        "Urness"  : "2188",
                        "Migunov"   : "1810",
                        "Rieck"   : "3795"
                    } #a map

manley_record = {
                    "name" : "Eric Manley",
                    "email" : "eric.manley@drake.edu",
                    "building" : "Collier-Scripps Hall",
                    "room" : 327,
                    "phone" : "(515) 271-2177"
                }

### Reflection questions

Consider the two different ways that we stored city populations (showing only a portion of them for brevity).

`[{'city': 'DES MOINES', 'pop': 148155}, 
{'city': 'CEDAR RAPIDS', 'pop': 116146},
{'city': 'DAVENPORT', 'pop': 95743},
...
]
`

vs. 

`{'ACKWORTH': 491, 
'ADAIR': 1748, 
'ADEL': 4884, 
...
}`

Which one of these is a *map*, and which one is a list of *records*?

What is something (i.e., a computational task) that the *map* organization of this data is useful for?

What is something that the list of *records* organization of this data is useful for?



## Movie sales dataset

Download the file [`HighestGrossingMovies.json`](https://raw.githubusercontent.com/ericmanley/IntroToProgrammingWithPython/refs/heads/main/HighestGrossingMovies.json) and then put it where you can use it with your Python code (e.g., upload it to Colab). Start by loading the data and investigating how it is organized.

*Note:* The data in this file is originally from here (though I restructured it for use in this course): https://www.kaggle.com/sanjeetsinghnaik/top-1000-highest-grossing-movies . Also note that the data is a few years old... *it's an older file, but it checks out.*





In [None]:
import json

with open("HighestGrossingMovies.json") as moviefile:
    movies = json.load(moviefile)
    
print(movies)

### Reflection question

Is this data organized as a *map* or *records*?

## Exercise 1

For the code above, answer the following questions

1. What is the type of the `movies` variable?
2. How could you get it to print just the first record? Change the code and run it.
3. How could you get it to print the first 5 records? Change the code and run it.
    


## Exercise 2

Run this code after you have loaded the `movies` variable. Describe what it does.

In [None]:
for m in movies:
    print(m["Title"])

## Exercise 3

Write the code that will print out the title of all movies that contain `"Star Wars"` in their name.

_Hint:_ You can check if one string is a substring of another with code like

In [None]:
"Star Wars" in "Star Wars: Episode VII - The Force Awakens (2015)"

## Exercise 4

Write the code that will print the names of all comedies (i.e., movies which contain "Comedy" as one of their items in the Genre list). 

## Exercise 5

Write the code that will determine which comedy had the highest world sales (note that the file seems to be sorted by domestic sales, not world sales, so you will have to loop through all the records).

## Exercise 6

Write a function called `most_popular_in_genre` which takes in a list of movie records (in the same format as the `movies` variable above) and the name of a movie genre and returns the record of the movie from that genre with the highest world sales.

For example, if I call the function like this:

In [None]:
print( most_popular_in_genre(movies,"Comedy") )

it should result in 

```
{'Title': 'Frozen II (2019)', 'Summary': "Anna, Elsa, Kristoff, Olaf and Sven leave Arendelle to travel to an ancient, autumn-bound forest of an enchanted land. They set out to find the origin of Elsa's powers in order to save their kingdom.", 'Distributor': 'Walt Disney Studios Motion Pictures', 'Release Date': 'November 20, 2019', 'Domestic Sales': 477373578, 'International Sales': 972653355, 'World Sales': 1450026933, 'Genre': "['Adventure', 'Animation', 'Comedy', 'Family', 'Fantasy', 'Musical']", 'Runtime': '1 hr 43 min', 'MPAA Rating': 'PG'}
```