# Agenda: Week 3 (Dictionaries and files)

1. Recap data structures so far
2. Dictionaries ("dicts")
    - Storing
    - Retrieving
    - Keys and values
    - Modifying dictionaries
3. Accumulating in dicts
4. Acuumulating the unknown
5. Looping over dicts
6. Files
    - File objects
    - Reading from files
    - Writing to files
    - Using with `with` statement when working with files
    
We'll use files from this zipfile: https://files.lerner.co.il/exercise-files.zip    

# Data structures so far

We've seen that different types of data can be stored in different types of structures. Each structure provides us with different functionality and speed trade-offs.

Integers are really small and very fast -- but very annoying if we're going to use them for text.

Strings are great for text - but very annoying if we want to store multiple things (that aren't characters).

Lists are great for collections -- but very annoying if I just want to hold onto the text of a book or article.

Generally speaking:
- Integers are for whole numbers, and floats are for numbers with a fractional part
- Strings are for text (of any length), or collections of characters
- Lists are the go-to ordered collection in Python.  Lists can contain anything at all, and can be any size we want. We can append to the end, and we can remove from anywhere.
- Tuples are Python's version of structs or records. They are immutable, but more importantly, we typically use tuples when we have a collection of different types.

Some examples:
- If I have a text document, I'll store that in a string.
- If I have a few words that our company wants to be sure are never used in a press release, then we could put those in a list, and search for each of them in outgoing correspondence.
- If I have information about an employee -- their name, age, address, and salary -- then I'll put those in a tuple, because it's a collection of information of different types.

# Getting help in Jupyter

Every Python environment is a bit different, but in Jupyter, you have a few ways to get help, and to find out what your options are.

1. At any point, you can press `TAB` to complete your options. If you press `TAB` halfway through a variable name, then Python will try to complete it. If the completion is ambiguous, then it'll show you a menu of possibilities. If you're after a `.`, then it'll show you all of the methods that you can invoke on that type of object.
2. You can use the `help` function to find out more about something, as in `help(len)` or `help(str.upper)`. Notice that if I'm getting help on a function, I don't invoke it with `()`, but just pass its name as an argument to `help`.
3. You can put any variable, function, or other name in Jupyter, and put a `?` after it, to get information about it. If you use `??`, you'll sometimes get more information, such as the definition of a function.

In [2]:
variable_x = 100
variable_y = [10, 20, 30]
variable_z = 'Hello, out there!'

In [3]:
help(variable_y)

Help on list object:

class list(object)
 |  list(iterable=(), /)
 |  
 |  Built-in mutable sequence.
 |  
 |  If no argument is given, the constructor creates a new empty list.
 |  The argument must be an iterable if specified.
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __delitem__(self, key, /)
 |      Delete self[key].
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __getitem__(...)
 |      x.__getitem__(y) <==> x[y]
 |  
 |  __gt__(self, value, /)
 |      Return self>value.
 |  
 |  __iadd__(self, value, /)
 |      Implement self+=value.
 |  
 |  __imul__(self, value, /)
 |      Implement self*=value.
 |  
 |  __init__(self, /, *args, **kwargs)
 |      Initialize self.  See help(type(self)) for accurate sign

In [5]:
variable_y??

# Dictionaries (aka "dicts")

When we store data in a list, we know several things:

1. We can store any type of data that we want.
2. Each new element in the list is put at a new index, 1 higher than the previous final element's index. Indexes start at 0.
3. We can update the values in a list by assigning to the list at a particular index.

There are some problems with this, though:

1. We have to use integers to retrieve our values.
2. Those integers start at 0, and are rather inflexible.
3. If we want to search for a value in our list, we need to (potentially) go through all of the values until we'll either find it or see that it's not there.

Imagine we're running a new streaming service. Someone wants to know whether we have a particular movie in stock. Can you imagine looking through 1m films in a list, one at a time, to see if we have it in stock? That would take forever!

Dictionaries provide a wonderful alternative:
- They are more flexible with their indexes (known as "keys")
- They are far faster to search through than lists
- They also provide us with more semantic power than lists do

A dict is also known by many other names in other languages:
- key-value store
- name-value store
- hash table
- hash map
- hash
- map
- associative array

You can think of a dict as a two-column table, in which the left column contains keys and the right column contains values.

A list can only have integer k