# PYTHON DATA STRUCTURES

We'll discuss the object-oriented features of  data structures, when they should be used instead of a regular class, and when they should not be used. 

In particular, we'll be covering the following topics:

Tuples and named tuples
Dataclasses
Dictionaries
Lists and sets
Three types of queues

# EMPTY OBJECTS

Let's start with the most basic Python built-in, one that we've used implicitly many times already, the one (it turns out) we've extended in every class we have created: the object.

Technically, we can instantiate an object without writing a subclass, as follows:

In [7]:
o = object()

In [8]:
o.x = 100

AttributeError: 'object' object has no attribute 'x'

Unfortunately, as you can see, it's not possible to set any attributes on an object that was instantiated directly. 

This isn't because the Python developers wanted to force us to write our own classes, or anything so sinister. 

They did this to save memory – a lot of memory.

When Python allows an object to have arbitrary attributes, it takes a certain amount of system memory to keep track of what attributes each object has, for storing both the attribute name and its value.

Even if no attributes are stored, memory is allocated to make it possible to add attributes.

Given the dozens, hundreds or thousands of objects( every class extends the object class) in a typical Python program, this small amount of memory would quickly become a large amount of memory.

So, Python disables arbitrary properties on `object` and several other built-ins, by default.

It is possible to restrict arbitrary properties on our own classes using __slots__. 

Slots are part of Chapter 12, Advanced Design Patterns. We'll look at them as a way to save memory for objects that occur many, many times.

It is, however, trivial to create an empty object class of our own; we saw it in our earliest example:

In [9]:
class MyObject:
    pass

In effect, `class MyObject` is equivalent to `class MyObject(object)`.

As seen earlier, it is possible to set attributes on such classes.

In [10]:
m = MyObject()

In [11]:
m.x = "hihihi"

In [12]:
m.x

'hihihi'

If we wanted to group an unknown number of attribute values together, we could store them in an empty object like this.

The problem with this approach is the lack of an obvious schema that we can use to understand what attributes should be present and what types of values they will have.

  Focus of this book is the way classes and objects should only be used when you want to specify both data and behaviors. 
  
Therefore, it is important to decide from the outset whether the data is purely data, or whether it is an object in disguise. 

Once that design decision is made, the rest of the design can grow from the seed concept.

# TUPLES AND NAMED TUPLES

Tuples are objects that can store a specific number of other objects in sequence.

They are immutable, meaning we cannot add, remove or replace objects on the fly.

The primary benefit of tuples' immutability is a tuple of immutable objects (like strings and numbers and other tuples) has a hash value, allowing us to use them as keys in dictionaries and members of sets.

Instances of tuple class are used to store data; behaviour cannot be associated with built-in tuple class.

If we require behaviour to manipulate a tuple, we have to pass the tuple into a function(or method on an another object) that performs the required behaviour.

Tuples overlap with the idea of coordinates or dimensions. 

A mathematical (x, y) pair or (r, g,b) color are examples of tuples.

The order matters, a lot: the color (255, 0, 0) looks nothing like (0, 255, 0).

The primary purpose of a tuple is to aggregate different pieces of data together into one container.

We create a tuple by separating values with commas, and optionally surrounding them with parentheses.

The following two assignments are identical (they record a stock, the current price, the 52-week high, and the 52-week low, for a rather profitable company):

In [13]:
stock1 = "AAPL", 123.52, 53.15, 137.98
stock2 = ("AAPL", 123.52, 53.15, 137.98)


If we are grouping a tuple inside of some other object, such as function call, list comprehension, or generator, the parentheses are required.

Otherwise, it would be impossible for the interpreter to know whether it is a tuple or the next function parameter:

In [14]:
import datetime

def middle(stock, date):
    symbol, current, high, low = stock
    return (((high + low) / 2), date)

middle(("MSFT", 75.00, 75.03, 74.90), datetime.date(2018, 2, 1))

(74.965, datetime.date(2018, 2, 1))

When Python displays a tuple, it uses what is called the **canonical** representation; this will always include `()`'s, making the `()`'s a common practice even when they are not strictly required.

The return statement, specifically has redundant `()`'s around the tuple it creates.

The degenerate cases include a tuple with only one item, written like thin `(2.718,)`. 

The extra comma is required here. An empty tuple is `()`.

In [15]:
a = (42,)

In [16]:
a

(42,)

It is sometimes surprising that the varaible `a` will be a one-tuple.

**The trailing comma is what creates an expression list with a single item; this is the value of the tuple**

The parentheses are required for two tings:

1-) to create an empty tuple
2-) to separate a tuple from other expressions.

For example, the following code will create nested tupple: 

In [17]:
b = (42, 3.14), (2.718, 2.618),

In [18]:
b

((42, 3.14), (2.718, 2.618))

The trailing commas in Python are politely ignored.

The `middle()` function also illustrates **tuple unpacking**.

The first line inside the function unpacks the `stock` parameter into four different variables.

*The tuple has to be exactly the same length as the number of variables, or it will raise an exception.*

Unpacking is a very useful feature in Python.

A tuple groups related values together to make storing and passing them around simpler.

The moment we need to access the pieces, we can unpack them into separate variables.

Of course, sometimes we only need access to one of the variables in the tuple.

We can use the same syntax that we use for other sequence types(list and strings) to access an individual value.

In [19]:
s = ("AKBNK", 36.24, 48.56, 25.65)

In [20]:
high = s[2]

In [21]:
high

48.56

We can use slice notation to extract larger pieces of tuples.


In [22]:
s[1:3]

(36.24, 48.56)

These examples, while illustrating how flexible tuples can be, also demonstrate one of their major disadvantages: **readability**.

How does someone reading this code know what is in position 2 of a specific tuple?

They would have to paw through the code to find where the tuple was packed or unpacked before they could discover what it does.

Accessing tuple members directly is fine in some circumstances, but don't make a habit of it. 

The index values become what we might call magic numbers: numbers that seem to come out of thin air with no apparent meaning within the code. 

This opacity is the source of many coding errors and leads to hours of frustrated debugging. 

**Try to use tuples only when you know that all the values are going to be useful at once and it's normally going to be unpacked when it is accessed.**
 
Think of (x, y) coordinate pairs and (r, g, b) colors, where the number of items is fixed, the order matters, and the meaning is clear.

One way to provide some useful documentation is to define numerous little helper functions. This can help to clarify the way a tuple is used. Here's an example.

In [23]:
def high(stock):
    symbol, current, high, low = stock
    return high

In [24]:
high(s)

48.56

We need to keep these helper functions collected together into a single namespace. 

Doing this causes us to suspect that a class is better than a tuple with a lot of helper functions. 

There are other alternatives to clarifying the contents of tuples, the most important of which is the typing.NamedTuple class.

# NAMED TUPLES VIA typing.NamedTuple

What do we do when we want to group values together but know we're frequently going to need to access them individually? 

1-) We could use an empty object instance, as discussed previously. We can assign arbitrary attributes to this object. 

But without a good definition of what's allowed and what types are expected, we'll have trouble understanding this.

2-) We could use a dictionary. This can work out nicely, and we can formalize the acceptable list of keys for the dictionary with the typing.TypedDict hint.

3-) We can use a @dataclass, the subject of the next section in this chapter.

4-) We can also provide names to the positions of a tuple. While we're at it, we can also define methods for these named tuples, making them super helpful.

