# Tuples as Data Structures and Named Tuples

<br></br>

read-only lists... at least that's how many introductions to Python will present tuples

This isn't wrong, but there's a lot more going on with tuples...

If you only thinh of tuples as read-only lists, you're going to miss out on some interesting ideas

We really need to think of tuples also as data records **position of value has meaning**

This is why we are going to start looking nat tuples before we even cover sequence types

We are going to focus on tuples as **data records or structures** and **named tuples**

-----
## Tuples as Data Structures

**Tuples vs Lists vs Strings** (Sequence Types)

- **all** are ***containers***

- **all** ***order matters***

- Tuples tend to be **Heterogeneous**, but are also Homogeneous

- Lists tend to be **Homogeneous**, but are also Heterogeneous

- Strings are only Homogeneous

- **all** are ***indexable***

- **all** are ***iterable***

- Tuples/Strings are immutable, the length **cannot** be changed, the order **cannot** be changed, **cannot** do in-place sorts and **cannot** do in-place reversals

- Lists are mutable, the length can be changed, the order can be changed, can do in-place sorts and can do in-place reversals



### Immutability of Tuples

elements cannot be added or removed

the order of elements cannot be changed

works well for representing data structures:

- Point(x,y)

- Circle(x,y,r)

Here, we gave meaning to the position of the elements in the tuple, of its fixed order characteristic



### Tuples as Data Records

Think of a tuple as a data record where the position of the data has meaning

    london = ('London', 'UK', 8_780_000)
    new_york = ('New York', 'USA', 8_500_000)

Because tuples, strings and integers are immutable, we are guaranteed that the stored data won't change.

In [1]:
def print_tuple(t):
    for e in t:
        print(e)

In [3]:
print_tuple((1,2,3,4))

1
2
3
4


In [4]:
a = 'a', 10, 200
a[0]

'a'

In [5]:
a[1]

10

In [6]:
a = 1,2,3,4,5,6

In [7]:
a[2:5]

(3, 4, 5)

In [10]:
a = 'a', 10, 200
x,y,z = a
x,y,z

('a', 10, 200)

In [12]:
t = ('Manu', 'ITC', 22, 'Ortiz', 'CG', 'Hernandez')
name, *_, last_name = t

In [13]:
name, last_name

('Manu', 'Hernandez')

In [14]:
class Point2D:
    def __init__(self, x,y):
        self.x = x
        self.y = y
    
    def __repr__(self):
        return f'{self.__class__.__name__}: x={self.x}, y={self.y}'

In [15]:
pt = Point2D(10,20)
pt

Point2D: x=10, y=20

In [16]:
pt.x = 100

In [17]:
id(pt)

4430760880

In [18]:
pt.x

100

In [19]:
pt.y

20

In [20]:
a = Point2D(0,0), Point2D(10,20)

In [21]:
a

(Point2D: x=0, y=0, Point2D: x=10, y=20)

In [22]:
id(a[0])

4430759392

In [23]:
a[0].x = 100

In [24]:
a

(Point2D: x=100, y=0, Point2D: x=10, y=20)

In [28]:
s = 'python'

In [29]:
id(s)

4401869936

In [30]:
s = 'python ' + ' rocks!'

In [31]:
id(s)

4430788272

Just as we can see with strings, that a new reference is created when concatenating, the same happens with tuples.

In [32]:
a = 1,2,3

In [33]:
id(a)

4430218048

In [34]:
a += (4,5)

In [35]:
id(a)

4430065152

In [36]:
a = a + (4,5)

Maybe we can summarize our class with tuples as a data structure. As we now a tuple is immutable, we can know that the position of the elements is the same as the coordenates of a point. Is just a convention we can agree to.

In [39]:
pt1 = (0,0)
pt2 = (10,20)
# pt[0] = pt.x ...

Let's use the tuples as an example of cities information:

In [41]:
london = 'London', 'UK', 8_780_000
new_york = 'New York', 'USA', 8_500_000
beijing = 'Beijing', 'China', 21_000_000

In [42]:
cities = [london, new_york, beijing]

Let's retrieve the total population of all the cities

In [44]:
total = sum([city[2] for city in cities])

In [45]:
total

38280000

Examples of unpacking

In [46]:
record = 'DJIA', 2018, 1, 19, 25_987, 26_072, 25_941, 26_072

In [47]:
symbol, *_, close = record

In [50]:
symbol, close

('DJIA', 26072)

In [51]:
print(f'We are ignoring this: {_}')

We are ignoring this: [2018, 1, 19, 25987, 26072, 25941]


We can unpack in the loop itself

In [52]:
for city, country, population in cities:
    print(city, country, population)

London UK 8780000
New York USA 8500000
Beijing China 21000000


Let's calculate an approx. value for PI

In [53]:
from random import uniform
from math import sqrt

def random_shot(radius):
    random_x = uniform(-radius, radius)
    random_y = uniform(-radius, radius)
    
    if sqrt(random_x **2 + random_y ** 2) <= radius:
        is_in_circle = True
    else:
        is_in_circle = False
        
    return random_x, random_y, is_in_circle

In [57]:
num_attempts = 100
count_inside = 0

for i in range(num_attempts):
    *_, is_in_circle = random_shot(1)
    if is_in_circle:
        count_inside += 1

print(f' PI is approx: {4 * count_inside / num_attempts}')

 PI is approx: 3.04


----
## Named Tuples

As we saw in the example of the coordenates of a Point, there is no much clarity when using a tuple as a data structure, such as Point[0] instead of point.x.

At this point, in order to make things clearer for the reader (not the compiler, the reader), we might want to approach this using a class instead.

    class Point2D:
        def __init__(self,x,y):
            self.x,self.y = x,y
                pt = Point2D(10,20)
    distance = sqrt(pt.x ** 2 + pt.y ** 2)
    
But what happens when we have a class with more than two attributes?

    class Stock:
        def __init__(self,symbol,yerar,month,day,open,high,low,close):
            self.symbnol = symbnole
            .
            .
            .
    Class Approach: 
        djia.symbol
        djia.open
        
        djia.high - djia.low
        
    Tuple Approach:
        djia[0]
        djia[4]
        
        djia[5] - djia[6]
        
        
As we can see, the class has more readibilty than the tuple approach.

**Named Tuples to the rescue**

There are other reasons to seek another approach. 

So what if we can combine these two approaches, essentially creating tuples where we can, in addition, give meaningful to the positions?

That's waht **namedtuples** essentially do

**namedtuple** is a functiopn which generates a new class - >  **class Factory**

that new class **inherits** from tuple but aslo provides **named properties** to access elements of the tuple but an instance of that class is still a **tuple**

**Generating Named Tuple Classes**

We have to understand that **namedtuple** is a class factory.

When we use it, we are essentially **creating a new class**, just as if we had used **class** ourselves

**namedtuple** needs a few thing to generate this class:

- the **class name** we want to use
- a sequence of **field names (strings)** we want to assing, in the order of the elements in the tuple

The **return** value of the call to **namedtuple** will be a **class**

We need to assing that class to a variable name in our code so we can use it to construct instances.

In general, we use the same name as the name of the class that was generated.

**Generating Named Tuple Classes**

    Point2D = namedtuple('Point2D', ['x','y'])
    
We can create **instances** of Point2D just as we would with any class (since it is a class)

    pt = Point2D(10,20)
    
The variable name that we use to asssign to the class generated and returned by **namedtyuple** is arbitrary.

    Pt2D = namedtuple('Point2D', ['x','y'])
    pt = Pt2D(10,20)
    
What does Python do when creating a namedtuple?

    1. Python creates an object of type class, but in addition, it creates a variable in our local scope
       that points to that object.
       
           Variable: MyClass -> [Class:MyClass, 0xFF300]
    
    2. MyClassAlias = MyClass
            
            Variable: MyClass -> [Class:MyClass, 0xFF300]
            Variable: MyClassAlias -> [Class:MyClass, 0xFF300]
    
Similarly 

    Pt2DAlias = namedtuple('Point2D', ['x','y'])
    
        Variable: Pt2DAlias -> [Class:Point2D, 0xFF900]
    
    This is the same concept as aliasing a function, or assigning a lambda fuinction to a variable name!
    
There are many ways we can provide the list of field names to the **namedtuple** function

- a list of string
- a tuple of strings
- a single string with the field names separated by whitespaces or commas

Per example:
    
    namedtuple('Point2D', 'x y') = namedtuple('Point2D', 'x', 'y') = namedtuple('Point2D', ['x', 'y'])

**Instantiating a Named Tuples** 

After we have created a named tuple class, we can instantiate them just like an ordinary class

In fact, the _new_ method of the generated class uses the **field names** we provided as param names

    Point2D = namedtuple('Point2D', 'x y')
    
    We can use **positional** arguments:
    
    pt1 = Point2D(10,20) 10 -> x   20 -> y
    
    We can use **keyword** arguments:
    
    pt1 = Point2D(x=10,y=20) 10 -> x   20 -> y
    
**Accessing Data in a Named Tuple**

Since named tuples are also regular tuples, we can still handle them just like any other tuple

- by index
- slice 
- iterate

        Point2D = namedtuple('Point2D', 'x y')
        
        In addition, we can now access by field names:
        
        pt1 = Point2D(10,20)
        
        pt1.x = 10, pt2.y = 2-
        
        
Since namedtuple generated classes inherit from tuple, we can do this:
    
        class Point2D(tuple):
        ...

Inherit from tuple


But now, **pt1** is a tuple, and is therefore **immutable**

pt1.x = 100 will not work!


**The rename keyword-only argument for namedtuple**

Remember that field names for named tuples must be valid identifiers, but cannot start with an underscore

This would not work: 

    Person = namedtuple('Person', 'name age _ssn'), because of the underscore is not valid
    
**namedtuple** has a keyword-only argument, **rename** (defaults to False) that will automatically **rename** any invalid field name

uses convention: 

        _{position in list of field names}

This **will** now work:

        Person = namedtuple('Person', 'name age _ssn', rename=True)
        
And the actual field names would be:

        name age _2
        
        
**Introspection**

We can easily find out the field names in a named tuple generated class

class property -> 

        _fields
        
        Person = namedtuple('Person', 'name age _ssn', rename = True)
        
        Person._fields -> ('name','age','_2')

Remember that **namedtuple** is a **class factory**

We can actually see what the code for that class is, using a class property:
    
    _source
    
    Point2D = namedtuple('Point2D', 'x y')
    
    Point2D._source
    
    It creates a class in memory that can be defined by this code:
    
    class Point2D(tuple):
        'Point2D(x,y)'
        


Let's make a class Point3D

In [1]:
class Point3D:
    def __init__(self, x,y,z):
        self.x,self.y,self.z = x,y,z

If you use a class for named attributes, take a step back, because a namedtuple is a way better option

In [2]:
from collections import namedtuple

**namedtuple** is a class factory. Remember!!!

In [5]:
#Valid for namedtuple
Point2D = namedtuple('Point2D', ['x', 'y'])

In [6]:
pt1 = Point2D(10,20)
pt1

Point2D(x=10, y=20)

In order to get the representation above, it is necessary for us to use the repr built in function in our class.

**namedtuple** makes it easier for us

In [7]:
Pt2D = namedtuple('Point2D', ('x', 'y'))

In [8]:
pt2 = Pt2D(100,200)

In [9]:
pt2

Point2D(x=100, y=200)

The name is still 'Point2D', the variable name is not involved in the namedtuple