## Data Types and Structures in Python ##

What are data structures?
> To perform computations, data has to be stored in a certain format. These can take various forms in Python.

What types of data structures are there?
> There are a whole wide range of data structures available. They range from specific data types - e.g. integers, strings
> to data structures such as lists and arrays.


The most common data structures that will need to be used for quantitative stuff are unsurprisingly -
* Integers
* Floats
* Decimal

Note: For those from a programming background, Python is dynamically typed.
This means that there is no need to declare a specific for each variable that is used. Python will automatically assign.

** BASIC DATA TYPES **

In [1]:
# Integer
myInteger = 8

# Float
myFloat = 8.

# Decimal
import decimal
from decimal import Decimal
myDecimal = Decimal(1)/Decimal(3)

In [2]:
type(myInteger)

int

In [3]:
type(myFloat)

float

In [4]:
type(myDecimal)

decimal.Decimal

In [5]:
decimal.getcontext().prec = 5

In [6]:
myDecimal

Decimal('0.3333333333333333333333333333')

In [7]:
decimal.getcontext().prec = 5

In [8]:
# Note that just changing the precision as above will not work.
# The new precision is only applied after you do the operation again
myDecimal = Decimal(1)/Decimal(3)

In [9]:
myDecimal

Decimal('0.33333')

Why is changing the precision of a number important?
> Usually it is not. However, when aggregating and performing many operations across many numbers which are rounded, not specifying the exact precision could introduce small errors that are not insignificant.

** DATA STRUCTURES **

Basic data structures include tuples, lists, dictionaries and sets.

** Tuples **

In [10]:
# tuples, once declared are immutable. Not very commonly used.
t = (1, "One", "Un")

In [11]:
type(t)

tuple

In [12]:
t[1]

'One'

In [13]:
t[1] = 3

TypeError: 'tuple' object does not support item assignment

*Note: Remember that the numbering in Python's data structure starts from 0. So the '1' element (as we see above) is actually the second element in the data structure*

** Lists **

In [14]:
l = [2, "Two", "Deux"]

In [15]:
type(l)

list

In [16]:
l[0]

2

In [17]:
l[0] = 'TWO'

In [18]:
l

['TWO', 'Two', 'Deux']

Data structures are not just plain old containers for numbers. Each data structure also has it own methods (i.e. functions) attached to it.

In [19]:
varThree = 3

# add one more element to the list we declared above
l.append(varThree)
l

['TWO', 'Two', 'Deux', 3]

In [20]:
l.extend([1,2,3])

In [21]:
ll = [['A', 'B'], ['C', 'D']]

In [22]:
ll

[['A', 'B'], ['C', 'D']]

Other useful methods for lists are as follows:
* l.extend([1,2,3]) (add another list to an existing list)
* l.insert(1, 'newelement') (add an element at a specific position)
* l.remove('newelement') (remove a specific element)
* l.pop(1) (remove and return item at the specified index

In [23]:
# To access more than one element, do this, but remember that you need to go +1 at the end, 
# i.e. to get the item at index 0 and 1 here, you need to go one more (i.e. 2 in this case)
l[0:2]

['TWO', 'Two']

In [24]:
# Sorting is another useful function
lUnsorted = [3, 1, 5, 9, 2]
lUnsorted.sort()
lUnsorted

[1, 2, 3, 5, 9]

**Looping - For Loops versus List Comprehension**

Not critical, but useful to know to aid understanding of Python code you see as the second way is a very common expression.
Python programmers typically don't like loops.

In [25]:
l =[]
for i in range(0, 10, 1):
    if i % 2 == 0:
        print(i)
        l.append(i)
l

0
2
4
6
8


[0, 2, 4, 6, 8]

In [26]:
[i for i in range(0, 10, 1) if i % 2 == 0]

[0, 2, 4, 6, 8]

** Dictionaries **

Dicts are also useful.

In [31]:
d = {
    'ItemOne': 1,
    'ItemTwo': 'Two',
    'ItemThree': [3,3,3],
}

In [32]:
type(d)

dict

In [33]:
d['ItemThree']

[3, 3, 3]

In [34]:
d.keys()

dict_keys(['ItemOne', 'ItemThree', 'ItemTwo'])

In [35]:
for i in d.items():
    print(i)

('ItemOne', 1)
('ItemThree', [3, 3, 3])
('ItemTwo', 'Two')


In [36]:
d.values()

dict_values([1, [3, 3, 3], 'Two'])

** Sets **

Sets, if you remember from school, are just collections of unique items. No difference here

In [37]:
repeatedList = [1,1,1,4,3,4,5,2,5,6,7,2]
s = set(repeatedList)
s

{1, 2, 3, 4, 5, 6, 7}

End Notes: This just scratches the surface. It is also important to understand two more data structures which I will cover in subsequent tutorials -
* Arrays (think MATLAB) - This comes from the NUMPY package
* Dataframe (think Excel tables) - This comes from the PANDA package