# Data Types

## Lists, Dicts

#### High Level
These three types - `list`, `dict` and `pandas.DataFrame` can all be used to hold more than one value at a time, have some overlapping functionality, but differ greatly

### Lists

Lists are ordered collections of singular elements. They can be individually referenced by their index because they maintain their order. They can quickly return a value based on its index, and slower based on its value since it must read each value until it finds its match. 
Lists are good useful when the order of a collection will be used for lookups more than the values of the elements. 

In [41]:
import random
biglist = [random.randint(0,833774) for i in range(1000000)]

In [2]:
len(biglist)

1000000

In [6]:
biglist[0:10]

[79983, 97302, 252747, 717454, 566705, 419553, 302746, 506004, 313506, 756719]

In [3]:
#First Five
biglist[:5]

[79983, 97302, 252747, 717454, 566705]

In [5]:
#Second Five 
biglist[5:10]

[419553, 302746, 506004, 313506, 756719]

In [9]:
#last Five
biglist[-5:]

[372090, 344356, 272253, 382061, 191478]

Lists can very quickly return items if they are requested based on their position.

In [10]:
%%timeit

biglist[59595]

44.1 ns ± 0.817 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


In [17]:
print('value: ', biglist[59595])
print('position: ', biglist.index(433319))

value:  433319
position:  59595


It takes much longer to find that value in the list when searching for its value. 

In [15]:
%%timeit
biglist.index(433319)

1.25 ms ± 23.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


List comprehension with brackets makes new lists, and can perform actions to each item while doing so.

In [20]:
range(10)

range(0, 10)

In [19]:
[i for i in range(10)]

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

This is useful when you want to perform an action on every item on in a list

In [21]:
[f"{i} squared is {i*i}" for i in range(10)]

['0 squared is 0',
 '1 squared is 1',
 '2 squared is 4',
 '3 squared is 9',
 '4 squared is 16',
 '5 squared is 25',
 '6 squared is 36',
 '7 squared is 49',
 '8 squared is 64',
 '9 squared is 81']

If we have an unparsed csv file, we could parse it on both newlines `\n` and commas `,`

In [25]:
csvdata = "1,dog,$5.99\n2,cat,$6.25\n3,fish,4.23"

To start we could break up the lines into a list of lines

In [26]:
lines = csvdata.split("\n")
lines

['1,dog,$5.99', '2,cat,$6.25', '3,fish,4.23']

But we still need to split the lines up into individual fields. We could write a for loop with split, or we can use list comprehension to shorten the syntax

In [27]:
data = [line.split(",") for line in lines]
data

[['1', 'dog', '$5.99'], ['2', 'cat', '$6.25'], ['3', 'fish', '4.23']]

We can also put it all back together quickly using list comprehension

In [30]:
"\n".join([",".join(d) for d in data])

'1,dog,$5.99\n2,cat,$6.25\n3,fish,4.23'

## Dicts
Dictionaries or `dict`s are key-value data collections. `{'key1':'value1', 'key2':2}` They can store different data types, and their keys can be strings or integers. Keys are unique so there cannot be 2 of the same key. The order keys are returned is not predictable. If the order of keys is important a `collections.OrderedDict` must be used instead. The key difference between a `list` and a `dict` is that a `list` is good for lookups based on position, while a `dict` is better for a lookup based on the value of a key. The `{` and `}` brackets of a `dict` can also be used for list comprehension. 

In [31]:
dictdata = {i: f"The number squared is {i*i}" for i in range(1000000)}

In [33]:
%%timeit
dictdata[433319]

59.6 ns ± 2.78 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


We can see that the looking up the key was about as fast as doing an index lookup of a list.
<br>
`Dicts` are useful when the order of elements is not important, but being able to keep track of different elements is. 

In [35]:
voteResults = {name:random.randint(4000,30000) for name in ["Tom","Sally", "Paul","Jamie"]}
voteResults

{'Tom': 8102, 'Sally': 24089, 'Paul': 15104, 'Jamie': 18894}

If we want to track the votes of the 4 candidates, then a dictionary is good because if we want to print out each one's results later, we can use a single variable. We also dont need to pay attention to the position of the votes or who they belong to because that information is stored.

In [37]:
voteResults['Tom']+=1
voteResults['Tom']+=1
voteResults['Tom']+=1
print(voteResults['Tom'])

8105


In [40]:
for candidate in voteResults:
    print(f"{candidate} got {voteResults[candidate]} votes.")

Tom got 8105 votes.
Sally got 24089 votes.
Paul got 15104 votes.
Jamie got 18894 votes.
