# Data Structures in Python
The basic data container in Python is a list. A `list` can hold a variety of data types. Once a `list` exists, it can be appended, have elements deleted, and be transformed into `arrays` or `tuples`. Of course, there are more sophisicated ways of storing/accessing/processing data in Python. Let's look at two: `dictionaries` and `dataframes`. Content below is generously sampled from http://openbookproject.net/thinkcs/python/english3e/dictionaries.html.

## Dictionaries
Dictionaries are lists of values that are indexed by `keys`. In other languages, these dictionaries are often referred to associative arrays because it is a way to associate `keys` with values. Examples of using dictionaries:
* English to Spainish dictionary

In [1]:
entosp = {"one":"uno","two":"dos","three":"tres"}

* Inventory of supplies

In [2]:
office = {"highlighters":200,"pencils":300,"pens":500}

You could construct a dictionary for a encryption code:

In [3]:
codex = { "a": 1, "b": 2, "c": 3, "d": 4, "e": 5, "f": 6, "g": 7, "h": 8, "i": 9, "j": 10,"k": 11,"l": 12,"m": 13,"n": 14,
         "o": 15,"p": 16,"q": 17,"r": 18,"s": 19,"t": 20,"u": 21,"v": 22,"w": 23,"x": 24,"y": 25}
print(codex)

{'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5, 'f': 6, 'g': 7, 'h': 8, 'i': 9, 'j': 10, 'k': 11, 'l': 12, 'm': 13, 'n': 14, 'o': 15, 'p': 16, 'q': 17, 'r': 18, 's': 19, 't': 20, 'u': 21, 'v': 22, 'w': 23, 'x': 24, 'y': 25}


You can add new keys after the initial definiton:

In [4]:
codex["z"]=26
print(codex)

{'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5, 'f': 6, 'g': 7, 'h': 8, 'i': 9, 'j': 10, 'k': 11, 'l': 12, 'm': 13, 'n': 14, 'o': 15, 'p': 16, 'q': 17, 'r': 18, 's': 19, 't': 20, 'u': 21, 'v': 22, 'w': 23, 'x': 24, 'y': 25, 'z': 26}


You can update the values in the dictionary

In [5]:
office["pens"]+=200
print(office)

{'highlighters': 200, 'pencils': 300, 'pens': 700}


You can find out how many `key:value` pairs a dictionary has

In [6]:
len(codex)

26

### Dictionary Methods
Dictionaries are a type of class in Python, and classes have functions associated with them called `methods`. Methods allow you to perform operations on and with the dictionary. 
You can always get the keys and the values from any dictionary, using the methods `keys` and `values`

In [7]:
print(office.values())
print(office.keys())
print(list(office.values()))

dict_values([200, 300, 700])
dict_keys(['highlighters', 'pencils', 'pens'])
[200, 300, 700]


You can run over all the keys to process the values or run through the values explicitly.

In [8]:
print("Let's count in Spanish")
for k in entosp.keys():   # The order of the k's is not defined
   print(entosp[k])


Let's count in Spanish
uno
dos
tres


In [9]:
print("Let's count in Spanish")
for k in entosp:   # The order of the k's is not defined
   print(k,"oops")

Let's count in Spanish
one oops
two oops
three oops


In [10]:
print("Let's count in Spanish")
for k in entosp.values():   # The order of the k's is not defined
   print(k)

Let's count in Spanish
uno
dos
tres


The method `items` can be used to iterate over the keys and values or to grab both the keys and the values.

In [11]:
list(office.items())

[('highlighters', 200), ('pencils', 300), ('pens', 700)]

In [12]:
for (k,v) in codex.items():
    print(k, "=", v)

a = 1
b = 2
c = 3
d = 4
e = 5
f = 6
g = 7
h = 8
i = 9
j = 10
k = 11
l = 12
m = 13
n = 14
o = 15
p = 16
q = 17
r = 18
s = 19
t = 20
u = 21
v = 22
w = 23
x = 24
y = 25
z = 26


## Pandas: Dataframes

Dataframes are two-dimensional data structures that imitate spreadsheets or tables, that is they are indexed with row and column identifiers. `DataFrame` is a class included with the package called `pandas`. Dataframes are useful because you can convert almost any other data container into a dataframe (lists, tuples, dictionaries). What follows was greatly inspired by https://www.tutorialspoint.com/python_pandas/python_pandas_dataframe.htm

Let's start by creating a dataframe, using the syntax
```python
pandas.DataFrame( data, index, columns, dtype, copy)

```
* `data` can be any data container like arrays, lists, dictionaries
* `index` contains the labels for all the rows (default is `np.arrange(n)`)
* `columns` contains the labels for the columns (default is also `np.arrange(n)`)
* `dtype` is where you can specify the data types of the input values
* `copy` is a boolean used to copy any data (I rarely if ever actually use this)

Let's create some dataframes and see what happens

In [13]:
import pandas as pd
# Empty dataframe
df = pd.DataFrame()
print(df)

Empty DataFrame
Columns: []
Index: []


In [14]:
# DataFrame from list as a data frame
lt = [1,2,3,4,5]
df1 = pd.DataFrame(lt,index=['one','two','three','four','five'],columns=['a'])
print(df1)

       a
one    1
two    2
three  3
four   4
five   5


In [16]:
# DataFrame from list of lists
data = [['highlighters',200],['pencils',300],['pens',500]]
df2 = pd.DataFrame(data,columns=['Object','Quantity'])
print(df2)

         Object  Quantity
0  highlighters       200
1       pencils       300
2          pens       500


In [17]:
# Or a DataFrame form a dictionaries - (though be careful, if the dictionaries have different keys, you might not get what you think)
df3 = pd.DataFrame(office,entosp)
print(df3)

       highlighters  pencils  pens
one             200      300   700
two             200      300   700
three           200      300   700


### Accessing the DataFrame
To access the elements within your dataframe, you can index them using the row and column names as indices or somethings using standard indices conventions from Python

In [21]:
# Printing a column
print(df2['Quantity'],"\n")
# Printing an element
print(df3['pens']['three'])

0    200
1    300
2    500
Name: Quantity, dtype: int64 

700


Selecting a row requires that you use another method `loc` or `iloc`

In [19]:
print(df3.loc['one'],"\n")
print(df3.loc['one']['pens'],"\n")
print(df3.iloc[1],"\n")
print(df3.iloc[1]['pens'],"\n")

highlighters    200
pencils         300
pens            700
Name: one, dtype: int64 

700 

highlighters    200
pencils         300
pens            700
Name: two, dtype: int64 

700 



You can even use the `:` colon notation to get ranges of rows and columns

In [24]:
print(df2[:1][:2],'\n')
print(df2[:][:2])

         Object  Quantity
0  highlighters       200 

         Object  Quantity
0  highlighters       200
1       pencils       300
