#### Day 9: Data types and Tricks

Overview: 
1. Pandas
2. NumPy
3. More on Tuples
4. More on Lists
5. More on Dictionary
6. for - else
7. Tree Data Structure


_Don't expect you to remember every tip that we go through, but this can be a helpful reference going forward._

Source: https://book.pythontips.com/en/latest/index.html




#### Part 1. Pandas

##### 1.1 Installation: Customarily, we import as follows: 


In [3]:
# ! pip3 install pandas
# ! pip3 install numpy
import pandas as pd
import numpy as np

##### 1.2 Object Creation

- Creating a `Series` by passing a list of values, letting pandas create a default integer index. 
- A `Series` is just a n by 1 array. Similar to a vector in R. 
- Holds any type of data (similar to lists in Python). 

In [None]:
# Create a pd.Series with 4 elements:
x = pd.Series([10, 100, 12, 5])

In [None]:
# We index a Series as we do with lists
x[0]

In [None]:
# We can also include and index many types of data
y = pd.Series([10, 'a string', ['a','b'], 5])

In [None]:
y[2]
# y[2][0]

- Creating a `DataFrame` by passing a dictionary of objects that can be converted into a series-like structure.
- A `DataFrame` is an n by k array. Similar to a data.frame in R. 
- We now have columns and rows. 

In [None]:
# The input can be a dictionary
dt = {"A" : 1.2, "B" : [10, 8, 3, 4, 8, 6], "C" : "hi"}
df1 = pd.DataFrame(dt)
df1

- Note that the data recycles

- Creating a `DataFrame` by passing a list-like object

In [None]:
# The input can also be a list of lists
l = [[1.2], [10, 8, 3, 4, 8, 6], ["hi"]]
# We need to list the columns' names.
df2 = pd.DataFrame(l, columns = ["A", "B", "C", "D", "E", "F"])
df2

Q: How is `df1` different from `df2`?

<br>
<br>
<br>
<br>
<br>
<br>

`df1`: 
- 6 by 3
- data recycled to fill in the values

`df2`: 
- 3 by 6
- no data recycling when using lists

In [None]:
df2.dtypes # check data types

##### 1.3 Viewing and Changing Metadata

- View the data types

In [None]:
df1.dtypes
# df2.dtypes

- Check the row indices

In [None]:
df1.index
# df2.index

- Rename the rows

In [None]:
df2 = df2.rename(index = {0:"One", 1:"Two", 2:"Three"})
df2

- View the column names

In [None]:
df2.columns

- Change columns names

In [None]:
df2.rename(columns={"A": "Renamed_A", 
                    "B": "Renamed_B",
                    "C": "Renamed_C",
                    # "D": "Renamed_D",
                    "E": "Renamed_E",
                    "F": "Renamed_F"}) # Rename columns using a mapping

In [None]:
df2.columns

In [None]:
df2 = df2.rename(columns={"A": "Renamed_A", 
                        "B": "Renamed_B",
                        "C": "Renamed_C",
                        "D": "Renamed_D",
                        "E": "Renamed_E",
                        "F": "Renamed_F"}) # Rename columns using a mapping

In [None]:
df2.columns

- Add more columns to your dataframe

In [None]:
# option 1: use an object 
df1['D'] = 'I am Column D'
df1

In [None]:
# option 2: use a list 
df1['E'] = [0, 2, 4, 6, 8, 10]
df1

In [None]:
# option 3: use assign
df1 = df1.assign(F = ["x", "x", "x", "x", "x", None])
df1

In [None]:
# option 4: use insert (insert new col at a specific position)
df1.insert(3, "H", ["H", "H", "H", "H", "H", "H"]) # at index 3
df1

##### 1.4 Viewing and Changing Actual Data

- Look at the first several rows

In [None]:
df1.head(3) # default is 5, similar to head() in R

- Look at the last several rows

In [None]:
df1.tail(2) # default is 5, similar to tail() in R

- Describe statistics for the dataframe, similar to `summary()` in R

In [None]:
df1.describe()

- Sort the dataframe using a specific column

In [None]:
df1.sort_values(by = "B", ascending = True)

- Access specific values in a DataFrame using indexing

In [None]:
df1["A"] # gets a column
# df1["A"][0] # gets an element in a column 

In [None]:
df1[0:4] # gets first 4 rows 
# df1[:4]

In [None]:
df1.loc[:,["C", "D"]] # All rows from column "C" & "D"
# df1.loc[0:2, ["C"]] # First three rows from column "C" 

In [None]:
df1.iloc[0:2, 0:3] # First 2 rows and 3 colunms

Difference between `.iloc` and `.loc`:
- `.loc`: gets rows / columns using labels
- `.iloc`: gets rows / columns using positions (integer location indexing)

<br>

- Read files from CSV into pandas DataFrame

In [None]:
my_data = pd.read_csv("S117_members.csv")
my_data.head()

- Save pandas DataFrame into csv files

In [None]:
my_data.to_csv('test_csv.csv')

#### Part 2. NumPy

- NumPy's main object is the multidimensional array. 
- It is a table of elements, usually numbers, all of the same type, indexed by a tuple of non-negative integers.
- NumPy’s array class is called `ndarray`. It is also known by the alias `array`.

<br>

- Let's create a 3 by 1 array

In [8]:
a = np.array([1,2,3])
a

array([1, 2, 3])

In [9]:
type(a)

numpy.ndarray

- Let's create an 2 x 3 array

In [11]:
a = np.array([(1, 2, 3), (1, 2, 3)])
a

array([[1, 2, 3],
       [1, 2, 3]])

- Arrays are faster than lists
- We can do math operations on them!

In [13]:
print(a.ndim) # check the number of dimensions of a numpy array

2


In [14]:
# If two arrays have the same dimension 
a * a

array([[1, 4, 9],
       [1, 4, 9]])

In [15]:
a + a

array([[2, 4, 6],
       [2, 4, 6]])

In [16]:
a - a

array([[0, 0, 0],
       [0, 0, 0]])

In [17]:
a / a

array([[1., 1., 1.],
       [1., 1., 1.]])

- We can use `NumPy` arrays to create a DataFrame

In [24]:
np.random.seed(42) # set seed for reproducibility
a = np.random.randn(6,4) # generate a random array of dimension 6 x 4 
a

array([[ 0.49671415, -0.1382643 ,  0.64768854,  1.52302986],
       [-0.23415337, -0.23413696,  1.57921282,  0.76743473],
       [-0.46947439,  0.54256004, -0.46341769, -0.46572975],
       [ 0.24196227, -1.91328024, -1.72491783, -0.56228753],
       [-1.01283112,  0.31424733, -0.90802408, -1.4123037 ],
       [ 1.46564877, -0.2257763 ,  0.0675282 , -1.42474819]])

In [26]:
df_array = pd.DataFrame(a, columns = ["A", "B", "C", "D"])
df_array

Unnamed: 0,A,B,C,D
0,0.496714,-0.138264,0.647689,1.52303
1,-0.234153,-0.234137,1.579213,0.767435
2,-0.469474,0.54256,-0.463418,-0.46573
3,0.241962,-1.91328,-1.724918,-0.562288
4,-1.012831,0.314247,-0.908024,-1.412304
5,1.465649,-0.225776,0.067528,-1.424748


- More on `NumPy` here: https://www.edureka.co/blog/python-numpy-tutorial/

<br>

#### Part 3. Tuples

- Remember, Tuples are immutable

In [29]:
my_tuple = (7,'b',3,'d',5,'b')
# my_tuple[1] = {'b':2} # this will break

- Indexing Tuples

In [31]:
my_tuple[0]  # element at index  0

7

In [32]:
my_tuple.index('b') ## Gives the index of 'b' - only the first occurence!

1

In [33]:
my_tuple.count('b') ## Gives the number of times 'b' occurs

2

#### Part 4. Lists

- Let's create a list from 0 to 9 squared

In [36]:
my_square_list=[]
for i in range(0,10):
	my_square_list.append(i**2)
my_square_list

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

- We can do all this in one line!

In [39]:
my_square_list = [i**2 for i in range(10)]
my_square_list

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

- We can also use `map()` with `lambda` to help us do the same!
- `map()` is similar to the apply() or sapply() in R
- `lambda` creates an anonymous function, like function(x)

In [49]:
my_square_list = map(lambda x: x**2, range(0,10))
my_square_list

<map at 0x122067880>

- We need to transform the output into a list or tupple to view in the output

In [50]:
list(my_square_list) 

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

- Another way to use `map()`

In [52]:
def sqr(x): 
	return x**2

my_square_list = map(sqr, range(0,10))
list(my_square_list)

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

- Or we can do all that in one line

In [58]:
my_square_list = list(map(sqr, range(0,10)))
my_square_list

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

- `zip()` combines elements of 2 lists with matching indices into an iterable of tuples

In [59]:
my_list = range(0,10)
# An simple example:
for i, j in zip(my_list, my_square_list): 
    print(i * j)

0
1
8
27
64
125
216
343
512
729


- Let's combine `zip()` with `map()`

In [62]:
zipped = list(zip(my_list, my_square_list, map(lambda x: my_square_list[x] * my_list[x], range(0, 10))))
zipped

[(0, 0, 0),
 (1, 1, 1),
 (2, 4, 8),
 (3, 9, 27),
 (4, 16, 64),
 (5, 25, 125),
 (6, 36, 216),
 (7, 49, 343),
 (8, 64, 512),
 (9, 81, 729)]

- We can also unzip the object using `*`

In [64]:
unzipped1, unzipped2, unzipped3 = zip(*zipped)
unzipped1

(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)

In [65]:
unzipped2

(0, 1, 4, 9, 16, 25, 36, 49, 64, 81)

In [66]:
unzipped3

(0, 1, 8, 27, 64, 125, 216, 343, 512, 729)

- Let's look at some other methods for lists

In [69]:
x = [3,6,1,2,8,3,5,7]

- Reverse a list (don't need assignment)

In [70]:
x.reverse() 
x

[7, 5, 3, 8, 2, 1, 6, 3]

- Sort a list (don't need assignment)

In [73]:
x.sort(reverse = False) 
x

[1, 2, 3, 3, 5, 6, 7, 8]

- `.append()` append as the whole list

In [74]:
x.append([10, 12, 14])
x

[1, 2, 3, 3, 5, 6, 7, 8, [10, 12, 14]]

- `.extend()` add items to the original list one by one

In [75]:
x.extend([11, 12, 14])
x

[1, 2, 3, 3, 5, 6, 7, 8, [10, 12, 14], 11, 12, 14]

- `.insert()` insert in a specific position by index

In [76]:
x.insert(1,'999')
x

[1, '999', 2, 3, 3, 5, 6, 7, 8, [10, 12, 14], 11, 12, 14]

- `.remove()` remove the first occurrence

In [77]:
x.remove('999')
x

[1, 2, 3, 3, 5, 6, 7, 8, [10, 12, 14], 11, 12, 14]

- `enumerate()`
- adds a counter to an iterable object
- returns it in a form of enumerate object 

<br>

- Let's look at an example with a list of your names

In [85]:
names = ["Leticia", "Irene", "Jie", "Masanori", "Tian"]
enumerate(names)
list(enumerate(names))

[(0, 'Leticia'), (1, 'Irene'), (2, 'Jie'), (3, 'Masanori'), (4, 'Tian')]

- We can use enumerate objects in our loops

In [86]:
for name in enumerate(names): 
    print(name)

(0, 'Leticia')
(1, 'Irene')
(2, 'Jie')
(3, 'Masanori')
(4, 'Tian')


- We could also refer to the counter and names separately in our loop

In [88]:
for number, name in enumerate(names):
	print("Number: {}, Name: {}".format(number, name))

Number: 0, Name: Leticia
Number: 1, Name: Irene
Number: 2, Name: Jie
Number: 3, Name: Masanori
Number: 4, Name: Tian


#### Part 5. Dictionary

- We can also use `zip()` to create dictionaries

In [93]:
d = dict(zip(names,range(0,5)))
d

{'Leticia': 0, 'Irene': 1, 'Jie': 2, 'Masanori': 3, 'Tian': 4}

- Access elements from a dictionary using the keys

In [94]:
d['Leticia']

0

- Get all the keys

In [96]:
d.keys()

dict_keys(['Leticia', 'Irene', 'Jie', 'Masanori', 'Tian'])

- Get all the values

In [97]:
d.values()

dict_values([0, 1, 2, 3, 4])

- Get all key-value pairs

In [99]:
d.items()

dict_items([('Leticia', 0), ('Irene', 1), ('Jie', 2), ('Masanori', 3), ('Tian', 4)])

- We can use the key-value pairs in our loops to iterate through

In [106]:
for key, value in d.items(): 
    print("Key: {}. \nValue: {}".format(key, value))

Key: Leticia. 
Value: 0
Key: Irene. 
Value: 1
Key: Jie. 
Value: 2
Key: Masanori. 
Value: 3
Key: Tian. 
Value: 4


- Add a new instance in your dict

In [110]:
d.update({'Cecilia': 5})
d

{'Leticia': 0,
 'Irene': 1,
 'Jie': 2,
 'Masanori': 3,
 'Tian': 4,
 'A': 27,
 'Peter': 5,
 'Cecilia': 5}

- Overwrite a value using the key

In [111]:
d.update({'Cecilia': 20})
d

{'Leticia': 0,
 'Irene': 1,
 'Jie': 2,
 'Masanori': 3,
 'Tian': 4,
 'A': 27,
 'Peter': 5,
 'Cecilia': 20}

- Or a faster way

In [112]:
d['Cecilia'] = 50

- You can also put in a list of values under one key

In [114]:
d['Cecilia'] = [10,20,30]
d

{'Leticia': 0,
 'Irene': 1,
 'Jie': 2,
 'Masanori': 3,
 'Tian': 4,
 'A': 27,
 'Peter': 5,
 'Cecilia': [10, 20, 30]}

#### Part 6. for else

- We know that we can use `for` loops like this: 

In [115]:
for i in range(1,20):
    if i % 5 == 0:
        print(i)

5
10
15


- `for` loops also have an `else` clause!

In [129]:
# What is happening here? 
for i in range(1,20):
	if i % 5 == 0:
		print(i)		
else:
	print('print this')
	
print('this other thing')

5
10
15
print this
this other thing


- What is happening here? 

In [128]:
for i in range(1,20):
	if i % 5 == 0:
		print(i)
		break		
else:
	print('print this')
	
print('this other thing')

5
this other thing


- Let's break it down by parts!

In [125]:
for i in range(1,20):
	if i % 5 == 0:
		# print(i)
		break	

In [130]:
for i in range(1,20):
	if i % 5 == 0:
		# print(i)
		break		
else:
	print('print this')
    
print('this other thing')

this other thing


In [131]:
for i in range(1,20):
	if i == 0:
		# print(i)
		break		
else:
	print('print this')
    
print('this other thing')

print this
this other thing


- See also: https://book.pythontips.com/en/latest/for_-_else.html

#### Part 7. Tree

- Tree is a binary data structure
- Example: https://www.cdn.geeksforgeeks.org/wp-content/uploads/binary-tree-to-DLL.png

![Tree](tree.png)

- It contains some data, and left/right child. 
- It has nodes (i.e., child) and edges that are connecting the nodes. 

<br>

- Let's create a tree!
- First, let's create the `Node` Class()

In [133]:
class Node():
	def __init__(self, value = None):
		self.value = value
		self.parent = None
		self.children = [None, None]			
		
	def __repr__(self):
		return "Node object with value %s" %(self.value)
		
	def __str__(self):
		if self.children != (None,None):
			return "Node value: %s \n left child: \n %s \n right child: \n %s" %(self.value,self.children[0],self.children[1])
		else: 
			return "Node value: %s" % self.value	


- Let's also create a class for Tree

In [134]:
class Tree():
	def __init__(self, root=None):
		self.root = root # First node
		self.branches = [[root]] # All branches. 
		# We use [[]] because we want to make the object root
		# in a list. But, we also want to make root a list.
		
	def add_branch(self, node, children):
		node.children = children # update object node
		for branch in self.branches: # get branches from the object node
			if branch[-1] == node: # check the last node in a given branch is each to the node
				# if we find a match, we add the children to the correct place
				newbranch = branch + [children[0]]
				newbranch2 = branch + [children[1]]
				self.branches.append(newbranch)
				self.branches.append(newbranch2)
				self.branches.remove(branch)

- Now let's create some instances of Nodes!

In [135]:
node1 = Node(1)
node2 = Node(2)
node3 = Node(3)
node4 = Node(4)
node5 = Node(5)

In [136]:
node1 # representation of the object

Node object with value 1

In [138]:
print(node1) # print() object

Node value: 1 
 left child: 
 None 
 right child: 
 None


- Let's make a Tree!

In [139]:
mytree = Tree(root = node1) # create an instance

In [141]:
mytree.branches # Check branches

[[Node object with value 1]]

In [142]:
mytree.add_branch(node = node1, children = [node2, node3]) # nodes 2 and 3 are children of node 1

In [143]:
mytree.add_branch(node = node2, children = [node4, node5]) # nodes 4 and 5 are children of node 2

In [144]:
mytree.root # Check root

Node object with value 1

In [146]:
mytree.branches # Check branches

[[Node object with value 1, Node object with value 3],
 [Node object with value 1,
  Node object with value 2,
  Node object with value 4],
 [Node object with value 1,
  Node object with value 2,
  Node object with value 5]]

In [148]:
print(node1) # print node1

Node value: 1 
 left child: 
 Node value: 2 
 left child: 
 Node value: 4 
 left child: 
 None 
 right child: 
 None 
 right child: 
 Node value: 5 
 left child: 
 None 
 right child: 
 None 
 right child: 
 Node value: 3 
 left child: 
 None 
 right child: 
 None


In [149]:
print(node2) # print node2

Node value: 2 
 left child: 
 Node value: 4 
 left child: 
 None 
 right child: 
 None 
 right child: 
 Node value: 5 
 left child: 
 None 
 right child: 
 None


In [150]:
print(node5) # print node3

Node value: 5 
 left child: 
 None 
 right child: 
 None


In [None]:
# Copyright (c) 2014 Matt Dickenson
# 
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
# 
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
# 
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.