### Daily learning notes on Python 

In [31]:
from datetime import date
today = str(date.today())
print('Last update on '+ today)

Last update on 2018-12-31


Useful links:
- python tutorial: https://morvanzhou.github.io/tutorials/data-manipulation/np-pd/2-8-np-copy/
- multiple github ids in the same account: https://code.tutsplus.com/tutorials/quick-tip-how-to-work-with-github-and-multiple-accounts--net-22574
- the YouTube [Corey Schafer channel](https://www.youtube.com/channel/UCCezIgC97PvUuR4_gbFUs5g)

- python tutorial: https://morvanzhou.github.io/tutorials/data-manipulation/np-pd/2-8-np-copy/
- multiple github ids in the same account: https://code.tutsplus.com/tutorials/quick-tip-how-to-work-with-github-and-multiple-accounts--net-22574

Import needed modules: 

In [2]:
from pandas import Series, DataFrame
import pandas as pd
import numpy as np

**Numpy**

In [8]:
x = np.arange(20).reshape((4,5))
x

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])

In [12]:
x_flat = x.flatten()
x_flat

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19])

In [13]:
x.shape

(4, 5)

**Column & row bind**

method 1 - use vstack and hstack

In [15]:
A = np.array(np.zeros(6)).reshape((2,3))
B = np.array(np.ones(6)).reshape((2,3))
         
print(np.vstack((A,B)))    # vertical stack
print(np.hstack((A,B)))    # horizontal stack

[[0. 0. 0.]
 [0. 0. 0.]
 [1. 1. 1.]
 [1. 1. 1.]]
[[0. 0. 0. 1. 1. 1.]
 [0. 0. 0. 1. 1. 1.]]


method 2 - use concantenate

In [19]:
A = np.array(np.zeros(6)).reshape((2,3))
B = np.array(np.ones(6)).reshape((2,3))
         
print(np.concatenate((A,B), axis = 0))  # row bind
print(np.concatenate((A,B), axis = 1))  # column bind

[[0. 0. 0.]
 [0. 0. 0.]
 [1. 1. 1.]
 [1. 1. 1.]]
[[0. 0. 0. 1. 1. 1.]
 [0. 0. 0. 1. 1. 1.]]


**Series**

series is a 1-dim array-like object

In [5]:
obj = Series([4,5,-1,9])

print(obj)
print(obj.values)
type(obj[1])

obj.index

0    4
1    5
2   -1
3    9
dtype: int64
[ 4  5 -1  9]


RangeIndex(start=0, stop=4, step=1)

** DataFrame **

create a dictionary:

In [16]:
data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'],
       'year': [2000, 2001, 2002, 2001, 2002],
       'pop': [1.5, 1.7, 3.6, 2.4, 2.9]}

In [17]:
data

{'pop': [1.5, 1.7, 3.6, 2.4, 2.9],
 'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'],
 'year': [2000, 2001, 2002, 2001, 2002]}

In [14]:
frame = DataFrame(data)

It looks more like a table for a DataFrame in the notebook:

In [15]:
frame

Unnamed: 0,pop,state,year
0,1.5,Ohio,2000
1,1.7,Ohio,2001
2,3.6,Ohio,2002
3,2.4,Nevada,2001
4,2.9,Nevada,2002


one can specify the sequence of the columns to shuffle the order:

In [18]:
frame2 = DataFrame(data, columns=['year', 'state', 'pop'])
frame2

Unnamed: 0,year,state,pop
0,2000,Ohio,1.5
1,2001,Ohio,1.7
2,2002,Ohio,3.6
3,2001,Nevada,2.4
4,2002,Nevada,2.9


two ways to retrieve a column in DataFrame:

In [19]:
frame2['year']

0    2000
1    2001
2    2002
3    2001
4    2002
Name: year, dtype: int64

In [20]:
frame2.year

0    2000
1    2001
2    2002
3    2001
4    2002
Name: year, dtype: int64

Retrieve a row:

In [24]:
frame2.ix[2]

year     2002
state    Ohio
pop       3.6
Name: 2, dtype: object

Modify a cell:

In [30]:
frame2['debt'] = 16.5
frame2

Unnamed: 0,year,state,pop,debt
0,2000,Ohio,1.5,16.5
1,2001,Ohio,1.7,16.5
2,2002,Ohio,3.6,16.5
3,2001,Nevada,2.4,16.5
4,2002,Nevada,2.9,16.5


In [32]:
frame2['debt'] = np.arange(5.)
frame2

Unnamed: 0,year,state,pop,debt
0,2000,Ohio,1.5,0.0
1,2001,Ohio,1.7,1.0
2,2002,Ohio,3.6,2.0
3,2001,Nevada,2.4,3.0
4,2002,Nevada,2.9,4.0


When treating a column as a attribute, it can not be assigned a new value:

In [35]:
print(np.arange(1,6))
frame2.debt = np.arange(1,6)
frame2

[1 2 3 4 5]


Unnamed: 0,year,state,pop,debt
0,2000,Ohio,1.5,0.0
1,2001,Ohio,1.7,1.0
2,2002,Ohio,3.6,2.0
3,2001,Nevada,2.4,3.0
4,2002,Nevada,2.9,4.0


**Dictionary**

In [2]:
student = {'name': 'Peter', 'age': 35, 'course': 'machine learning'}

In [4]:
type(student)

dict

In [5]:
student['name']

'Peter'

In [6]:
student.get('name')

'Peter'

the .get method is preferrable:

In [10]:
print(student['phone number'])

KeyError: 'phone number'

In [9]:
print(student.get('phone number'))

None


One can actually specify the return text for unfound keys:

In [11]:
print(student.get('phone number', 'not found'))

not found


Edit a key-value pair:

In [12]:
student['phone number'] = '111-222'
student['name'] = 'James'
student

{'age': 35,
 'course': 'machine learning',
 'name': 'James',
 'phone number': '111-222'}

Edit/update key-value using .update method: 

In [14]:
student.update({'name': 'Peter', 'phone number': '222-333'})
student

{'age': 35,
 'course': 'machine learning',
 'name': 'Peter',
 'phone number': '222-333'}

Delete an element:

In [30]:
del student['name']
print(student)

{'age': 35, 'course': 'machine learning', 'phone number': '222-333'}


In [15]:
len(student)

4

In [16]:
student.items()

dict_items([('name', 'Peter'), ('age', 35), ('course', 'machine learning'), ('phone number', '222-333')])

In [25]:
student.values()

dict_values(['Peter', 35, 'machine learning', '222-333'])

In [29]:
student_value = student.values()
list(student_value)[0]  # in case one wants to do indexing

'Peter'

Loop through a dictionary:

In [19]:
for key in student:
    print(key)

name
age
course
phone number


In [23]:
for key, value in student.items():
    print('The ' + key + ' is ' + str(value))

The name is Peter
The age is 35
The course is machine learning
The phone number is 222-333
