### Daily learning notes on Python 

In [1]:
from datetime import date
today = str(date.today())
print('Last update on '+ today)

Last update on 2020-03-09


Useful links:
- python tutorial: https://morvanzhou.github.io/tutorials/data-manipulation/np-pd/2-8-np-copy/
- multiple github ids in the same account: https://code.tutsplus.com/tutorials/quick-tip-how-to-work-with-github-and-multiple-accounts--net-22574
- the YouTube [Corey Schafer channel](https://www.youtube.com/channel/UCCezIgC97PvUuR4_gbFUs5g)

- python tutorial: https://morvanzhou.github.io/tutorials/data-manipulation/np-pd/2-8-np-copy/
- multiple github ids in the same account: https://code.tutsplus.com/tutorials/quick-tip-how-to-work-with-github-and-multiple-accounts--net-22574

Import needed modules: 

In [2]:
from pandas import Series, DataFrame
import pandas as pd
import numpy as np

**Numpy**

In [8]:
x = np.arange(20).reshape((4,5))
x

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])

In [12]:
x_flat = x.flatten()
x_flat

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19])

In [13]:
x.shape

(4, 5)

**Column & row bind**

method 1 - use vstack and hstack

In [15]:
A = np.array(np.zeros(6)).reshape((2,3))
B = np.array(np.ones(6)).reshape((2,3))
         
print(np.vstack((A,B)))    # vertical stack
print(np.hstack((A,B)))    # horizontal stack

[[0. 0. 0.]
 [0. 0. 0.]
 [1. 1. 1.]
 [1. 1. 1.]]
[[0. 0. 0. 1. 1. 1.]
 [0. 0. 0. 1. 1. 1.]]


method 2 - use concantenate

In [19]:
A = np.array(np.zeros(6)).reshape((2,3))
B = np.array(np.ones(6)).reshape((2,3))
         
print(np.concatenate((A,B), axis = 0))  # row bind
print(np.concatenate((A,B), axis = 1))  # column bind

[[0. 0. 0.]
 [0. 0. 0.]
 [1. 1. 1.]
 [1. 1. 1.]]
[[0. 0. 0. 1. 1. 1.]
 [0. 0. 0. 1. 1. 1.]]


**Series**

series is a 1-dim array-like object

In [5]:
obj = Series([4,5,-1,9])

print(obj)
print(obj.values)
type(obj[1])

obj.index

0    4
1    5
2   -1
3    9
dtype: int64
[ 4  5 -1  9]


RangeIndex(start=0, stop=4, step=1)

** DataFrame **

create a dictionary:

In [16]:
data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'],
       'year': [2000, 2001, 2002, 2001, 2002],
       'pop': [1.5, 1.7, 3.6, 2.4, 2.9]}

In [17]:
data

{'pop': [1.5, 1.7, 3.6, 2.4, 2.9],
 'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'],
 'year': [2000, 2001, 2002, 2001, 2002]}

In [14]:
frame = DataFrame(data)

It looks more like a table for a DataFrame in the notebook:

In [15]:
frame

Unnamed: 0,pop,state,year
0,1.5,Ohio,2000
1,1.7,Ohio,2001
2,3.6,Ohio,2002
3,2.4,Nevada,2001
4,2.9,Nevada,2002


one can specify the sequence of the columns to shuffle the order:

In [18]:
frame2 = DataFrame(data, columns=['year', 'state', 'pop'])
frame2

Unnamed: 0,year,state,pop
0,2000,Ohio,1.5
1,2001,Ohio,1.7
2,2002,Ohio,3.6
3,2001,Nevada,2.4
4,2002,Nevada,2.9


two ways to retrieve a column in DataFrame:

In [19]:
frame2['year']

0    2000
1    2001
2    2002
3    2001
4    2002
Name: year, dtype: int64

In [20]:
frame2.year

0    2000
1    2001
2    2002
3    2001
4    2002
Name: year, dtype: int64

Retrieve a row:

In [24]:
frame2.ix[2]

year     2002
state    Ohio
pop       3.6
Name: 2, dtype: object

Modify a cell:

In [30]:
frame2['debt'] = 16.5
frame2

Unnamed: 0,year,state,pop,debt
0,2000,Ohio,1.5,16.5
1,2001,Ohio,1.7,16.5
2,2002,Ohio,3.6,16.5
3,2001,Nevada,2.4,16.5
4,2002,Nevada,2.9,16.5


In [32]:
frame2['debt'] = np.arange(5.)
frame2

Unnamed: 0,year,state,pop,debt
0,2000,Ohio,1.5,0.0
1,2001,Ohio,1.7,1.0
2,2002,Ohio,3.6,2.0
3,2001,Nevada,2.4,3.0
4,2002,Nevada,2.9,4.0


When treating a column as a attribute, it can not be assigned a new value:

In [35]:
print(np.arange(1,6))
frame2.debt = np.arange(1,6)
frame2

[1 2 3 4 5]


Unnamed: 0,year,state,pop,debt
0,2000,Ohio,1.5,0.0
1,2001,Ohio,1.7,1.0
2,2002,Ohio,3.6,2.0
3,2001,Nevada,2.4,3.0
4,2002,Nevada,2.9,4.0


**Dictionary**

In [2]:
student = {'name': 'Peter', 'age': 35, 'course': 'machine learning'}

In [4]:
type(student)

dict

In [5]:
student['name']

'Peter'

In [6]:
student.get('name')

'Peter'

the .get method is preferrable:

In [10]:
print(student['phone number'])

KeyError: 'phone number'

In [9]:
print(student.get('phone number'))

None


One can actually specify the return text for unfound keys:

In [11]:
print(student.get('phone number', 'not found'))

not found


Edit a key-value pair:

In [12]:
student['phone number'] = '111-222'
student['name'] = 'James'
student

{'age': 35,
 'course': 'machine learning',
 'name': 'James',
 'phone number': '111-222'}

Edit/update key-value using .update method: 

In [14]:
student.update({'name': 'Peter', 'phone number': '222-333'})
student

{'age': 35,
 'course': 'machine learning',
 'name': 'Peter',
 'phone number': '222-333'}

Delete an element:

In [30]:
del student['name']
print(student)

{'age': 35, 'course': 'machine learning', 'phone number': '222-333'}


In [15]:
len(student)

4

In [16]:
student.items()

dict_items([('name', 'Peter'), ('age', 35), ('course', 'machine learning'), ('phone number', '222-333')])

In [25]:
student.values()

dict_values(['Peter', 35, 'machine learning', '222-333'])

In [29]:
student_value = student.values()
list(student_value)[0]  # in case one wants to do indexing

'Peter'

Loop through a dictionary:

In [19]:
for key in student:
    print(key)

name
age
course
phone number


In [23]:
for key, value in student.items():
    print('The ' + key + ' is ' + str(value))

The name is Peter
The age is 35
The course is machine learning
The phone number is 222-333


**String**

In [1]:
message = "hello world"
print(message)

hello world


In [3]:
handle_quote = 'Fei\'s dog'
print(handle_quote)

handle_quote2 = "Fei's dog"
print(handle_quote2)

Fei's dog
Fei's dog


In [4]:
m = """i want to see this movie
it is pretty good!"""
print(m)

i want to see this movie
it is pretty good!


In [5]:
len(m)

43

In [6]:
m[0:20]

'i want to see this m'

In [7]:
m[:20]

'i want to see this m'

A *method* is a *function* that associates with an object

In [8]:
print(m.upper())

I WANT TO SEE THIS MOVIE
IT IS PRETTY GOOD!


In [9]:
m.count('is')

2

In [10]:
m.find('is')

16

In [12]:
greeting = "Hello"
name = "Alex"
greeting + ', ' + name

'Hello, Alex'

Formatted string:

In [14]:
message = '{}, {}. Welcome!'.format(greeting, name)
print(message)

Hello, Alex. Welcome!


The *f string* is available for python 3.6 and later versions

In [16]:
message = f'{greeting}, {name.upper()}. Welcome!'
print(message)

Hello, ALEX. Welcome!


In [17]:
print(dir(name))

['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']


In [18]:
print(help(name)) # this does not work - cannot put a specific string name in help

No Python documentation found for 'Alex'.
Use help() to get the interactive help utility.
Use help(str) for help on the str class.

None


In [19]:
print(help(str)) # this works - put the object in the help

Help on class str in module builtins:

class str(object)
 |  str(object='') -> str
 |  str(bytes_or_buffer[, encoding[, errors]]) -> str
 |  
 |  Create a new string object from the given object. If encoding or
 |  errors is specified, then the object must expose a data buffer
 |  that will be decoded using the given encoding and error handler.
 |  Otherwise, returns the result of object.__str__() (if defined)
 |  or repr(object).
 |  encoding defaults to sys.getdefaultencoding().
 |  errors defaults to 'strict'.
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __format__(...)
 |      S.__format__(format_spec) -> str
 |      
 |      Return a formatted version of S as described by format_spec.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getatt

**OS module and system functionality**

In [1]:
import os

In [2]:
print(dir(os))

['CLD_CONTINUED', 'CLD_DUMPED', 'CLD_EXITED', 'CLD_TRAPPED', 'DirEntry', 'EX_CANTCREAT', 'EX_CONFIG', 'EX_DATAERR', 'EX_IOERR', 'EX_NOHOST', 'EX_NOINPUT', 'EX_NOPERM', 'EX_NOUSER', 'EX_OK', 'EX_OSERR', 'EX_OSFILE', 'EX_PROTOCOL', 'EX_SOFTWARE', 'EX_TEMPFAIL', 'EX_UNAVAILABLE', 'EX_USAGE', 'F_LOCK', 'F_OK', 'F_TEST', 'F_TLOCK', 'F_ULOCK', 'MutableMapping', 'NGROUPS_MAX', 'O_ACCMODE', 'O_APPEND', 'O_ASYNC', 'O_CLOEXEC', 'O_CREAT', 'O_DIRECTORY', 'O_DSYNC', 'O_EXCL', 'O_EXLOCK', 'O_NDELAY', 'O_NOCTTY', 'O_NOFOLLOW', 'O_NONBLOCK', 'O_RDONLY', 'O_RDWR', 'O_SHLOCK', 'O_SYNC', 'O_TRUNC', 'O_WRONLY', 'PRIO_PGRP', 'PRIO_PROCESS', 'PRIO_USER', 'P_ALL', 'P_NOWAIT', 'P_NOWAITO', 'P_PGID', 'P_PID', 'P_WAIT', 'PathLike', 'RTLD_GLOBAL', 'RTLD_LAZY', 'RTLD_LOCAL', 'RTLD_NODELETE', 'RTLD_NOLOAD', 'RTLD_NOW', 'R_OK', 'SCHED_FIFO', 'SCHED_OTHER', 'SCHED_RR', 'SEEK_CUR', 'SEEK_END', 'SEEK_SET', 'ST_NOSUID', 'ST_RDONLY', 'TMP_MAX', 'WCONTINUED', 'WCOREDUMP', 'WEXITED', 'WEXITSTATUS', 'WIFCONTINUED', 'WIFEX

Print the current working directory:

In [3]:
os.getcwd()

'/Users/firefreezing/DataScience/PythonPgm'

In [4]:
os.listdir()

['.DS_Store',
 '4_sklearn_pipline.ipynb',
 'README.md',
 '2_freq_use_tricks.ipynb',
 '.gitignore',
 '.ipynb_checkpoints',
 '1_daily_Python_notes.ipynb',
 'Ex_Files_Python_Data_Science_EssT',
 'learn_py.py',
 '.git',
 'data',
 '3_tidy_data_demo.ipynb']

In [14]:
os.mkdir('demo_file/subdir')

FileExistsError: [Errno 17] File exists: 'demo_file/subdir'

In [7]:
os.makedirs('demo_file/subdir')  # create a subfolder under a new parent folder

FileExistsError: [Errno 17] File exists: 'demo_file/subdir'

In [12]:
print('there is a new parent folder: {}'.format(os.listdir()))
print('\n and also a sub-folder: {}'.format(os.listdir('./demo_file')))

there is a new parent folder: ['.DS_Store', '4_sklearn_pipline.ipynb', 'README.md', '2_freq_use_tricks.ipynb', '.gitignore', '.ipynb_checkpoints', '1_daily_Python_notes.ipynb', 'Ex_Files_Python_Data_Science_EssT', 'learn_py.py', '.git', 'data', '3_tidy_data_demo.ipynb', 'demo_file']

 and also a sub-folder: ['subdir']


In [15]:
os.removedirs('demo_file/subdir')

In [16]:
os.stat('1_daily_Python_notes.ipynb')

os.stat_result(st_mode=33188, st_ino=8601403527, st_dev=16777221, st_nlink=1, st_uid=501, st_gid=20, st_size=60791, st_atime=1546802430, st_mtime=1546802429, st_ctime=1546802429)

In [17]:
os.environ.get('HOME')

'/Users/firefreezing'

In [20]:
file_name = os.path.join(os.environ.get('HOME'), 'my_file')
print(file_name)

/Users/firefreezing/my_file


In [21]:
os.path.exists(file_name)

False

In [22]:
print(os.path.basename(file_name))
print(os.path.dirname(file_name))
print(os.path.split(file_name))

my_file
/Users/firefreezing
('/Users/firefreezing', 'my_file')
