# Data Open Lecture: An Introduction of Python

### Fangda Fan

### Dec 2016

## Contents

- Basic Python Syntax
- Numpy Array for Vectorize Operation
- Pandas DataFrame for Data Analysis
- Statsmodels for Statistical Modeling
- Scikit-learn for Machine Learning
- Review: Data Analysis for Titanic

## Preparation

- Bring your laptop
- Download and Install [Anaconda](https://www.continuum.io/downloads) (Python 3.5 version)
- Open Jupyter Notebook
- Create a new notebook: Right-upper area $\rightarrow$ New $\rightarrow$ Notebooks $\rightarrow$ Python
- Be ready to type in code!

## Our Goal

- Focus on how to *analyze data* with Python
- Not a Python tutorial
- Final Project: Make a good submission on [Titanic](https://www.kaggle.com/c/titanic) of [Kaggle](https://www.kaggle.com/) (a machine learning competition website)

## What is Python?

- A widely used high-level, general-purpose language
- Highly readable
- Abundant Modules
- A [comparsion](https://www.datacamp.com/community/tutorials/r-or-python-for-data-analysis) between Python and R
- A [poll](http://www.kdnuggets.com/polls/2015/r-vs-python.html): Primary programming language for Analytics, Data Mining, Data Science tasks:
    - R: 51% (2015) vs. 46% (2014)
    - Python: 29% (2015) vs. 23% (2014)
    - Others(Java, Matlab, SAS, etc): 17% (2015) vs. 23% (2014)
    - None: 2% (2015) vs. 8% (2014)

In [1]:
import pandas as pd
da = pd.DataFrame([[51, 46], [29, 23], [17, 23], [2, 8]], 
                  index = ["R", "Python", "Others", "None"], 
                  columns = ["2015", "2014"])
da

Unnamed: 0,2015,2014
R,51,46
Python,29,23
Others,17,23
,2,8


In [2]:
?da.plot

In [2]:
da.mean(axis = 1)

R         48.5
Python    26.0
Others    20.0
None       5.0
dtype: float64

In [3]:
%matplotlib notebook
da.plot(kind = "bar", rot = 0)

<IPython.core.display.Javascript object>

<matplotlib.axes._subplots.AxesSubplot at 0x1153fc9e8>

## How to get help

- From Function Help: ? + function
    - ?pd.DataFrame
    - ?da.plot
- From Books:
    - [Python for Data Analysis](http://www3.canisius.edu/~yany/python/Python4DataAnalysis.pdf), Wes McKinney
    - [Learning Python](http://stock.ethop.org/pdf/python/Learning%20Python,%205th%20Edition.pdf), Mark Lutz
- From Online Course: https://www.datacamp.com/
- From Google: Your Question (what you want to do) + Python
    - read SAS file Python
    - string operation Python

# Basic Python Syntax

## Contents

1. Number
2. String
3. List
    - For-loop
4. Dictionary
5. Tuple
6. Boolean
    - If-else condition
7. Function
8. Other Special Commands

## Basic Python Syntax (1): Number

In [3]:
1 + 2 + 3

6

In [4]:
# compute 4 * 0.5 and save value as variable 'a'
a = 4 * 0.5
a

2.0

In [5]:
# variable 'a' to the power 10
a ** 10

1024.0

In [6]:
(a - 6) / 3

-1.3333333333333333

## Basic Python Syntax (2.1): String

- A string can be created by either ' or "

In [7]:
b = "red apple"
# Select the substring at index 0 (head character)
b[0]

'r'

In [8]:
# Slice from index 4 to 7 (right-side not included)
b[4:7]

'app'

In [9]:
# Slice from index 4
b[4:]

'apple'

In [10]:
# A multi-lines string can be created with triple quotes ''' or """
'''
For the earth shall be full of the knowledge of the LORD
    as the waters cover the sea.
(Isaiah 11:9 ESV)
'''

'\nFor the earth shall be full of the knowledge of the LORD\n    as the waters cover the sea.\n(Isaiah 11:9 ESV)\n'

## Basic Python Syntax (2.2): String Function

In [11]:
# Concatenate string b with another string "green apple"
b + 'green apple'

'red applegreen apple'

In [12]:
# Repeat the same string for 3 times
b * 3

'red applered applered apple'

In [13]:
# Function format() in a string uses a brace '{}' to assign values
"I have {} apples, one is a {}, the other one is also a {}".format(1+1, b, b)

'I have 2 apples, one is a red apple, the other one is also a red apple'

In [14]:
# Split by character
b.split(" ")

['red', 'apple']

In [15]:
# Join by character
"-".join(['red', 'apple'])

'red-apple'

## Basic Python Syntax (3.1): List

- To store more than one values: [value0, value1, value2, ...]
- Values are accessed by position number
- Can contain mixed types of values

In [16]:
# Create a list with a square bracket
c = [1, 3, 5, 7]
c[1]

3

In [17]:
# Slice from index 1 to index 3 (right-side not included)
c[1:3]

[3, 5]

In [18]:
# Change the last value to 9 (-N denotes the Nth index counted from the tail)
c[-1] = 9
c

[1, 3, 5, 9]

In [19]:
# Replace position 1 to 3 with a new list [2, 4, 6, 8]
c[1:3] = [2, 4, 6, 8]
c

[1, 2, 4, 6, 8, 9]

## Basic Python Syntax (3.2): List Function

In [20]:
# Length of list c
len(c)

6

In [21]:
# Concatenate list c with another list (not elementwise plus!)
c + [1, 2, 3, 4, 5, 6]

[1, 2, 4, 6, 8, 9, 1, 2, 3, 4, 5, 6]

In [22]:
# Repeat the same list for 3 times (not elementwise multiply!)
c * 3

[1, 2, 4, 6, 8, 9, 1, 2, 4, 6, 8, 9, 1, 2, 4, 6, 8, 9]

In [23]:
# Append a new element "apple" to the last of list c
c.append("apple")
c

[1, 2, 4, 6, 8, 9, 'apple']

In [24]:
# Insert "red" to position 3 of list c
c.insert(3, "red")
c

[1, 2, 4, 'red', 6, 8, 9, 'apple']

## Basic Python Syntax (3.3): For-loop

In [25]:
# A common for-loop: square of 0 to 9
output = []
for i in range(10):
    value = i ** 2
    output.append(value)
output

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [26]:
# An list-inside for-loop: square of 0 to 9
[i ** 2 for i in range(10)]

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [27]:
# An list-inside for-loop with if-conditions
[i ** 2 for i in range(10) if i > 4]

[25, 36, 49, 64, 81]

In [28]:
# Nested List
[[i, j] for i in range(4) for j in range(4) if i > j]

[[1, 0], [2, 0], [2, 1], [3, 0], [3, 1], [3, 2]]

In [29]:
# An list can be directly used as iterator
[i * 3 for i in ["a", "ab", 3, 4, [5, 6]]]

['aaa', 'ababab', 9, 12, [5, 6, 5, 6, 5, 6]]

## Basic Python Syntax (4.1): Dictionary

- To store values with keys: {key0: value0, key1: value1, key2: value2, ...}
- Values are accessed by key, not position number

In [30]:
d = {"name": "gold", "price": 4000}
d

{'name': 'gold', 'price': 4000}

In [31]:
# Select the value with key "name" 
d["name"]

'gold'

In [32]:
# Update the value of price by plus 500
d["price"] += 500
d["price"]

4500

In [33]:
# Another way to create a dict: function dict()
dict(R = 51, Python = 29, others = 17, none = 2)

{'Python': 29, 'R': 51, 'none': 2, 'others': 17}

In [34]:
# Create a dict from two lists: function zip()
dict(zip(["R", "Python", "others", "none"], [51, 29, 17, 2]))

{'Python': 29, 'R': 51, 'none': 2, 'others': 17}

## Basic Python Syntax (4.2): Dictionary Functions

In [35]:
# Check all keys of dict d
d.keys()

dict_keys(['price', 'name'])

In [36]:
# Check all values of dict d
d.values()

dict_values([4500, 'gold'])

In [37]:
# Update dict d with another dict
d.update({"price": 6000, "status": "sold out"})
d

{'name': 'gold', 'price': 6000, 'status': 'sold out'}

In [38]:
# List keys of dict d
list(d.keys())

['status', 'price', 'name']

In [39]:
# Remove an element from dict d
d.pop("price")
d

{'name': 'gold', 'status': 'sold out'}

## Basic Python Syntax (5): Tuple
- To store more than one fixed values: (value0, value1, value2, ...)
- Like a list, but cannot be revised

In [40]:
# Check the type of variable a, b, c, d
e = a, b
e

(2.0, 'red apple')

In [41]:
e[0]

2.0

In [42]:
# A tuple cannot be revised
e[2] = 0

TypeError: 'tuple' object does not support item assignment

In [43]:
# Switch values of a and b directly
b, a = a, b
a, b

('red apple', 2.0)

## Basic Python Syntax (6.1): Boolean

In [44]:
# Double equal sign "==" means whether two values are equal in value
1 + 1 == 2

True

In [45]:
# sign "!=" means unequal
1 + 1 != 2

False

In [46]:
4 < 3

False

In [47]:
4 >= 8/2

True

In [48]:
# Compare two strings index by index, according to alphabetical sort
"z" > "ab"

True

In [49]:
# "in" find whether a value is in a list
"banana" in ["apple", "banana", 3, 4, 5]

True

## Basic Python Syntax (6.2): If-else

- False logical indicators: False, None, 0, "", [ ], { }

In [50]:
a = 0
if(a):
    print("Hello!")

In [51]:
a = []
if(a):
    print("Hello!")
else:
    print("Goodbye!")

Goodbye!


In [52]:
a = [0]
if(a):
    print("Hello!")
else:
    print("Goodbye!")

Hello!


In [53]:
if(type(a) in ["int", "float"]):
    print("a is a number")
elif(type(a) == "str"):
    print("a is a string")
else:
    print("a is neither a number nor a string")

a is neither a number nor a string


## Basic Python Syntax (6.3): Boolean Opeations

- Logical operators (not, and, or): can be used on python built-in types
- Bitwise operators (~, &, |): can be used on numpy bool arrays

In [54]:
# "not": return True if the condition is False
not (1 + 1 == 2)

False

In [55]:
# "or", "|": return True if at least 1 condition is True
(2 > 3) or (2 <= 3)

True

In [56]:
(2 > 3) | (2 <= 3)

True

In [57]:
# "and", "&": return True if both 2 conditions are True 
(2 > 3) and (2 <= 3)

False

In [58]:
(2 > 3) & (2 <= 3)

False

## Basic Python Syntax (7): Python Function

- lambda: need no name or return(), usually for short function
- def: need a name and return(), usually for long function and reuse

In [59]:
(lambda x: x + 1)(3)

4

In [60]:
def minus(x, y):
    output = "{} - {} = {}".format(x, y, x-y)
    return(output)
minus(6, 2), minus(x = 5, y = 2), minus(y = 2, x = 3)

('6 - 2 = 4', '5 - 2 = 3', '3 - 2 = 1')

In [61]:
# Set default value to function
def minus(x = 10, y = 3):
    output = "{} - {} = {}".format(x, y, x-y)
    return(output)
minus(), minus(y = 6), minus(x = 4)

('10 - 3 = 7', '10 - 6 = 4', '4 - 3 = 1')

In [62]:
# Use * to expand list, and ** to expand dict
parlist = [8, 1]
pardict = {"x": 14, "y": 9}
minus(*parlist), minus(**pardict)

('8 - 1 = 7', '14 - 9 = 5')

## Basic Python Syntax (8): Other Special Commands

- _: return the last output
- %: run magic syntax of ipython notebook
- !: run system commands

In [63]:
_

('8 - 1 = 7', '14 - 9 = 5')

In [64]:
# %timeit: check running time
%timeit [i**100 for i in range(1000)]

1000 loops, best of 3: 978 µs per loop


In [65]:
# %cd: change working directory 
%cd ~/Stat/test/

/Users/fangda/Stat/test


In [66]:
!ls

[31mExtracted_IOP_previous_ALL_n47.dose[m[m
[31mExtracted_IOP_previous_ALL_n47.info[m[m
Glaucoma_Genes_2016.xlsx
IOP Genes.xlsx
Mexican_stats.xlsx
POAGSNP_da_summary.csv
[31mPreviousCctSnps_n53_UPDATED.dose[m[m
[31mPreviousCctSnps_n53_UPDATED.info[m[m
Previous_POAGSNPs_n51.dose
Previous_POAGSNPs_n51.info
Previous_VCDR_n25.dose
Previous_VCDR_n25.info
[34m__pycache__[m[m
da_sub_snps.log
da_sub_snps.nosex
da_sub_snps.raw
dr_X_summary.xlsx
dr_cm.xlsx
dr_da_summary.xlsx
dr_score_dbrGao.png
dr_score_drExtreme.png
dr_score_drGao.png
dr_w_lgr2_cv_drExtreme.xlsx
dr_w_lgr2_cv_drGao.xlsx
dr_w_lgr_cv_drExtreme.xlsx
dr_w_lgr_cv_drGao.xlsx
dr_w_lr2_cv_dbrGao.xlsx
dr_w_lr_cv_dbrGao.xlsx
dr_w_xgb_cv_dbrGao.xlsx
dr_w_xgb_cv_drExtreme.xlsx
dr_w_xgb_cv_drGao.xlsx
exclude_glaucoma.csv
exclude_glaucoma.xlsx
glaucoma.Rmd
glaucoma_10_11.docx
glaucoma_10_11_full_model_AUC_10_11.png
glaucoma_10_11_full_model_ROC_10_11.png
glaucoma_10_11_reduced_model_AUC.png
glaucoma_10_11_reduced_model_ROC.png


In [None]:
'''
The Value of Wisdom 


    For the LORD gives wisdom;

        from his mouth come knowledge and understanding;

    he stores up sound wisdom for the upright;

        he is a shield to those who walk in integrity,

    guarding the paths of justice

        and watching over the way of his saints.



(Proverbs 2:6-8 ESV)
'''