# 1. 환경

## 1.1 Python 설치

### 1.1.1 윈도우

* [사이트] (https://www.python.org/download)로 가서 Windows Installer 설치
* Python version 2 vs 3: 여기서는 Python2.7 사용
* 설치경로: 루트 (c:/)에 설치 된다. 경로를 변경해도 된다.

### 1.1.2 Linux/Mac

* 기본 설치

## 1.2 Pip 설치

* 파이썬을 이용하여 작업을 하려면 라이브러리가 필요하다.
* 윈도우에서 conmand 창을 열어, 설치된 경로 아래 `Scripts>`디렉토리에 가면 easy_install, pip가
설치되었는지 확인할 수 있다. 미설치면 설치를 손쉽게 할 수 있다.
* setuptools를 다운로드해서 아래와 같이 설치

```
set PYTHON_HOME=C:/Python2.7
set path=%path%;%PYTHON_HOME%
// setuptools를 다운로드하고, 디렉토리에 setup.py가 있는지 확인
C:/Python2.7/python setup.py install
cd Scripts
C:/Python2.7/Scripts/easy_install.exe pip
pip install
```

* pip를 설치하고 나면 라이브러리를 아래와 같이 손쉽게 설치.

```
$ pip search statsmodels
$ pip install statsmodels
```

* git 리포에서 직접 patsy 소스를 다운로드 받아서 설치하는 예이다.

```
~/Code/git/else$ git clone https://github.com/pydata/patsy.git
Cloning into 'patsy'...
remote: Counting objects: 2028, done.
remote: Total 2028 (delta 0), reused 0 (delta 0)
Receiving objects: 100% (2028/2028), 1.55 MiB | 757.00 KiB/s, done.
Resolving deltas: 100% (1306/1306), done.
Checking connectivity... done.
~/Code/git/else$ cd patsy/
~/Code/git/else/patsy$ ls
doc          MANIFEST.in  README.rst             setup.cfg  TODO   tox.ini
LICENSE.txt  patsy        release-checklist.txt  setup.py   tools
$ sudo python setup.py install
$ pip-2.7 freeze | grep patsy
```

## 1.3 python scientific library

* numpy
    * 윈도우에서 설치할 경우 c compiler 필요.
    * Visual Studio 2008를 기본으로 찾음. 없으면 MinGW Compiler 설치.
    * MinGW를 사용할 경우 Python27/Lib/distutils/distutils.cfg 편집

```
    [build]
    compiler = mingw32
    [build_ext]
    compiler = mingw32
```

* matplotlib
* statsmodel
* patsy: 오타주의!
* seaborn
* scikit-learn
* gensim
* pymc
    * linux and mac(after gfortran installed) ok 
        * pip install pymc==2.3
        * pip install git+https://github.com/pymc-devs/pymc
    * mac to install gfortran
        * download, drag to Applicaiton
        * which gfortran-mp-4.8 
        * sudo ln -s gfortran-mp-4.8 gfortran
    * errors - ld: symbol(s) not found for architecture x86_64
        * use gfortran (not g95)
        * port install g95 (pymc requires fortran compiler) (gfortan is not on macport)
        * add -stdlib=libstdc++ -lstdc++
        There are two implementations of the standard C++ library available on OS X:
        libstdc++ and libc++.  On 10.8 and earlier libstdc++ is chosen by default,
        on 10.9 libc++ is chosen by default.
    * freetype error
        * sudo apt-get install libfreetype6-dev libxft-dev
* sympy: note sympy to latex

## 1.3 site package

In [4]:
import site
site.getsitepackages()

['/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python',
 '/Library/Python/2.7/site-packages']

# 2. intro

## 2.1 notations (Population, sample)

* 평균, 갯수, 표준편차, 상관계수는 각 각 다음과 같이 표시된다.

* population
    * mean $\mu$
    * N 
    * standard deviation $\sigma$
    * correlation coefficient $\rho$

* sample
    * sample mean $\bar{x}$
    * number of obs n
    * number of samples (each having n elements) K
    * standard deviation s
    * correlation coefficient r

* math latex in ipython notebook cells
    * $\vec{a}$

## 2.2 pylab

* pylab은 matplotlib.pyplot과 namespace 동일하게 이해.
* pylab은 numpy함수를 namespace없이 부르게 함 (sin,cos..)

## 2.3 statsmodels은 R 스타일의 모델링 제공 (y ~ ax..)

* import statsmodels.formula.api as sm (not statsmodels.api)
* statsmodels.api는 대문자 (예:OLS)
* statsmodels.formula.api는 df, 소문자 지원 (ols)
* '~'  Separate the left-hand side from the right-hand side.
* '+'  Combines terms on either side (set union).
* '-'  Removes terms on the right from set of terms on the left (set difference).
* '*'  a*b is shorthand for the expansion a + b + a:b.
* '/'  a/b is shorthand for the expansion a + a:b.
* ':'  Computes the interaction between terms on the left and right.
* '**' Takes a set of terms on the left and an integer n on the right and computes
the * of that set of terms with itself n times.

In [2]:
# mac에서는 scipy.stats statsmodels.forumula를 로딩하는 에러러
import numpy as np
import matplotlib.pyplot as plt

from pylab import *

import pandas as pd

import scipy
import scipy.stats as stats
from scipy import stats

import statsmodels
import statsmodels.formula.api as sm

import sklearn as sk
import sympy as sym

## 2.4 learning Sympy

* 먼저 기호를 정의해야 한다.
* lambdify - 간편하게 Sympy표현식을 lambda로 변환

In [16]:
from sympy.abc import w,x,y,z
f=sym.lambdify(x,x*2)
print f(2)

row=sym.lambdify((x,y),sym.Matrix((x,x+y)).T,modules='sympy')
print row(1,2)

#Attention: There are naming differences between numpy and sympy.
#So if you simply take the numpy module, e.g. sympy.atan will not be translated to numpy.arctan.
#Use the modified module instead by passing the string "numpy":
import numpy as np
f=sym.lambdify((x,y),sym.tan(x*y),np)
f=sym.lambdify((x,y),sym.tan(x*y),"numpy")
print f.func_name

4
Matrix([[1, 3]])
<lambda>


## 2.5 Python

수학적 표현을 프로그램으로 표현할 수 있게 해 봄. 

### 2.5.1 출력과 도움말

In [1]:
# comments
print "Hello World!"

Hello World!


### 2.5.2 variables

* 변수는 메모리의 저장공간을 확보해서 활용하기 위해 그 곳을 이름붙여 놓는 것.
* 변수는 메모리의 공간이라서 수, 문자, 소수점 크기에 따라 차지하는 공간이 차이가 있슴.
* 변수는 언제 하는가? 재사용용
* 효율적인 공간활요을 위해서는 최소한으로 확보
* 다른 컴파일 언어에서는 수를 사용하는 변수는 문자를 넣을 수가 없으나 스크립트언어에서는 문제가 되지 않는다.

In [2]:
a=3
b=2*a
print type(b)
print b
print a*b
# 6이 저장되어 있었지만 지금은 문자열을 넣음. 그래도 오류가 없슴.
b="hello"
print type(b)
# operator overloading
print b+b
print 2*b

<type 'int'>
6
18
<type 'str'>
hellohello
hellohello


### 2.5.3 데이터타잎

* boolean type
* float
* containers

In [3]:
print 3>4
test=(3>4)
type(test)

# float
print 7*3.

# power
print 2**10
print 3/2
print 3/2.


# containers
l=[1,2,3,4,5]
print type(l)
print l[2]
print l[-1]
print  l[2:4]
print l[3:]
print l[:3]
print l[::2]

# lists are mutable
l[0]=28
print l

# list may have different types
l[0]='hello'
print l
l.append(10)
print l
print l.pop()

print l
l=l[2:5]
print l

# Dictionary of hash table
room={'jsl':405, 'lecture':415}
print room
print room.values()
print room.keys()
print room['jsl']

False
21.0
1024
1
1.5
<type 'list'>
3
5
[3, 4]
[4, 5]
[1, 2, 3]
[1, 3, 5]
[28, 2, 3, 4, 5]
['hello', 2, 3, 4, 5]
['hello', 2, 3, 4, 5, 10]
10
['hello', 2, 3, 4, 5]
[3, 4, 5]
{'jsl': 405, 'lecture': 415}
[405, 415]
['jsl', 'lecture']
405


### 2.5.4 control flow

In [1]:
if 2**2 == 4:
   print "yes"

a=3
if a==1:
   print 'a=1'
elif a==2:
   print 'a=2'
else:
   print 'a is neither 1 nor 2'

for i in range(5):
   print i
for word in ('cool','powerful','readable'):
   print('Python is %s' % word)
words=('cool','powerful','readable')
for index, item in enumerate(words):
   print index, item

# iterator
d={'a':1,'b':1.2,'c':2}
for key,val in d.iteritems():
   print('key:%s value:%s' % (key,val))

yes
a is neither 1 nor 2
0
1
2
3
4
Python is cool
Python is powerful
Python is readable
0 cool
1 powerful
2 readable
key:a value:1
key:c value:2
key:b value:1.2


### 2.5.5 function definition

In [2]:
def test():
   print 'this is a test function'
test()

# return
def getCircleArea(radius):
   return 3.14*radius*radius

print getCircleArea(1.5)

area=getCircleArea(1.5)
print area

# optional parameters
def double_it(x=2):
   return x*2
print double_it()
print double_it(2)
print double_it(3)

this is a test function
7.065
7.065
4
4
6


### 2.5.6 scripts

* save a Python program (.py) and run from the OS prompt ($ python test.py)

In [3]:
import os
print os.getcwd()

# echo 'jsl' > hello.txt
f=open('/Users/media/Code/git/sd/hello.txt','r')
s=f.read()
print s
f.close()

/home/jsl/Code/git/bb/jsl/algo/src/pystat


IOError: [Errno 2] No such file or directory: '/Users/media/Code/git/sd/hello.txt'

## 2.6 numpy intro

In [4]:
import numpy as np
a=np.array([0,1,2])
print a

a=np.arange(10)
print a
b=np.arange(1.,9.,2)
print b
c=np.linspace(0,1,6)
print c
a=np.ones((3,3))
print a
print a.dtype
b=np.ones(5,dtype=np.int)
print b
c=np.zeros((2,2))
print c
d=np.eye(3)
print d

[0 1 2]
[0 1 2 3 4 5 6 7 8 9]
[ 1.  3.  5.  7.]
[ 0.   0.2  0.4  0.6  0.8  1. ]
[[ 1.  1.  1.]
 [ 1.  1.  1.]
 [ 1.  1.  1.]]
float64
[1 1 1 1 1]
[[ 0.  0.]
 [ 0.  0.]]
[[ 1.  0.  0.]
 [ 0.  1.  0.]
 [ 0.  0.  1.]]


## 2.7 visualization

In [5]:
from pylab import *

a=np.arange(20)
plot(a,a**2)
show()
plot(a,a**2,'o')
show()
print -np.pi
x=np.linspace(-np.pi,np.pi,256,endpoint=True)
c,s=np.cos(x),np.sin(x)
plot(x,c)
plot(x,s)
show()
plot([1,2,3],[4,2,6])
show()
plot([1,2,3],[4,2,6])
title("A line")
xlabel("X-axis")
ylabel("Y-axis")
show()

def f(x):
  return 2*x**2+3
 
print f(0)
3
print f(-3)
print f(3)

plot([-3,0,3],[f(-3),f(0),f(3)])
show()
xs1=np.arange(-3,3,0.1)
plot(xs1,[f(x) for x in xs1])
show()
xs2=np.linspace(-3,3,100)
plot(xs2,[f(x) for x in xs2])
show()

# sum of square
print sum(x*x for x in [1, 2, 3, 4, 5])

-3.14159265359
3
21
21
55
