# Table of Contents
* [1. About](#1.-About)
* [2. Shape 1 array quick convert to shape 2](#2.-Shape-1-array-quick-convert-to-shape-2)
* [3. One-hot Encoding of Numbered elements](#3.-One-hot-Encoding-of-Numbered-elements)


# 1. About

This juyter notebook is a summary of a few nice and handy python tricks that I found when I was learning Deep Learning. 

- Author: Yu Lu
- Create: 2018/03/11
- Last update: 2018/03/11

In [1]:
import numpy as np 
import pandas as pd

# 2. Shape 1 array quick convert to shape 2

Convert shape one array( e.g. (5,) to (5,1) or (1, 5)) can be done by np.reshape, but another simple way is as follows:

In [14]:
a = np.array([1,2,3,4,5])
print(a)
print(a.shape)
print(a[:,None].shape)
print(a[None ,:].shape)

[1 2 3 4 5]
(5,)
(5, 1)
(1, 5)


# 3. One-hot Encoding of Numbered elements

In the situation of dealing with classificiation problems of ML/DL, converting an array of numbered elements (e.g. array([1,2,3,4,5,6,7, 9, 10]) to one-hot encoded arraies 0 -> array([1, 0,0,0,0,0,0,0,0,0]), 1 - > array([0, 1, 0,0,0,0,0,0,0,0,0]), etc is always a starting step.  Surely one can use Sklearn.preprocessing.LaberEncoder or Tensorflow tf.one_hot function, but there is another handy way to go which only uses numpy.

In [10]:
elements = np.array([0, 3, 4, 1, 2, 5])
numElements  = len(np.unique(elements))
(np.arange(numElements) == elements[:, None]).astype(np.float16)

array([[1., 0., 0., 0., 0., 0.],
       [0., 0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 1., 0.],
       [0., 1., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0.],
       [0., 0., 0., 0., 0., 1.]], dtype=float16)

# 4. Label Encoding using pandas factorize
Rather than one-hot encoding, sometimes one need to perform label encoding for categorical features, e.g. b,a,b,c,d -> (1,2,1,3,4), especially useful for tree-based machine methods. Pandas factorize method 

In [27]:
df = pd.DataFrame(np.c_[[3,4,6,7,8,9,0,2], 'a b d d c a d b'.split()], columns=['value', 'category'])
df.value = df.value.astype(np.int32)
df

Unnamed: 0,value,category
0,3,a
1,4,b
2,6,d
3,7,d
4,8,c
5,9,a
6,0,d
7,2,b


In [28]:
labels, uniques = df.category.factorize()
labels, uniques

(array([0, 1, 2, 2, 3, 0, 2, 1]), Index(['a', 'b', 'd', 'c'], dtype='object'))

## 5. Quickly elect categorical and numerical columns   
Another operation that is frequenctly done in machine learning is to select columns with specific dtypes, e.g. float, object (string), etc. Pandas select_dtype is pretty handy with this kind of operations. One canypes choose inclue or exclude certain a few data types. 

In [30]:
df.dtypes

value        int32
category    object
dtype: object

In [31]:
df.select_dtypes(include = ['object'])

Unnamed: 0,category
0,a
1,b
2,d
3,d
4,c
5,a
6,d
7,b


In [32]:
df.select_dtypes(exclude = ['object'])

Unnamed: 0,value
0,3
1,4
2,6
3,7
4,8
5,9
6,0
7,2
