# M.A.D. Helper Libraries (Numpy & Pandas)
**M.A.D.** => **M**achine **L**earning **a**nd **D**ata Science

The purpose of this notebook is to get you comfortable with some really common helper libraries that machine learning and data science practitioners use.

# [`numpy`](https://docs.scipy.org/doc/numpy/reference/)

`numpy` (numerical Python) is a widely used library for data representation and manipulation (written in C). 

[Numpy Cheatsheet (pdf)](https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Numpy_Python_Cheat_Sheet.pdf)

In [1]:
import numpy as np

## Creating Arrays

In [2]:
a = np.array([1,2,3]); a

array([1, 2, 3])

In [3]:
b = np.array([(1.5,2,3), (4,5,6)], dtype = float); b

array([[1.5, 2. , 3. ],
       [4. , 5. , 6. ]])

In [4]:
c = np.array([[(1.5,2,3), (4,5,6)], [(3,2,1), (4,5,6)]], dtype = float); c

array([[[1.5, 2. , 3. ],
        [4. , 5. , 6. ]],

       [[3. , 2. , 1. ],
        [4. , 5. , 6. ]]])

### Initial placeholders

In [5]:
np.zeros((3,4)) #=>Create an array of zeros

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [6]:
np.ones((2,3,4),dtype=np.int16) #=>Create an array of ones

array([[[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]],

       [[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]]], dtype=int16)

In [7]:
d = np.arange(10,25,5); d #=>Create an array of evenly spaced values (step value)

array([10, 15, 20])

In [8]:
np.linspace(0,2,9) #=>Create an array of evenly spaced values (number of samples)

array([0.  , 0.25, 0.5 , 0.75, 1.  , 1.25, 1.5 , 1.75, 2.  ])

In [9]:
e = np.full((2,2),7); e #=>Create a constant array

array([[7, 7],
       [7, 7]])

In [11]:
f = np.eye(2); f #=>Create a 2X2 identity matrix

array([[1., 0.],
       [0., 1.]])

In [12]:
np.random.random((2,2)) #=>Create an array with random values

array([[0.32504705, 0.06946108],
       [0.4879944 , 0.58132754]])

In [20]:
np.empty((3,2)) #=>Create an empty array (uninitialized so whatever is in mem loc already stays)

array([[1.5, 2. ],
       [3. , 4. ],
       [5. , 6. ]])

## I/O

### Saving and Loading on Disk

In [23]:
np.save('my_array', a) #=>Saves arry as a .npy file

In [25]:
np.load('my_array.npy')

array([1, 2, 3])

In [32]:
np.savez('array.npz', array1=a, array2=b) #=>Save several arrays in uncompressed .npz format

In [33]:
my_arrays = np.load('array.npz')

In [34]:
my_arrays['array1']

array([1, 2, 3])

### Saving and Loading Text Files

In [None]:
np.loadtxt("myfile.txt") #=>Load data from text file (each row must have same # of vals)

In [None]:
np.genfromtxt("my_file.csv", delimiter=',') #=>Load data from text file with missing vals handled as specified

In [None]:
np.savetxt("myarray.txt", a, delimiter=" ")

## Datatypes

In [24]:
np.int64 #=>Signed 64-bit integer types

numpy.int64

In [25]:
np.float32 #=>Standard double-precision floating point

numpy.float32

In [26]:
np.complex #=>Complex numbers represented by 128 floats

complex

In [27]:
np.bool #=>Boolean type storing TRUE and FALSE values

bool

In [28]:
np.object #=>Python object type

object

In [29]:
np.string_ #=>Fixed-length string type

numpy.bytes_

In [30]:
np.unicode_ #=>Fixed-length unicode type

numpy.str_

## Inspecting Your Array

In [38]:
print(a)
a.shape #=>Array dimensions

[1 2 3]


(3,)

In [39]:
len(a) #=>Length of array

3

In [40]:
print(b)
b.ndim #=>Number of array dimensions

[[1.5 2.  3. ]
 [4.  5.  6. ]]


2

In [43]:
print(e)
e.size #=>Number of array elements

[[7 7]
 [7 7]]


4

In [44]:
b.dtype #=>Data type of array elements

dtype('float64')

In [45]:
b.dtype.name #=>Name of data type

'float64'

In [46]:
b.astype(int) #=>Convert an array to a different type

array([[1, 2, 3],
       [4, 5, 6]])

## Array Mathematics

### Arithmetic Ops

This is a distinctive feature of numpy called **broadcasting**. 

It is done using four rules:

* All input arrays with ndim smaller than the input array of largest ndim, have 1’s prepended to their shapes.

* The size in each dimension of the output shape is the maximum of all the input sizes in that dimension.

* An input can be used in the calculation if its size in a particular dimension either matches the output size in that dimension, or has value exactly 1.

#### Subtraction

In [47]:
g = a - b
print('Subtraction\n a - b = g\n\n {}\n - \n{}\n =\n{}'.format(a, b, g))

# Note: Dimension mismatch error doesn't happen b/c array a is broadcast to each from in b

Subtraction
 a - b = g

 [1 2 3]
 - 
[[1.5 2.  3. ]
 [4.  5.  6. ]]
 =
[[-0.5  0.   0. ]
 [-3.  -3.  -3. ]]


In [58]:
np.subtract(a,b)

array([[-0.5,  0. ,  0. ],
       [-3. , -3. , -3. ]])

#### Addition

In [48]:
h = b + a
print('Addition\n a + b = h\n\n {}\n + \n{}\n =\n{}'.format(a, b, h))

Addition
 a + b = h

 [1 2 3]
 + 
[[1.5 2.  3. ]
 [4.  5.  6. ]]
 =
[[2.5 4.  6. ]
 [5.  7.  9. ]]


In [49]:
np.add(b,a)

array([[2.5, 4. , 6. ],
       [5. , 7. , 9. ]])

#### Division

In [50]:
i = a / b
print('Division\n a / b = h\n\n {}\n / \n{}\n =\n{}'.format(a, b, i))

Division
 a / b = h

 [1 2 3]
 / 
[[1.5 2.  3. ]
 [4.  5.  6. ]]
 =
[[0.66666667 1.         1.        ]
 [0.25       0.4        0.5       ]]


In [51]:
np.divide(a,b)

array([[0.66666667, 1.        , 1.        ],
       [0.25      , 0.4       , 0.5       ]])

#### Multiplication

In [52]:
j = a * b
print('Multiplication\n a * b = h\n\n {}\n * \n{}\n =\n{}'.format(a, b, j))

Multiplication
 a * b = h

 [1 2 3]
 * 
[[1.5 2.  3. ]
 [4.  5.  6. ]]
 =
[[ 1.5  4.   9. ]
 [ 4.  10.  18. ]]


In [53]:
np.multiply(a,b)

array([[ 1.5,  4. ,  9. ],
       [ 4. , 10. , 18. ]])

#### Other Mathy Stuff

In [56]:
print(b)
np.exp(b) #=>Exponentiation (ie. e^b)

[[1.5 2.  3. ]
 [4.  5.  6. ]]


array([[  4.48168907,   7.3890561 ,  20.08553692],
       [ 54.59815003, 148.4131591 , 403.42879349]])

In [57]:
print(b)
np.sqrt(b) #=>Square root

[[1.5 2.  3. ]
 [4.  5.  6. ]]


array([[1.22474487, 1.41421356, 1.73205081],
       [2.        , 2.23606798, 2.44948974]])

In [58]:
print(a)
np.sin(a) #=>Print sines of an array

[1 2 3]


array([0.84147098, 0.90929743, 0.14112001])

In [59]:
print(b)
np.cos(b) #=>Element-wise cosine

[[1.5 2.  3. ]
 [4.  5.  6. ]]


array([[ 0.0707372 , -0.41614684, -0.9899925 ],
       [-0.65364362,  0.28366219,  0.96017029]])

In [60]:
print(a)
np.log(a) #=>Element-wise natural logarithm

[1 2 3]


array([0.        , 0.69314718, 1.09861229])

In [61]:
print(e)
print(f)
e.dot(f) #=>Dot product

[[7 7]
 [7 7]]
[[1. 0.]
 [0. 1.]]


array([[7., 7.],
       [7., 7.]])

### Comparison

In [63]:
print(a)
print(b)
a == b #=>Element-wise comparison

[1 2 3]
[[1.5 2.  3. ]
 [4.  5.  6. ]]


array([[False,  True,  True],
       [False, False, False]])

In [64]:
print(a)
a < 2 #=>Element-wise comparison array([True, False, False], dtype=bool)

[1 2 3]


array([ True, False, False])

In [65]:
print(a)
print(b)
np.array_equal(a, b) #=>Array-wise comparison

[1 2 3]
[[1.5 2.  3. ]
 [4.  5.  6. ]]


False

### Aggregate Functions

In [66]:
print(a)
a.sum() #=>Array-wise sum

[1 2 3]


6

In [76]:
print('{}\n'.format(a))

print(a.min()) #=>Array-wise minimum value
print(a.max()) #=>Array-wise maximum value

[1 2 3]

1
3


In [81]:
b[0][2] = 0
b[1][1] = 1
print('{}\n'.format(b))

print(b.min(axis=0)) #=>Minimum value of an array row (min along axis)
print(b.max(axis=0)) #=>Maximum value of an array row (max along axis)

[[1.5 2.  0. ]
 [4.  1.  6. ]]

[1.5 1.  0. ]
[4. 2. 6.]


In [79]:
print('{}\n'.format(b))

print(b.min(axis=1)) #=>Minimum value of an array row (min along axis)
print(b.max(axis=1)) #=>Maximum value of an array row (max along axis)

[[1.5 2.  0. ]
 [4.  5.  6. ]]

[0. 4.]
[2. 6.]


In [84]:
print('{}\n'.format(b))
b.cumsum(axis=1) #=>Cumulative sum of the elements

[[1.5 2.  0. ]
 [4.  1.  6. ]]



array([[ 1.5,  3.5,  3.5],
       [ 4. ,  5. , 11. ]])

In [85]:
print('{}\n'.format(a))
a.mean() #=>Mean

[1 2 3]



2.0

In [88]:
print('{}\n'.format(b))
np.median(b) #=>Median

[[1.5 2.  0. ]
 [4.  1.  6. ]]



1.75

In [90]:
print('{}\n'.format(a))
np.corrcoef(a) #=>Correlation coefficient

[1 2 3]



1.0

In [91]:
print('{}\n'.format(b))
np.std(b) #=>Standard deviation

[[1.5 2.  0. ]
 [4.  1.  6. ]]



2.008661798865658

## Copying

**COPY / DEEP COPY:** When the contents are physically stored in another location, it is called Copy (deep by default). 

**VIEW / SHALLOW COPY:** If on the other hand, a different view of the same memory content is provided, we call it as View.

In [101]:
print('Array: {} --> mem loc: {}\n'.format(a, a.__array_interface__['data']))

h = a.view() #=>Create a view of the array with the same data

print('Array: {} --> mem loc: {}\n'.format(h, h.__array_interface__['data']))

Array: [1 2 3] --> mem loc: (140537247918336, False)

Array: [1 2 3] --> mem loc: (140537247918336, False)



In [102]:
print('Array: {} --> mem loc: {}\n'.format(a, a.__array_interface__['data']))

h = np.copy(a) #=>Create a copy of the array

print('Array: {} --> mem loc: {}\n'.format(h, h.__array_interface__['data']))

Array: [1 2 3] --> mem loc: (140537247918336, False)

Array: [1 2 3] --> mem loc: (140537247843168, False)



In [103]:
print('Array: {} --> mem loc: {}\n'.format(a, a.__array_interface__['data']))

i = a.copy() #=>Create a deep copy of the array

print('Array: {} --> mem loc: {}\n'.format(h, h.__array_interface__['data']))

Array: [1 2 3] --> mem loc: (140537247918336, False)

Array: [1 2 3] --> mem loc: (140537201507680, False)



## Sorting

In [112]:
a = np.array([2,1,4])
print(a)

a.sort() #=>Sort an array in place
print(a)

[2 1 4]
[1 2 4]


In [122]:
c = np.random.randint(0,9,(2,2,3))
print('{}\n\n'.format(c))

c.sort(axis=0) #=>Sort the elements of an array's axis
print(c)

[[[6 7 6]
  [3 3 8]]

 [[8 7 1]
  [7 3 3]]]


[[[6 7 1]
  [3 3 3]]

 [[8 7 6]
  [7 3 8]]]


## Subsetting, Slicing, Indexing

### Subsetting

In [123]:
a[2] #=>Select the element at the 2nd index 3

4

In [125]:
b[1,2] #=>Select the element at row 1 column 2 (equivalent to b[1][2])

6.0

### Slicing

In [128]:
print(a)
a[0:2] #=>Select items at index 0 and 1

[1 2 4]


array([1, 2])

In [130]:
print(b)
b[0:2,1] #=>Select items at rows 0 and 1 in column 1

[[1.5 2.  0. ]
 [4.  1.  6. ]]


array([2., 1.])

In [132]:
print(b)
b[:1] #=>Select all items at row 0 (equivalent to b[0:1, :])

[[1.5 2.  0. ]
 [4.  1.  6. ]]


array([[1.5, 2. , 0. ]])

In [133]:
print(c)
c[1,...] #=>Same as [1,:,:]

[[[6 7 1]
  [3 3 3]]

 [[8 7 6]
  [7 3 8]]]


array([[8, 7, 6],
       [7, 3, 8]])

In [134]:
print(a)
a[ : :-1] #=>Reversed array a array

[1 2 4]


array([4, 2, 1])

### Indexing

In [136]:
print(a)
a[a<2] #=>Select elements from a less than 2

[1 2 4]


array([1])

In [137]:
print(b)
b[[1, 0, 1, 0],[0, 1, 2, 0]] #=>Select elements (1,0),(0,1),(1,2) and (0,0)

[[1.5 2.  0. ]
 [4.  1.  6. ]]


array([4. , 2. , 6. , 1.5])

In [138]:
print(b)
b[[1, 0, 1, 0]][:,[0,1,2,0]] #=>Select a subset of the matrix’s rows, and columns

[[1.5 2.  0. ]
 [4.  1.  6. ]]


array([[4. , 1. , 6. , 4. ],
       [1.5, 2. , 0. , 1.5],
       [4. , 1. , 6. , 4. ],
       [1.5, 2. , 0. , 1.5]])

## Array Manipulation

### Transposing Array

In [None]:
>>> i = np.transpose(b) Permute array dimensions
>>> i.T Permute array dimensions

### Changing Array Shape

In [None]:
b.ravel() Flatten the array
>>> g.reshape(3,-2) Reshape, but don’t change data

### Adding/Removing Elements

In [None]:
h.resize((2,6)) Return a new array with shape (2,6)
>>> np.append(h,g) Append items to an array
>>> np.insert(a, 1, 5) Insert items in an array
>>> np.delete(a,[1]) Delete items from an array

### Combining Arrays

In [None]:
np.concatenate((a,d),axis=0) Concatenate arrays
 array([ 1, 2, 3, 10, 15, 20])
>>> np.vstack((a,b)) Stack arrays vertically (row-wise)
 array([[ 1. , 2. , 3. ],
 [ 1.5, 2. , 3. ],
 [ 4. , 5. , 6. ]])
>>> np.r_[e,f] Stack arrays vertically (row-wise)
>>> np.hstack((e,f)) Stack arrays horizontally (column-wise)
 array([[ 7., 7., 1., 0.],
 [ 7., 7., 0., 1.]])
>>> np.column_stack((a,d)) Create stacked column-wise arrays
 array([[ 1, 10],
 [ 2, 15],
 [ 3, 20]])
>>> np.c_[a,d] Create stacked column-wise arrays

### Splitting Arrays

In [None]:
np.hsplit(a,3) Split the array horizontally at the 3rd
 [array([1]),array([2]),array([3])] index
>>> np.vsplit(c,2) Split the array vertically at the 2nd index
[array([[[ 1.5, 2. , 1. ],
 [ 4. , 5. , 6. ]]]),
 array([[[ 3., 2., 3.],
 [ 4., 5., 6.]]])]

In [None]:
from learntoo
# from learntools.pandas.creating_reading_and_writing import *

In [None]:
# from learntools.core import binder; binder.bind(globals())
# from learntools.pandas.creating_reading_and_writing import *

# fruits = pd.DataFrame(data = [[30,21]], columns=['Apples', 'Bananas'])

# q1.check()
# fruits

# q1.hint()
# q1.solution()

# CREATE ARRAYS
# import numpy as np
# x = np.array([10, 20, 30])
# print(type(x))
# print(x.shape)

# np.zeros() # create an array of zeros
# np.ones() # create an array of ones
# np.full() # create a constant array
# np.eye() # create an identity matrix

# numpy.array(object, dtype, etc)

# Create an array of random values
# np.random.random()

# arry1 = np.arrange(0.1, 5.2, 7.4)

In [None]:
# SLICE, SUBSET AND INDEX ARRAYS

# print(a2darray[2][1])
# print(a2darray[1,1,2])
# print(a2darray[:,2])
# print(a2darray[1,...])


# Fancy ndexing
# print(a2darray[a2darray > 2])

# greater_than_2 = (a3darray >= 2)
# print(a2darray[greater_than_2])

# RESHAPING
# myarray = np.arrange(10).reshape(5,2)

In [None]:
# ITERATE OVER AN ARRAY

# for i in np.nditer(myarray)

# [`pandas`](https://pandas.pydata.org/pandas-docs/stable/)

`pandas` is a library that comes with many easy-to-use data structures and data analysis tools.

[Pandas Cheatsheet (pdf)](https://assets.datacamp.com/blog_assets/PandasPythonForDataScience.pdf)

In [140]:
import pandas as pd

## Pandas Data Structures

### Series

A constant size one dimensional array (holds any data type)

In [None]:
# series1 = pd.Series([10, 20, 30, 40])
# print(series1)

In [None]:
# names = np.array(['alskm', 'alskm', 'aslkdm'])
# s2 = pd.Series(names)
# print(s2)

In [148]:
# Series from np array
data = np.array(['g', 'e', 'e', 'k', 's'])
ser = pd.Series(data)
print(ser)

0    g
1    e
2    e
3    k
4    s
dtype: object


In [149]:
# Series with indicies
s = pd.Series([3, -5, 7, 4], index=['a', 'b', 'c', 'd'])
print(s)

a    3
b   -5
c    7
d    4
dtype: int64


In [150]:
# Series from a dictionary (can also do with list, etc)
dict = {'Geeks' : 10, 
        'for' : 20, 
        'geeks' : 30} 

ser = pd.Series(dict)    
print(ser)

Geeks    10
for      20
geeks    30
dtype: int64


### Dataframe

A two-dimensional labeled data structure with columns of potentially different types

In [None]:
# df = pd.DataFrame(['Name': ['michelle', 'frank', "joe"]])
# print(df)

In [None]:
# numList = [0, 10, 20, 30, 40]
# df2 = pd.DataFrame(numList)
# print(df)

In [None]:
# names = [['Michelle, 850'], ['Nicholas', 320]]
# df = pd.DataFrame(names, colums['Name', 'Salary'], dtype=float)
# print(df)

# df.describe()

# iterate through rows of df
# .iteritems() - for iterating over key,val pairs
# .iterrows() - for iterating over rows as (index,series) pairs
# .itertuples() - for iterating over rows as named tuples

In [155]:
data = {'Country': ['Belgium', 'India', 'Brazil'],
        'Capital': ['Brussels', 'New Delhi', 'Brasília'],
        'Population': [11190846, 1303171035, 207847528]}
data

{'Country': ['Belgium', 'India', 'Brazil'],
 'Capital': ['Brussels', 'New Delhi', 'Brasília'],
 'Population': [11190846, 1303171035, 207847528]}

In [154]:
df = pd.DataFrame(data, columns=['Country', 'Capital', 'Population'])
df

Unnamed: 0,Country,Capital,Population
0,Belgium,Brussels,11190846
1,India,New Delhi,1303171035
2,Brazil,Brasília,207847528


In [158]:
# Define index later
df = pd.DataFrame({"a" : [4 ,5, 6],
                   "b" : [7, 8, 9],
                   "c" : [10, 11, 12]},
                  index = [1, 2, 3])
df

Unnamed: 0,a,b,c
1,4,7,10
2,5,8,11
3,6,9,12


In [159]:
# Define index & column later
df = pd.DataFrame([[4, 7, 10],
                   [5, 8, 11],
                   [6, 9, 12]],
                  index=[1, 2, 3],
                  columns=['a', 'b', 'c'])
df

Unnamed: 0,a,b,c
1,4,7,10
2,5,8,11
3,6,9,12


In [161]:
# Dataframe with multiindex
df = pd.DataFrame({"a" : [4 ,5, 6],
                   "b" : [7, 8, 9],
                   "c" : [10, 11, 12]},
                  index = pd.MultiIndex.from_tuples([('d',1),('d',2),('e',2)], names=['n','v']))
df

Unnamed: 0_level_0,Unnamed: 1_level_0,a,b,c
n,v,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
d,1,4,7,10
d,2,5,8,11
e,2,6,9,12


## I/O

In [None]:
# IMPORTING DATA

# data = pd.read_csv("bla.csv")
# data.head()
# data.tail()
# data.sample(5)

### Read and Write to CSV

In [None]:
pd.read_csv('file.csv', header=None, nrows=5)

In [None]:
df.to_csv('myDataFrame.csv')

### Read and Write to Excel

In [None]:
pd.read_excel('file.xlsx')

In [None]:
pd.to_excel('dir/myDataFrame.xlsx', sheet_name='Sheet1')

In [None]:
# Read multiple sheets from the same file
xlsx = pd.ExcelFile('file.xls')

In [None]:
df = pd.read_excel(xlsx, 'Sheet1')

### Read and Write to SQL Query or Database Table

In [162]:
from sqlalchemy import create_engine

ModuleNotFoundError: No module named 'sqlalchemy'

In [None]:
engine = create_engine('sqlite:///:memory:')

In [None]:
pd.read_sql("SELECT * FROM my_table;", engine)

In [None]:
pd.read_sql_table('my_table', engine)

In [None]:
pd.read_sql_query("SELECT * FROM my_table;", engine)

In [None]:
pd.to_sql('myDf', engine)

## Selection

### Getting

In [164]:
print(s)
s['b'] #=>Get one element

a    3
b   -5
c    7
d    4
dtype: int64


-5

In [166]:
df

Unnamed: 0_level_0,Unnamed: 1_level_0,a,b,c
n,v,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
d,1,4,7,10
d,2,5,8,11
e,2,6,9,12


In [167]:
df[1:] #=>Get subset of a DataFrame

Unnamed: 0_level_0,Unnamed: 1_level_0,a,b,c
n,v,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
d,2,5,8,11
e,2,6,9,12


### Selecting, Boolean Indexing & Setting

### By Position

In [169]:
df

Unnamed: 0_level_0,Unnamed: 1_level_0,a,b,c
n,v,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
d,1,4,7,10
d,2,5,8,11
e,2,6,9,12


In [170]:
df.iloc[[0],[0]] #=>Select single value by row & 'Belgium' column

Unnamed: 0_level_0,Unnamed: 1_level_0,a
n,v,Unnamed: 2_level_1
d,1,4


In [None]:
df.iat([0],[0])

### By Label

In [None]:
df.loc[[0], ['Country']] #=>Select single value by row & 'Belgium' column labels

In [None]:
df.at([0], ['Country']) #=>'Belgium'

### By Label/Position

In [None]:
df.ix[2] #=>Select single row of Country Brazil subset of rows Capital Brasília Population 207847528

In [None]:
df.ix[:,'Capital'] #=>Select a single column of 0 Brussels subset of columns 1 New Delhi 2 Brasília

In [None]:
df.ix[1,'Capital'] #=>Select rows and columns 'New Delhi'

### Boolean Indexing

In [None]:
s[~(s > 1)] #=>Series s where value is not >1

In [None]:
s[(s < -1) | #=>(s > 2)] s where value is <-1 or >2

In [None]:
df[df['Population']>1200000000] #=>Use filter to adjust DataFrame

### Setting

In [None]:
s['a'] = 6 #=>Set index a of Series s to 6

## Dropping

In [None]:
s.drop(['a', 'c']) #=>Drop values from rows (axis=0)

In [None]:
df.drop('Country', axis=1) #=>Drop values from columns(axis=1)

## Sort and Rank

In [None]:
df.sort_index() #=>Sort by labels along an axis

In [None]:
df.sort_values(by='Country') #=>Sort by the values along an axis

In [None]:
df.rank() #=>Assign ranks to entries

## Retrieving Series/DataFrame Information

### Basic Info

In [None]:
df.shape #=>(rows,columns)

In [None]:
df.index #=>Describe index

In [None]:
df.columns #=>Describe DataFrame columns

In [None]:
df.info() #=>Info on DataFrame

In [None]:
df.count() #=>Number of non-NA values

### Summary Info

In [None]:
df.sum() #=>Sum of values

In [None]:
df.cumsum() #=>Cummulative sum of values

In [None]:
df.min()/df.max() #=>Minimum/maximum values

In [None]:
df.idxmin()/df.idxmax() #=>Minimum/Maximum index value

In [None]:
df.describe() #=>Summary statistics

In [None]:
df.mean() #=>Mean of values

In [None]:
df.median() #=>Median of values

## Applying Functions

In [None]:
f = lambda x: x*2

In [None]:
df.apply(f) #=>Apply function

In [None]:
df.applymap(f) #=>Apply function element-wise

## Data Alignment

### Internal Data Alignment

In [None]:
s3 = pd.Series([7, -2, 3], index=['a', 'c', 'd'])

In [None]:
s + s3

### Arithmetic Operations with Fill Methods

In [None]:
s.add(s3, fill_value=0)

In [None]:
s.sub(s3, fill_value=2)

In [None]:
s.div(s3, fill_value=4)

In [None]:
s.mul(s3, fill_value=3)

In [None]:
# Wrangling

# Sorting
# sort by labels
# sort by values
# sort using a specific sorting algorithm (quicksort, mergesort, etc)

# Handling Missing Data (replacing/dropping) and Duplicates
# replace()
# fillna()

# Joining, Merging, Concatenating, Grouping, Aggregating

## Visualization Aggregation, timeseries

Visualization with Pandas



In [None]:
# Series.box.plot()
# Dataframe.boxplot() or Dataframe.box.plot()

# Series.plot.area()
# Dataframe.plot.area()

# Dataframe.plot.scatter()

# Pie chart

# Bar Plot

# Histogram

In [1]:
# TODO maybe include this?