# Numpy Tutorial:

Numpy main object is homogenous multi-dimensional array with elements of the same data type. Dimensions are called axes. Numpy’s array class is called ndarray. Some important attributes of ndarray:


    - ndarray.ndim: the number of axes (dimensions) of the array.
    - ndarray.shape: the dimensions of the array. This is a tuple of integers indicating the size of the array in each dimension. For a matrix with n rows and m columns, shape will be (n,m).
    - ndarray.size: the total number of elements of the array. This is equal to the product of the elements of shape.
    - ndarray.dtype: An object describing the type of the elements in the array. One can create or specify dtype’s using standard Python types. Additionally NumPy provides types of its own. numpy.int32, numpy.int16, and numpy.float64 are some examples.


Creating a numpy array and reading some attributes.

In [2]:
import numpy as np
a = np.array([2, 3, 5])
print(a)
print(a.ndim)
print(a.shape)
print(a.dtype)
print(a.size)

[2 3 5]
1
(3,)
int32
3


We can use “reshape” method to change the shape of our array.

In [3]:
b = np.array([4, 5, 6, 12, 5.1, 42.4, 11, 8])
print(b)
print(b.reshape(2, 4))

[ 4.   5.   6.  12.   5.1 42.4 11.   8. ]
[[ 4.   5.   6.  12. ]
 [ 5.1 42.4 11.   8. ]]


Multi-dimensional arrays of ones or zeros can be created.

In [4]:
np.zeros((4, 2))

array([[0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.]])

In [5]:
np.ones((2,2))

array([[1., 1.],
       [1., 1.]])

Multi-dimensional arrays of specified number.

In [6]:
np.full((3, 4), 3.14)

array([[3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14]])

When different type items are provided in the input list. They are cast to a common type.

In [7]:
c = np.array([4, 5, 6, 5.1, 42.4, False, 8])
print(c)
c = np.array([4, 5, 6, 5.1, 42.4, False, "strhere"])
print(c)

[ 4.   5.   6.   5.1 42.4  0.   8. ]
['4' '5' '6' '5.1' '42.4' 'False' 'strhere']


Desired type of the array can be set when initializing the array.

In [8]:
np.array([1, 2, 3, 4], dtype='float32')

array([1., 2., 3., 4.], dtype=float32)

Create array with numbers from a sequence.

In [9]:
np.arange(0, 10, 2)

array([0, 2, 4, 6, 8])

We can create random numbers with Numpy’s “random” function.


In [10]:
np.random.random((3, 3))

array([[0.37517802, 0.65623899, 0.60370133],
       [0.2795809 , 0.30420743, 0.75824129],
       [0.710307  , 0.28240918, 0.67445656]])

Creating normally distributed numbers with mean 0 and standard deviation 1.

In [11]:
np.random.normal(0, 1, (3, 3))

array([[ 0.242662  , -1.20002981, -0.54271294],
       [ 0.32606718, -2.22379809,  1.64071508],
       [ 1.41123214,  0.13919089,  2.11894407]])

Creating random integers between 0-5.

In [12]:
np.random.randint(0, 5, (4, 3))

array([[4, 3, 2],
       [2, 2, 3],
       [4, 1, 4],
       [1, 1, 1]])

We can create multi-dimensional arrays.

In [13]:
#We can create multi-dimensional arrays.
a = np.arange(4)                         # 1d array
print('1d array')
print(a)
b = np.arange(12).reshape(4,3)           # 2d array
print('2d array')
print(b)
c = np.arange(24).reshape(2,3,4)         # 3d array
print('3d array')
print(c)

1d array
[0 1 2 3]
2d array
[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]
3d array
[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]


Basic Operations: Arithmetic operations are applied element-wise. A new array with new values are created.

In [14]:
a = np.array([4, 3, 12, 1])
b = np.array([5, 1, 4, 7])
print(a + b)
print(a * b)
print(a / b)

[ 9  4 16  8]
[20  3 48  7]
[0.8        3.         3.         0.14285714]


Element-wise multiplication is perfomed using “*” operator.

In [15]:
A = np.array( [ [5, 3], [2, 4] ] )
B = np.array( [ [2, 4], [6, 1] ])
A*B

array([[10, 12],
       [12,  4]])

Matrix product is performed using “dot” function operator (using the same A and B above).

In [16]:
A.dot(B)

array([[28, 23],
       [28, 12]])

We can calculate simple statistics in arrays.

In [17]:
a = np.array([2, 3, 4, 5, 12, 1])
print(a.sum())
print(a.min())
print(a.max())
print(a.std())

27
1
12
3.593976442141304


Numpy arrays can be sliced using the position indexes.

In [18]:
a = np.array([14, 15, 16, 17, 18, 19])
print(a[:2])
print(a[3:5])
print(a[3:])

[14 15]
[17 18]
[17 18 19]


# Pandas Tutorial:

There are two main data structures: Series and Data Frame.

## Series
Series can be created from list with indexes

In [21]:
import pandas as pd
import numpy as np
s = pd.Series([2, 3.1, 3.131, 5, 1], index=['a','b','c','d','e'])
s

a    2.000
b    3.100
c    3.131
d    5.000
e    1.000
dtype: float64

Series can be created from list without indexes.

In [4]:
import pandas as pd
s = pd.Series([2, 3.1, 3.131, 5, 1])
s = pd.Series(s)
s

0    2.000
1    3.100
2    3.131
3    5.000
4    1.000
dtype: float64

Alternatively, series can be initiated from dictionary. 

In [23]:
d = {'b': 1, 'a': 0, 'c': "mystring"}
s = pd.Series(d)
s

b           1
a           0
c    mystring
dtype: object

Important Note: The Series would be ordered by the lexical order of the dictionary keys (i.e. ['a', 'b', 'c'] rather than ['b', 'a', 'c']).


In [25]:
import sys
print(sys.version)
print(pd.__version__)

3.7.4 (default, Aug  9 2019, 18:34:13) [MSC v.1915 64 bit (AMD64)]
0.25.1


If a scalar value is passed as data. It will be repeated for every index item.

In [26]:
s = pd.Series(5., index = ['a','b', 'c', 'd', 'e'] )

In [1]:
type(s)

NameError: name 's' is not defined

We can use slicing like in other types.


In [28]:
d = {'d' : 1, 'a' : 0, 'c' : "mystring", 'b' : 0.5, 'e': False }
s = pd.Series(d)
print(s)
print("----------")
print(s[2:4])

d           1
a           0
c    mystring
b         0.5
e       False
dtype: object
----------
c    mystring
b         0.5
dtype: object


Items can also be accessed through labels.

In [29]:
d = {'d' : 1, 'a' : 0, 'c' : "mystring", 'b' : 0.5 }
s = pd.Series(d)
print(s['c'])

mystring


## Data Frames:

2-D labeled data structure with columns of potentially different data types. It can be tought of as a spreadsheet or SQL table. Data frames can be initialized from lists, dictionaries or another data frames.Two important concepts are index and columns attributes. Index relates to rows. Initializing from dictionary.

In [32]:
d = {'one' : [1., 2., 3., 4.], 'two' : [3, 12.3, 1.2, 4.5]}
df = pd.DataFrame(d)
df

Unnamed: 0,one,two
0,1.0,3.0
1,2.0,12.3
2,3.0,1.2
3,4.0,4.5


Adding index attribute.

In [33]:
df = pd.DataFrame(d, index = ['a', 'b', 'c', 'd'])
df

Unnamed: 0,one,two
a,1.0,3.0
b,2.0,12.3
c,3.0,1.2
d,4.0,4.5


Important Note: Pandas automatically handle missing values assigns NaN to them.

In [37]:
d = [{'a': 1, 'b': 2, 'c': 3}, {'a': 5, 'b': 7, 'c': 15}]
df = pd.DataFrame(d)
print(df)
print(df["c"])

   a  b   c
0  1  2   3
1  5  7  15
0     3
1    15
Name: c, dtype: int64


Adding new column data.

In [38]:
df['d'] = df['b'] * df['c']

In [39]:
df

Unnamed: 0,a,b,c,d
0,1,2,3,6
1,5,7,15,105


Deleting column.

In [40]:
del df['b']
df

Unnamed: 0,a,c,d
0,1,3,6
1,5,15,105


Updating column data.


In [42]:
df['a'] = np.array([4, 1.2])
df

Unnamed: 0,a,c,d
0,4.0,3,6
1,1.2,15,105


Updating column data with a single value.


In [44]:
df['a'] = 5
df

Unnamed: 0,a,c,d
0,5,3,6
1,5,15,105


If column names are not provided they will automatically increment from zero.

In [45]:
d = [[3, 2], [1, 5]]
df = pd.DataFrame(d)
print(df)

   0  1
0  3  2
1  1  5


We can access the columns through their column index.


In [46]:
print(df[0])

0    3
1    1
Name: 0, dtype: int64


Select row by label.


In [48]:
d = {'one' : [1., 2., 3., 4.], 'two' : [3, 12.3, 1.2, 4.5]}
df = pd.DataFrame(d, index=['a', 'b', 'c', 'd'])
print(df)

   one   two
a  1.0   3.0
b  2.0  12.3
c  3.0   1.2
d  4.0   4.5


In [49]:
print(df.loc['c'])

one    3.0
two    1.2
Name: c, dtype: float64


Select row by integer row index.


In [50]:
print(df.iloc[0])

one    1.0
two    3.0
Name: a, dtype: float64


We can slice rows.


In [51]:
print(df[1:3])

   one   two
b  2.0  12.3
c  3.0   1.2
