# Numpy

##### Numpy one of the most important foundational packages of python.
##### Many packages providing scientific functionality use Numpy's Array objects.

1. ndarray -- an efficient multidimensional array providing fast array operations and flexible broadcasting capabilities.
2. Mathematical functions for performing fast operations on entire array data without loop.
3. Tools for reading/writing array data to disk and working memory-mapped files.
4. Linear algebra, random number generation, and Fourier transform capabilities
5. A C api to connect with libraries written in C, C++ and FORTRAN

##### While Numpy by itself does not provide modeling or scientificc functionality, but it helps in using tools with array computing semantics like pandas.

#### For most data analysis applications, the main areas of functionality focus on are:
1. Fast array-based operations for data munging and cleaning, subsetting and filtering, transformation, and any other kinds of computations
2. Common array algorithms like sorting, unique, and set operations
3. Efficient descriptive statistics and aggregating/summarizing data
4. Data alignment and relational data manipulations for merging and joining together heterogeneous datasets
5. Expressing conditional logic as array expressions instead of loops with if-elif-else branches
6. Group-wise data manipulations (aggregation, transformation, function application)

#### Why is numpy efficient which is one of its most important uses?

1. Numpy internally stores data in contiguous block of memory independent of each other.
2. C based algorithms (most important advantages)
3. Perform complex array operations on entire arrays without the need of loop, which can be slow for large sequences.

In [1]:
import numpy as np

my_array = np.arange(1000000)
my_list = list(range(1000000))

In [2]:
%time
for _ in range(10):
    my_arr2 = my_array * 2

CPU times: total: 0 ns
Wall time: 0 ns


In [3]:
%time
for _ in range(10):
    my_list2 = [x*2 for x in my_list]
# i dont know why the time is showing 0 it took longer time to execute also.

CPU times: total: 0 ns
Wall time: 0 ns


##### ndarray is a fast flexible container for large datasets of python
##### enables you to perform mathematical ops on whole array just like scalar elements

In [4]:
data = np.random.randn(2,3)
data

array([[ 0.39900051,  0.67625754, -0.94281076],
       [ 1.04462617, -0.68861258,  0.89540645]])

In [5]:
data*100

array([[ 39.90005122,  67.62575395, -94.28107559],
       [104.46261664, -68.86125752,  89.54064486]])

In [6]:
data + data

array([[ 0.79800102,  1.35251508, -1.88562151],
       [ 2.08925233, -1.37722515,  1.7908129 ]])

##### ndarray containes only homogenous data

In [7]:
data.shape

(2, 3)

In [8]:
data.dtype

dtype('float64')

In [9]:
# creating numpy arrays

data1 = [6,7.5,8,0,1]
arr1 = np.array(data1)
arr1

array([6. , 7.5, 8. , 0. , 1. ])

In [10]:
data2 = [[1,2,3,4],[5,6,7,8]]
arr2 = np.array(data2)
arr2
# since data2 was a list of list arr2 also has 2 dimensions with shape inferred from the data

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

In [11]:
arr2.ndim

2

In [12]:
arr2.shape

(2, 4)

In [13]:
arr1.dtype

dtype('float64')

In [14]:
arr2.dtype

dtype('int32')

In [15]:
# array of zero
np.zeros(10)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [16]:
np.zeros((3,6))

array([[0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]])

In [17]:
# empty creates an array without initializing its value to any particular value. the values can be zeros or garbage value
np.empty((2,3,2))

array([[[1.34039663e-311, 3.16202013e-322],
        [0.00000000e+000, 0.00000000e+000],
        [0.00000000e+000, 6.23078250e+174]],

       [[8.60640303e-043, 9.33611075e+164],
        [3.69409910e-057, 1.02308830e+166],
        [1.58729973e-076, 3.85528491e-057]]])

In [18]:
# arange has built in range function
np.arange(15)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

### Data Types for ndarrays

In [19]:
arr1 = np.array([1,2,3],dtype = np.float64)

In [20]:
arr2 = np.array([1,2,3],dtype = np.int32)

In [21]:
arr1.dtype

dtype('float64')

In [22]:
arr2.dtype

# dtypes are a source of numpy flexibility for interacting with data coming from other systems. 

dtype('int32')

In [23]:
# you can explicitly convert or cast an array from one dtype to another using astype method

arr = np.array([1,2,3,4,5])
arr.dtype

dtype('int32')

In [24]:
float_arr = arr.astype(np.float64)
float_arr.dtype

dtype('float64')

In [25]:
arr = np.array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])
arr

array([ 3.7, -1.2, -2.6,  0.5, 12.9, 10.1])

In [26]:
arr.astype(np.int32)

array([ 3, -1, -2,  0, 12, 10])

In [27]:
numeric_strings = np.array(['1.24','56','65'], dtype = np.string_)
numeric_strings.astype(np.float64)

array([ 1.24, 56.  , 65.  ])

In [28]:
# you can also use another array dtype

int_array = np.arange(10)
calibers = np.array([.22,.27,8.9,.357],dtype = np.float64)
int_array.astype(calibers.dtype)

array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])

In [29]:
# there are shorthand type code strings you can also use to refer to a dtype
empty_int32 = np.empty(8,dtype = 'u4')
empty_int32

# calling astype always creates a new array

array([    0,     1,     0,   631,  1612,     0,   768, 32765],
      dtype=uint32)

### Arithmetic with Numpy array

In [30]:
arr = np.array([[1,2,3],[4,5,6]],dtype = np.float64)
arr

array([[1., 2., 3.],
       [4., 5., 6.]])

In [31]:
arr*arr

array([[ 1.,  4.,  9.],
       [16., 25., 36.]])

In [32]:
arr-arr

array([[0., 0., 0.],
       [0., 0., 0.]])

In [33]:
# arithmetic ops with scalars propogate the scalar argument to each element in the array
1/arr

array([[1.        , 0.5       , 0.33333333],
       [0.25      , 0.2       , 0.16666667]])

In [34]:
arr ** 0.5

array([[1.        , 1.41421356, 1.73205081],
       [2.        , 2.23606798, 2.44948974]])

In [35]:
arr2 = np.array([[0., 4., 1.], [7., 2., 12.]])
arr2

array([[ 0.,  4.,  1.],
       [ 7.,  2., 12.]])

In [37]:
# Comparisons between arrays of the same size yield boolean arrays:
arr2>arr1

array([[False,  True, False],
       [ True, False,  True]])

### Basic indexing and slicing 

In [38]:
arr = np.arange(10)

In [39]:
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [40]:
arr[5]

5

In [41]:
arr[5:8]

array([5, 6, 7])

In [44]:
arr[5:8] = 12 # example of broadcasting

In [45]:
arr

array([ 0,  1,  2,  3,  4, 12, 12, 12,  8,  9])

In [None]:
# first distinction from python lists is that array slices are only views so the data is not copied or modified 