<img align="right" width="500" height="500"  src="..\..\..\data_science\exercise\Section-3-Python-for-Data-Scientists\images\listvsarrayvsnumpy.png" > 

# Learning agenda of this notebook
1. A Comparison
    - Python Lists
    - Python Arrays
    - NumPy Arrays
2. Memory Consumption of Python List and Numpy Array
3. Operation cost on Python List and Numpy Array

### a. Python Lists
- Python List is a numerically ordered sequence of elements that can store elements of heterogeneous types, is iterable, mutable and allows duplicate elements.
- A Python List is built-in type in Python and can be created by placing comma separated values in square brackets, and you don't have to specify the type while creating a Python List
- Python list is by default 1 dimensional. But we can create an N-Dimensional list. But then too it will be 1 D list storing another 1D list
- Items are stored non-contiguously in memory.
- More memory hungry.
- Operations on Lists are typically slower, however, append operation will take O(1) time.

In [1]:
# creating a list containing elements belonging to different data types  
mylist = [1, "Data Analysis", ['a', 'b', 'c'], False, 3.45]
print(f'Type = {type(mylist)}\nList = {mylist}')

Type = <class 'list'>
List = [1, 'Data Analysis', ['a', 'b', 'c'], False, 3.45]


### b. Python Arrays
- A simple Python array is a sequence of objects of similar data dype. Python array module requires all array elements to be of the same type. Moreover, to create an array, you'll need to specify a value type. 

```
array(typecode [, initializer])
```

- Return a new array whose items are restricted by typecode, and initialized from the optional initializer value, which must be a list, string or iterable over elements of the appropriate type.

- Arrays represent basic values and behave very much like lists, except the type of objects stored in them is constrained. The type is specified at object creation time by using a type code, which is a single character.
- The following type codes are defined:


    Type code   C Type             Minimum size in bytes
    
    'b'         signed integer     1
    
    'B'         unsigned integer   1
    
    'u'         Unicode character  2 (see note)
    
    'h'         signed integer     2
    
    'H'         unsigned integer   2
    
    'i'         signed integer     2
    
    'I'         unsigned integer   2
    
    'l'         signed integer     4
    
    'L'         unsigned integer   4
    
    'q'         signed integer     8 (see note)
    
    'Q'         unsigned integer   8 (see note)
    
    'f'         floating point     4
    
    'd'         floating point     8

In [9]:
# To use Python arrays, you have to import Python's built-in array module
import array

# declare the array of integers
arr = array.array('i', [2,3,4,7,78])
print(f'Type = {type(arr)}\nList = {arr}')
print()

# declare the array of integers
arr2 = array.array('f', [2.34,3.009,44.5,7.1123,78])
print(f'Type = {type(arr2)}\nList = {arr2}')
print()

# Python arrays can grow/shrink dynamically
arr1 = arr2.append(999)
print(arr1)

Type = <class 'array.array'>
List = array('i', [2, 3, 4, 7, 78])

Type = <class 'array.array'>
List = array('f', [2.3399999141693115, 3.009000062942505, 44.5, 7.112299919128418, 78.0])

None


In [18]:
# NumPy array upcast data type of all elements to bigger datatype in case of different types
import numpy as np
arr1 = np.array([2.56, True, 6, 0.01, -3, False])
print(f'Type = {type(arr1)}\nList = {arr1}')
print(type(arr1[1]))
print(type(arr1[2]))
print(type(arr1[5]))

Type = <class 'numpy.ndarray'>
List = [ 2.56  1.    6.    0.01 -3.    0.  ]
<class 'numpy.float64'>
<class 'numpy.float64'>
<class 'numpy.float64'>


In [21]:
# NumPy array upcast data type of all elements to bigger datatype in case of different types
arr = np.array([1.02, -6, 'Godwin', True])
print(f'Type = {type(arr)}\nList = {arr}')
print(type(arr[1]))
print(type(arr[3]))
print(type(arr[2]))

Type = <class 'numpy.ndarray'>
List = ['1.02' '-6' 'Godwin' 'True']
<class 'numpy.str_'>
<class 'numpy.str_'>
<class 'numpy.str_'>


In [22]:
# If you mention the data type, the elements are automatically typecasted to the mentioned type
array1 = np.array([3.5, False, 9.8, 2.7, True], dtype=np.uint16)
print(f'Type = {type(arr)}\nList = {arr}')
print(type(array1[1]))
print(type(array1[3]))

Type = <class 'numpy.ndarray'>
List = ['1.02' '-6' 'Godwin' 'True']
<class 'numpy.uint16'>
<class 'numpy.uint16'>


In [24]:
# # If you mention the data type, the elements are automatically typecasted to the mentioned type
# import numpy as np
# array1 = np.array([3.5, False, 9.8, 2.7, True], dtype=np.str)
# print(array1)
# print(f'Type = {type(arr)}\nList = {arr}')
# print(type(array1[1]))
# print(type(array1[3]))

## 2. Memory Consumption of NumPy Array and Python List
- Python Lists consume more memory than NumPy arrays

In [34]:
import numpy as np
import sys

print("PYTHON LIST")
# declaring a list of 1000 elements
mylist = range(0, 1000)

element_size = sys.getsizeof(mylist)
mylist_size = element_size * len(mylist)
print(f'Size of each element = {element_size}\nSize of mylist = {mylist_size}bytes')

print("\nNUMPY ARRAY")
# declaring a Numpy array of 1000 elements
arr1 = np.arange(1000, dtype=np.uint8)
print(f'Size of each element = {arr1.itemsize}\nSize of mylist = {arr1.nbytes}bytes')


PYTHON LIST
Size of each element = 48
Size of mylist = 48000bytes

NUMPY ARRAY
Size of each element = 1
Size of mylist = 1000bytes


## 3. Operations on NumPy Arrays vs Python Lists
- NumPy arrays are stored at one continuous place in memory unlike lists, so processes can access and manipulate them very efficiently. 
- This behavior is called **locality of reference** in computer science. 
- This is the main reason why NumPy is faster than lists. 
- As a proof of concept, we can multiply two list and and then two arrays, and compare their multiplication time

### Effect of * operator on NumPy Array and Python List

In [35]:
# You can multiply two numPy arrays using * operator
import numpy as np
myarray1 = np.array([1, 2, 3, 4, 5, 6])
myarray2 = np.array([1, 2, 3, 4, 5, 6])
myarray3 = myarray1 * myarray2
myarray3

array([ 1,  4,  9, 16, 25, 36])

In [36]:
# you can't multiply two lists using a * operator, you have to use a loop
mylist1 = [1, 2, 3, 4, 5, 6]
mylist2 = [1, 2, 3, 4, 5, 6]
mylist3 = [0, 0, 0, 0, 0, 0]
for i in range(0,6):
    mylist3[i] = mylist1[i] * mylist2[i]
mylist3

[1, 4, 9, 16, 25, 36]

In [48]:
list1 = [5,4,3,2]
list2 = [2,3,4,5]
list3 = [0,0,0,0]

for i in range(4):
    list3[i] = list1[i] * list2[i]
list3

[10, 12, 12, 10]

**Let us calculate time to multiply two numPy arrays of 1 million elements**

In [57]:
import time
size = 1000000
arr1 = np.arange(size)
arr2 = np.arange(size)

# capturing time before the multiplication of Numpy arrays
initialTime = time.time()

# multiplying elements of both the Numpy arrays and stored in another Numpy array
arr3 = arr1 * arr2

# capturing time again after the multiplication is done
finishTime = time.time()

final = initialTime - finishTime
print(f'Time started = {initialTime}\nTime ended = {finishTime}\nTime Taken = {initialTime - finishTime}')

Time started = 1707477319.3876636
Time ended = 1707477319.3916554
Time Taken = -0.003991842269897461


**Let us calculate time to multiply two Python Lists of 1 million elements**

In [58]:
# Creating two large size Lists and multiplying them element by element
list1 = list(range(size))
list2 = list(range(size))
list3 = list(range(size))

# capturing time before the multiplication of Python Lists
initialTime = time.time()

# multiplying elements of both the lists and stored in another list
# simply run a loop and overwrite the elements of the new list with resulting value
for i in range(0, len(list1)):
         list3[i] = list1[i] * list2[i]
# capturing time again after the multiplication is done
finishTime = time.time()

final1 = initialTime - finishTime
print(f'Time started = {initialTime}\nTime ended = {finishTime}\nTime Taken = {initialTime - finishTime}')

Time started = 1707477320.8711853
Time ended = 1707477321.2061834
Time Taken = -0.33499813079833984


In [60]:
print(f'Python Numpy Array is {final/final1} times faster than python list')

Python Numpy Array is 0.011916013562178489 times faster than python list
