# Numpy


NumPy is a powerful Python library used for numerical computing. It stands for "Numerical Python." NumPy provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays efficiently.

Key features of NumPy:

1. `N-dimensional arrays`: NumPy's main data structure is the `ndarray`, which is a multi-dimensional array. It allows you to work with data in various dimensions (1D, 2D, 3D, etc.).

2. `Mathematical functions`: NumPy provides a wide range of mathematical operations such as arithmetic operations, trigonometry, logarithms, exponents, and more, which can be applied element-wise on arrays.

3. `Broadcasting`: NumPy allows operations between arrays with different shapes and sizes by broadcasting the values to perform element-wise computations.

4. `Slicing and Indexing`: You can easily slice and access elements of arrays using various indexing techniques.

5. `Integration with other libraries`: NumPy plays a crucial role in the data science ecosystem and integrates well with other libraries like Pandas, Matplotlib, and SciPy.

Advantages of using NumPy:
- **Reduced Memory Requirements**: NumPy arrays consume less memory compared to Python lists, making it ideal for handling large datasets more efficiently.
- **Faster Execution**: NumPy's underlying C implementation enables faster execution of mathematical operations, making it well-suited for high-performance computing tasks.
- **Convenience**: NumPy offers a wide range of built-in functions and methods for efficient data manipulation and mathematical computations.
- **Functionality**: NumPy supports a vast array of mathematical and statistical functions that are optimized for large-scale data processing.

## Memory Consumption
NumPy arrays use much less memory as compared to normal lists.

In [4]:
import numpy as np
import sys

li_arr = [i for i in range(101)]     # Create list of 100 elements
np_arr = np.arange(101)              # Create numpy array of 100 elements

In [5]:
## Size of 100 elements in numpy array
print(np_arr.itemsize * np_arr.size)

404


In [7]:
## Size of 100 elements in python list
print(sys.getsizeof(1) * len(li_arr))

2800


## Time Execution
NumPy arrays are much faster as compared to python lists

In [16]:
import time
import numpy as np  

size = 100000

def addition_using_list():
  t1 = time.time() #starts after epoch
  a = range(size)
  b = range(size)
  c = [a[i] + b[i] for i in range(size)]
  t2 = time.time()
  return t2 - t1
  
def addition_using_numpy():
  t1 = time.time()
  a = np.arange(size)
  b = np.arange(size)
  c = a + b
  t2 = time.time()
  return t2 - t1

t_list = addition_using_list()
t_numpy = addition_using_numpy()

print("List = ", t_list * 1000, "milli seconds")   
print("NumPy = ", t_numpy * 1000,"milli seconds")

List =  23.93484115600586 milli seconds
NumPy =  0.0 milli seconds


# Why Numpy Faster than Lists

1. **Data Type Consistency**: In Python lists, elements can have different data types, which adds overhead for type-checking during operations. Numpy arrays, on the other hand, are homogeneous and have a fixed data type for all elements, which allows for faster execution of operations as the data type information is known in advance.

2. **Vectorization**: Numpy utilizes vectorized operations, which means that many operations are performed on the entire array at once, rather than on individual elements. This takes advantage of hardware-level optimizations and reduces the need for explicit loops in Python, leading to faster execution times.

3. **C Implementation**: Numpy is implemented in C and uses low-level optimizations, which make it more efficient in terms of memory usage and execution speed compared to the Python standard library lists.

4. **Memory Layout**: Numpy arrays have a contiguous memory layout, which allows for efficient cache utilization and enables better performance during data access and manipulation.

5. **Optimized Algorithms**: Numpy is designed with optimized algorithms for various mathematical and numerical operations like array multiplication, dot products, statistical calculations, etc. These operations are highly optimized and often implemented in C, providing significant speed gains compared to the equivalent operations in Python lists.

6. **Parallelism and Multithreading**: Numpy can leverage multiple cores and parallelize certain operations, leading to further speed improvements for large computations.

While Numpy is faster for numerical computations, Python lists still have their place in certain situations, especially for general-purpose tasks or when dealing with a small amount of data. Numpy's real strength shines when you are working with large datasets and performing heavy mathematical computations, where its efficient memory usage and optimized algorithms make a significant difference in performance.

# Creating NumPy Arrays

In [21]:
import numpy as np    #Use np as an alias for numpy

## np.array()
NumPy is used to work with arrays. The array object in NumPy is called ndarray.

We can create a NumPy ndarray object by using the array() function.

Convert a regular Python list, tuple, or other sequence-like object into a NumPy array.

### Syntax

numpy.array(object, dtype=None, *, copy=True, order='K', subok=False, ndmin=0, like=None)

### Parameters

- `object` : array_like

An array, any object exposing the array interface, an object whose array method returns an array, or any (nested) sequence.

- `dtype` : data-type, optional

The desired data-type for the array. If not given, then the type will be determined as the minimum type required to hold the objects in the sequence.

- `copy` : bool, optional

If true (default), then the object is copied. Otherwise, a copy will only be made if array returns a copy, if obj is a nested sequence, or if a copy is needed to satisfy any of the other requirements (dtype, order, etc.).

In [24]:
a = [1, 2, 3]
b = np.array(a)
print(b)
print(type(b))

[1 2 3]
<class 'numpy.ndarray'>


In [26]:
a = [1, 2, 3, '5', 4.5]
b = np.array(a, dtype = str)
print(b)

['1' '2' '3' '5' '4.5']


In [28]:
a = [1, 2, 3, '5', 4.5]
b = np.array(a * 3)
print(b)

['1' '2' '3' '5' '4.5' '1' '2' '3' '5' '4.5' '1' '2' '3' '5' '4.5']


### np.ones()
#### Syntax

numpy.ones(shape, dtype=None, order='C', *, like=None)

This function returns a new array of given shape and type, filled with ones.

####  Parameters

- `shape` : int or sequence of ints

Shape of the new array, e.g., (2, 3) or 2.

- `dtype` : data-type, optional

The desired data-type for the array, e.g., numpy.int8. Default is numpy.float64.

- `order` : {‘C’, ‘F’}, optional, default: C

Whether to store multi-dimensional data in row-major (C-style) or column-major (Fortran-style) order in memory.

- `like` : array_like

Reference object to allow the creation of arrays which are not NumPy arrays. If an array-like passed in as like supports the array_function protocol, the result will be defined by it. In this case, it ensures the creation of an array object compatible with that passed in via this argument.

In [29]:
b = np.ones(3, dtype = int)
b

array([1, 1, 1])

In [31]:
b = np.ones((2, 3), dtype = int)
b

array([[1, 1, 1],
       [1, 1, 1]])

### np.zeros()
#### Syntax

numpy.zeros(shape, dtype=float, order='C', *, like=None)

This returns a new array of given shape and type, filled with zeros.

#### Parameters

- `shape` : int or tuple of ints

Shape of the new array, e.g., (2, 3) or 2.

- `dtype` : data-type, optional

The desired data-type for the array, e.g., numpy.int8. Default is numpy.float64.

- `order` : {‘C’, ‘F’}, optional, default: ‘C’

Whether to store multi-dimensional data in row-major (C-style) or column-major (Fortran-style) order in memory.

In [34]:
arr = np.zeros(3, dtype=int)
print(arr)

[0 0 0]


In [36]:
b = np.zeros((3, 4))
b

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

### np.full()
#### Syntax

numpy.full(shape, fill_value, dtype=None, order='C', *, like=None)

Return a new array of given shape and type, filled with fill_value.

#### Parameters

- `shape` : int or sequence of ints

Shape of the new array, e.g., (2, 3) or 2.

- `fill_value` : scalar or array_like

Fill value.

- `dtype` : data-type, optional

The desired data-type for the array The default, None, means np.array(fill_value).dtype.

- `order` : {‘C’, ‘F’}, optional

Whether to store multidimensional data in C- or Fortran-contiguous (row- or column-wise) order in memory.

In [39]:
b = np.full((3, 3), 5, dtype = float)
b

array([[5., 5., 5.],
       [5., 5., 5.],
       [5., 5., 5.]])

## p.empty()
### Syntax

numpy.empty(shape, dtype=float, order='C', *, like=None)

This returns a new array of given shape and type, without initializing entries.

### Parameters

- `shape` : int or tuple of int

Shape of the empty array, e.g., (2, 3) or 2.

- `dtype` : data-type, optional

Desired output data-type for the array, e.g, numpy.int8. Default is numpy.float64.

- `order` : {‘C’, ‘F’}, optional, default: ‘C’

Whether to store multi-dimensional data in row-major (C-style) or column-major (Fortran-style) order in memory.

In [43]:
np.empty((2,2))             # uninitialized

array([[6.23042070e-307, 4.67296746e-307],
       [1.69121096e-306, 9.39465068e-312]])

In [45]:
np.empty((2, 2), dtype=int)

array([[1193501079, -408530185],
       [1528765816, 2039645231]])

* Note: You will get a different result each time you execute using empty, as the array formed will always be uninitialised.

## np.arange()
Syntax

### numpy.arange([start, ]stop, [step, ]dtype=None, *, like=None)

This returns evenly spaced values within a given interval.

Values are generated within the half-open interval [start, stop) (in other words, the interval including start but excluding stop). For integer arguments the function is equivalent to the Python built-in range function, but returns an ndarray rather than a list.

When using a non-integer step, such as 0.1, the results will often not be consistent. It is better to use numpy.linspace for these cases.

### Parameters

- `start` : integer or real, optional

Start of interval. The interval includes this value. The default start value is 0.

- `stop` : integer or real

End of interval. The interval does not include this value, except in some cases where step is not an integer and floating point round-off affects the length of out.

- `step` : integer or real, optional

Spacing between values. For any output out, this is the distance between two adjacent values, out[i+1] - out[i]. The default step size is 1. If step is specified as a position argument, start must also be given.

- `dtype` : dtype

The type of the output array. If dtype is not given, infer the data type from the other input arguments.

In [47]:
b = np.arange(10)
b

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [48]:
b = np.arange(2, 10)
b

array([2, 3, 4, 5, 6, 7, 8, 9])

In [49]:
b = np.arange(2, 20, 2)
b

array([ 2,  4,  6,  8, 10, 12, 14, 16, 18])

## `np.linspace()`*lineary spaced

### Syntax

```python
numpy.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None, axis=0)
```

This function returns evenly spaced numbers over a specified interval.

The endpoint of the interval can optionally be excluded.

### Parameters

- `start` : array_like
  The starting value of the sequence.

- `stop` : array_like
  The end value of the sequence, unless the `endpoint` is set to `False`. In that case, the sequence consists of all but the last of `num + 1` evenly spaced samples, so that `stop` is excluded. Note that the step size changes when `endpoint` is `False`.

- `num` : int, optional
  Number of samples to generate. Default is 50. Must be non-negative.

- `endpoint` : bool, optional
  If `True`, `stop` is the last sample. Otherwise, it is not included. Default is `True`.

- `retstep` : bool, optional
  If `True`, return `(samples, step)`, where `step` is the spacing between samples.

- `dtype` : dtype, optional
  The type of the output array. If `dtype` is not given, the data type is inferred from `start` and `stop`. The inferred dtype will never be an integer; float is chosen even if the arguments would produce an array of integers.

- `axis` : int, optional
  The axis in the result to store the samples. Relevant only if `start` or `stop` are array-like. By default (0), the samples will be along a new axis inserted at the beginning. Use `-1` to get an axis at the end.

In [52]:
b = np.linspace(2, 10) #start,end required
b
print(b[1] - b[0])
print(b[3] - b[2])

0.16326530612244916
0.16326530612244916


* The difference between np.arange() and np.linspace() is that, using arange() we actually have control over step value, and using linspace() we have control over the number of values we want to generate.

## `np.identity()`

### Syntax

```python
numpy.identity(n, dtype=None, *, like=None)
```

This function returns the identity array.

The identity array is a square array with ones on the main diagonal.

### Parameters

- `n`: int
  Number of rows (and columns) in n x n output.

- `dtype`: data-type, optional
  Data-type of the output. Defaults to float.

In [53]:
np.identity(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])


## `np.eye()`

### Syntax

```python
numpy.eye(N, M=None, k=0, dtype=<class 'float'>, order='C', *, like=None)
```

This function returns a 2-D array with ones on the diagonal and zeros elsewhere.

### Parameters

- `N` : int

  Number of rows in the output.

- `M` : int, optional

  Number of columns in the output. If None, defaults to N.

- `k` : int, optional

  Index of the diagonal: 0 (the default) refers to the main diagonal, a positive value refers to an upper diagonal, and a negative value to a lower diagonal.

- `dtype` : data-type, optional

  Data-type of the returned array.

- `order` : {'C', 'F'}, optional

  Whether the output should be stored in row-major (C-style) or column-major (Fortran-style) order in memory.

In [55]:
b = np.eye(3, 4)
b

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.]])

Sure! Here's the formatted markdown for the `np.random.rand()` function:

---

## np.random.rand()

### Syntax

```python
np.random.rand(d0, d1, ..., dn)
```

This function returns random values in a given shape.

### Parameters

- `d0, d1, ..., dn` : `t`, optional

  The dimensions of the returned array, must be non-negative. If no argument is given, a single Python formatting in markdown.

In [58]:
## To generate values in range 0 - 10, we simply multiply by 10
b = np.random.rand(10) * 10    
b

array([9.979374  , 2.43686358, 0.80323096, 6.95300558, 6.61576977,
       6.15138209, 3.11858436, 1.61556468, 3.70819116, 1.46466413])

In [59]:
b = np.random.rand(2, 3)
b

array([[0.98255544, 0.56660948, 0.62306193],
       [0.64905944, 0.48868337, 0.97805953]])

## np.random.randint()

### Syntax

```python
np.random.randint(low, high=None, size=None, dtype=int)
```

This function returns random integers from `low` (inclusive) to `high` (exclusive).

### Parameters

- `low`: `int` or array-like of ints

  Lowest (signed) integers to be drawn from the distribution (unless `high=None`, in which case this parameter is one above the highest such integer).

- `high`: `int` or array-like of ints, optional

  If provided, one above the largest (signed) integer to be drawn from the distribution (see above for behavior if `high=None`). If array-like, must contain integer values.

- `size`: `int` or tuple of ints, optional

  Output shape. If the given shape is, e.g., `(m, n, k)`, then `m * n * k` samples are drawn. Default is `None`, in which case a single value is returned.

- `dtype`: `dtype`, optional

  Desired dtype of the result. Byteorder must be native. The default value is `int`.


In [60]:
np.random.randint(2, size=10)

array([0, 1, 1, 1, 1, 0, 0, 0, 0, 0])

In [61]:
np.random.randint(1, size=10)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [62]:
np.random.randint(5, size=(2, 4))

array([[0, 0, 1, 1],
       [3, 3, 1, 0]])

In [64]:
np.random.randint(1, [3, 5, 10])

array([2, 2, 7])

In [66]:
np.random.randint(1, 10,(4,4))

array([[4, 9, 6, 8],
       [3, 3, 6, 7],
       [2, 6, 6, 8],
       [4, 1, 4, 6]])

### numpy attributes
Numpy arrays are basically collection of references or collection of pointers which points to
4 different attributes

- data: reference to 1st byte of the array /1st element
- shape: shape/size of the array
- dtype- datatype of the array
- strides: no of bytes to skip to get next element of the array

## Indexing and Slicing
Since we already know how to use indices and perform slicing in normal python lists, we shall see in comparison, how a numpy array does similar things.

### 1-D Arrays

In [68]:
import numpy as np

li = [1, 2, 3, 4, 5]
arr = np.array(li)

print(li)
print(arr)

[1, 2, 3, 4, 5]
[1 2 3 4 5]


In [70]:
print(arr.data)
print(arr.shape)
print(arr.dtype)
## Refers to the memory gap between two elements of numpy array
print(arr.strides)  

<memory at 0x000001BAC9C1F580>
(5,)
int32
(4,)


In [72]:
## Accessing elements
print(li[3])
print(arr[3])

4
4


In [75]:
## Accessing more than one elements
print(li[1:4])
print(arr[1:4])

[2, 3, 4]
[2 3 4]


## 2-D Arrays

In [77]:
li_2d = [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]]
arr_2d = np.array(li_2d)

print(li_2d)
print(arr_2d)

[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]]
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]
 [13 14 15 16]]


In [79]:
print(arr_2d.data)
print(arr_2d.shape)
print(arr_2d.dtype)
print(arr_2d.strides)

<memory at 0x000001BACA0E0E10>
(4, 4)
int32
(16, 4)


In [80]:
## Accessing the elements
print(li_2d[2][1])
print(arr_2d[2][1])
print(arr_2d[2, 1])   # We may use comma also

10
10
10


In [82]:
## Slicing 
print(li_2d[1][:3])
print(arr_2d[1, :3])

[5, 6, 7]
[5 6 7]


* For normal lists, we cannot get elements in column axis. Suppose we want to get elements of 1st, 2nd and 3rd row belonging only to the 2nd column, we may try this:

In [83]:
print(li_2d[0:3][2])

[9, 10, 11, 12]


In [86]:
# Above does not work. because

x = li_2d[0:3]
print(x)
y = x[2]
print(y)

[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]
[9, 10, 11, 12]


* This is easily possible in numpy arrays :

In [88]:
print(arr_2d[0 : 3, 2])

[ 3  7 11]


In [90]:
print(arr_2d[2:4, 1:3])
print(li_2d[2:4][1:3])

[[10 11]
 [14 15]]
[[13, 14, 15, 16]]


## Mathematical Operations
Lets see how numpy arrays simplify mathematical operations for us.
### Arithmetic Operations

In [91]:
import numpy as np

li = [1, 2, 3, 4, 5]
a = np.random.randint(1, 20, 5)
b = np.random.randint(1, 20, 5)

print(li)
print(a)
print(b)

[1, 2, 3, 4, 5]
[ 5 18 18 18 17]
[ 2 19 15  8  6]


In [92]:
## Adding 1 to each element in list
li = [i+1 for i in li]
li

[2, 3, 4, 5, 6]

In [93]:
## Adding 1 to each element in numpy array
a = a + 1
a

array([ 6, 19, 19, 19, 18])

In [95]:
## Subtracting two numpy arrays
d = a - b
d

array([ 4,  0,  4, 11, 12])

In [96]:
## Multiplying two numpy arrays
e = a * b
e

array([ 12, 361, 285, 152, 108])

In [97]:
## Dividing two numpy arrays
f = a / b
f

array([3.        , 1.        , 1.26666667, 2.375     , 3.        ])

In [98]:
h = a ** b
h

array([         36,  -306639989, -1044074789,  -196306143,    34012224])

In [100]:
## Some misc operations
print(a)
print(a.sum())    ## Sum of all elements
print(a.mean())   ## Mean of all elements
print(a.min())    ## Minimum of all elements
print(a.argmin()) ## Index of minimum of all elements 
print(a.max())    ## Maximum of all elements
print(a.argmax()) ## Index of maximum of all elements 

[ 6 19 19 19 18]
81
16.2
6
0
19
1


## Relational Operoator & Logical Operations
returns boolean array by applying the codition

In [101]:
print(a)
print(b)

[ 6 19 19 19 18]
[ 2 19 15  8  6]


In [102]:
a > b

array([ True, False,  True,  True,  True])

In [103]:
a < b

array([False, False, False, False, False])

In [104]:
a == b

array([False,  True, False, False, False])

In [106]:
# logical
print(np.logical_or(a, b))
print(np.logical_and(a, b))
print(np.logical_not(a)) #only one argument: unary operator

[ True  True  True  True  True]
[ True  True  True  True  True]
[False False False False False]


In [107]:
a[2] = 0
print(a)
print(np.logical_not(a))

[ 6 19  0 19 18]
[False False  True False False]


## Boolean Indexing
NumPy also permits the use of a boolean-valued array as an index, to perform advanced indexing on an array. In its simplest form, this is an extremely intuitive and elegant method for selecting contents from an array based on logical conditions.

### 1D Arrays

In [109]:
import numpy as np

In [111]:
b = np.random.randint(1, 20, 8)
print(b)

[19  2 19  9  6 12 15  2]


In [112]:
print(b > 10)

[ True False  True False False  True  True False]


In [114]:
bool_arr = b > 10
print(bool_arr)

new_arr = b[bool_arr]
print(new_arr)

[ True False  True False False  True  True False]
[19 19 12 15]


In [116]:
# In shorthand, we may do the following :

new_arr = b[b > 10]
new_arr

array([19, 19, 12, 15])

In [117]:
new_arr = b[(b > 10) & (b < 18)]
new_arr

array([12, 15])

In [118]:
print(b)
c = b
c

[19  2 19  9  6 12 15  2]


array([19,  2, 19,  9,  6, 12, 15,  2])

In [119]:
c[:3] = 19
print(c)
c[c > 15] = 100
c

[19 19 19  9  6 12 15  2]


array([100, 100, 100,   9,   6,  12,  15,   2])

In [120]:
print(b)
print(b[b == 100])

[100 100 100   9   6  12  15   2]
[100 100 100]


In [123]:
## To get those indiced where element is 100
ind = np.where(b == 100)
ind

(array([0, 1, 2], dtype=int64),)

## 2D Arrays
2D arrays work the same as 1D arrays.

In [124]:
import numpy as np

In [125]:
a = np.random.randint(1, 30, (5, 6))
print(a)

[[ 6 11 19 27  4 26]
 [11 12  9  6  9 24]
 [ 3 21 26 28  3  7]
 [ 9 24 25 18  8 10]
 [26  2  4  7 13 10]]


In [126]:
bool_arr = a > 20
print(bool_arr)

[[False False False  True False  True]
 [False False False False False  True]
 [False  True  True  True False False]
 [False  True  True False False False]
 [ True False False False False False]]


In [127]:
ans = a[bool_arr]
print(ans)

[27 26 24 21 26 28 24 25 26]


In [128]:
b = a
print(a)

[[ 6 11 19 27  4 26]
 [11 12  9  6  9 24]
 [ 3 21 26 28  3  7]
 [ 9 24 25 18  8 10]
 [26  2  4  7 13 10]]


In [129]:
b[bool_arr] = 100
print(b)

[[  6  11  19 100   4 100]
 [ 11  12   9   6   9 100]
 [  3 100 100 100   3   7]
 [  9 100 100  18   8  10]
 [100   2   4   7  13  10]]


In [130]:
c = np.random.randint(1, 10, (2, 2))
print(c)

[[9 5]
 [1 8]]


In [131]:
c_bool = np.array([[True, False], [False, True], [True, True]])
print(c_bool)

[[ True False]
 [False  True]
 [ True  True]]


In [134]:
print(c[c_bool])   
## Error generated because dimensions of boolean array and c are not same dimesion

IndexError: boolean index did not match indexed array along dimension 0; dimension is 2 but corresponding boolean dimension is 3

In [135]:
print(b)

[[  6  11  19 100   4 100]
 [ 11  12   9   6   9 100]
 [  3 100 100 100   3   7]
 [  9 100 100  18   8  10]
 [100   2   4   7  13  10]]


In [136]:
bool_arr = b[:, 3] == 100
print(bool_arr)

[ True False  True False False]


In [138]:
b[bool_arr, 3] = 99
print(b)

[[  6  11  19  99   4 100]
 [ 11  12   9   6   9 100]
 [  3 100 100  99   3   7]
 [  9 100 100  18   8  10]
 [100   2   4   7  13  10]]


# NumPy Broadcasting

The term broadcasting describes how numpy treats arrays with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes. Broadcasting provides a means of vectorizing array operations so that looping occurs in C instead of Python. It does this without making needless copies of data and usually leads to efficient algorithm implementations. There are, however, cases where broadcasting is a bad idea because it leads to inefficient use of memory that slows computation.

NumPy operations are usually done on pairs of arrays on an element-by-element basis. In the simplest case, the two arrays must have exactly the same shape, as in the following example:

#### arrrays are compatible only if they are having same dimensions or one of them have 1 dimension

a=3 x 2 b=3 x 2 #compatible

a=3 x 3 b=1 x 3 #compatible, here broadcasting happens

a=3 x 3 b=3 x 2 # not compatible

In [139]:
import numpy as np

In [141]:
a = np.array([1.0, 2.0, 3.0])
b = np.array([2.0, 2.0, 2.0])
a * b

array([2., 4., 6.])

NumPy’s broadcasting rule relaxes this constraint when the arrays’ shapes meet certain constraints. The simplest broadcasting example occurs when an array and a scalar value are combined in an operation:

In [143]:
a = np.array([1.0, 2.0, 3.0])
b = 2.0
a * b

array([2., 4., 6.])

The result is equivalent to the previous example where b was an array. We can think of the scalar b being stretched during the arithmetic operation into an array with the same shape as a. The new elements in b are simply copies of the original scalar. The stretching analogy is only conceptual. NumPy is smart enough to use the original scalar value without actually making copies so that broadcasting operations are as memory and computationally efficient as possible.

The code in the second example is more efficient than that in the first because broadcasting moves less memory around during the multiplication (b is a scalar rather than an array).

When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing (i.e. rightmost) dimensions and works its way left. Two dimensions are compatible when

they are equal, or

one of them is 1

If these conditions are not met, a ValueError: operands could not be broadcast together exception is thrown, indicating that the arrays have incompatible shapes. The size of the resulting array is the size that is not 1 along each axis of the inputs.

In [144]:
x = np.random.randint(1, 10, (3, 3))
y = np.random.randint(1, 10, (3, 3))
print(x)
print(y)

[[4 1 4]
 [3 9 5]
 [1 6 4]]
[[4 7 5]
 [5 5 2]
 [1 2 7]]


In [145]:
ans = x - y
print(ans)

[[ 0 -6 -1]
 [-2  4  3]
 [ 0  4 -3]]


In [146]:
x = np.random.randint(1, 10, (3, 3))
y = np.random.randint(1, 10, (3))
print(x)
print(y)

[[5 8 9]
 [7 9 5]
 [4 4 7]]
[4 9 2]


In [147]:
ans = x - y
print(ans)

[[ 1 -1  7]
 [ 3  0  3]
 [ 0 -5  5]]


In [152]:
x = np.random.randint(1, 10, (3, 2))
y = np.random.randint(1, 10, (2, 3))
print(x)
print(y)

[[2 5]
 [2 7]
 [5 2]]
[[1 7 2]
 [5 4 7]]


# Transpose
To make these two arrays compatible, we may transpose one array.

In [153]:
y = np.transpose(y)
print(y)
ans = x - y
print(ans)

[[1 5]
 [7 4]
 [2 7]]
[[ 1  0]
 [-5  3]
 [ 3 -5]]


## Reshape
We may also reshape our array.

In [155]:
x = np.arange(16)
x

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])

In [157]:
y = np.random.randint(1, 10, (4, 4))
y

array([[2, 6, 8, 7],
       [6, 4, 5, 3],
       [5, 2, 1, 4],
       [4, 9, 9, 6]])

In [158]:
x * y    ## Error generated

ValueError: operands could not be broadcast together with shapes (16,) (4,4) 

In [160]:
x = np.reshape(x, (4, 4))
x

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

Now we may multiply, subtract, add or divide 'x' and 'y'.

In [161]:
x * y

array([[  0,   6,  16,  21],
       [ 24,  20,  30,  21],
       [ 40,  18,  10,  44],
       [ 48, 117, 126,  90]])

In [162]:
x - y

array([[-2, -5, -6, -4],
       [-2,  1,  1,  4],
       [ 3,  7,  9,  7],
       [ 8,  4,  5,  9]])

In [163]:
x + y

array([[ 2,  7, 10, 10],
       [10,  9, 11, 10],
       [13, 11, 11, 15],
       [16, 22, 23, 21]])

In [164]:
x / y

array([[ 0.        ,  0.16666667,  0.25      ,  0.42857143],
       [ 0.66666667,  1.25      ,  1.2       ,  2.33333333],
       [ 1.6       ,  4.5       , 10.        ,  2.75      ],
       [ 3.        ,  1.44444444,  1.55555556,  2.5       ]])

In [165]:
x // y 

array([[ 0,  0,  0,  0],
       [ 0,  1,  1,  2],
       [ 1,  4, 10,  2],
       [ 3,  1,  1,  2]])

In [167]:
x % y

array([[0, 1, 2, 3],
       [4, 1, 1, 1],
       [3, 1, 0, 3],
       [0, 4, 5, 3]])

## NumPy 2D array
Problem Statement:

Given a 2D list, create a numpy 2D array using it.

Note: Given 2D list is [[1, 2, 3], [4, 5, 6], [7, 8, 9]] Print the Numpy array.

In [171]:
import numpy as np

In [174]:
li = [[1,2,3,4], [5,6,7,8]]

b = np.array(li)

print(b)

[[1 2 3 4]
 [5 6 7 8]]


## NumPy 0s and 1s
Problem Statement:

Create an integer array of size 10, where all the values should be 0 but the fifth value should be 1. Print the elements of array.

In [179]:
a= np.zeros(10,dtype=int)
a[4] = 5
print(a)
print(*a)

[0 0 0 0 5 0 0 0 0 0]
0 0 0 0 5 0 0 0 0 0


## NumPy Inclusive
Problem Statement:

Create an array with values ranging consecutively from 9 to 49 (both inclusive). Print the Numpy array.

In [181]:
a = np.arange(9,50)
print(a)

[ 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49]


# Identity Matrix
Problem Statement:

Create a matrix having diagonal elements as 1 and all other elements as 0 of size (5, 6). Print the Numpy array.

In [186]:
a = np.eye(5, 6, dtype=int)
print(a)

[[1 0 0 0 0 0]
 [0 1 0 0 0 0]
 [0 0 1 0 0 0]
 [0 0 0 1 0 0]
 [0 0 0 0 1 0]]


## Cut the Rope
Problem Statement:

You are given a rope of length 5m. Cut the rope into 9 parts such that each part is of equal length. Note:Array elements are the points where cut is to be made and round upto 2 decimal place. Print the array element.

In [194]:
arr = np.linspace(0,5,10)
for i in range(0, len(arr)):
    print(round(arr[i],2))

0.0
0.56
1.11
1.67
2.22
2.78
3.33
3.89
4.44
5.0


# Print Elements
Problem Statement:

Given a 2D integer array size (4, 5) with name input_?

apply slicing on it

In [199]:
import numpy as np
input_ = np.arange(1,21,1)
input_ = input_.reshape(4,5)
print(input_)

print(*input_[2,:3])

print(*input_[1:4,3])

c = input_[2:4,:5]
print(*c[0],end=' ')
print(*c[1],end=' ')
print()

c = input_[1:3,1:3]
print(*c[0],end=' ')
print(*c[1],end=' ')

[[ 1  2  3  4  5]
 [ 6  7  8  9 10]
 [11 12 13 14 15]
 [16 17 18 19 20]]
11 12 13
9 14 19
11 12 13 14 15 16 17 18 19 20 
7 8 12 13 

# Non-Zero elements
Problem Statemnt:

Find indices of non-zero elements from the array [1, 2, 0, 0, 4, 0] ? Print the index of non-zero elements.

In [216]:
import numpy as np
arr = np.array([1,2,0,0,4,0])
ind = np.where(arr != 0)

for i in ind:
    print(i)

print(ind)

[0 1 4]
(array([0, 1, 4], dtype=int64),)


## Multiples of 3
Problem Statement:

Given an integer array of size 10. Print the index of elements which are multiple of 3. Note: Generate the following array

array([ 1, 3, 5, 7, 9, 11, 13, 15, 17, 19])

In [221]:
a = np.array([ 1, 3, 5, 7, 9, 11, 13, 15, 17, 19])

# or To generate use 
a = np.arange(1,20,2)

indexes = np.where(a%3 == 0)
indexes

(array([1, 4, 7], dtype=int64),)

# Odd elements
Problem Statemnt:

Given an integer array of size 10. Replace the odd number in numpy array with -1 ? Note: Generate the following array

array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

In [226]:
a = np.arange(1,11)
a[ a % 3 == 0] = -1
a

array([ 1,  2, -1,  4,  5, -1,  7,  8, -1, 10])

# Replace Max
Problem Statemnt:

Given an integer array of size 9 and replace the first occurrence of maximum value by 0? Note: Generate the following array

array([11, 2, 13, 4, 15, 6, 27, 8, 19])

In [237]:
a = np.random.randint(1,20,9)
print(a)
a[a.argmax()] = 0
a

[ 5  2 12 17 19 15 19 17 19]


array([ 5,  2, 12, 17,  0, 15, 19, 17, 19])

# Negate Elements
Problem Statemnt:

Given a 1D array, negate all elements which are between 3 and 8 (both inclusive)? Note: Generate the following array

array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

In [240]:
arr = np.arange(1,11)
arr[2:8] = np.multiply(arr[2:8],-1)
arr

array([ 1,  2, -3, -4, -5, -6, -7, -8,  9, 10])

# Killed IN USA

In [246]:
import numpy as np
import csv

with open('terrorismData.csv',encoding="utf8") as file_obj :
    data = csv.DictReader(file_obj,skipinitialspace=True)

    killed = []
    country = []
    for row in data:
        killed.append(row['Killed'])
        country.append(row['Country'])

    np_killed = np.array(killed)
    np_country = np.array(country)
    
    np_killed[np_killed == ''] = '0.0'
    np_killed = np.array(np_killed,dtype=float)
    
    np_country = np.array(country)
    indexes = np.where(np_country == "United States")
    ans = np_killed[indexes]
    
    # no_of_killed = 0
    # for i in ans:
    #     no_of_killed+=i
    
    no_of_killed = np.sum(ans)
    print(no_of_killed)

3771.0


# Casuality in India vs Casualities over world

In [253]:
import numpy as np
import csv

with open('terrorismData.csv',encoding="utf8") as file_obj :
    data = csv.DictReader(file_obj,skipinitialspace=True)

    killed = []
    wounded = []
    country = []
    for row in data:
        killed.append(row['Killed'])
        country.append(row['Country'])
        wounded.append(row['Wounded'])

    np_killed = np.array(killed)
    np_country = np.array(country)
    np_wounded = np.array(wounded)
    
    np_killed[np_killed == ''] = '0.0'
    np_wounded[np_wounded == ''] = '0.0' #handling empty strings
    np_killed = np.array(np_killed,dtype=float)
    np_wounded = np.array(np_wounded,dtype=float)

    np_casualities = np_killed + np_wounded
    np_country = np.array(country)
    
    indexes = np.where(np_country == "India")
    ans = np_casualities[indexes]

    total_casualities_india = np.sum(ans)
    total_casualities_over_world = np.sum(np_casualities)
    
    print("India Casualities",int(total_casualities))
    print("World Casualities",int(total_casualities_over_world))

India Casualities 48321
World Casualities 935737


## Number of attacks between day 10 and day 20

In [254]:
import numpy as np
import csv

days= []

with open('year2017-7767.csv','r',encoding='utf-8') as file_obj:
    data = csv.DictReader(file_obj, skipinitialspace=True)
    for row in data:
        days.append(row['Day'])

np_days= np.array(days,dtype = float)
num_attacks = np_days[(np_days>=10) & (np_days<=20)]

print(len(num_attacks))

print(num_attacks)

4002
[10. 10. 10. ... 20. 20. 20.]


## Number of attack between 1 jan 2010 and 31 jan 2010

In [263]:
import numpy as np
import csv

years= []
months= []
days= []

with open('terrorismData.csv','r',encoding='utf-8') as file_obj:
    data = csv.DictReader(file_obj, skipinitialspace=True)
    for row in data:
        days.append(row['Day'])
        months.append(row['Month'])
        years.append(row['Year'])

np_days= np.array(days,dtype = float)
np_months= np.array(months,dtype = float)
np_years= np.array(years,dtype = float)

bool_year = (np_years == 2010)
bool_month= ((bool_year) & (np_months==1))
bool_days = bool_month & ((np_days>=1) & (np_days<=31))
ans= np_days[bool_days]

print(len(ans))

271


In [264]:
#alternate solution
import numpy as np
import csv

with open('terrorismData.csv',encoding="utf8") as file_obj :
    data = csv.DictReader(file_obj,skipinitialspace=True)
    
    days = []
    months = []
    year = []
    for row in data:
        days.append(row['Day'])
        months.append(row['Month'])
        year.append(row['Year'])
        
    np_day = np.array(days, dtype=float)
    np_month = np.array(months, dtype=float)
    np_year = np.array(year, dtype=float)
    
    np_day = np_day[(np_month==1) & (np_year==2010)]
    
    print(len(np_day[np_day!=0]))

271


In [52]:
import numpy as np
import csv

with open('terrorismData.csv',encoding="utf8") as file_obj :
    data = csv.DictReader(file_obj,skipinitialspace=True)
    
    city = []
    casualty = []
    for row in data:
        if row["State"] == "Jammu and Kashmir" and row["Year"] == "1999" :
            if row["Month"] =="5" or row["Month"] == "6" or row["Month"] == "7" :
                casualty.append([row["Killed"],row["Wounded"]])
                city.append([row["City"],row['Group']])
                
    np_city = np.array(city)
    np_casualty = np.array(casualty)
    
    np_casualty[np_casualty==''] = "0.0"
    np_casualty = np.array(np_casualty, dtype = float)
    
    np_casualty = np.sum(np_casualty, axis=1)
    index = np.argmax(np_casualty)
    print(int(np_casualty[index]),np_city[index][0],np_city[index][1])

Casualty: 300.0 City: Unknown Terrorist Group: Unknown


In [267]:
import numpy as np
import csv

with open('terrorismData.csv',encoding="utf8") as file_obj :
    data = csv.DictReader(file_obj,skipinitialspace=True)
    
    city = []
    casualty = []
    for row in data:
        if row["State"] == "Jammu and Kashmir" and row["Year"] == "1999" :
            if row["Month"] =="5" or row["Month"] == "6" or row["Month"] == "7" :
                casualty.append([row["Killed"],row["Wounded"]])
                city.append([row["City"],row['Group']])
                
    np_city = np.array(city)
    np_casualty = np.array(casualty)
    
    np_casualty[np_casualty==''] = "0.0"
    np_casualty = np.array(np_casualty, dtype = float)
    
    np_casualty = np.sum(np_casualty, axis=1)
    index = np.argmax(np_casualty)
    print(int(np_casualty[index]),np_city[index][0],np_city[index][1])

22 Kargil District Separatists


In [268]:
import numpy as np
import csv

with open('terrorismData.csv',encoding="utf8") as file_obj :
    data = csv.DictReader(file_obj,skipinitialspace=True)
    
    casualty = []
    for row in data:
        if row['State']=='Chhattisgarh' or row['State']=='Odisha' or row['State']=='Jharkhand' or row['State']=='Andhra Pradesh':
                casualty.append([row["Killed"],row["Wounded"]])

    np_casualty = np.array(casualty)
    
    np_casualty[np_casualty==''] = "0.0"
    np_casualty = np.array(np_casualty, dtype = float)
    
    np_casualty = np.sum(np_casualty, axis=1)
    total_casualty = np.sum(np_casualty)
    print(int(total_casualty))

5628


It assumes that np_casualty is a NumPy array or a 2D array-like structure where each row represents a list of casualties in the format [killed, wounded].

It uses np.sum() function to calculate the sum along axis 1. In this context, axis=1 means that the sums will be calculated for each row.

The result of the summation will be stored in a new NumPy array (or 1D array-like structure) called np_casualty. This means that the original 2D structure is collapsed into a 1D array containing the sums of each row.

In [269]:
import numpy as np
import csv

def convert_to_float(value):
    if value.strip() == "":
        return 0.0
    return float(value)

with open('terrorismData.csv', encoding="utf8") as file_obj:
    data = csv.DictReader(file_obj, skipinitialspace=True)
    killed = []
    wounded = []
    for row in data:
        if row['State'] in ['Chhattisgarh', 'Odisha', 'Jharkhand', 'Andhra Pradesh']:
            killed.append(convert_to_float(row["Killed"]))
            wounded.append(convert_to_float(row["Wounded"]))

    np_killed = np.array(killed)
    np_wounded = np.array(wounded)

    total_casualty = np.sum(np_killed + np_wounded)
    print(int(total_casualty))


5628


In [272]:
import numpy as np
import csv
from collections import Counter

with open('terrorismData.csv',encoding="utf8") as file_obj :
    data = csv.DictReader(file_obj,skipinitialspace=True)
    
    city = []
    casualty = []
    for row in data:
         if row["Country"] == 'India' and row['City'] != 'Unknown':
                casualty.append([row["Killed"],row["Wounded"]])
                city.append(row["City"])
                
    np_city = np.array(city)
    np_casualty = np.array(casualty)
    
    np_casualty[np_casualty==''] = "0.0"
    np_casualty = np.array(np_casualty, dtype = float)
    np_casualty = np.sum(np_casualty, axis=1)
    
    dic = {}
    for i in range(len(np_city)):
        if np_city[i] in dic:
            dic[np_city[i]] += np_casualty[i]
        else:
            dic[np_city[i]] = np_casualty[i]

    k = Counter(dic) 
    high = k.most_common(5) 
    for i in high: 
        print(i[0],int(i[1]))

Srinagar 3134
New Delhi 2095
Mumbai 2016
Jammu 1119
Guwahati 822


## Terrorism Frequent Day

In [278]:
# Terrorism Frequent Day
# Find the most frequent data of attack in a terrorismDataset
# Note: np.unique can ba used
import numpy as np
import csv

with open('terrorismData.csv', encoding="utf8") as file_obj:
    data = csv.DictReader(file_obj, skipinitialspace=True)
    attack_days = []
    for row in data:
        attack_days.append(row["Day"])

    np_attack_days = np.array(attack_days, dtype=int)

# Get unique days and their corresponding counts
unique_days, day_counts = np.unique(np_attack_days, return_counts=True)

# Find the day with the highest count (most frequent day)
most_frequent_day = unique_days[np.argmax(day_counts)]
most_frequent_day_count = np.max(day_counts)

print("The most frequent day of attacks in the terrorism dataset is:", most_frequent_day)
print("Number of attacks on the most frequent day:", most_frequent_day_count)


The most frequent day of attacks in the terrorism dataset is: 15
Number of attacks on the most frequent day: 6500


In [279]:
import numpy as np
import csv

with open('terrorismData.csv',encoding="utf8") as file_obj :
    data = csv.DictReader(file_obj,skipinitialspace=True)
    
    day = []
    for row in data:
        day.append(row['Day'])
        
    np_day = np.array(day, dtype='int')
    day, count = np.unique(np_day, return_counts=True)
    index = np.argmax(count)
    print(day[index], count[index])

15 6500
