# NumPy - Part 2

API documentation: https://numpy.org/doc/stable/reference/index.html

## Using NumPy

Once you've installed NumPy you can import it as a library:

In [1]:
import numpy as np

## Fast Sorting in NumPy: ``np.sort`` and ``np.argsort``

To return a sorted version of the array without modifying the input, you can use ``np.sort``:

In [4]:
x = np.array([2, 1, 4, 3, 5])
np.sort(x) # without changing original array

array([1, 2, 3, 4, 5])

If you prefer to sort the array in-place, you can instead use the ``sort`` method of arrays:

In [5]:
x.sort()
print(x)

[1 2 3 4 5]


*A related function is ``argsort``, which returns the *indices* of the sorted elements:

numpy.argsort() function is used to perform an indirect sort along the given axis using the algorithm specified by the kind keyword. It returns an array of indices of the same shape as arr that would sort the array.*

In [6]:
x = np.array([2, 1, 4, 3, 5])
i = np.argsort(x)
print(i)

[1 0 3 2 4]


The first element of this result gives the index of the smallest element, the second value gives the index of the second smallest, and so on.
These indices can then be used (via fancy indexing) to construct the sorted array if desired:

In [7]:
x[i]

array([1, 2, 3, 4, 5])

### Sorting along rows or columns

In [11]:
a = np.random.seed(42) #Creat the same pseudo random numbers each time
a

A useful feature of NumPy's sorting algorithms is the ability to sort along specific rows or columns of a multidimensional array using the ``axis`` argument. For example:

In [12]:
X = np.random.randint(0, 10, (4, 6)) #random.randint(low, high=None, size=None, dtype=int); 4 rows and 6 columns,числа от 0 до 10
#X=np.random.randint(0, 10, size=(50)) #for 1-dimentional array
X

array([[6, 3, 7, 4, 6, 9],
       [2, 6, 7, 4, 3, 7],
       [7, 2, 5, 4, 1, 7],
       [5, 1, 4, 0, 9, 5]])

In [13]:
# sort each column of X
np.sort(X, axis=0) #сортирует числа внутри столбика, не меняя исходной матрицы

array([[2, 1, 4, 0, 1, 5],
       [5, 2, 5, 4, 3, 7],
       [6, 3, 7, 4, 6, 7],
       [7, 6, 7, 4, 9, 9]])

In [15]:
# sort each row of X
np.sort(X, axis=1) #сортирует числа внутри строки

array([[3, 4, 6, 6, 7, 9],
       [2, 3, 4, 6, 7, 7],
       [1, 2, 4, 5, 7, 7],
       [0, 1, 4, 5, 5, 9]])

Keep in mind that this treats each row or column as an independent array, and any relationships between the row or column values will be lost!

## Working with Boolean Arrays

Given a Boolean array, there are a host of useful operations you can do.
We'll work with ``x``, the two-dimensional array we created earlier.

In [17]:
x = np.array([[3,  1,  8],  [4,  2,  7],  [9, 6, 10]])
x

array([[ 3,  1,  8],
       [ 4,  2,  7],
       [ 9,  6, 10]])

In [19]:
x < 6

array([[ True,  True, False],
       [ True,  True, False],
       [False, False, False]])

### Counting entries

To count the number of ``True`` entries in a Boolean array, ``np.count_nonzero`` is useful:

In [20]:
# how many values less than 6?
np.count_nonzero(x < 6)

4

We see that there are eight array entries that are less than 6.
Another way to get at this information is to use ``np.sum``; in this case, ``False`` is interpreted as ``0``, and ``True`` is interpreted as ``1``:

In [22]:
np.sum(x < 6)

4

The benefit of ``sum()`` is that like with other NumPy aggregation functions, this summation can be done along rows or columns as well:

In [27]:
# how many values less than 4 in each row?
np.sum(x < 4, axis=1)

array([2, 1, 0])

In [28]:
np.sum(x < 4, axis=0)

array([1, 2, 0])

This counts the number of values less than 6 in each row of the matrix.

If we're interested in quickly checking whether any or all the values are true, we can use (you guessed it) ``np.any`` or ``np.all``:

In [29]:
# are there any values greater than 8?
np.any(x > 8)

True

In [30]:
# are there any values less than zero?
np.any(x < 0)

False

In [31]:
# are all values less than 10?
np.all(x < 10)

False

In [32]:
# are all values equal to 6?
np.all(x == 6)

False

``np.all`` and ``np.any`` can be used along particular axes as well. For example:

In [33]:
# are all values in each row less than 4?
np.all(x < 8, axis=1)

array([False,  True, False])

Here all the elements in the first and third rows are less than 8, while this is not the case for the second row.

Finally, a quick warning: Python has built-in ``sum()``, ``any()``, and ``all()`` functions. These have a different syntax than the NumPy versions, and in particular will fail or produce unintended results when used on multidimensional arrays. Be sure that you are using ``np.sum()``, ``np.any()``, and ``np.all()`` for these examples!

### Multiplying Arrays

In [34]:
np.dot(3,4)

12

In [35]:
np.dot([2,3],[3,4]) # 2*3 + 3*4

18

In [36]:
np.array([2,3]) * np.array([5,4]) #Element wise multiply

array([10, 12])

Matrix multiplication: Works with @ operator And np.dot() function

In [41]:
A = np.array([[3,2],[0,1]])
B = np.array([[3,1],[2,1]])
print("A@B=\n", A@B)
print("np.dot(A,B)=\n",np.dot(A,B))
print("A*B \n",A*B) # перемножает просто по положению в матрице

A@B=
 [[13  5]
 [ 2  1]]
np.dot(A,B)=
 [[13  5]
 [ 2  1]]
A*B 
 [[9 2]
 [0 1]]


In [42]:
A

array([[3, 2],
       [0, 1]])

In [43]:
B

array([[3, 1],
       [2, 1]])

# Exercises 

Now that we've learned about NumPy let's test your knowledge. We'll start off with a few simple tasks, and then you'll be asked some more complicated questions.

In [45]:
mat = np.arange(1,26).reshape(5,5)
mat

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20],
       [21, 22, 23, 24, 25]])

#### Get the sum of all the values in mat

In [46]:
mat.sum()

325

#### Get the standard deviation of the values in mat. Use Numpy API if you don't remember

In [47]:
np.std(mat)

7.211102550927978

#### Get the sum of all the columns in mat

In [48]:
mat.sum(axis=0)

array([55, 60, 65, 70, 75])

**Using the mass_numbers and isotopic_abundances arrays below, evaluate the average mass number for tin.**



In [50]:
mass_numbers = np.array([112, 114, 115, 116, 117, 118, 119, 120, 122, 124])
isotopic_abundances = np.array([0.0097, 0.0066, 0.0034, 0.1454, 0.0768, 0.2422, 0.0859, 0.3258, 0.0463, 0.0579])
np.dot(mass_numbers,isotopic_abundances)



118.8077

# Loading a text file by using NumPy's loadtxt method

Each row in the text file must have the same number of values.

Load data from a text file.

**numpy.loadtxt**
(fname, dtype=<class 'float'>, comments='#', delimiter=None, converters=None, skiprows=0, usecols=None, unpack=False, ndmin=0, encoding='bytes', max_rows=None, *, like=None)

In [52]:
a=np.loadtxt(r"eg6-a-student-data.csv", delimiter=",",skiprows=2, dtype='str' )
a
#(Subject,Gender,DOB,Height,Weight,BP,VO2max)

array([['JW-1', 'M', '19/12/1995', '1.82', '92.4', '119/76', '39.3'],
       ['JW-2', 'M', '11/01/1996', '1.77', '80.9', '114/73', '35.5'],
       ['JW-3', 'F', '02/10/1995', '1.68', '69.7', '124/79', '29.1'],
       ['JW-6', 'M', '06/07/1995', '1.72', '75.5', '110/60', '45.5'],
       ['JW-9', 'F', '11/12/1995', '1.78', '82.1', '115/75', '32.3'],
       ['JW-10', 'F', '07/04/1996', '1.6', '-', '-/-', '30.1'],
       ['JW-11', 'M', '22/08/1995', '1.72', '77.2', '97/63', '48.8'],
       ['JW-12', 'M', '23/05/1996', '1.83', '88.9', '105/70', '37.7'],
       ['JW-14', 'F', '12/01/1996', '1.56', '56.3', '108/72', '26'],
       ['JW-15', 'F', '01/06/1996', '1.64', '65', '99/67', '35.7'],
       ['JW-16', 'M', '10/09/1995', '1.63', '73', '131/84', '29.9'],
       ['JW-17', 'M', '17/02/1996', '1.67', '89.8', '101/76', '40.2'],
       ['JW-18', 'M', '31/07/1996', '1.66', '75.1', '-/-', '-'],
       ['JW-19', 'F', '30/10/1995', '1.59', '67.3', '103/69', '33.5'],
       ['JW-22', 'F', '09/03/199

In [53]:
type(a)

numpy.ndarray

**Calculate the minimum height**

In [56]:
np.min(a[:,3].astype(float))

1.56

In [57]:
np.argmin(a[:,3])

8

In [59]:
a[np.argmin(a[:,3])] # returns the whole row

array(['JW-14', 'F', '12/01/1996', '1.56', '56.3', '108/72', '26'],
      dtype='<U10')

numpy.argmin(a, axis=None, out=None)

Returns the indices of the minimum values along an axis.

**Let's find the average heights of the male students. The columns we need are the second and fourth.**

In [60]:
a[:,1] == 'M' #Gender

array([ True,  True, False,  True, False, False,  True,  True, False,
       False,  True,  True,  True, False, False,  True, False, False,
        True])

The average heights of the **male students** can be seen to be:

In [61]:
Men_Height = a[:,3][a[:,1] == 'M'].astype(float)
Men_Height 

array([1.82, 1.77, 1.72, 1.72, 1.83, 1.63, 1.67, 1.66, 1.97, 1.69])

In [62]:
type(Men_Height)

numpy.ndarray

In [63]:
mean_M=np.mean(Men_Height)
mean_M

1.748

Sort the array in **ascending order of height**:

In [65]:
sorted_array_height = a[np.argsort(a[:, 3])]
sorted_array_height

array([['JW-14', 'F', '12/01/1996', '1.56', '56.3', '108/72', '26'],
       ['JW-19', 'F', '30/10/1995', '1.59', '67.3', '103/69', '33.5'],
       ['JW-10', 'F', '07/04/1996', '1.6', '-', '-/-', '30.1'],
       ['JW-16', 'M', '10/09/1995', '1.63', '73', '131/84', '29.9'],
       ['JW-25', 'F', '25/10/1995', '1.63', '64.4', '-/-', '28'],
       ['JW-15', 'F', '01/06/1996', '1.64', '65', '99/67', '35.7'],
       ['JW-24', 'F', '01/12/1995', '1.66', '63.8', '100/78', '-'],
       ['JW-18', 'M', '31/07/1996', '1.66', '75.1', '-/-', '-'],
       ['JW-17', 'M', '17/02/1996', '1.67', '89.8', '101/76', '40.2'],
       ['JW-3', 'F', '02/10/1995', '1.68', '69.7', '124/79', '29.1'],
       ['JW-26', 'M', '17/04/1996', '1.69', '-', '121/82', '39'],
       ['JW-22', 'F', '09/03/1996', '1.7', '-', '119/80', '30.9'],
       ['JW-11', 'M', '22/08/1995', '1.72', '77.2', '97/63', '48.8'],
       ['JW-6', 'M', '06/07/1995', '1.72', '75.5', '110/60', '45.5'],
       ['JW-2', 'M', '11/01/1996', '1.77', '80

#### Replacing missing values

In [66]:
a[a == '-'] = 0
a[a == '-/-'] = 0
a

array([['JW-1', 'M', '19/12/1995', '1.82', '92.4', '119/76', '39.3'],
       ['JW-2', 'M', '11/01/1996', '1.77', '80.9', '114/73', '35.5'],
       ['JW-3', 'F', '02/10/1995', '1.68', '69.7', '124/79', '29.1'],
       ['JW-6', 'M', '06/07/1995', '1.72', '75.5', '110/60', '45.5'],
       ['JW-9', 'F', '11/12/1995', '1.78', '82.1', '115/75', '32.3'],
       ['JW-10', 'F', '07/04/1996', '1.6', '0', '0', '30.1'],
       ['JW-11', 'M', '22/08/1995', '1.72', '77.2', '97/63', '48.8'],
       ['JW-12', 'M', '23/05/1996', '1.83', '88.9', '105/70', '37.7'],
       ['JW-14', 'F', '12/01/1996', '1.56', '56.3', '108/72', '26'],
       ['JW-15', 'F', '01/06/1996', '1.64', '65', '99/67', '35.7'],
       ['JW-16', 'M', '10/09/1995', '1.63', '73', '131/84', '29.9'],
       ['JW-17', 'M', '17/02/1996', '1.67', '89.8', '101/76', '40.2'],
       ['JW-18', 'M', '31/07/1996', '1.66', '75.1', '0', '0'],
       ['JW-19', 'F', '30/10/1995', '1.59', '67.3', '103/69', '33.5'],
       ['JW-22', 'F', '09/03/1996', 

## Exercises on the data

The average heights of the **female students**:

In [67]:
a=np.loadtxt(r"eg6-a-student-data.csv", delimiter=",",skiprows=2, dtype='str' )
a
#(Subject,Gender,DOB,Height,Weight,BP,VO2max)

array([['JW-1', 'M', '19/12/1995', '1.82', '92.4', '119/76', '39.3'],
       ['JW-2', 'M', '11/01/1996', '1.77', '80.9', '114/73', '35.5'],
       ['JW-3', 'F', '02/10/1995', '1.68', '69.7', '124/79', '29.1'],
       ['JW-6', 'M', '06/07/1995', '1.72', '75.5', '110/60', '45.5'],
       ['JW-9', 'F', '11/12/1995', '1.78', '82.1', '115/75', '32.3'],
       ['JW-10', 'F', '07/04/1996', '1.6', '-', '-/-', '30.1'],
       ['JW-11', 'M', '22/08/1995', '1.72', '77.2', '97/63', '48.8'],
       ['JW-12', 'M', '23/05/1996', '1.83', '88.9', '105/70', '37.7'],
       ['JW-14', 'F', '12/01/1996', '1.56', '56.3', '108/72', '26'],
       ['JW-15', 'F', '01/06/1996', '1.64', '65', '99/67', '35.7'],
       ['JW-16', 'M', '10/09/1995', '1.63', '73', '131/84', '29.9'],
       ['JW-17', 'M', '17/02/1996', '1.67', '89.8', '101/76', '40.2'],
       ['JW-18', 'M', '31/07/1996', '1.66', '75.1', '-/-', '-'],
       ['JW-19', 'F', '30/10/1995', '1.59', '67.3', '103/69', '33.5'],
       ['JW-22', 'F', '09/03/199

In [68]:
Female_Height = a[:,3][a[:,1] == 'F'].astype(float)
Female_Height

array([1.68, 1.78, 1.6 , 1.56, 1.64, 1.59, 1.7 , 1.66, 1.63])

In [69]:
mean_F = np.mean(Female_Height)
mean_F
#Desired Output: 1.6488888888888888

1.6488888888888888

In [54]:
print('Male average: {:.2f} m, Female average: {:.2f} m'.format(mean_M,mean_F)) #to print 2 decimal places we will use str. format() with “{:. 2f}”

Male average: 1.75 m, Female average: 1.65 m


#### 1. Calculate the average weight by gender:

In [70]:
a[a == '-'] = 0
a[a == '-/-'] = 0
a

array([['JW-1', 'M', '19/12/1995', '1.82', '92.4', '119/76', '39.3'],
       ['JW-2', 'M', '11/01/1996', '1.77', '80.9', '114/73', '35.5'],
       ['JW-3', 'F', '02/10/1995', '1.68', '69.7', '124/79', '29.1'],
       ['JW-6', 'M', '06/07/1995', '1.72', '75.5', '110/60', '45.5'],
       ['JW-9', 'F', '11/12/1995', '1.78', '82.1', '115/75', '32.3'],
       ['JW-10', 'F', '07/04/1996', '1.6', '0', '0', '30.1'],
       ['JW-11', 'M', '22/08/1995', '1.72', '77.2', '97/63', '48.8'],
       ['JW-12', 'M', '23/05/1996', '1.83', '88.9', '105/70', '37.7'],
       ['JW-14', 'F', '12/01/1996', '1.56', '56.3', '108/72', '26'],
       ['JW-15', 'F', '01/06/1996', '1.64', '65', '99/67', '35.7'],
       ['JW-16', 'M', '10/09/1995', '1.63', '73', '131/84', '29.9'],
       ['JW-17', 'M', '17/02/1996', '1.67', '89.8', '101/76', '40.2'],
       ['JW-18', 'M', '31/07/1996', '1.66', '75.1', '0', '0'],
       ['JW-19', 'F', '30/10/1995', '1.59', '67.3', '103/69', '33.5'],
       ['JW-22', 'F', '09/03/1996', 

In [74]:
Men_weight= np.mean(a[:,4][a[:,1] == 'M'].astype(float))
print(Men_weight)
print ("The average weight of men is : {:.2f} m".format(Men_weight))
# The average weight of men is : 74.20 m

74.2
The average weight of men is : 74.20 m


In [75]:
Female_weight = np.mean(a[:,4][a[:,1] == 'F'].astype(float))
print(Female_weight)
print ("The average weight of Female is : {:.2f} m".format(Female_weight))
#The average weight of Female is : 52.07 m

52.06666666666667
The average weight of Female is : 52.07 m


**2. Find the value of the shortest woman and print the line where this value is located**

In [76]:
min_height_female = np.min(a[:,3][a[:,1] == 'F'].astype(float))
print(min_height_female)


1.56

In [102]:
#a[[a[:,3] == '1.56'][a[:,1] == 'F']]
a[a[:,3] == '1.56']

array([['JW-14', 'F', '12/01/1996', '1.56', '56.3', '108/72', '26']],
      dtype='<U10')

**3. Write a NumPy program to create a structured array from given student name, class, height and their data types. Now sort by class, then height if class are equal.**

*Original array:*

[(b'James', 5, 48.5 ) (b'Nail', 6, 52.5 ) (b'Paul', 5, 42.1 ) (b'Pit', 5, 40.11)]

*Sort by age, then height if class are equal:*

[(b'Pit', 5, 40.11) (b'Paul', 5, 42.1 ) (b'James', 5, 48.5 ) (b'Nail', 6, 52.5 )]

In [111]:
import numpy as np

data_type = [('name', 'S15'), ('class', int), ('height', float)]

students_details = [('James', 5, 48.5), ('Nail', 6, 52.5),('Paul', 5, 42.10), ('Pit', 5, 40.11)]

In [110]:
l = np.array(students_details, dtype = data_type)
l

array([(b'James', 5, 48.5 ), (b'Nail', 6, 52.5 ), (b'Paul', 5, 42.1 ),
       (b'Pit', 5, 40.11)],
      dtype=[('name', 'S15'), ('class', '<i8'), ('height', '<f8')])

In [116]:
np.sort(l,order = ["class", "height"])

array([(b'Pit', 5, 40.11), (b'Paul', 5, 42.1 ), (b'James', 5, 48.5 ),
       (b'Nail', 6, 52.5 )],
      dtype=[('name', 'S15'), ('class', '<i8'), ('height', '<f8')])