# Numpy and Scipy

From official Numpy site (http://www.numpy.org):
NumPy is the fundamental package for scientific computing with Python. It contains among other
things:

* a powerful N-dimensional array object,
* sophisticated (broadcasting) functions,
* `ufunc` fast array mathematical operations.
* tools for integrating C/C++ and Fortran code,
* useful linear algebra, Fourier transform, and random number capabilities.

Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases.

# Load Numpy (or install it if needed)

To use Numpy, you must import NumPy:

In [2]:
import numpy as np
print(np.__version__)

1.21.5


# How to create arrays with Numpy
## From Python lists and sequences:
There are important differences between NumPy arrays and Python lists:
* NumPy arrays have a fixed size at creation.
* NumPy arrays elements are all required to be of the same data type.
* NumPy arrays operations are performed in compiled code for performance.
* Most of today's scientific/mathematical Python-based software use NumPy arrays.
* NumPy gives us the code simplicity of Python, but the operation is speedily executed by pre-compiled C code.

In [3]:
#1D array
np.array([1,2,3])

array([1, 2, 3])

In [4]:
#2x3 Matrix
np.array([[5.0, 3.0, 7.5],[2.1, 3.4, 5.6]])

array([[5. , 3. , 7.5],
       [2.1, 3.4, 5.6]])

In [5]:
#3x3 Matrix from python sequences
np.array([(1.5,2,3), (4,5,6), (3,6,9)])

array([[1.5, 2. , 3. ],
       [4. , 5. , 6. ],
       [3. , 6. , 9. ]])

## Using NumPy filling functions:

In [6]:
#Fill a 1D array with integers from 0 to 9
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [7]:
#Fill a 1D array with ones
np.ones(5)

array([1., 1., 1., 1., 1.])

In [8]:
#Fill a 5x5 matrix with 0.0
np.zeros((5,5), dtype=float)

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

In [9]:
#Fill a 2x2 matrix with 10
np.full((2,2), 10, dtype=int)

array([[10, 10],
       [10, 10]])

## Fill an array with random values:
Module numpy.random is used to generate random values.  

Some examples of uniform random generators 

* np.random.rand tirage uniforme continu dans [0,1[
* np.random.random tirage uniforme continu dans [0,1[
* np.random.randint tirage uniforme discret dans [a, b[
* np.random.random_integers tirage uniforme discret dans [a, b]
* np.random.randn tirage gaussien dans [0,1[

In [11]:
#Fill a 1D array with 100 floats in (-1.0, 1.0)
a = np.zeros(100)
for i in range(100):
    a[i] = np.random.uniform(-1.0, 1.0)
print(a)

[-0.11304868 -0.49205398  0.18001243 -0.51153655 -0.2525514  -0.45212101
 -0.26113655  0.25862913  0.02366454  0.43255899 -0.59903664  0.71187596
 -0.00136701  0.23346718 -0.72154639  0.69028779  0.86433033  0.10252952
 -0.75350255  0.85156582 -0.79137489  0.0587244   0.52725903 -0.25375949
  0.98039114  0.86449089  0.01656651  0.61386325  0.40868934 -0.72935706
  0.85289279 -0.17650039 -0.60593558 -0.29851066 -0.59146863 -0.98730023
 -0.33448034  0.10503653 -0.40303112 -0.11490882 -0.6246959   0.3273458
  0.57031517  0.1226963  -0.50492579  0.13259097  0.24029744  0.98905893
  0.28793398 -0.64784459  0.4561299   0.43667037 -0.32181159  0.09745836
 -0.92084118 -0.46774019  0.4045925  -0.42261334  0.8630467   0.49048963
 -0.74755849  0.70656183 -0.47468961 -0.12671196  0.14041769 -0.50188961
  0.9232445   0.44731393  0.55084921  0.76861979 -0.36951591  0.63584906
  0.0220342   0.58985564  0.18500655  0.64221222  0.31347901 -0.84915324
 -0.26776487 -0.31316436 -0.19090739  0.37113721  0.

In [12]:
#Alternative solution
a = np.random.uniform(-1.0, 1.0, 100)
print(a)

[-0.9609574  -0.69141154  0.08192085 -0.9398657   0.95369395  0.82607373
 -0.04116554 -0.54399188 -0.89350495 -0.95277913 -0.17542712 -0.51257158
 -0.62745772 -0.65729472  0.52646999  0.38871212 -0.55727095  0.4680111
 -0.79479842 -0.21590158  0.18802383 -0.61828825 -0.43065274 -0.84579243
  0.45344118 -0.80453021  0.84011566 -0.8053674   0.79013796 -0.10227539
 -0.57115632  0.63654122 -0.88206279  0.954268   -0.96092587  0.88208249
 -0.66836179 -0.28801234  0.23089789  0.26536952  0.8888769  -0.33562991
  0.84191265 -0.47767102  0.15514384 -0.28835938  0.86820305 -0.17939722
  0.09258719 -0.63952972  0.96128337  0.39377865 -0.10737027 -0.97114655
  0.62397087 -0.96236756  0.84560505 -0.93711875 -0.91084848  0.29551984
 -0.27186922 -0.15951689 -0.48397224  0.11533787 -0.09960015 -0.27173188
  0.81898554 -0.88189819  0.71359563 -0.17784708  0.92470475 -0.33675574
 -0.9241938   0.38848552  0.78188265 -0.75038394 -0.43064333  0.20281024
 -0.55722321  0.15764345 -0.22640421  0.01480515 -0.

In [13]:
#Fill a 5x3 matrix with integers in (1, 10)
a = np.random.randint(1,10, [5,3])
print(a)

[[6 4 7]
 [7 3 5]
 [6 1 9]
 [2 8 6]
 [1 5 6]]


## Reshaping arrays:

In [14]:
#1D Array
a = np.arange(10)
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [15]:
#Reshape a to 2D array (2x5)
a.reshape(2, 5)

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

In [16]:
#Alternative: setting to -1 automatically decides the number of cols
a.reshape(2, -1)

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

## Change type of data in arrays:

In [17]:
#Transform data in 1D array as integers
b = a.astype(int)
b

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [18]:
#Transform data from 2nd column of a 2D array to float values
c = np.random.randint(1,10, [5,3])
print(c)
column1 = c[:,1].astype(float)
print(column1)

[[9 5 6]
 [3 8 3]
 [9 6 9]
 [8 8 2]
 [5 9 6]]
[5. 8. 6. 8. 9.]


# Indexes and selections

In [19]:
#1D Arrays of 20 integers
a = np.random.randint(1,10, 20)
print(a)

[1 2 9 7 7 9 6 2 4 8 4 4 1 1 7 9 7 8 1 1]


In [20]:
#Print 3rd value from a
print(a[3])

7


In [21]:
#Print 3rd to 7th value of a
print(a[3:8])

[7 7 9 6 2]


In [22]:
#Select values > 5 in a
a[a>5]

array([9, 7, 7, 9, 6, 8, 7, 9, 7, 8])

In [23]:
#indexes of values in [5, 9]
indexes = np.where((a>=5) & (a<=9))
a[indexes]

array([9, 7, 7, 9, 6, 8, 7, 9, 7, 8])

In [24]:
#5x3 Matrix of integers
m = np.random.randint(1,10, [5,3])
print(m)

[[5 5 2]
 [5 4 4]
 [5 8 2]
 [5 8 8]
 [1 5 7]]


In [25]:
#Print 1st row
print(m[0])

[5 5 2]


In [26]:
#Print last row
print(m[-1])

[1 5 7]


In [27]:
#Print 2nd column
print(m[ : ,1])

[5 4 8 8 5]


In [28]:
#Print 2nd to 3rd row, 1st and 2nd column
print(m[1:3, 0:2])

[[5 4]
 [5 8]]


# Basic operations
## Arithmetic operations are performed on every elements of the array.

In [29]:
#1D Arrays
a = np.array([10,20,30])
b = np.array([1,2,3])
print(a)
print(b)

[10 20 30]
[1 2 3]


In [30]:
#Scalars
#addition
a + 1

array([11, 21, 31])

In [31]:
#Multiplication
a * 3

array([30, 60, 90])

In [32]:
#With arrays
#sum (element to element)
a + b

array([11, 22, 33])

In [33]:
#Matrices
m = np.array([(10,20),(30,40)])
n = np.array([(1,2),(3,4)])
print(m)
print(n)

[[10 20]
 [30 40]]
[[1 2]
 [3 4]]


In [34]:
#sum (element to element)
m + n

array([[11, 22],
       [33, 44]])

In [35]:
#product (element to element)
m * n

array([[ 10,  40],
       [ 90, 160]])

In [36]:
#matrix product
m.dot(n)

array([[ 70, 100],
       [150, 220]])

## Operations with arrays

In [37]:
#1D Arrays
a = np.random.randint(1,10, 20)
b = np.random.randint(1,10, 20)
print(a)
print(b)

[2 7 6 8 1 6 3 6 5 5 8 7 9 4 8 7 6 6 2 8]
[6 8 5 2 1 8 8 1 1 5 2 3 7 5 2 4 9 1 9 8]


In [38]:
#Common items in a and b
np.intersect1d(a,b)

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [39]:
#Remove items from a that are in b
np.setdiff1d(a,b)

array([], dtype=int64)

In [40]:
#Indexes of matching elements in a and b
indexes = np.where(a == b)
indexes

(array([ 4,  9, 19]),)

In [41]:
# Another use of the np.where: produce a new array of the same size, with␣
, → valuesdepending on a condition
# Example: replace in a all the NAN value by -100 (the other values being␣
, → unchanged)
np.where(np.isnan(a),-100,a)

array([2, 7, 6, 8, 1, 6, 3, 6, 5, 5, 8, 7, 9, 4, 8, 7, 6, 6, 2, 8])

# Exercise 1
1. Create 1D array t1 of 100 float values. Fill this array with random values in [-1.0, 1.0].
2. Calculate mean, median and standard deviation of t1.
3. Create arrays t2 and t3 that contains respectively positive and negative values from t1.
4. Create t4 by removing negatives values from t1.
5. Create t5 by replacing values in t1 that are greater than 0.5 with 1.0.
6. Find max and min values of t1
7. Reshape t1 as m1, 25x4 matrix
8. Compute mean values for each row then for each column
9. Replace randomly values in m1 with np.nan (create m2). Make sure there is at least one NaN in each column.
10. Count NaN values in m2 and show indexes of NaNs
11. Compute mean value without NaNs of 1st column of m2. Replace NaNS in 1st column with mean value.
12. Create a function removeNaNs that, given a matrix mat and a column c, replaces NaNS in c with mean value of c.
13. Apply removeNaNs for each column of m2.
14. Find max value for each column of m1.
15. Find min value for each row of m1.

# Exercise 2

1. Create 10x6 2D array m filled with random integers in [0, 10].
2. Create the function that computes the counts of unique values row-wise. Output is 10x11 matrix where columns represent numbers from 0 to 10. The values are the counts of the numbers in the respective rows. (cf. numpy.bincount function)

# Data analysis with NumPy
NumPy implements usual data analysis functions such as sum, mean, median and standard devia-
tion:

In [42]:
#1D Array
a = np.random.uniform(0.0,1.0, 1000)
print(a)

[7.89350922e-01 2.40949338e-01 6.13839194e-01 1.53991414e-01
 9.18970884e-01 6.03508264e-01 3.38368471e-01 7.19876897e-01
 2.38844229e-01 7.72759701e-01 5.63735348e-01 2.91329686e-01
 7.42668448e-01 1.44729747e-01 4.83592626e-01 1.33990205e-01
 5.34695837e-01 5.55516907e-01 1.69627453e-01 2.66926643e-01
 3.08767779e-01 5.89852925e-01 3.05516408e-01 2.22476414e-01
 3.47346213e-01 5.44243569e-01 8.88144747e-01 3.88417663e-01
 4.01826002e-01 1.96456145e-01 7.13387533e-02 2.32685858e-01
 2.33277732e-01 6.81256462e-01 1.77578442e-01 4.10341415e-01
 3.06227625e-01 6.41271921e-01 1.69102072e-01 5.71509834e-01
 3.58976727e-01 8.10497401e-01 2.63790065e-01 5.78207540e-01
 5.45761212e-02 1.19258148e-01 6.26274134e-01 8.55829659e-01
 7.06947384e-01 7.43382892e-01 6.67780865e-01 4.61138306e-01
 6.41859072e-01 3.81170396e-01 8.28400610e-01 3.99561633e-01
 2.18112659e-01 1.78956211e-01 9.67660570e-01 5.23335460e-01
 4.63892978e-01 5.21854538e-01 4.95165010e-01 2.88990553e-02
 2.81397022e-01 8.252333

In [43]:
#sum
np.sum(a)

497.2418355542215

In [44]:
#mean
np.mean(a)

0.4972418355542215

In [45]:
#median
np.median(a)

0.49312201601103695

In [46]:
#standard deviation
np.std(a)

0.28577480968666585

In [47]:
#mean and std ignoring NaNs
print(np.nanmean(a))
print(np.nanstd(a))

0.4972418355542215
0.28577480968666585


To detect NaN values in arrays, you can use the isnan function.

In [48]:
a = np.array([10,20,np.nan,40,50,np.nan,np.nan,80,100])
b = np.array([(10,np.nan),(np.nan,40),(50,60) ,(np.nan,70)])
print(a)
print(b)

[ 10.  20.  nan  40.  50.  nan  nan  80. 100.]
[[10. nan]
 [nan 40.]
 [50. 60.]
 [nan 70.]]


In [49]:
#Check one value in a
np.isnan(a[3])

False

In [50]:
#Test all elements in a
np.isnan(a)

array([False, False,  True, False, False,  True,  True, False, False])

In [51]:
#Check 2nd column in a matrix b
np.isnan(b[:,1])

array([ True, False, False, False])

In [52]:
#indexes of NaN values in the 2nd column of b
np.where(np.isnan(b[:,1]))

(array([0]),)

# Import data from files

np.genfromtxt function allows to import data form file in numpy arrays.

In [None]:
#import data as 1D array of tuples containing different data types
data = np.genfromtxt('file_name.csv', delimiter=',', dtype=None)

In [None]:
#import data as 2D array each line containing different data types
data = np.genfromtxt('file_name.csv', delimiter=',', dtype=object)

In [None]:
#import data selecting columns 2, 4, 5 only
data = np.genfromtxt('file_name.csv', delimiter=',', usecols=(2,4,5))

In [None]:
#import data as 2D array of same data type, removing headers
data = np.genfromtxt('file_name.csv', delimiter=',', skip_header=1)

# Exercise 2
1. Import without headers airquality dataset from the file ‘airquality.csv’ from Moodle.
2. Remove 1st column of imported data.
3. Count NaN values in each column.
4. Extract data from 2nd column without NaN values.
5. For each column, replace NaNs with mean value.
6. Write a function that returns ozone value according to selected month and day.

# Exercise 3
1. Import the iris dataset keeping the text intact as a 2D array names irisdata (to keep the text intact, choose the “good” datatype). Url: https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data The columns are sepal length, sepal width, petal length, petal width and species.
2. Extract from irisdata the species column as an array of string named species
3. Extract from irisdata sepal length column as an array of float values
4. Change type of irisdata 1st to 4th column to float and last one to string.
5. Find the mean, median, standard deviation of iris’s sepal length (1st column)
6. Normalize values of iris’s petal length.
7. Create function that normalize values in any column of an array.
8. Create function softmax that computes the softmax function (or normalized exponential function) of any column of an array.
9. Add to irisdata a new column for volumes.

Volume = Pi/3 * petalLength * sepalLength*sepalLength

10. Computes mean and standard deviation values of volumes according to species.*
