<div >
<h1> <a href="https://docs.python.org/3/tutorial/modules.html">Modules</a>:</h1>
</div>

<img src="https://python-tricks.com/wp-content/uploads/2019/09/Modules-in-Python-450x338.jpg">

In Python, like in other modern programming languages, it is common to use pre-written useful functions and groups of functions. In Python's syntax, the `import` command loads external functions, packages or modules.

We call every file written in Python module, that contains classes or functions written in Python. It is usually a file with a `.py` extension. If there are many functions in a big package, it is usually split amongst many files, or submodules. In this notebook, we are going to get familiar with some useful functions and data structures of the `numpy`, `matplotlib` and `csv` modules. Before we can use them, we have to load them. Let us observe an example!

In [5]:
dir()

['In',
 'Out',
 '_',
 '_1',
 '_3',
 '__',
 '___',
 '__builtin__',
 '__builtins__',
 '__doc__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 '_dh',
 '_i',
 '_i1',
 '_i2',
 '_i3',
 '_i4',
 '_i5',
 '_ih',
 '_ii',
 '_iii',
 '_oh',
 'd',
 'exit',
 'get_ipython',
 'numpy',
 'quit']

In [4]:
import numpy

The above command loaded the module `numpy`, which means that all functions and data structures defined in this module became available for use. The `numpy` module contains many common mathematical functions. Let us calculate for example $\sin(3)$!

In [31]:
type(numpy)

module

In [32]:
numpy.sin(3)

0.1411200080598672

As we've seen in the above example, we have to use the functions of a loaded module according to the following syntax:

>```python
module_name.function_name(...)
```

A module may not only contain functions, but also pre-defined variables. For example the `numpy` module defines the numbers $e$ and $\pi$ frequently used in mathematics and physics. It is not surprising, that we can obtain their values in the following way:

In [33]:
numpy.e

2.718281828459045

In [34]:
numpy.pi

3.141592653589793

Big modules may have several submodules. We refer to the submodules with the following syntax:

>```python
module_name.submodule_name.function_name(...)
```

For example, the functions generating random numbers in `numpy` are defined as functions of the `random` submodule. The below code cell generates a random number between $0$ and $99$, and uses the `randint` function from the `random` submodule of the `numpy` module.

In [8]:
numpy.random.randint(100)

68

It may occur, that we would like to import only certain functions of a module. It is possible by using the
 
>```python
from module_name import first_function, other_function
```

construction. For example, if we only want to use the $\sin$ and $\cos$ funtions from the `numpy` module, then we can do it in the following way:

In [36]:
from numpy import sin,cos

Then the functions $\sin$ and $\cos$ are callable without using the module name `numpy`. 

In [37]:
sin(23)**2+cos(23)**2

1.0

We can import all functions and variables from a module.

In [10]:
from numpy import *

Then, we can refer to the variable `pi` storing the value of $\pi$ without writing `numpy` first.

In [39]:
pi

3.141592653589793

In what follows in this course - apart from some special cases - it is advised that every notebook should begin with the imports of the most frequently used modules. In Jupyter Notebooks, we can do it with the following command (cell magic):

In [40]:
%pylab inline

%pylab is deprecated, use %matplotlib inline and import the required libraries.
Populating the interactive namespace from numpy and matplotlib


The above cell magic contains the following imports:


>```python
>import numpy
>import matplotlib
>from matplotlib import pylab, mlab, pyplot
>np = numpy
>plt = pyplot
>
>from IPython.display import display
>from IPython.core.pylabtools import figsize, getfigs
>
>from pylab import *
>from numpy import *
>```

Thus, apart form the numeric module `numpy`, it loads the `matplotlib` module that is used for creating figures.

# A new data structure: `array`

The `numpy` module uses a data structure that is a bit similar to `list`. This new data structure is called `array`. Let us go through certain properties of `array`s. ([Here](https://scipy-lectures.github.io/intro/numpy/array_object.html) you can find a more extensive description.)

Similarly to `list`s, we can define them by enumerating numbers or other objects.

In [11]:
vec=array([1,2,3])
vec

array([1, 2, 3])

In [12]:
type(vec)

numpy.ndarray

In [42]:
matr=array([[1,1,3],[4,3,5],[6,2,3]])
matr

array([[1, 1, 3],
       [4, 3, 5],
       [6, 2, 3]])

In [43]:
b=array(['a','b','cd'])
b

array(['a', 'b', 'cd'], dtype='<U2')

In [44]:
c=array([1,2,3,'sd',])
c

array(['1', '2', '3', 'sd'], dtype='<U21')

In the last examples, the keyword `dtype=` refer to the type of elements grouped in that array. The `dtype` of the array is always the most general one that fits every element. In the variables `vec` and `matr` there are scalars, in `b` and `c` there are character strings.

## Useful array operations

We can get information on the shape of an array:

In [45]:
vec.shape

(3,)

In [46]:
matr.shape

(3, 3)

The `size` attribute displays the number of elements.

In [47]:
matr.size

9

The functions `min` (smallest element), `max` (biggest element), `std` (standard deviation) and `mean` help in the statistical analysis of sequences of numbers.

In [48]:
vec.min()

1

In [49]:
vec.max()

3

In [50]:
vec.mean()

2.0

In [51]:
vec.std()

0.816496580927726

If the array is not one dimensional, for example the variable `matr` before, then we can run the above functions for all columns or rows:

In [52]:
matr.mean(axis=0) # columnwise mean

array([3.66666667, 2.        , 3.66666667])

In [53]:
matr.mean(axis=1) # rowwise mean

array([1.66666667, 4.        , 3.66666667])

The transpose of `matr`:

In [54]:
matr.T

array([[1, 4, 6],
       [1, 3, 2],
       [3, 5, 3]])

We can get the sum or product of all elements of an `array`:

In [55]:
vec.sum() # A vec változó összes elemének az összege

6

In [56]:
matr.prod() # A matr változó összes elemének szorzata

6480

But we have to be careful, because not all methods are applicable to all arrays! If we want to get the product of a matrix that contains strings, then we get an error.

In [57]:
matrS=array([['a','b'],['c','d']])

In [58]:
# matrS.prod()

Some functions help in generating `array`s of predefined structure. Let us see some simple exaples. The `linspace()` function creates an arithmetic series between two given numbers with a given number of elements. 

In [59]:
linspace(0,pi,10) # 10 numbers between 0 and pi

array([0.        , 0.34906585, 0.6981317 , 1.04719755, 1.3962634 ,
       1.74532925, 2.0943951 , 2.44346095, 2.7925268 , 3.14159265])

The functions `rand()`, `randn()` and `randint()` generate arrays of random numbers:

In [60]:
rand() # a random number between 0 and 1

0.5247682904739168

In [61]:
randn(3) # 3 random numbers from a N(0,1) normal distribution

array([-0.85016896,  0.47230051, -0.15969936])

In [62]:
randint(0,9,(2,3)) # a 2x3 matrix containing random integers between 0 and 9

array([[3, 1, 6],
       [6, 2, 3]])

We have to note that some basic operations such as summation, subtraction, multiplication and division, and some basic functions such as sin, cos or exp have an elementwise effect on arrays.

In [63]:
v1=array([1,2,3])
v2=array([2,3,3])
v1*v2

array([2, 6, 9])

In [64]:
sin(v1)

array([0.84147098, 0.90929743, 0.14112001])

Summing boolean arrays corresponds to an elementwise OR, taking their product to an elementwise AND operation.

In [65]:
b1=array([True,False,True,False])
b2=array([False,False,True,True])

In [66]:
b1+b2 

array([ True, False,  True,  True])

In [67]:
b1*b2

array([False, False,  True, False])

## Indexing arrays

The most important property of arrays is that their elements can be accessed by a very rich indexing structure.  First, let us define some variables.

In [68]:
foo1=linspace(0,10,10)  # 10 numbers equally spaces between 0 and 10
foo2=rand(10) # 10 random numbers
foo3=randint(0,10,(5,5)) #5x5 random matrix

The slices already used for lists work here as well.

In [69]:
foo1[0:3]

array([0.        , 1.11111111, 2.22222222])

In [70]:
foo1[-4:-1]

array([6.66666667, 7.77777778, 8.88888889])

But we can give an arbitrary index list:

In [71]:
foo1[[3,5,2]]

array([3.33333333, 5.55555556, 2.22222222])

Another very useful feature is to use boolean arrays for accessing elemets fulfilling certain conditions.

In [72]:
foo1>5 # this expression returns a boolean array

array([False, False, False, False, False,  True,  True,  True,  True,
        True])

If we use the above boolean array as an index to `foo1`, then we get an `array` that only contains those elements, for which the boolean value has been `True`.

In [73]:
foo1[foo1>5]

array([ 5.55555556,  6.66666667,  7.77777778,  8.88888889, 10.        ])

When analyzing databases, it is very useful to look for some elements that have certain properties in the database. These tasks can be translated to selecting the elements of an array according to the elements of another array. What are those elements of `foo1` where the corresponding elements from `foo2` are greater than 0.25?

In [74]:
foo1[foo2>0.25]

array([0.        , 5.55555556, 6.66666667, 7.77777778, 8.88888889])

Here are some graphic examples for indexing multidimensional arrays:
<table style="width:100%;border: 0px solid #ffffff">
<tr style="border: 0px solid #ffffff">
<td style="border: 0px solid #ffffff">
<img src="https://scipy-lectures.github.io/_images/numpy_indexing.png" width=400> 
</td>
<td style="border: 0px solid #ffffff">
<img src="http://scipy-lectures.github.io/_images/numpy_fancy_indexing.png" width=400>
</td>
</tr>
</table>