# Popular modules in Python 🐍
Since Python has a lot of in-build packages and modules, we will go deeper in it within this exercise.

## Python basics hands on
### numpy
![](https://upload.wikimedia.org/wikipedia/commons/thumb/3/31/NumPy_logo_2020.svg/1024px-NumPy_logo_2020.svg.png)

[Numpy](https://numpy.org/) is a open source library mainly developed for homogeneous multidimensional arrays. 

Take a look into the [introduction](https://numpy.org/devdocs/user/quickstart.html) to get fimilar with numpy! 
In numpy, dimensions are called _axis_, which will be important later. 

To use the library, you have to import it. Using the *as np* ensures, that you can call all functions using the shorthand *np*.

In [None]:
import numpy as np

#### Numpy basics

In [None]:
point_a = [1,2,3]  # create a 3x1 list
one_d_point = np.array(point_a)  # Cast it to np.array-datatype

In [None]:
# Create a point containts two single point 
two_d_point = np.array([one_d_point, [-1,-2.5,3]])
print(two_d_point)

An ndarray is defined by the number of dimensions, the size of each dimension and the type of data it holds. Check the number and size of dimensions of an ndarray with the shape attribute:
    

In [None]:
two_d_point.shape

The *two_d_point* is two dimensional with 3 items in each dimension. The overall size is obtained by using *size*.

In [None]:
two_d_point.size

To display the data type within the numpy array, you can use the *dtype* method:

In [None]:
two_d_point.dtype

Numpy has a lot of in-build functions to create arrays in often used shapes such as ones, eye, ... 

In [None]:
np.identity(3)  # Identitiy, I or often E

In [None]:
np.eye(N=3,  # Number of rows
       M=4,  # Number of columns
       k=1,  # Start index of the diagonal
       dtype=int)  # define data type

In [None]:
np.ones(10)  # array of length 10 with 1 column, consists of ones

In [None]:
np.zeros([2,3])  # array of length 3 with 2 column, consists of zeros

#### Indexing and Slicing
Indexing numpy arrays are identical to indexing and accessing lists

In [None]:
my_array = np.arange(1,6)
my_array[2]  # Get my_array-value at second index

In [None]:
my_array[2:]  # Get my_array-values at second index till the end. This is called slicing

In [None]:
my_array[::-1]  # Slice backwards

The same methodology is applied to higher dimensions

In [None]:
my_two_d_array = np.array([my_array, my_array + 5, my_array + 10])
print(my_two_d_array)

In [None]:
my_two_d_array[1, 3]  # row index 1, column index 3

In [None]:
my_two_d_array[1:, 3:]  # Slicing is also in higher dimensions available

#### Reshaping
Reshaping is often used to bring the data from form a into form b. The data is identical, only the arrangement differs.

In [None]:
np.reshape(a=my_two_d_array[1:, 3:], # Input array to reshape
           newshape=(1,4))  # output shape

In [None]:
my_two_d_array.flatten()  # Brings an multidimensional array into shape of 1d
my_two_d_array.ravel()  # Brings an multidimensional array into shape of 1d (without copy them)
assert np.allclose(my_two_d_array.flatten(), my_two_d_array.ravel())

In [None]:
my_two_d_array.T  # Transpose the array often used

In [None]:
np.transpose(my_two_d_array)  # Transpose the array often used

In [None]:
np.flipud(my_two_d_array)  # Flip the array in row-direction

In [None]:
np.fliplr(my_two_d_array)  # Flip the array in column-direction

In [None]:
np.concatenate((my_two_d_array, np.array([[10,20,30],[40,50,60],[70,80,90]])),  # Arrays to join
               axis=1) 

#### Array operations

In [None]:
my_two_d_array + 100  # Add 100 elementwise
np.add(my_two_d_array, 100) # Also adds 100 elementwise to array

In [None]:
my_two_d_array - 100  # Subtract 100 elementwise
np.subtract(my_two_d_array, 100) # Also subtracts 100 elementwise to array

In [None]:
my_two_d_array * 5  # Multiplies array by 5, elementwise
np.multiply(my_two_d_array, 5) # Also multiplies elementwise

In [None]:
my_two_d_array * my_two_d_array

In [None]:
my_two_d_array @ my_two_d_array.T

In [None]:
my_two_d_array ** 2  # Squares elementwise
np.power(my_two_d_array, 2) # Also squares elementwise

In [None]:
my_two_d_array / 5  # Divides array by 5, elementwise
np.divide(my_two_d_array, 5) # Also divides elementwise

In [None]:
my_two_d_array.mean()  # Determining the mean
np.mean(my_two_d_array)

In [None]:
np.mean(my_two_d_array, axis=1)  # Determines the mean row-wise

In [None]:
np.std(my_two_d_array)  # Determines the standard deviation

There are much more in-build functions available such as sum, log, dot, ... 

### Pandas
![](https://upload.wikimedia.org/wikipedia/commons/thumb/e/ed/Pandas_logo.svg/1024px-Pandas_logo.svg.png)

[Pandas](https://pandas.pydata.org/) is a open source library mainly developed for data structures and operators for accessing numerical tables and time series.

Take a look into the [introduction](https://pandas.pydata.org/docs/user_guide/10min.html) to get fimilar with pandas!

To use the library, you have to import it. Using the *as pd* ensures, that you can call all functions using the shorthand *pd*.

In [None]:
import pandas as pd

#### 1.3.2.1 Pandas Series
Pandas Series are similar to numpy ndarrays. The main difference is, that you can use custom index labels and apply operations based on that. 

In [None]:
my_series = pd.Series(data = [2,3,5,4],             # Data
                      index= ['a', 'b', 'c', 'd'])  # Indexes
my_series

You can simply convert a dictionary into pd.Series

In [None]:
my_series_from_dict = pd.Series({'x': 2, 'a': 4, 'y':4.01, 'µ':45})

Accessing the items of the series is similar to an dict

In [None]:
my_series_from_dict['a']

In [None]:
my_series_from_dict[-1]  # Numeric values also easy to use

In [None]:
my_series[1:]  # Slicing is also possible

You can use numpy functions directly on pandas Series.

In [None]:
np.mean(my_series)

#### Pandas DataFrame
A Pandas DataFrame is a two-dimensional table with labeled columns that can hold each totally different data such as strings, lists, scalars, ... . 
pd.DataFrame are very similar to SQL database. You can image it as in-memory-database.

In [None]:
test_data = {"name" : ["Georg","Donald","Siegfried"],
           "age" : np.array([60,65,24]),
           "weight" : (75, 123, 101),
           "height" : pd.Series([1.81, 1.95, 1.47], index=["Georg","Donald","Siegfried"]),
           "siblings" : 1,
           "gender" : "M"}

In [None]:
df = pd.DataFrame(test_data)  # Convert the dictionary to DataFrame

In [None]:
df.head(1)

Using pd.Series with index will result in an automatically given index inside the DataFrame. If we do not use index in the above example, we get the index in an ordered way. 

In [None]:
test_data_wo_series = {"name" : ["Georg","Donald","Siegfried"],
           "age" : np.array([60,65,24]),
           "weight" : (75, 123, 101),
           "height" : [1.81, 1.95, 1.47],
           "siblings" : 1,
           "gender" : "M"}

In [None]:
df = pd.DataFrame(test_data_wo_series)
df

You can also provide custom row labels. This makes it much easier to sort and find the data you are looking for

In [None]:
df2 = pd.DataFrame(test_data_wo_series,
                   index = test_data["name"] )

df2

##### Dealing with DataFrame content
A DataFrame behaves like a dictionary of Series and thus, you can use a key to get the data. An alternative is the so-called dot-operator.

In [None]:
df['weight']

In [None]:
df.weight

To get the values without index just append .values 

In [None]:
df.weight.values.tolist()

You can add columns if they are the same length. Just adding values without further information require the identical length. Just parsing list of 2 elements will result in an error. 

In [None]:
df2["IQ"] = [105, 26, 115]

In [None]:
df2

When inersting Series into DataFrame, unmatched values are filled with NaN (compare left join in SQL)

In [None]:
df2["Zip code"] = pd.Series(['87435', '87437'], index=["Georg", 'Siegfried'])

df2

In [None]:
df2.loc["Donald","IQ"]  # loc = location, using string as key

In [None]:
df2.iloc[1,6]  # iloc = index location, using int/index as key

Selecting rowa by boolean index is often used. Prepare and provide a boolean index obtained from different conditions is often usefull and comes from an good structured algorithmn design. 

In [None]:
boolean_index = [False, True, True]  

df2[boolean_index] 

In [None]:
boolean_index = df2["age"] > 25

In [None]:
df2[boolean_index]

In [None]:
df2[(df2["age"] > 25) & (df2["IQ"] > 50)]