# HSE 2022: Mathematical Methods for Data Analysis

## Homework 1

### Attention!
* For tasks where <ins>text answer</ins> is required **Russian language** is **allowed**.
* If a task asks you to describe something (make coclusions) then **text answer** is **mandatory** and **is** part of the task
* **Do not** upload the dataset (titanic.csv) to the grading system (we already have it)
* We **only** accept **ipynb** notebooks. If you use Google Colab then you'll have to download the notebook before passing the homework
* **Do not** use python loops instead of NumPy vector operations over NumPy vectors - it significantly decreases performance (see why https://blog.paperspace.com/numpy-optimization-vectorization-and-broadcasting/), will be punished with -0.25 for **every** task. 
Loops are only allowed in part 1 (Tasks 1 - 4). 
* Some tasks contain tests. They only test you solution on a simple example, thus, passing the test does **not** guarantee you the full grade for the task. 

In [None]:
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# 1. Python (2 points) 

**Task 1** (0.5 points)<br/> 
Enter nonzero numbers `a`,  `r` and `N`. Find the first `N` members of geometric progression with the first member `a` and factor `r`, without using formula for the product.

In [None]:
def find_product(a,r,N):
    arr_product = []
    for _ in range(N):
        arr_product.append(a)
        new = 0
        for _ in range(r):
            new += a
        a = new
    return arr_product

a = 2
r = 3
N = 3
print(find_product(a,r,N))

[2, 6, 18]


**Task 2** (0.5 points) <br/> 
Enter an integer number `N`. Check if it is a palindrome number. It means that it can be read equally from left to right and from right to back. 

In [None]:
def check_palindrome(N):
    return str(N) == str(N)[::-1]

for N in [3, 81, 111, 113, 810, 2022, 4774, 51315, 611816]:
    print(N, check_palindrome(N))

3 True
81 False
111 True
113 False
810 False
2022 False
4774 True
51315 True
611816 False


**Task 3** (0.5 points) <br/> 
Find the first `N` palindrome numbers starting from 1000 (you may use the function from the precious task).

In [None]:
def find_palindromes(N):
    palindromes = []
    number = 1000
    while len(palindromes) < N:
        if check_palindrome(number):
            palindromes.append(number)
        number += 1
    return palindromes

print(find_palindromes(5))
print(find_palindromes(10))
print(find_palindromes(20))

[1001, 1111, 1221, 1331, 1441]
[1001, 1111, 1221, 1331, 1441, 1551, 1661, 1771, 1881, 1991]
[1001, 1111, 1221, 1331, 1441, 1551, 1661, 1771, 1881, 1991, 2002, 2112, 2222, 2332, 2442, 2552, 2662, 2772, 2882, 2992]


**Task 4** (0.5 points) <br/> 
There are numbers: `a`, `b`, `c`. Without using functions `min`, `max` and other functions, find the minimum number.

In [None]:
from random import randint
def find_min(a, b, c):
    if a <= b and a <= c:
        return a
    if b <= c:
        return b
    return c

for i in range(10):
    
    a = randint(-100, 100)
    b = randint(-100, 100)
    c = randint(-100, 100)
    
    print(a, b, c, '\tMinimum:', find_min(a, b, c))

-85 -19 66 	Minimum: -85
23 -16 -68 	Minimum: -68
25 61 46 	Minimum: 25
-4 63 -17 	Minimum: -17
63 39 79 	Minimum: 39
84 72 13 	Minimum: 13
-97 14 34 	Minimum: -97
-70 -87 66 	Minimum: -87
-18 -14 90 	Minimum: -18
42 19 -40 	Minimum: -40


# 2. Numpy (4 points)

**Task 1** (0.5 points) <br/>
Create a random array (`np.random.rand()`) with length of 17 and with sum of its elements equals to 6.

In [None]:
my_array = np.random.rand(17)
my_array *= 6 / np.sum(my_array)
print(f'Length: {len(my_array)}')
print(f'Sum of elements: {np.sum(my_array)}')

Length: 17
Sum of elements: 5.999999999999997


**Task 2** (0.5 points) <br/>
Create two random arrays $a$ and $b$ with the same length. 

Calculate the following distances between the arrays **without using special funcrion. You may only use basic numpy operations (`np.linalg.*` and other high-level ones are prohibited).**:

* Manhattan Distance
$$ d(a, b) = \sum_i |a_i - b_i| $$
* Euclidean Distance
$$ d(a, b) = \sqrt{\sum_i (a_i - b_i)^2} $$
* Chebyshev Distance
$$ d(a, b) = \underset{i}{max} |a_i - b_i| $$
* Cosine Distance
$$ d(a, b) = 1 - \frac{a^\top b}{||a||_2\cdot||b||_2} $$


In [None]:
def calculate_manhattan(a, b):    
    return np.sum(np.abs(a - b))

def calculate_euclidean(a, b):    
    return np.sqrt(np.sum(np.power(a - b, 2)))

def calculate_chebyshev(a, b):
    return np.max(np.abs(a - b))

def calculate_cosine(a, b):    
    return 1 - np.dot(a, b) / (double_abs(a) * double_abs(b))

def double_abs(a):
    return np.sqrt(np.sum(np.power(a, 2)))

In [None]:
a = np.random.rand(10)
b = np.random.rand(10)
print(f'Manhattan distance: {calculate_manhattan(a, b)}')
print(f'Euclidean distance: {calculate_euclidean(a, b)}')
print(f'Chebyshev distance: {calculate_chebyshev(a, b)}')
print(f'Cosine distance: {calculate_cosine(a, b)}')

Manhattan distance: 4.124815606464514
Euclidean distance: 1.4794866024438351
Chebyshev distance: 0.9055952936470761
Cosine distance: 0.37167513572155575


**Task 3** (0.5 points) <br/>
Create a random array (`np.random.randint()`) with length of 76. Transform the array so that 
* Maximum element(s) value is -1
* Minimum element(s) value is -4
* Other values are in interval (-4; -1) with keeping the order

In [None]:
def transform(array):
    transformed_array = array
    transformed_array -= np.min(array)
    transformed_array /= np.max(array)
    transformed_array *= 3
    transformed_array -= 4
    return transformed_array

In [None]:
my_array = np.random.rand(76)
my_array = transform(my_array)
print(f'Min: {np.min(my_array)}')
print(f'Max: {np.max(my_array)}')
print('Array:')
print(my_array)

Min: -4.0
Max: -1.0
Array:
[-2.18646721 -2.48779018 -2.25384466 -2.21290772 -3.44165685 -1.43014116
 -2.96077293 -4.         -3.97276712 -2.7606755  -2.98893385 -2.21559847
 -1.30286186 -1.05565004 -1.5884619  -3.95015616 -2.68381072 -2.85823972
 -1.74078918 -2.66199974 -2.00477448 -2.42641279 -2.15574247 -1.87296148
 -3.23835846 -3.45056248 -2.38014347 -1.93932369 -3.65167789 -2.24971755
 -1.03543313 -3.93333624 -2.6253491  -2.60546579 -3.18788198 -3.74247165
 -2.71857194 -1.71326115 -3.57557606 -2.32341127 -2.90132769 -2.87375018
 -1.30371194 -1.80670834 -1.41412341 -2.24630531 -1.94298715 -3.31064062
 -3.92002819 -1.         -3.03997939 -3.36010892 -1.65055207 -3.41854866
 -2.10672993 -3.2957267  -3.70235209 -2.52614266 -3.54830772 -2.38547207
 -2.86829185 -1.59595301 -3.44634618 -3.49488169 -3.69745414 -1.8310006
 -3.14231245 -1.34351757 -2.49051736 -3.69156079 -3.04816506 -3.32167213
 -2.56261408 -1.97801532 -2.01923459 -3.78893952]


**Task 4** (0.5 points) <br/>
Create an array with shape of $8 \times 5$ with integers from [-7, 43]. Print a column that contains the maximum element of the array.

In [None]:
my_array = np.random.randint(-7, 44, (8, 5))
selected_column = my_array[:, np.unravel_index(np.argmax(my_array), np.shape(my_array))[1]]
print('Shape: ', my_array.shape)
print('Array:')
print(my_array)
print(f'Selected column: {selected_column}')

Shape:  (8, 5)
Array:
[[12  3  0 -5 29]
 [19 22 24 15  8]
 [ 4  0 22 18 13]
 [28  1 36 16 11]
 [32 36 20 -3 17]
 [-6 12 -6 29  2]
 [30 30 39 29 -7]
 [39  6  0 21 12]]
Selected column: [ 0 24 22 36 20 -6 39  0]


**Task 5** (0.5 points) <br/>

Replace all missing values in the following array with median.

In [None]:
arr = np.random.rand(10)
idx = np.random.randint(0, 10, 4)
arr[idx] = np.nan

print('Array:')
print(arr)

Array:
[0.1665686         nan 0.00429075 0.30539656 0.81266437 0.22367444
        nan 0.71798281        nan        nan]


In [None]:
def replace_missing(arr):
    ans = np.copy(arr)
    median = np.median(ans[np.where(np.isnan(ans) == False)])
    ans[np.where(np.isnan(ans) == True)] = median
    return ans

In [None]:
arr = replace_missing(arr)
print('Array with no missing values:')
print(arr)

Array with no missing values:
[0.1665686  0.2645355  0.00429075 0.30539656 0.81266437 0.22367444
 0.2645355  0.71798281 0.2645355  0.2645355 ]


**Task 6** (0.5 points) <br/>
Create a function which takes an image ```X``` (3D array of the shape (n, m, 3)) as an input and returns the mean for all 3 channels (a vector of shape 3).

In [None]:
def mean_channel(X):
    return np.array([np.mean(X[:,:,0]), np.mean(X[:,:,1]), np.mean(X[:,:,2])])

In [None]:
n = 19
m = 23
X = np.random.randint(-11, 8, size=(n, m, 3))
print(f'Vector of means: {mean_channel(X)}')

Vector of means: [-2.18535469 -2.37070938 -1.48741419]


### **Task 7** (1 points) <br/>
Create a function which takes a 3D matrix ```X``` as an input and returns all its unique vertical (first axis) layers.

Sample input:

     ([[[1, 2, 3],
        [1, 2, 3],
        [1, 2, 3]],

       [[4, 5, 6],
        [4, 5, 7]
        [4, 5, 6]],

       [[7, 8, 9],
        [7, 8, 9]
        [7, 8, 9]]])
        
Sample output:

     ([[[1, 2, 3],
        [1, 2, 3]],

       [[4, 5, 6],
        [4, 5, 7]],

       [[7, 8, 9],
        [7, 8, 9]]])

In [None]:
def get_unique_columns(X):
    return np.unique(X, axis = 1)

In [None]:
X =  np.random.randint(4, 6, size=(n, 3, 3))
print('Matrix:')
print(X)
print('Unique columns:')
get_unique_columns(X)

Matrix:
[[[5 4 5]
  [5 4 5]
  [5 5 4]]

 [[5 4 4]
  [5 4 5]
  [5 4 5]]

 [[5 5 5]
  [5 5 5]
  [5 4 5]]

 [[5 4 4]
  [5 5 4]
  [4 5 5]]

 [[5 5 5]
  [4 4 5]
  [5 4 5]]

 [[4 4 4]
  [4 5 4]
  [5 4 5]]

 [[5 5 4]
  [4 5 4]
  [5 4 5]]

 [[4 4 5]
  [4 5 5]
  [4 4 5]]

 [[5 4 5]
  [5 4 4]
  [5 4 5]]

 [[5 5 4]
  [4 4 4]
  [5 4 5]]

 [[4 4 4]
  [5 5 4]
  [5 5 5]]

 [[5 5 4]
  [4 4 4]
  [4 4 4]]

 [[4 4 5]
  [4 4 4]
  [4 5 5]]

 [[5 4 5]
  [4 4 5]
  [4 4 5]]

 [[4 4 4]
  [5 5 5]
  [5 5 5]]

 [[5 4 4]
  [4 4 4]
  [4 4 4]]

 [[4 5 5]
  [5 5 5]
  [4 5 5]]

 [[4 4 5]
  [4 5 4]
  [4 4 4]]

 [[4 4 4]
  [4 5 5]
  [5 4 4]]]
Unique columns:


array([[[5, 4, 5],
        [5, 4, 5],
        [5, 5, 4]],

       [[5, 4, 4],
        [5, 4, 5],
        [5, 4, 5]],

       [[5, 5, 5],
        [5, 5, 5],
        [5, 4, 5]],

       [[5, 4, 4],
        [5, 5, 4],
        [4, 5, 5]],

       [[5, 5, 5],
        [4, 4, 5],
        [5, 4, 5]],

       [[4, 4, 4],
        [4, 5, 4],
        [5, 4, 5]],

       [[5, 5, 4],
        [4, 5, 4],
        [5, 4, 5]],

       [[4, 4, 5],
        [4, 5, 5],
        [4, 4, 5]],

       [[5, 4, 5],
        [5, 4, 4],
        [5, 4, 5]],

       [[5, 5, 4],
        [4, 4, 4],
        [5, 4, 5]],

       [[4, 4, 4],
        [5, 5, 4],
        [5, 5, 5]],

       [[5, 5, 4],
        [4, 4, 4],
        [4, 4, 4]],

       [[4, 4, 5],
        [4, 4, 4],
        [4, 5, 5]],

       [[5, 4, 5],
        [4, 4, 5],
        [4, 4, 5]],

       [[4, 4, 4],
        [5, 5, 5],
        [5, 5, 5]],

       [[5, 4, 4],
        [4, 4, 4],
        [4, 4, 4]],

       [[4, 5, 5],
        [5, 5, 5],
        [4, 5, 5]]

# 3. Pandas & Visualization (4 points)


You are going to work with *Titanic* dataset which contains information about passengers of Titanic:
- **Survived** - 1 - survived, 0 - died (0); **Target variable**
- **pclass** - passengers's class;
- **sex** - passengers's sex
- **Age** - passengers's age in years
- **sibsp**    - is the passenger someones siblings   
- **parch**    - is the passenger someones child or parent
- **ticket** - ticket number    
- **fare** - ticket price    
- **cabin** - cabin number
- **embarked** - port of Embarkation; C = Cherbourg, Q = Queenstown, S = Southampton

**Note** for all visualizations use matplotlib or seaborn but NOT plotly! Plotly's graphics sometimes vanish after saving. In this case the task wont't be graded.

**Note** support all your answers with necessary code, computations, vizualization, and explanation. Answers without code and explanation won't be graded.

**Task 0** (0 points) \
Load the dataset and print first 5 rows

In [None]:
dataset = pd.read_csv("titanic.csv", index_col = 0)
dataset.head(5)

Unnamed: 0_level_0,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


**Task 1** (1 points) <br/>
Answer the following questions:
    
    * Are there any missing values? In what columns?
    * What is the percentage of survived passengers? Are the classes balanced?
    * Were there more males or females?
    * What what the least popular port of embarkation?
    * How many classes (pclass) were there on Tinanic?
    * What is the overall average ticket fare? And for every class?
Please, write not only the answers but the code, proving it.

In [None]:
# Are there any missing values? In what columns?
# Да, null значения содержатся в столбцах Age, Cabin, Embarked

dataset.isna().sum()

Survived      0
Pclass        0
Name          0
Sex           0
Age         177
SibSp         0
Parch         0
Ticket        0
Fare          0
Cabin       687
Embarked      2
dtype: int64

In [None]:
# What is the percentage of survived passengers? Are the classes balanced?
# Всего выжило 38%
# Выживаемость по классам не сбалансирована
# Например, в первом классе выжило 62% пассажиров, а в третьем 24%

def class_statsitics(data, text):
    print(f"{text}: {len(data[data['Survived'] == 1]) / data.shape[0] * 100}%")

class_statsitics(dataset, "Всего")
class_statsitics(dataset[dataset["Pclass"] == 1], "Первый класс")
class_statsitics(dataset[dataset["Pclass"] == 2], "Второй класс")
class_statsitics(dataset[dataset["Pclass"] == 3], "Третий класс")

Всего: 38.38383838383838%
Первый класс: 62.96296296296296%
Второй класс: 47.28260869565217%
Третий класс: 24.236252545824847%


In [None]:
# Were there more males or females?
# Было больше мужчин

print(len(dataset[dataset["Sex"] == "male"]))
print(len(dataset[dataset["Sex"] == "female"]))

577
314


In [None]:
# What what the least popular port of embarkation?
# Наименее популярный Queenstown

print(len(dataset[dataset["Embarked"] == "C"]))
print(len(dataset[dataset["Embarked"] == "Q"]))
print(len(dataset[dataset["Embarked"] == "S"]))

168
77
644


In [None]:
# How many classes (pclass) were there on Tinanic?
# Всего было 3 класса

print(len(dataset["Pclass"].unique()))

3


In [None]:
# What is the overall average ticket fare? And for every class?
# Средняя цена 32$
# Первый класс 84$
# Второй класс 20$
# Третий класс 13$

def average_fare(data, text):
    return print(f"{text} {np.sum(data['Fare']) / data.shape[0]}")

average_fare(dataset, "Всего")
average_fare(dataset[dataset["Pclass"] == 1], "Первый класс")
average_fare(dataset[dataset["Pclass"] == 2], "Второй класс")
average_fare(dataset[dataset["Pclass"] == 3], "Третий класс")

Всего 32.204207968574636
Первый класс 84.1546875
Второй класс 20.662183152173913
Третий класс 13.675550101832993


**Task 2** (0.5 points) <br/>
Visualize age distribution (you may use a histogram, for example). 

    * What is the minimal and maximum ages of the passengers? Visualize it on the plot
    * What is the mean age? And among males and females sepatately? Visualize it on the plot
    * Make conclusions about what you see on the plot

In [None]:
# What is the minimal and maximum ages of the passengers? Visualize it on the plot
# Минимальный возраст 0.42, максимальный 80

dataset["Age"].hist(bins = 35)
plt.axvline(x = np.min(dataset["Age"]), color = 'g')
plt.axvline(x = np.max(dataset["Age"]), color = 'g')
plt.show()
print(np.min(dataset["Age"]), np.max(dataset["Age"]))

In [None]:
# What is the mean age? And among males and females sepatately? Visualize it on the plot
# Среднее значение = 29.69
# Среднее значение среди мужчин = 30.72
# Среднее значение среди женщин = 27.91

dataset["Age"].hist(bins = 35)
mean = np.mean(dataset["Age"])
plt.axvline(x = np.mean(dataset["Age"]), color = 'g')
plt.axvline(x = np.mean(dataset[dataset["Sex"] == "male"]["Age"]), color = 'b')
plt.axvline(x = np.mean(dataset[dataset["Sex"] == "female"]["Age"]), color = 'r')
plt.show()
print(np.mean(dataset["Age"]))
print(np.mean(dataset[dataset["Sex"] == "male"]["Age"]))
print(np.mean(dataset[dataset["Sex"] == "female"]["Age"]))

In [None]:
# Make conclusions about what you see on the plot

# На графике можно увидеть несколько пиков в районе 20 и 30 лет
# Также среди детей преобладает возраст 3-5 лет, и есть просадка в районе 10 лет
# Можно сделать вывод, что на корабле было много молодых семей с маленькими детьми :(

**Task 3** (1 points) <br/>
Find all the titles of the passengers (example, *Capt., Mr.,Mme.*), which are written in the column Name, and answer the following questions:

    * How many are there unique titles?
    * How many are there passengers with every title?
    * What is the most popular man's title? And woman's title?
    
**Hint** You may select the title from the name as a word which contains a dot.

In [None]:
# How many are there unique titles?
# 17 уникальных titles

titles = dataset.iloc[:, 2:3]["Name"].str.lower().str.extract(r"([a-z]+\.)")
unique = titles[0].unique()
print(len(unique), unique)

17 ['mr.' 'mrs.' 'miss.' 'master.' 'don.' 'rev.' 'dr.' 'mme.' 'ms.' 'major.'
 'lady.' 'sir.' 'mlle.' 'col.' 'capt.' 'countess.' 'jonkheer.']


In [None]:
# How many are there passengers with every title?
# Ответ изображен ниже, самый популярный Mr. - 517 пассажиров

titles.value_counts()

mr.          517
miss.        182
mrs.         125
master.       40
dr.            7
rev.           6
major.         2
col.           2
mlle.          2
mme.           1
ms.            1
capt.          1
lady.          1
jonkheer.      1
don.           1
countess.      1
sir.           1
dtype: int64

In [None]:
# What is the most popular man's title? And woman's title?
# Среди мужчин самый популярный Mr.
# Среди женщин самый популярный Miss.

def get_most_popular(sex):
    return concated[concated["Sex"] == sex][0].value_counts().index[0]

concated = pd.concat([titles, dataset["Sex"]], axis = 1)
print(get_most_popular("male"))
print(get_most_popular("female"))

mr.
miss.


**Task 4** (0.5 points) <br/>
Is there correlation between *pclass* and *ticket price*? Calculate mean price for each port and visualize prize distribution for each port. Make conclusions about what you see on the plot

Hint: you could make one or several plot types i.e.  box, violine, pair, histogram (see additional notebooks for Seminar 1 "Visualization with Seaborn"). main point here is to **choose** plots wisely and **make meaningful conclusions**



In [None]:
# Средняя цена по классам: 84$, 20$ и 13$
# Корреляция -0.54

# Правило весьма ожидаемо и логично - чем лучше класс, чем выше цена
# Это же видно на графике

def mean_by_class(number):
    print(f"{number} {np.mean(dataset[dataset['Pclass'] == number]['Fare'])}")

print(dataset.Pclass.corr(dataset.Fare))
mean_by_class(1)
mean_by_class(2)
mean_by_class(3)

# https://seaborn.pydata.org/examples/grouped_boxplot.html
sns.boxplot(x = dataset.Pclass, y = dataset.Fare)
plt.show()

**Task 5** (0.5 points) <br/>
The same question as in task 4 about correlation between *embarked* and *ticket priсe*.

In [None]:
# Средняя цена по городам: 59$, 13$ и 27$
# Возникает вопрос чем вызвана данный разброс цены
# Как один из варинатов, билеты для городов, куда корабль приплывет раньше, стоят дороже,
# так как люди пробудут больше времени на корабле
# Еще, возможно, в одном из городов продали больше билетов выше классом, чем в другом городе

def mean_by_embark(town):
    print(f"{town} {np.mean(dataset[dataset['Embarked']== town[0]]['Fare'])}")

mean_by_embark("Cherbourg")
mean_by_embark("Queenstown")
mean_by_embark("Southampton")

# https://seaborn.pydata.org/examples/grouped_boxplot.html
sns.boxplot(x = dataset.Embarked, y = dataset.Fare)
plt.show()

**Task 6** (0.5 points) <br/>
Visualize age distribution for survived and not survived passengers separately and calculate the mean age for each class. Are they different? Provide the same visualization for males and females separately. Make conclusions about what you see on the plots

In [None]:
# Средние значение возраста погибших и выживших людей на одном уровне - 30 и 28 лет
# Средние значение возраста погибших и выживших мужчин - 31 и 27 лет
# Средние значение возраста погибших и выживших женщин - 25 и 29 лет

# Большая часть людей среднего возраст не смогла выжить, при этом 
# выживших женщин средних лет примерно в два раза больше, чем выживших мужчин

# Выживших среди детей, больше чем погибших

# Среди взрослых людей выживших и погибших примерно равно количество

def draw_dataset(data, text1, text2):
    print(text1, np.mean(data[data["Survived"] == 0]["Age"]))
    data[data["Survived"] == 0]["Age"].hist(bins = 10, range=(0, 80))
    plt.show()

    print(text2, np.mean(data[data["Survived"] == 1]["Age"]))
    data[data["Survived"] == 1]["Age"].hist(bins = 10, range=(0, 80))
    plt.show()
    print("\n\n")

print()

draw_dataset(dataset, "Не выжившие", "Выжившие")
draw_dataset(dataset[dataset["Sex"] == "male"], "Не выжившие мужчины", "Выжившие мужчины")
draw_dataset(dataset[dataset["Sex"] == "female"], "Не выжившие женщины", "Выжившие женщины")