## Exercise 1.02: Indexing, Slicing, Splitting, and Iterating

Our client wants to prove that our dataset is nicely distributed around the mean value of 100.   
They asked us to run some tests on several subsections of it to make sure they won't get a non-descriptive section of our data.

Look at the mean value of each subtask.

#### Loading the dataset

In [None]:
# importing the necessary dependencies
import numpy as np

In [None]:
# loading the Dataset
dataset = np.genfromtxt('../../Datasets/normal_distribution_splittable.csv', delimiter=',')

---

#### Indexing

Since we need several rows of our dataset to complete the given task, we have to use indexing to get the right rows.   
To recap, index: 
- the second row 
- the last row
- the first value of the first row
- the last value of the second to the last row

In [None]:
# indexing the second row of the dataset (2nd row)
second_row = dataset[1]

np.mean(second_row)

In [None]:
# indexing the last element of the dataset (last row)
last_row = dataset[-1]

np.mean(last_row)

In [None]:
# indexing the first value of the first row (1st row, 1st value)
first_val_first_row = dataset[0][0]

np.mean(first_val_first_row)

In [None]:
# indexing the last value of the second to last row (we want to use the combined access syntax here) 
last_val_second_last_row = dataset[-2, -1]

np.mean(last_val_second_last_row)

---

#### Slicing

Other than the single rows and values we also need to get some subsets of the dataset.   
Use slicing for:
- a 2x2 slice starting from the second row and second element to the 4th element in the 4th row
- every other element of the 5th row
- the content of the last row in reversed order

In [None]:
# slicing an intersection of 4 elements (2x2) of the first two rows and first two columns
subsection_2x2 = dataset[1:3, 1:3]

np.mean(subsection_2x2)

##### Why is it not a problem if such a small subsection has a bigger standard deviation from 100?

Several smaller values can cluster in such a small subsection leading to the value being really low.   
If we make our subsection larger, we have a higher chance of getting a more expressive view of our data.

In [None]:
# selecting every second element of the fifth row 
every_other_elem = dataset[4, ::2]

np.mean(every_other_elem)

In [None]:
# reversing the entry order, selecting the first two rows in reversed order
reversed_last_row = dataset[-1, ::-1]

np.mean(reversed_last_row)

---

#### Splitting

Our client's team only wants to use a small subset of the given dataset.   
Therefore we need to first split it into 3 equal pieces and then give them the first half of the first split.   
They sent us this drawing to show us what they need:
```
1, 2, 3, 4, 5, 6          1, 2     3, 4    5, 6          1, 2  
3, 2, 1, 5, 4, 6    =>    3, 2     1, 5    4, 6    =>    3, 2    =>    1, 2
5, 3, 1, 2, 4, 3          5, 3     1, 2    4, 3                        3, 2
1, 2, 2, 4, 1, 5          1, 2     2, 4    1, 5          5, 3
                                                         1, 2
```

> **Note:**   
We are using a very small dataset here but imagine you have a huge amount of data and only want to look at a small subset of it to tweak your visualizations

In [None]:
# splitting up our dataset horizontally on indices one third and two thirds
hor_splits = np.hsplit(dataset,(3))

In [None]:
# splitting up our dataset vertically on index 2
ver_splits = np.vsplit(hor_splits[0],(2))

In [None]:
# requested subsection of our dataset which has only half the amount of rows and only a third of the columns
print("Dataset", dataset.shape)
print("Subset", ver_splits[0].shape)

---

#### Iterating

Once you sent over the dataset they tell you that they also need a way iterate over the whole dataset element by element as if it would be a one-dimensional list.   
However, they want to also now the position in the dataset itself.

They send you this piece of code and tell you that it's not working as mentioned.   
Come up with the right solution for their needs using the `ndenumerate method`.

In [None]:
# iterating over whole datagmaiset (each value in each row)
curr_index = 0
for x in np.nditer(dataset):
    print(x, curr_index)
    curr_index += 1

In [None]:
# iterating over whole dataset with indices matching the position in the dataset
for index, value in np.ndenumerate(dataset):
    print(index, value)