<a href="https://colab.research.google.com/github/maharshi112/looking-forward/blob/master/Day_2_Numpy_2_Class_Practice.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# <code style="background:yellow;color:black">2D ARRAYS</code>

In the first lecture, we talked about-
* differences between python lists and numpy arrays
* why they are faster than python lists
* various properties of numpy arrays
* we literally compared the speed between the two
* how to create numpy arrays
* operations like indexing and slicing on them
* fancy indexing about numpy arrays which can also be called masking
* NPS case study

**We will today primarily work with 2D arrays, because the data in real world that we will get will be a table of rows and cols.
A table is nothing but a 2D array or a 2 dimensional representation of data.**

In [None]:
import numpy as np

In [None]:
a = np.array(range(16))

In [None]:
a

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])

In [None]:
a.shape

(16,)

In [None]:
a.ndim

1

**<code style="background:yellow;color:black">reshape()-</code>** Lets suppose we want to convert a 1D array to a 2D array. We can use a method called <mark>np.reshape().</mark>

**The reshape() method in Numpy is used to change the shape or dimensions of a NumPy array while keeping the same elements. It allows us to reorganize the elements of an array into a new shape, provided that the total number of elements remains the same.**

The reshape() method takes one argument, which is the new shape that we want for our array. This new shape should be specified as a tuple of dimensions. The product of the dimensions in the new shape should equal the total number of elements in the original array. If it doesn't, we will get a ValueError because it's not possible to reshape the array into the specified shape without changing the total number of elements.

**Format of function is: np_array.reshape(num_rows, num_columns)**

Here in the example below we use - a.reshape(8, 2) - this converts 1D array into 2D array with 8 rows and 2 cols.
We can always play with the structure of the array once we have a numpy array.

In [None]:
a.reshape(8, 2)

array([[ 0,  1],
       [ 2,  3],
       [ 4,  5],
       [ 6,  7],
       [ 8,  9],
       [10, 11],
       [12, 13],
       [14, 15]])

In [None]:
a.reshape(4, 4)

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

** Here in the code below, our array has 16 elements which cannot be stored in 20 spaces. So always make sure that product of rows and columns should always be equal to the total elements in original array. **

In [None]:
a.reshape(4, 5)

ValueError: cannot reshape array of size 16 into shape (4,5)

**This reshape() method is actually very powerful. If we say - a.reshape(8) - rows are set to 8, so number of cols in this case
will be 2 only, no other option, when we have 16 values in our data.**

So, lets suppose maybe data is in millions and we dont want to calculate the number of cols, we can simply give 2nd argument
as "-1". -1 denotes that we want python to automatically calculate the second argument.

You can also use -1 as one of the dimensions in the new shape, and NumPy will automatically calculate it based on the size of the array and the other specified dimensions. This is useful when you want to reshape an array without specifying all dimensions explicitly.

So, if we give only one argument that is rows argument, python will automatically calculate 2nd column argument.

In [None]:
a.reshape(8, -1)

array([[ 0,  1],
       [ 2,  3],
       [ 4,  5],
       [ 6,  7],
       [ 8,  9],
       [10, 11],
       [12, 13],
       [14, 15]])

We can do this for rows as well. Python will automatically compute that since there are 16 total elements,
and you want to store that data in 4 cols, so there will be 4 rows only, to make the product of rows and cols 16.

In [None]:
a.reshape(-1, 4)

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

This example code below will create ambiguity, because multiple values will satisfy this equation or condition;
4 x 4, 2 x 8, 8 x 2 all will satisfy this condition. Now python cannot randomly choose any one of them and return.

So, in order for python to compute any one of the arguments, you MUST provide other argument.
You can only specify one unknown dimension, the other dimension or argument needs to be specified.

In [None]:
a.reshape(-1, -1)

ValueError: can only specify one unknown dimension

In [None]:
a = a.reshape(8, 2)

In [None]:
a

array([[ 0,  1],
       [ 2,  3],
       [ 4,  5],
       [ 6,  7],
       [ 8,  9],
       [10, 11],
       [12, 13],
       [14, 15]])

** Since numpy arrays are similar to python 2D lists, we can get number of rows as follows- **

In [None]:
len(a)

8

** We can get number of columns as below - len(a[0]) = len(a[1]) = len(a[2]) - because in matrix each row contains same no of cols **

In [None]:
len(a[0])

2

In [None]:
a = np.arange(12)

In [None]:
a

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

In [None]:
a = a.reshape(3, 4)

In [None]:
a

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

### <code style="background:yellow;color:black">METHOD CHAINING-</code>

* Step 1- We created a numpy array consisting of 12 elements as above, by calling the arange() function.
* Step 2- We reshaped that array by calling the reshape() function.

In method chaining, we can do it all together. We can chain both the methods one after the other in a single step.

<mark>Advantage-</mark> Makes syntax easy, sometimes these two tasks are done one after the other, so it saves time, looks nice.
Method chaining isnt faster, it just looks good.

<mark>OPTIMISATION IS MORE IN REFERENCE TO PERFORMANCE OF CODE, AND NOT WRT SHAPE AND SIZE OF THE CODE.</mark>
No doubt that making the code look clean, readable and faster to write is vital because most of the time you will be
working in a team, and team can easily understand your code; maybe you have to collaborate on your single python file.

The format of that chaining looks like this code implementation below-

In [None]:
b = np.arange(12).reshape(3, 4)

In [None]:
b

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [None]:
b.shape

(3, 4)

**<mark>T represents the transpose of any matrix in numpy.</mark> Transpose of any matrix is matrix rotated by 90 degress, rows will become cols and cols will become rows, sort of like a pivot. Transpose of a matrix is readily available in numpy.**

**In NumPy, the transpose of a matrix can be obtained using the .T attribute, or by using the numpy.transpose() method.** The transpose of a matrix essentially switches its rows and columns. It's a fundamental linear algebra operation that is used for various purposes in mathematics, science, and engineering.

<mark>Why do we need Transpose?</mark> We will do a fitbit case study to see the application of transpose.

* In image processing, transposing a matrix can be used to perform various operations, such as rotating or flipping images.

* In linear regression and related statistical techniques, the transpose is used when calculating coefficients or solving the normal equations.

* If you have data with rows representing variables and columns representing observations, you can transpose it to switch the orientation for better analysis or visualization.

In [None]:
b.T

array([[ 0,  4,  8],
       [ 1,  5,  9],
       [ 2,  6, 10],
       [ 3,  7, 11]])

In [None]:
b.T.shape

(4, 3)

In [None]:
b

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

The expression b[0][1] gives us the value located in the first row (index 0) and the second column (index 1) of the NumPy array b.

b[0][1] corresponds to the element at the intersection of the first row and the second column, which is 1. This is how you access a specific element in a multi-dimensional array in NumPy.

**<mark>This type of indexing is called "nested indexing" or "double indexing" or "chained indexing".</mark>**

In [None]:
b[0][1]

1

**<mark>Following indexing is called "concise indexing" and it only works with numpy arrays -> NOT WITH PYTHON LISTS.</mark>**

This is not much advantageous, actually its confusing. Although there is one <u>**small advantage of finding diagonal elements easily**</u> that we will discuss further below, there are other advantages of indexing shown below.

* <u>Performance:</u> The concise indexing is often more efficient in terms of performance. When you use a[0, 1], NumPy can access the element directly, whereas with nested indexing, it may involve additional intermediate steps.
* <u>Multi-dimensional Arrays:</u> When working with multi-dimensional arrays (more than two dimensions), concise indexing becomes essential for clarity and ease of use.
* <u>Broadcasting:</u> NumPy supports broadcasting, which allows you to perform operations on arrays with different shapes. Concise indexing helps you understand and work with broadcasting rules more effectively.

In [None]:
b[0, 1]

1

** Lets suppose we have a numpy array called c like this - **

In [None]:
c = np.arange(9).reshape(3, 3)

In [None]:
c

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

** Lets suppose we have a numpy array called d like this - **

In [None]:
d = np.arange(10)

In [None]:
d

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Now, this is an old concept of previous lecture. We learnt that if we have a numpy array like "d", then what we can do is
**inside of indexes we can provide another list of indexes and all of the corresponding elements in the array will be returned.**

This is possible with a 1D np array. But what for a 2D np array- is this kind of thing possible? - YES. How, we will see in the below note.

In [None]:
d[[6,7,8,1]]

array([6, 7, 8, 1])

In [None]:
c

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

We will provide indexes in the form of a list of indexes of rows and cols. In other words we are asking numpy to pick ith and jth elements from the array.  

**Inside a index, we are providing 2 sets of indexes- one for the rows and one for the columns. So, it will return an array containing diagonal elements of the "c" array.** So basically, it will pick 0,0th  1,1th  2,2th elements from the array and return the array of diagonal elements.

**<mark>We can essentially pick <u>ANY NUMBER</u> of elements in a 2D array, unlike in above code -- b[0,1] where we get just one element, by providing concise indexes like the one shown below.</mark>**

Since this works with 1D lists, in which we just provide column numbers, it also works with 2D lists, in which we provide row and column both.

In [None]:
c[[0,1,2],[0,1,2]]

array([0, 4, 8])

** U can pick any elements, it doesnt have to be diagonal or any pattern always, it can be anything random. **

In [None]:
c[[0,2],[1,1]]

array([1, 7])

** This index is out of range, index should be available. So, out of bounds error.**

In [None]:
c[[0,2],[1,5]]

IndexError: index 5 is out of bounds for axis 1 with size 3

** Shape mismatch error. **

In [None]:
c[[1,2,0],[1,2]]

IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (3,) (2,) 

** Lets suppose we have a numpy array called test like this - **

In [None]:
test = np.arange(64).reshape(8,8)

In [None]:
test

array([[ 0,  1,  2,  3,  4,  5,  6,  7],
       [ 8,  9, 10, 11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29, 30, 31],
       [32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47],
       [48, 49, 50, 51, 52, 53, 54, 55],
       [56, 57, 58, 59, 60, 61, 62, 63]])

What below code will do is- len(test) is 8, so we have range(8); and this will generate a list of numbers from 0 to 7.

So, it will essentially look like this- test[[0,1,2,3,4,5,6,7], [0,1,2,3,4,5,6,7]].

And this will give 0,0th  1,1th  2,2th  3,3th,  4,4th and so on to return the diagonal elements.

In [None]:
test[list(range(len(test))),list(range(len(test)))]

array([ 0,  9, 18, 27, 36, 45, 54, 63])

**<mark>SLICING-</mark>**

Since we talk about indexing in 2D arrays, we will also talk about slicing in 2D arrays because its also important.

Similar to how we can provide indexes to both rows and columns, we can also provide slices for both rows and columns.

**BY DEFAULT SLICING IS DONE ON ROWS.**

In [None]:
a = np.arange(12).reshape(3, 4)

In [None]:
a

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

This slice got us 0th row and 1st row, last "2" isnt included.

This slice got applied to rows and we got all columns.

Means- when we have a 2D list, a slice menas we want a subset of all the rows.
Or a slice can also mean that I want a subset of few cols and I want all the rows. This is well explained in notes.
A slice can also mean that I dont want all the rows nor all the cols, I want something like 1st and 2nd row but only few cols,
basically, some rows and some cols.

**<mark>There can be all of the 3 possibilities when it comes to slicing.</mark>**

**First possibility is very easy, u just provide a single set of slice like the code below-**

In [None]:
a[:2]

array([[0, 1, 2, 3],
       [4, 5, 6, 7]])

If we write - a[:] - it will print all the data, we know this that starting element and ending element is taken by default and
prints everyhing in between.

**Second possibility is we add a comma next to colon, then we can add a slice on the number of columns. So, we can say
we want all the columns from 1 to 3, as shown below-**

It returs all rows, and 1st column and 2nd column, and NOT 0th and 3rd column.

In [None]:
a[:, 1:3]

array([[ 1,  2],
       [ 4,  5],
       [ 7,  8],
       [10, 11]])

** We can have any subset of rows and column elements that we want. **

In [None]:
a[:2, 1:3]

array([[1, 2],
       [4, 5]])

Whenever we are working we 2D numpy arrays, which we will be working in real world 99% of the time, we have all of this added
capabilities like indexing, slicing and all the we will study further that python lists do not actually provide.
They do provide, but numpy has made that a little bit better, elevated the entire thing a little bit.

In [None]:
test

array([[ 0,  1,  2,  3,  4,  5,  6,  7],
       [ 8,  9, 10, 11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29, 30, 31],
       [32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47],
       [48, 49, 50, 51, 52, 53, 54, 55],
       [56, 57, 58, 59, 60, 61, 62, 63]])

**Third possibilty is adding step size or jump values, which also works when slicing 2D arrays.**

In [None]:
test[3:7:2, 1:6:3]

array([[25, 28],
       [41, 44]])

In [None]:
a = np.array([1,2,3,4,5])
b = np.array([8,7,6])

** Pick everything after the 3rd element. **

In [None]:
a[3:]

array([4, 5])

** It will start from end point because jump is of -2, so "6" and "8" will be picked. **

In [None]:
b[::-2]

array([6, 8])

** Take the result of b[::-2] and add that in the position of a[3:]. **

In [None]:
a[3:] = b[::-2]

** So we are replacing result of a[3:] that is 4,5 with result of b[::-2] that is 6,8 to finally get the array "a". **

In [None]:
a

array([1, 2, 3, 6, 8])

In [None]:
a = np.arange(12).reshape(4,3)

**Fancy indexing also works on 2D numpy arrays; below code returns a 1D array.**

Actually the elements, that is the result, that are picked do not fit in the 4 x 3 dimension array, which means the result
should have a different dimension, but python cannot calculate both the dimensions, which means it simply outputs the
result in a 1D array.

**<mark>CRUX of it is that working of fancy indexing on 2D arrays is similar to its working on 1D arrays. Its just that the result is always a 1D array.</mark>**

In [None]:
a[a < 6]

array([0, 1, 2, 3, 4, 5])

In [None]:
a

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])

In [None]:
a < 6

array([[ True,  True,  True],
       [ True,  True,  True],
       [False, False, False],
       [False, False, False]])

## <code style="background:yellow;color:black">Aggregate functions</code>

**Aggregate functions are basically mathematical operations. If we have numpy imported, we wont have to import math module in 99% of the cases.**

Aggregate functions in Numpy are functions that operate on an array or a portion of an array to compute a single, summary statistic or value. These functions allow us to perform common mathematical and statistical operations on arrays, often summarizing the data in some way. Aggregate functions are particularly useful for data analysis and processing.

Some common aggregate functions in NumPy include:

- numpy.sum(): Computes the sum of all elements in an array or along a specified axis.

- numpy.mean(): Calculates the mean (average) of the elements in an array or along a specified axis.

- numpy.median(): Computes the median (middle value) of the elements in an array.

- numpy.min(): Returns the minimum value in an array.

- numpy.max(): Returns the maximum value in an array.

- numpy.var(): Computes the variance of the elements in an array, measuring the spread or dispersion of the data.

- numpy.std(): Calculates the standard deviation of the elements in an array, which is a measure of the data's dispersion around the mean.

- numpy.prod(): Computes the product of all elements in an array or along a specified axis.

- numpy.cumsum(): Calculates the cumulative sum of elements along a specified axis.

- numpy.cumprod(): Computes the cumulative product of elements along a specified axis.

- numpy.percentile(): Calculates the specified percentile value (e.g., 25th, 50th, or 75th percentile) of the data.

- numpy.histogram(): Generates a histogram of the data, returning the counts and bin edges.

- numpy.unique(): Returns the unique elements in an array, along with their counts.

- numpy.bincount(): Counts occurrences of non-negative integers in an array, returning a count for each integer.

- numpy.nanmean(), numpy.nanmedian(), and other "nan" functions: These are similar to their non-"nan" counterparts, but they ignore NaN (Not-a-Number) values in the computation.

**But these are there in python itself, so why are we studying and using these in numpy? Beacuse of what it converts to or what it leads to.**

In [None]:
a = np.arange(1, 11)

In [None]:
a

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [None]:
np.sum(a)

55

In [None]:
np.mean(a)

5.5

In [None]:
np.min(a)

1

In [None]:
np.max(a)

10

In [None]:
a = np.arange(12).reshape(3, 4)

In [None]:
a

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])

* To implement row and col wise sum of elements in a python list it is a bit long code and cumbersome and we have to write
custom code in python 2D lists.

* But numpy can do that with just one line of code which is so simple. Its very useful because in excel we create a lot of
grand totals of cols or rows frequently and numpy can automatically do that.

* The code below will add all the elements.

In [None]:
np.sum(a)

66

** This code will sum all the columns. Basically in my array "a", axis = 0 means vertical axis. So it will go and add all elements of column 0, then add all elements of column 1 and so on and return a list of that arrangement. **

In [None]:
np.sum(a, axis = 0)

array([18, 22, 26])

** Similarly I can do row wise sum by specifying axis = 1. **

In [None]:
np.sum(a, axis = 1)

array([ 3, 12, 21, 30])

In [None]:
np.mean(a, axis = 0)

array([4.5, 5.5, 6.5])

In [None]:
m = np.arange(16).reshape(4, 4)

In [None]:
m

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

** How to get sum of all diagonal elements. **

In [None]:
np.diag(m[[0,1,2,3],[0,1,2,3]]).sum()

30

## <code style="background:yellow;color:black">Logical operations</code>

**Logical operations in Numpy are operations that allow us to perform element-wise comparisons between elements in Numpy arrays. These operations return Boolean arrays with True and False values, where each value represents the result of the element-wise comparison.**

**There are 2 basic logical operations that we will study - .any() and .all()**

In [None]:
prices = np.array([50, 45, 25, 20, 35])

* Lets say, I have a budget, means I can only purchase an item upto 30 rupees. Overall task is to check if there is any value in the array with price less than or equal to 30. Returns True if the condition is True for even a single value.

* So, usually what we would do is we would write a loop, we will compare every value with 30, if we get a value which is less than or equal to 30, we will set a flag to true and then we will return that.

* "can_afford" is our flag. What any() method will do is it will check the condition inside it and compare it with array elements, and if its true for ANY OF THE VALUES, then our "can_afford" flag will become true.

* Lets suppose we are doing a comparison in which we are checking for multiple values and even if a single value satisfies the condition and we are ok in moving ahead or that is what we want, THEN WE CAN USE NP.ANY(). Its sort of like an OR operation.

In [None]:
can_afford = np.any(prices <= 30)

In [None]:
can_afford

True

In [None]:
task_completion = np.array([1,1,1,1,1,1,0])

* What all() method will do is it will check the condition inside it and compare it with array elements, and if its true for ALL OF THE VALUES, then our "can_go_out_play" flag will become true.

* Returns True if all the values satisfy the condition. Its sort of like an AND operation.

In [None]:
can_go_out_play = np.all(task_completion == 1)

In [None]:
can_go_out_play

False

In [None]:
a = np.array([1,4,3,2])
b = np.array([2,2,3,2])
c = np.array([6,4,4,5])

* This is a way where we can write multiple conditions or multiple np arrays in all() method.
* Here all the conditions element wise must satisfy for all() method to return true.

In [None]:
((a <= b) & (b <= c)).all()

False

In [None]:
a = np.array([-3,4,27,34,-2, 0, -45,-11,4, 0 ])
a

array([ -3,   4,  27,  34,  -2,   0, -45, -11,   4,   0])

* Setting values for elements selected via fancy indexing.

* We can pick these elements and change the elements at the same time with the given conditions.

* Here the condition is if element is greater than 0 pick and replace it with 1, and element is less than 0 pick and replace it with -1, and let 0 be 0.

In [None]:
a[a > 0] = 1
a[a < 0] = -1

In [None]:
a

array([-1,  1,  1,  1, -1,  0, -1, -1,  1,  0])

* Here, we want to check that if price of product is greater than 50, we want to give a discount of 10% on top of that value. So, we want to update the prices after adding the discount.

* So, with this **where() method, we can modify all the elements in our np array based on a given condition.**

* **Format of np.where() -- np.where(condition, value_if_true, value_if_false)**

In [None]:
prices = np.array([45, 55, 60, 30, 75, 20, 100, 90])

In [None]:
discounted_prices = np.where(prices > 50, prices * 0.9, prices)

In [None]:
discounted_prices

array([45. , 49.5, 54. , 30. , 67.5, 20. , 90. , 81. ])

In [None]:
# Just a batchmate's doubt on list slicing

a = [1,2,3,4,5,6,7,8]

In [None]:
a[3:7:2]

[4, 6]

In [None]:
a[1:6:3]

[2, 5]

## <code style="background:yellow;color:black">FITBIT CASE STUDY</code>

In [None]:
!gdown https://drive.google.com/uc?id=1vk1Pu0djiYcrdc85yUXZ_Rqq2oZNcohd

Downloading...
From: https://drive.google.com/uc?id=1vk1Pu0djiYcrdc85yUXZ_Rqq2oZNcohd
To: C:\Users\admin\Desktop\Scaler\Module 5- Python Libraries\My Practice\fit.txt

  0%|          | 0.00/3.43k [00:00<?, ?B/s]
100%|##########| 3.43k/3.43k [00:00<?, ?B/s]


In [None]:
data = np.loadtxt("fit.txt", dtype = "str")

In [None]:
data[:5]

array([['06-10-2017', '5464', 'Neutral', '181', '5', 'Inactive'],
       ['07-10-2017', '6041', 'Sad', '197', '8', 'Inactive'],
       ['08-10-2017', '25', 'Sad', '0', '5', 'Inactive'],
       ['09-10-2017', '5461', 'Sad', '174', '4', 'Inactive'],
       ['10-10-2017', '6915', 'Neutral', '223', '5', 'Active']],
      dtype='<U10')

In [None]:
data.shape

(96, 6)

In [None]:
data[0]

array(['06-10-2017', '5464', 'Neutral', '181', '5', 'Inactive'],
      dtype='<U10')

In [None]:
# Approach 1

data[:, 0]

array(['06-10-2017', '07-10-2017', '08-10-2017', '09-10-2017',
       '10-10-2017', '11-10-2017', '12-10-2017', '13-10-2017',
       '14-10-2017', '15-10-2017', '16-10-2017', '17-10-2017',
       '18-10-2017', '19-10-2017', '20-10-2017', '21-10-2017',
       '22-10-2017', '23-10-2017', '24-10-2017', '25-10-2017',
       '26-10-2017', '27-10-2017', '28-10-2017', '29-10-2017',
       '30-10-2017', '31-10-2017', '01-11-2017', '02-11-2017',
       '03-11-2017', '04-11-2017', '05-11-2017', '06-11-2017',
       '07-11-2017', '08-11-2017', '09-11-2017', '10-11-2017',
       '11-11-2017', '12-11-2017', '13-11-2017', '14-11-2017',
       '15-11-2017', '16-11-2017', '17-11-2017', '18-11-2017',
       '19-11-2017', '20-11-2017', '21-11-2017', '22-11-2017',
       '23-11-2017', '24-11-2017', '25-11-2017', '26-11-2017',
       '27-11-2017', '28-11-2017', '29-11-2017', '30-11-2017',
       '01-12-2017', '02-12-2017', '03-12-2017', '04-12-2017',
       '05-12-2017', '06-12-2017', '07-12-2017', '08-12

In [None]:
data.T[0]

array(['06-10-2017', '07-10-2017', '08-10-2017', '09-10-2017',
       '10-10-2017', '11-10-2017', '12-10-2017', '13-10-2017',
       '14-10-2017', '15-10-2017', '16-10-2017', '17-10-2017',
       '18-10-2017', '19-10-2017', '20-10-2017', '21-10-2017',
       '22-10-2017', '23-10-2017', '24-10-2017', '25-10-2017',
       '26-10-2017', '27-10-2017', '28-10-2017', '29-10-2017',
       '30-10-2017', '31-10-2017', '01-11-2017', '02-11-2017',
       '03-11-2017', '04-11-2017', '05-11-2017', '06-11-2017',
       '07-11-2017', '08-11-2017', '09-11-2017', '10-11-2017',
       '11-11-2017', '12-11-2017', '13-11-2017', '14-11-2017',
       '15-11-2017', '16-11-2017', '17-11-2017', '18-11-2017',
       '19-11-2017', '20-11-2017', '21-11-2017', '22-11-2017',
       '23-11-2017', '24-11-2017', '25-11-2017', '26-11-2017',
       '27-11-2017', '28-11-2017', '29-11-2017', '30-11-2017',
       '01-12-2017', '02-12-2017', '03-12-2017', '04-12-2017',
       '05-12-2017', '06-12-2017', '07-12-2017', '08-12

In [None]:
data_t = data.T

In [None]:
data_t.shape

(6, 96)

In [None]:
date, step_count, mood, calories_burned, hours_of_sleep, activity_status = data_t

In [None]:
step_count

array(['5464', '6041', '25', '5461', '6915', '4545', '4340', '1230', '61',
       '1258', '3148', '4687', '4732', '3519', '1580', '2822', '181',
       '3158', '4383', '3881', '4037', '202', '292', '330', '2209',
       '4550', '4435', '4779', '1831', '2255', '539', '5464', '6041',
       '4068', '4683', '4033', '6314', '614', '3149', '4005', '4880',
       '4136', '705', '570', '269', '4275', '5999', '4421', '6930',
       '5195', '546', '493', '995', '1163', '6676', '3608', '774', '1421',
       '4064', '2725', '5934', '1867', '3721', '2374', '2909', '1648',
       '799', '7102', '3941', '7422', '437', '1231', '1696', '4921',
       '221', '6500', '3575', '4061', '651', '753', '518', '5537', '4108',
       '5376', '3066', '177', '36', '299', '1447', '2599', '702', '133',
       '153', '500', '2127', '2203'], dtype='<U10')

In [None]:
# Can you figure out if there is a correlation between step_count and mood

Can we say that in all of these individual arrays date, step_count, mood, calories_burned, hours_of_sleep and activity_status, the 0th element is data for 1st person, the 1st element is the data for 2nd person, 2nd element is the data for 3rd person and so on? So, looking at same indexes in all of these arrays, I will get data that belongs to one row because i have taken a transpose of that data.

Lets suppose I create a mask on mood array. Now in mood data, mood is neutral, sad and happy. I create a mask on mood by this - mood == "Happy". This will return a bunch of true and false values. I can create a mask and then I can apply the mask to an array to get filtered data. This is what we so in masking - we create a mask, apply a condition on it and then apply that mask on array to get filtered data.

NOW, I CAN CREATE A MASK USING MOOD, BUT APPLY IT ON STEP_COUNT. So, 1st element of mood is neutral and 1st element of step_count is 5464. So, in mask of mood == "Happy" the first element is False, which means the 1st person is not happy because of sthe step count he took i.e. 5464. So, we have created a mask of some array and applied to some other array.

In [None]:
mood

array(['Neutral', 'Sad', 'Sad', 'Sad', 'Neutral', 'Sad', 'Sad', 'Sad',
       'Sad', 'Sad', 'Sad', 'Sad', 'Happy', 'Sad', 'Sad', 'Sad', 'Sad',
       'Neutral', 'Neutral', 'Neutral', 'Neutral', 'Neutral', 'Neutral',
       'Happy', 'Neutral', 'Happy', 'Happy', 'Happy', 'Happy', 'Happy',
       'Happy', 'Happy', 'Neutral', 'Happy', 'Happy', 'Happy', 'Happy',
       'Happy', 'Happy', 'Happy', 'Happy', 'Happy', 'Happy', 'Neutral',
       'Happy', 'Happy', 'Happy', 'Happy', 'Happy', 'Happy', 'Happy',
       'Happy', 'Happy', 'Neutral', 'Sad', 'Happy', 'Happy', 'Happy',
       'Happy', 'Happy', 'Happy', 'Happy', 'Sad', 'Neutral', 'Neutral',
       'Sad', 'Sad', 'Neutral', 'Neutral', 'Happy', 'Neutral', 'Neutral',
       'Sad', 'Neutral', 'Sad', 'Neutral', 'Neutral', 'Sad', 'Sad', 'Sad',
       'Sad', 'Happy', 'Neutral', 'Happy', 'Neutral', 'Sad', 'Sad', 'Sad',
       'Neutral', 'Neutral', 'Sad', 'Sad', 'Happy', 'Neutral', 'Neutral',
       'Happy'], dtype='<U10')

In [None]:
# Creating a mask on mood array

mood == "Happy"

array([False, False, False, False, False, False, False, False, False,
       False, False, False,  True, False, False, False, False, False,
       False, False, False, False, False,  True, False,  True,  True,
        True,  True,  True,  True,  True, False,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True, False,  True,
        True,  True,  True,  True,  True,  True,  True,  True, False,
       False,  True,  True,  True,  True,  True,  True,  True, False,
       False, False, False, False, False, False,  True, False, False,
       False, False, False, False, False, False, False, False, False,
        True, False,  True, False, False, False, False, False, False,
       False, False,  True, False, False,  True])

In [None]:
# Converting all str values to ints in step_count array.

step_count = np.array(step_count, dtype = "int")

In [None]:
# Here I am applying a mask generated on mood array to step_count array.

step_count_happy = step_count[mood == "Happy"]

In [None]:
# So these were the step counts of all those people who were happy.

step_count_happy

array([4732,  330, 4550, 4435, 4779, 1831, 2255,  539, 5464, 4068, 4683,
       4033, 6314,  614, 3149, 4005, 4880, 4136,  705,  269, 4275, 5999,
       4421, 6930, 5195,  546,  493,  995, 3608,  774, 1421, 4064, 2725,
       5934, 1867, 7422, 5537, 5376,  153, 2203])

In [None]:
step_count_sad = step_count[mood == "Sad"]

In [None]:
# So these were the step counts of all those people who were sad.

step_count_sad

array([6041,   25, 5461, 4545, 4340, 1230,   61, 1258, 3148, 4687, 3519,
       1580, 2822,  181, 6676, 3721, 1648,  799, 1696,  221, 4061,  651,
        753,  518,  177,   36,  299,  702,  133])

In [None]:
step_count_neutral = step_count[mood == "Neutral"]

In [None]:
# So these were the step counts of all those people who were neutral.

step_count_neutral

array([5464, 6915, 3158, 4383, 3881, 4037,  202,  292, 2209, 6041,  570,
       1163, 2374, 2909, 7102, 3941,  437, 1231, 4921, 6500, 3575, 4108,
       3066, 1447, 2599,  500, 2127])

In [None]:
# Here we find the mean of all 3. So, from analysis, we fins that if the step count is higher the mood is happy or vice versa.

step_count_happy.mean(), step_count_sad.mean(), step_count_neutral.mean()

(3392.725, 2103.0689655172414, 3153.777777777778)