<a href="https://colab.research.google.com/github/iEpsilon-FPS/QU-Python/blob/master/5.1%20Analyzing_Data_Tutorial_Part_1_Numpy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Analyzing Data. Packages Covered
- numpy

Copyright 2020 QuantUniversity LLC.

This is a modified and extended from this [tutorial](https://colab.research.google.com/github/google/eng-edu/blob/master/ml/cc/exercises/numpy_ultraquick_tutorial.ipynb?utm_source=mlcc&utm_campaign=colab-external&utm_medium=referral&utm_content=mlcc-prework&hl=en).
NumPy is a Python library for creating and manipulating vectors and matrices. 

In [None]:
!pip install yfinance

In [None]:
import numpy as np # This is NumPy
import yfinance as yf # This is Yahoo finance
import datetime # This is financial calenars

## Populate arrays with specific numbers

Call `np.array` to create a NumPy matrix with your own hand-picked values. For example, the following call to `np.array` creates an 8-element vector:

In [None]:
# This is simple vector
one_dimensional_array = np.array([1.2, 2.4, 3.5, 4.7, 6.1, 7.2, 8.3, 9.5])
print(one_dimensional_array)

You can also use `np.array` to create a two-dimensional matrix. To create a two-dimensional matrix, specify an extra layer of square brackets. For example, the following call creates a 3x2 matrix:

In [None]:
# This is 3 x 2 matrix
two_dimensional_array = np.array([[6, 5], [11, 7], [4, 8]])
print(two_dimensional_array)

In [None]:
# This is a transposition
two_dim_tr = two_dimensional_array.T
two_dim_tr

In [None]:
# This gives dimensions of the matrix
two_dim_tr.shape

In [None]:
# This flattens my 2 x 3 matrix into a 6 x 1 vector
two_to_one_dim = two_dim_tr.ravel()
two_to_one_dim

In [None]:
# This shows two_to_one_dim is a 6 x 1 vector
two_to_one_dim.shape

In [None]:
# This reshape a flat vector into a 3 x 2 matrix
one_dim_to_two = two_to_one_dim.reshape(3,2)
one_dim_to_two

In [None]:
# What is the new shape?
one_dim_to_two.shape

In [None]:
# This is reshaping again
one_dim_to_two_alt = two_to_one_dim.reshape(2,-1)
one_dim_to_two_alt 

In [None]:
# What is the new shape?
one_dim_to_two_alt.shape

In [None]:
# This is sorting
b = np.sort(one_dim_to_two_alt,axis =1)
print("appear like horizontal (by columns) sorting b =", b)
print()

c = np.sort(one_dim_to_two_alt,axis =0)
print("appear like vertical (by rows) sorting c =", c)

To populate a matrix with all zeroes, call `np.zeros`. To populate a matrix with all ones, call `np.ones`.

In [None]:
# ?np.zeros
a = np.zeros(5)
print("a =", a)
print()

b = np.zeros((2,3))
print("b =", b)

## Populate arrays with sequences of numbers

You can populate an array with a sequence of numbers:

In [None]:
# ?np.arange. NOTE: EXCLUDES the upper bound, for obscure syntax reasons.... 
sequence_of_integers = np.arange(5, 12)
print("sequence_of_integers from 5 (included) to 11 (12 excluded)=", sequence_of_integers)

Notice that `np.arange` generates a sequence that includes the lower bound (5) but not the upper bound (12). 

## Populate arrays with random numbers

NumPy provides various functions to populate matrices with random numbers across certain ranges. For example, `np.random.randint` generates random integers between a low and high value. The following call populates a 6-element vector with random integers between 50 and 100. 




In [None]:
# ?np.random, see the different types of random numbers that can be generated with NumPy
random_integers_between_50_and_100 = np.random.random_integers(low=50, high=101, size=(6))
print(random_integers_between_50_and_100)

To create random floating-point values between 0.0 and 1.0, call `np.random.random`. For example:

In [None]:
# ?np.random.random
random_floats_between_0_and_1 = np.random.random([6])
print(random_floats_between_0_and_1) 

## Mathematical Operations on NumPy Operands

If you want to add or subtract two vectors or matrices, linear algebra requires that the two operands have the same dimensions. Furthermore, if you want to multiply two vectors or matrices, linear algebra imposes strict rules on the dimensional compatibility of operands. Fortunately, NumPy uses a trick called [**broadcasting**](https://developers.google.com/machine-learning/glossary/#broadcasting) to virtually expand the smaller operand to dimensions compatible for linear algebra. For example, the following operation uses broadcasting to add 2.0 to the value of every item in the vector created in the previous code cell:

In [None]:
# This adds 2 to every row of the 6 x 1 vector
random_floats_between_2_and_3 = random_floats_between_0_and_1 + 2.0
print(random_floats_between_2_and_3)

The following operation also relies on broadcasting to multiply each cell in a vector by 3:

In [None]:
# This is the multiplication by 3 of a 6x1 random vector from 50 to 100
random_integers_between_150_and_300 = random_integers_between_50_and_100 * 3
print(random_integers_between_150_and_300)

## Create a Linear Dataset

Your goal is to create a simple dataset consisting of a single feature and a label as follows:

1. Assign a sequence of integers from 6 to 20 (inclusive) to a NumPy array named `feature`.
2. Assign 15 values to a NumPy array named `label` such that:

```
   label = (3)(feature) + 4
```
For example, the first value for `label` should be:

```
  label = (3)(6) + 4 = 22
 ```

In [None]:
# This give a suite of integers by increment of 1 from 6 to 20 (21 excluded)
feature = np.arange(6, 21)
print(feature)
print()

# Multiply the 15x1 vector above by 3, and add 4 to each line 
label = (feature * 3) + 4
print(label)

## Add Some Noise to the Dataset

To make your dataset a little more realistic, insert a little random noise into each element of the `label` array you already created. To be more precise, modify each value assigned to `label` by adding a *different* random floating-point value between -2 and +2. 

Don't rely on broadcasting. Instead, create a `noise` array having the same dimension as `label`.

In [None]:
# This creates a 15x1 noise vector between 0 and 1, 
# then multiple by 4, which gives vector between 0 and 4
# then substract 2, which gives vector between -2 and +2  
noise = (np.random.random([15]) * 4) - 2
print("noise =", noise)
print()

# Thia adds the "noise" above, to the "label" above, and that becomes the new "label"
label = label + noise 
print("label =", label)
print()

# Converting between Pandas and numpy arrays

In [None]:
d1 = datetime.datetime(2019, 5, 1)
d2 = datetime.datetime(2019, 12, 31)
ticker = yf.Ticker('MSFT')
histData = ticker.history(start=d1, end=d2)
print(histData)

In [None]:
openSeries = histData.Open
openSeries

In [None]:
openData = openSeries.to_numpy()
openData