# Google Colaboratoy

__Google Collaboratory__  exploits the Google Cloud to perform the calculations necessary for the computational chart to be implemented.
Google Colab is a Jupyter notebook environment that runs entirely in the cloud thus requiring no setup onto the (local) working machine whilst at the same time providing all of the powerful computing resources (e.g. full access to Google's GPUs and TPUs).

Google Colab is built on top of Jupyter notebook but it has some differences. Just like Jupyter there are both code cells and markdown cells for text; below are some of the hot-keys
* `Ctrl+M+H` to edit short-cut settings;
* `Ctrl+Enter` to execute a code cell;
* `Shit+Enter` to execute a markdown cell;
* `Ctrl+M+B` to add a code cell below;
* `Ctrl+M+D` to delete the current cell. 

Colab uses `marked.js` and so it is similar but not quite identical to the markdown used by Jupyter and Github.
The biggest differences are that Colaboratory supports (MathJax)  LATEX  equations like Jupyter, but does not allow HTML tags in the markdown unlike most other markdowns.

### Google Drive integration

Google Colab interacts with Google Drive where it is possible to store, access and share the notebooks created on Jupyter or Google Colab; also the Google Drive `New` menu allows to create a Colab notebook from scratch.
 __Warning:__ of course being Colab a cloud-based framework means that computational resources are shared among all the users connected to it; as such, should you leave your notebook unused for too long the session will time-out and the `Runtime` disconnect.

The fact that Jupyter notebooks can be open, accessed, modified, shared or even created in Google Colab using Google Drive __doesn't means__ that Colab accesses any file in the Drive automatically; if one has a database of pictures to train a DNNs for Image Classification saved on Google Drive and it wants to run it on Colab then it must first _"mount the drive"_ as shown below.

In [None]:
from google.colab import drive
drive.mount('/content/drive')
!ls "/content/drive/My Drive"

### File upload

Within the interactive environment one can simply upload those files to the runtime session using the `Files` tab on the left-hand side, or alternatively upload said files manually by simply using the traditional Python `file.upload`

In [None]:
from google.colab import files
uploaded=files.upload()

### GitHub interaction

Alongside Google Drive, Colab allows for clean __GitHub__ integration including loading and saving its notebook on the remote repos.

Any `.ipynb` notebook saved in a GitHub repo is uploaded in Colab in one of the two equivalent ways:

* internally in Colab by selecting `File`->`Upload notebook`->`GitHub` and then simply paste the GitHub URL for the notebook file;

* directly from GitHub by using the `Open in Colab` Chrome extension while being at the URL of the desired notebook.

The notebooks opened in Colab from GitHub won't overwrite the source one in the repo when saved; if one wishes to do so then it must give permission to Colab to `push` the `commit` to the master repo.
This achieved by selecting `File`->`Save a copy in Drive`->`Save a copy to GitHub` and then following the prompt from there.

# NumPy

NumPy is a library that enables high-performance vector-matrix manipulation: its main data object is the `ndarray` inteded as a colletion of items of the same type and it is indexed via a traditional integer-increasing 0-convention. It can be initialized in several different ways the most common being by using a Python `list` or `tuple` and by using placeholders

In [None]:
import numpy as np

a = [1,2,3,4] # Python list
a1 = np.array(a) # initialization by list
a2 = np.array([(1.5,2,3), (4,5,6)]) # inizialization by conversion of sequences
a3 = np.zeros((2,3)) # inizialization by placeholder of zeros
a4 = np.empty((2,3)) # uninitialized placeholder

NumPy arrays elements' are accessed via their labelled index much like traditional Python `list` and sequences; of course multidimensional arrays have multiple indeces, one for each axis

In [None]:
print(a1[1])
print(a2[1,2])
a5 = np.array([[ 0,  1,  2,  3],
       [10, 11, 12, 13],
       [20, 21, 22, 23],
       [30, 31, 32, 33],
       [40, 41, 42, 43]])
print(a5[1:4])
print(a5[1:4][1])
print(a5[1:4][1][1])
print(a5[:,1])

2
6.0
[[10 11 12 13]
 [20 21 22 23]
 [30 31 32 33]]
[20 21 22 23]
21
[ 1 11 21 31 41]


## Useful operations in NumPy


* Creation of higher-dimensional array by nesting lists

In [None]:
tensor = np.array([ [[1,2],[3,4]] , [[5,6],[7,8]] ], dtype='int32')
print(tensor[:,:,1])
tensor

[[2 4]
 [6 8]]


array([[[1, 2],
        [3, 4]],

       [[5, 6],
        [7, 8]]], dtype=int32)

* Get dimension and shape of the array

In [None]:
print(tensor.ndim)
print(tensor.shape)

3
(2, 2, 2)


* Replace slices of array with other arrays

In [None]:
A = np.zeros((5,5))
print(A)
B = np.ones((3,3))
A[1:4,1:4] = B
print(A)
A[2,2] = 0
print(A)

[[0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]]
[[0. 0. 0. 0. 0.]
 [0. 1. 1. 1. 0.]
 [0. 1. 1. 1. 0.]
 [0. 1. 1. 1. 0.]
 [0. 0. 0. 0. 0.]]
[[0. 0. 0. 0. 0.]
 [0. 1. 1. 1. 0.]
 [0. 1. 0. 1. 0.]
 [0. 1. 1. 1. 0.]
 [0. 0. 0. 0. 0.]]


* High-level mathematical operations

In [None]:
a = np.array([1,2,2,4])
print(a*-1)
print(a**2)
print(a+2)
print(a+2*a)
print(np.cos(a))
A = np.array([a,-1*a,2*a,np.sin(a)])
print(A)
print(np.min(A,axis=1))
print(np.min(A,axis=0))
print(np.min(A[:,1],axis=0))

[-1 -2 -2 -4]
[ 1  4  4 16]
[3 4 4 6]
[ 3  6  6 12]
[ 0.54030231 -0.41614684 -0.41614684 -0.65364362]
[[ 1.          2.          2.          4.        ]
 [-1.         -2.         -2.         -4.        ]
 [ 2.          4.          4.          8.        ]
 [ 0.84147098  0.90929743  0.90929743 -0.7568025 ]]
[ 1.        -4.         2.        -0.7568025]
[-1. -2. -2. -4.]
-2.0


* Reshaping and stacking

In [None]:
before = np.array([a,2*a,a-0.5*a])
print(before)
print(before.shape)
after = before.reshape((6,2))
print(after)
print(after.shape)

[[1.  2.  2.  4. ]
 [2.  4.  4.  8. ]
 [0.5 1.  1.  2. ]]
(3, 4)
[[1.  2. ]
 [2.  4. ]
 [2.  4. ]
 [4.  8. ]
 [0.5 1. ]
 [1.  2. ]]
(6, 2)


In [None]:
a = np.array([0,1,-1,5])
print(np.vstack([a,-1*a]))
print(np.hstack([a,-1*a]))

[[ 0  1 -1  5]
 [ 0 -1  1 -5]]
[ 0  1 -1  5  0 -1  1 -5]


* Basic i/o on files

In [None]:
filedata = np.genfromtxt('file.txt', delimiter='\;')
filedata = filedata.astype('int32')

# Pandas

Pandas is a package for data manipulation that uses the DataFrame objects from R (as well as different R packages) in a Python environment. The `series` object is the primary building block of pandas. A `series` represents a one-dimensional labeled indexed array based on the NumPy `ndarray`. Like an array, a `series` can hold zero or more values of any single data type; it can be created and initialized by passing either a scalar value, a NumPy `ndarray`, a Python `list`, or a Python `Dict` as the data parameter of the `series` constructor.

In [None]:
import pandas as pd

s1 = pd.Series([1,2,3,4]) # series initialized by list
s2 = 

Differently from NumPy arrays, whose elements are accesed by their (integer) indexed position (starting from 0), a pandas's `series` index can be customized.

## Useful operation on DataFrame

* Read and store a `csv` file into a `dataframe`

In [None]:
df = pd.read_csv('database.csv')
# df = pd.read_excel('database.xlsx')

* Basic i/o on dataframe

In [None]:
df.head(4) # print first 4 rows
df.iloc[5:15] # print rows from 5 to 15
df.tail(3) # print last 3 rows

In [None]:
df.columns # print the headers of the columns (features and labels)
features = ['Team season','Last seas. QBR','Label (Record)']
df[features][0:15] # print specific features (columns) from row 0 to row 15
df.loc[df['Last seas. QBR'] == 90.0] # access only rows of dataset whose x4 column has value 90.0
df['Bias'] = df.iloc[:, 2:8].sum(axis=1) # create new feature (column) as half the sum of the previous ones
df.drop(columns=['Bias']) # delete the aforementioned column
df.loc[df['Team season'].str.contains('NE|SF', regex=True)] # filters out all the rows that contain 'NE' or 'SF' in their first column
labels = df.pop('Label (Record)') # extract one column from the dataframe into a single array
df.to_csv('training set.csv', index=False) # saves the dataframe into a new csv file (check the files repo on the left hand side of google colab)

* Check types

In [None]:
print(labels.dtypes)
print(df.dtypes)

* High-level operations

In [None]:
print(df['SOS'].max())
print (df['SOS'].min())
print(df.loc[df['Last seas. QBR'] >= 120.0])
df.loc[df['Last seas. QBR'] >= 120.0] = df.loc[df['Last seas. QBR'] >= 120.0]*0.5
print(df.loc[df['Last seas. QBR'] >= 120.0])

# Tensorflow

TensorFlow is a Google-developed open-source software library (platform) for numerical computation, differentiable programming and machine learning applications. It bundles together a comprehensive and flexible ecosystem of machine learning algorithms with emphasis on deep learning techniques; being a cross-platform programming framework means that TensorFlow can run on (virtually) any CPU, GPU or even TPU

It provides Python, C++, Java etc front-end APIs for building applications with the framework while its core executes those applications in the back-ground with high-performance C++. On top of the front-ends it sits a structure of abstraction layers APIs that provide a simpler interface for commonly used ML models especially in deep learning. The tip of the mountain is where the __Canned estimators__ sit, ready-to-train machine learning models and algorithms in a box.

![alt text](https://static.javatpoint.com/tutorial/tensorflow/images/tensorflow-api-2.png)


For a more comprehensive consultation of Neural Networks and TensorFlow material refer to the following links:
* introductory guide to [NNs principles and architectures](https://neuralnetworksanddeeplearning.com/); 
* advance theoretical topics on [Neural Networks' mathematics](https://colah.github.io/);
* TensorFlow tutorials from [beginners](https://www.tensorflow.org/tutorials) to [professionals](https://www.javatpoint.com/tensorflow) including a guide on [its tools](https://www.tensorflow.org/resources/tools);
* [visualization tools](https://tensorspace.org/) for NNs architectures and classification boundaries.

![alt text](https://1.bp.blogspot.com/-XLebiZ3qt8o/XDwxlrGK_8I/AAAAAAAAGaw/PHnps1WR8NMCU7X_Hu717DdJLSwGU8T3ACPcBGAYYCw/s1600/neuron.png)

There are several ways to feed data (sampled from a given dataset) into a `keras` model in tensorflow; in the following we will summarize the simplest two to implement


## Input as NumPy array from pandas DataFrame

* File upload from Google Colab files

In [None]:
from google.colab import files 
uploaded = files.upload()

* Conversion of the file into a pandas DataFrame

In [None]:
import pandas as pd

spreadsheet = 'database.xlsx'
data = pd.read_excel(spreadsheet)

* Splitting the features from the labels into separate lists: this can be done in three different ways

  * 1) By exporting different parts of the dataset into separate lists;
  * 2) By filtering out the labels from the dataset and subsequentially dropping them for the features;
  * 3) By popping-out the labels from the dataset.

In [None]:
# Method 1)

features = ['1xrecord','2xrecord','3xrecord','QBR','PO','TO','SOS']
datapoints = data[features]
labels = data['Record']

In [None]:
# Method 2)

labels = data.filter(['Record'])
datapoints = data.drop(columns = ['Record'])

In [None]:
# Method 3)

labels = data.pop('Record')
datapoints = data.copy()

* Convert any possible non-numerical feature or label (those whose data-type is `object`)

In [None]:
labels.replace('cat', 1, inplace=True)
labels.replace('dog', 0, inplace=True)

datapoints = pd.get_dummies(datapoints)

* Creation of training, validation and test set

In [None]:
datapoints = datapoints.values.astype('float32')
labels = labels.values.astype('float32')

x_train, x_test, y_train, y_test = train_test_split(datapoints, labels, test_size=0.2)
x_train, x_val, y_train, y_val = train_test_split(x_train, y_train, test_size=0.2)

* Conversion of pandas lists into NumPy arrays

In [None]:
import numpy as np

x_train, y_train = np.array(x_train), np.array(y_train)
x_test, y_test = np.array(x_test), np.array(y_test)
x_val, y_val = np.array(x_val), np.array(y_val)
N = x_train.shape[0] # number of (input) datapoints
D = x_train.shape[1] # number of (dimensions) features

* Build the model

In [None]:
import tensorflow as tf
from tensorflow import keras
from keras.models import Sequential
from keras.layers import Dense

model.Sequential()
model.add(Dense(D, input_shape=(D,)))
model.add(Dense(11), activation='sigmoid')
model.add(Dense(5), activation='relu')
model.add(Dense(3), activation='relu')
model.add(Dense(1), activation='softmax')

* Compile the model

In [None]:
learning_algorithm = keras.optimizers.RMSprop(0.001)
model.compile(loss='mse', optimizer=learning_algorithm, metrics=['accuracy'])
model.summary()

* Training and testing

In [None]:
model.fit(x_train, y_train, validation_data = zip(x_val, y_val), batch_size = 0.1*N, epochs = 10)
y = model.predict(x_test)
performance = np.sqrt(np.mean(((y-y_test)**2))) # RMSE

# Alternative
# model.evaluate(x_test, y_test)

## Input pipeline of feature columns through tensorflow Dataset

* File upload from Google Colab files

In [None]:
from google.colab import files 
uploaded = files.upload()

* Conversion of the file into a pandas DataFrame

In [None]:
import pandas as pd
import numpy as np

spreadsheet = 'database.xlsx'
data = pd.read_excel(spreadsheet)

* Split the dataset into train, test and validation set

In [None]:
from sklearn.model_selection import train_test_split

train, test = train_test_split(data, test_size=0.2)

* Wrapping the Dataframe with a tensorflow Dataset 

In [None]:
import tensorflow as tf

def wrapper(data, shuffle=True, batch_size=32):
  data = data.copy()
  labels = data.pop('Record')
  dataset = tf.data.Dataset.from_tensor_slices((dict(data), labels))
  if shuffle:
    dataset = dataset.shuffle(buffer_size=len(data))
  dataset = dataset.batch(batch_size)
  return dataset

* Creation of training and test set

In [None]:
trainset = wrapper(train)
testset = wrapper(test, shuffle=False)

* Creation of (numeric) feature columns

In [None]:
features = ['1xrecord','2xrecord','3xrecord','QBR','PO','TO','SOS']
feature_columns = []

for header in features:
  feature_columns.append(feature_column.numeric_column(header))

* Creation of a feature (input) layer

In [None]:
from tensorflow import keras

input_layer = keras.layers.DenseFeatures(feature_columns)

* Build the model

In [None]:
model = keras.Sequential([
  input_layer,
  layers.Dense(128, activation='relu'),
  layers.Dense(128, activation='sigmoid'),
  layers.Dense(1)
])

model.compile(optimizer='sgd',
              loss = keras.losses.BinaryCrossentropy(from_logits=True),
              metrics=['mae', 'mse'])

* Training and testing

In [None]:
model.fit(trainset, validation_split = 0.2, epochs=5)
performance = model.evaluate(testset)
print("Accuracy", performance)