# Exploring Machine Learning

In this notebook, we will explore ways to analyze data, build models, and apply predictions on data corresponding to financial accounts. The idea is to grasp how machine learning works and evaluate its uses on financial systems.

---

## Initializing required libraries 

Here are the list of essential libraries used for building machine learning models and implementing predictions :

- **Tensorflow** - *An open source software library used for conducting machine learning and deep neural networks research.*

- **Numpy** - *An open source package for python used for scientific computing that supports large, multidimensional arrays and matrices, and is mainly used for data analysis.*

- **Pandas** - *An open source library aimed to be the fundamental high-level building block for doing practical, real world data analysis in Python, and is mainly used for data manipulation and analysis.*

- **Seaborn** - *An open source library for data visualization which provides a high-level interface for drawing attractive and informative statistical graphics.*

In [11]:
import tensorflow as tf
import numpy as np
import pandas as pd
import seaborn as sns
import shutil

print("TensorFlow v" + tf.__version__)
print("Numpy v" + np.__version__)
print("Pandas v" + pd.__version__)

TensorFlow v1.10.1
Numpy v1.14.5
Pandas v0.23.4


## Extraction of sample data

There is already an existing database in which dataset can be queried from, but on this experiment we will be using an already exported CSV file of the dataset. For inquiries on how will the flow be when getting dataset by utilizing BigQuery, below is a sample flow :

In [None]:
# import google.datalab.bigquery as bq
    
# raw_query = """
#     select something from data source where something = PARAMS
# """

# query = raw_query.replace("PARAMS", "params_value")

# result_set = bq.Query(query).execute().result().to_dataframe()

For a more complex example which splits query creation into different phases of machine learning data extraction and analysis see below code :

In [None]:
# def sample_between(start, end) :
# 	base_query = """
# 		select something from source where fixed conditions are met
# 	"""

# 	conditional_sampling_a = "and where condition respects PARAMS"
# 	conditional_sampling_b = "and where condition is somewhere within {0} and {1}".format(start, end)

# 	return "{} \n {} \n {}".format(base_query, conditional_sampling_a, conditional_sampling_b)

# def create_query(phase, params_value) :
# 	# Phases : 
# 	# 	train - 70% of data
# 	# 	valid - 15% of data
# 	# 	test - 15% of data
# 	query = ""

# 	if phase == 'train' :
# 		query = sample_between(0, 70)
# 	elif phase == 'valid' :
# 		query sample_between(70, 85)
# 	else :
# 		query = sample_between(85, 100)

# 	return query.replace("PARAMS", str(params_value))

Another example consist of finding a baseline **(alpha)** for a formulated column which will serve as the Root-Mean-Square error **(RMSE)**,
*a frequently used measurement of the differences between values predicted by a model or an estimator and the values observed*, and splitting datasets into two labels called **train** and **eval** :