# Efficient determination of zero-crossings in noisy real-life time series
## Advanced Data Science Capstone Project
### Import of all necessary libraries

The following main libraries are imported:

1. Scikit-learn
2. Pyspark
3. Tensorflow and Keras
4. Numpy

In each step, particular libraries are used:

1. Initial data exploration: numpy, matplotlib.pyplot, random, math.
2. ETL: pyspark (with SparkConf, SparkContext, SparkSession, ml.linalg.Vectors) is added.
3. Feature engineering: pyspark (with ml.feature.PolynomialExpansion, ml.feature.MinMaxScaler), Scikit-learn (with preprocessing.PolynomialFeatures and preprocessing.MinMaxScaler) are added.
4. Model definition: pyspark.ml.regression.LinearRegression,  sklearn.linear_model.LinearRegression, keras.models.Sequential and keras.models.Dense are added.
5. Model training: sklearn.metrics.r2_score and keras.callbacks.EarlyStopping are added.
5. Model evaluation: No additional libraries are required.
6. Model deployment: os and warnings can be added (optional).

In [1]:
import random

import sklearn
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures, MinMaxScaler
from sklearn.metrics import r2_score

try:
  import pyspark
except:
  !pip install pyspark
  import pyspark
from pyspark import SparkConf, SparkContext
sc = SparkContext.getOrCreate(SparkConf().setMaster("local[*]"))
from pyspark.sql import SparkSession
spark = SparkSession \
    .builder \
    .getOrCreate()

from pyspark.ml.feature import PolynomialExpansion
from pyspark.ml.linalg import Vectors
from pyspark.ml.feature import MinMaxScaler as sparkScaler
from pyspark.ml.regression import LinearRegression as spark_LR
from pyspark.ml import Pipeline, PipelineModel


import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras import optimizers
from keras.callbacks import EarlyStopping

import warnings
import os

import numpy as np
import matplotlib.pyplot as plt
import math
from math import exp, sin, cos, log, pi

Collecting pyspark
[?25l  Downloading https://files.pythonhosted.org/packages/27/67/5158f846202d7f012d1c9ca21c3549a58fd3c6707ae8ee823adcaca6473c/pyspark-3.0.2.tar.gz (204.8MB)
[K     |████████████████████████████████| 204.8MB 73kB/s 
[?25hCollecting py4j==0.10.9
[?25l  Downloading https://files.pythonhosted.org/packages/9e/b6/6a4fb90cd235dc8e265a6a2067f2a2c99f0d91787f06aca4bcf7c23f3f80/py4j-0.10.9-py2.py3-none-any.whl (198kB)
[K     |████████████████████████████████| 204kB 21.6MB/s 
[?25hBuilding wheels for collected packages: pyspark
  Building wheel for pyspark (setup.py) ... [?25l[?25hdone
  Created wheel for pyspark: filename=pyspark-3.0.2-py2.py3-none-any.whl size=205186687 sha256=93fd8b1a47a651085060ecd9e52f3a48184471f41469f86b30b48511587d4d45
  Stored in directory: /root/.cache/pip/wheels/8b/09/da/c1f2859bcc86375dc972c5b6af4881b3603269bcc4c9be5d16
Successfully built pyspark
Installing collected packages: py4j, pyspark
Successfully installed py4j-0.10.9 pyspark-3.0.2
