---
---
---
# **Studienarbeit in der Vorlesung 'Applied Big Data Analyitics'**
# **Implementation: Percent Price Oscillator (PPO) anhand des Data-Warehouse-Models**
---
---
---
* ### **Erstellt:** Wintersemeseter 2019-2020
* ### **Dozent:** Prof. Dr. Sebastian Leuoth
* ### **Autor:** Jan Gaida
* ### **Email:** jan.gaida@hof-university.de
* ### **Git:** [trading_2019](https://github.com/sleuoth-hof/trading_2019), [JanGaida](https://github.com/JanGaida) 
---
---
### *Powered by:*
### ***Hochschule für Angewandte Wissenschaften Hof***
[![Logo: Hochschule Hof](https://www.uni-assist.de/fileadmin/_processed_/4/7/csm_hof-university_logo_308ee8b37b.jpg)](https://www.hof-university.de/)
---
---
---
*© 2019-2020 Jan Gaida, Prof. Dr. Sebastian Leuoth. All Rights Reserved.*







---
# Datenbank
---
*Urspung: [More than 400 Cryptocurrency-Chartdata](https://www.kaggle.com/tencars/392-crypto-currency-pairs-at-minute-resolution/version/2)*

In [1]:
# clonen
!git clone https://github.com/sleuoth/ABDA2019.git

# entpacken 'btcusd'
!unzip ABDA2019/testdaten/cryptominuteresolution/btcusd.csv.zip
!mv btcusd.csv ABDA2019/testdaten/cryptominuteresolution/btcusd.csv

# resultat
!ls ABDA2019/testdaten/cryptominuteresolution

Cloning into 'ABDA2019'...
remote: Enumerating objects: 424, done.[K
remote: Total 424 (delta 0), reused 0 (delta 0), pack-reused 424[K
Receiving objects: 100% (424/424), 486.44 MiB | 18.90 MiB/s, done.
Resolving deltas: 100% (7/7), done.
Checking out files: 100% (418/418), done.
Archive:  ABDA2019/testdaten/cryptominuteresolution/btcusd.csv.zip
  inflating: btcusd.csv              
abseth.csv	cndbtc.csv  foausd.csv	mkrdai.csv  rdnusd.csv	utkbtc.csv
absusd.csv	cndeth.csv  fsnbtc.csv	mkreth.csv  repbtc.csv	utketh.csv
agibtc.csv	cndusd.csv  fsneth.csv	mkrusd.csv  repeth.csv	utkusd.csv
agieth.csv	cnneth.csv  fsnusd.csv	mlneth.csv  repusd.csv	utneth.csv
agiusd.csv	cnnusd.csv  fttusd.csv	mlnusd.csv  reqbtc.csv	utnusd.csv
aidbtc.csv	csxeth.csv  fttust.csv	mnabtc.csv  reqeth.csv	veebtc.csv
aideth.csv	csxusd.csv  funbtc.csv	mnaeth.csv  requsd.csv	veeeth.csv
aidusd.csv	ctxbtc.csv  funeth.csv	mnausd.csv  rifbtc.csv	veeusd.csv
aiobtc.csv	ctxeth.csv  funusd.csv	mtnbtc.csv  rifusd.csv	vetbtc.csv


---
# Weitere Frameworkinstallation
---

In [2]:
# jdk
!apt-get install openjdk-11-jdk-headless -qq > /dev/null

# tree
!apt-get install tree

# spark-package
!wget -q https://www-us.apache.org/dist/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz
!tar xf spark-2.4.4-bin-hadoop2.7.tgz

# findspark
!pip install findspark

# numpy
!pip install numpy

# timeseries library
!pip install ts ts-flint

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following package was automatically installed and is no longer required:
  libnvidia-common-430
Use 'apt autoremove' to remove it.
The following NEW packages will be installed:
  tree
0 upgraded, 1 newly installed, 0 to remove and 7 not upgraded.
Need to get 40.7 kB of archives.
After this operation, 105 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/universe amd64 tree amd64 1.7.0-5 [40.7 kB]
Fetched 40.7 kB in 0s (1,155 kB/s)
Selecting previously unselected package tree.
(Reading database ... 135143 files and directories currently installed.)
Preparing to unpack .../tree_1.7.0-5_amd64.deb ...
Unpacking tree (1.7.0-5) ...
Setting up tree (1.7.0-5) ...
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...
Collecting findspark
  Downloading https://files.pythonhosted.org/packages/b1/c8/e6e1f6a303ae5122dc28d131b5a67c5eb87cbf8f7ac5b9f87764ea1b1e1e/fi

---
# Imports
---

In [30]:
# Für Pfade und andere OS-Funktionalität
import os
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
os.environ["SPARK_HOME"] = "/content/spark-2.4.4-bin-hadoop2.7"

if 'COLAB_TPU_ADDR' not in os.environ:
  print(">> Colab verbunden mit einer TPU")
else:
  print(">> Colab nicht verbunden mit einer TPU")

# Spark und Spark-SQL
import findspark
findspark.init()

from pyspark import SparkContext
from pyspark.sql import SparkSession
from pyspark.sql import SQLContext
from pyspark.sql.functions import input_file_name, col, collect_list, concat_ws, udf
from pyspark.sql.types import DoubleType, StringType
from pyspark.sql.window import Window

# Initialisierung Spark und Spark-SQL
spark = SparkSession.builder.master("local[*]").getOrCreate()
sc = spark.sparkContext
sqlContext = SQLContext(sc)

# Für einfache Weiterverarbeitung des Datenframes (z.B. Visualisierung mit Plotty)
import pandas as pd
from pandas import Timestamp

# Für einfaches Handeln von Arrays aus Pandas-Datenframes
import numpy

# Visualisierung als Diagramm
import matplotlib.pyplot as plt
import plotly.graph_objects as go

# Für Darstellung von Datenframes
from IPython.display import display, HTML

# Für File-Download
from google.colab import files

# Um Zip-Datein zu erstellen
import shutil

>> Colab verbunden mit einer TPU


---
# Anwendung PPO-Indikator
---

*PPO Implementation basierend auf: [Investopedia.com](https://www.investopedia.com/terms/p/ppo.asp#formula-and-calculation-for-ppo)*


In [0]:
Console_Preview_Limit = 5  #@param {type: "slider", min: 1, max: 100}
#@markdown 'Git' enstpricht den dem zuvor geladen Repository 'ABDA2019':
Datenframe_Von = "Git" #@param ["Git", "Pfad", "Sample: Intel (requieres './intc.csv')"] {allow-input: false}
#@markdown Nur benutzt wenn 'Datenframe_Von' auf 'Git' gesetzt ist: 
Filenamen_Zu_Laden = '*.csv'  #@param {type: "string"}
#@markdown Nur benutzt wenn 'Datenframe_Von' auf 'Pfad' gesetzt ist:
Filepfad_Zu_Laden = '*.csv'  #@param {type: "string"}
Filenamen_Stellen_Von_Filepfad = 0  #@param {type: "integer"}



*Metadaten-Aggregation*
---
---

In [5]:
#
# Intialen Datenframe bilden
#

if Datenframe_Von == "Git":
  df_spark = spark.read.csv('./ABDA2019/testdaten/cryptominuteresolution/' + Filenamen_Zu_Laden, inferSchema = True, header = True)
  if Filenamen_Zu_Laden == '*.csv':
    start = 57
  else:
    start = 59
  end = 6

elif Datenframe_Von == "Pfad":
  df_spark = spark.read.csv( Filepfad_Zu_Laden, inferSchema = True, header = True)
  start = Filepfad_Zu_Laden.count + 15 - Filenamen_Stellen_Von_Filepfad - 4
  end = Filenamen_Stellen_Von_Filepfad

elif Datenframe_Von == "Sample: Intel (requieres './intc.csv')":
  df_spark = spark.read.csv('./intc.csv', inferSchema = True, header = True)
  start = 17
  end = 4

df_spark = df_spark \
  .withColumn('filepath', input_file_name()) \
  .withColumn('filename', (input_file_name()[start:end]) ) \
  .withColumn('timestamp', df_spark['time']/1000) \
  .withColumn('date', (df_spark['time']/1000).cast('timestamp'))

print("Type:", type(df_spark), "\n")
print("Schema: ", end = '')
df_spark.printSchema()
df_spark_total_count = df_spark.count()
print("Available rows:", df_spark_total_count, "\n\nDaten-Preview:")
df_spark.show(Console_Preview_Limit, False)

# Erstellen der SQL-Tabelle 
df_spark.createOrReplaceTempView("base_data")

df_spark_informationView = spark.sql("SELECT filename, count(*), min(date), max(date)  FROM base_data group by filename")
print("Geladene Daten:")
df_spark_informationView.show(Console_Preview_Limit, False)

Type: <class 'pyspark.sql.dataframe.DataFrame'> 

Schema: root
 |-- time: long (nullable = true)
 |-- open: double (nullable = true)
 |-- close: double (nullable = true)
 |-- high: double (nullable = true)
 |-- low: double (nullable = true)
 |-- volume: double (nullable = true)
 |-- filepath: string (nullable = false)
 |-- filename: string (nullable = false)
 |-- timestamp: double (nullable = true)
 |-- date: timestamp (nullable = true)

Available rows: 33909757 

Daten-Preview:
+-------------+-----+-----+-----+-----+-----------+------------------------------------------------------------------+--------+------------+-------------------+
|time         |open |close|high |low  |volume     |filepath                                                          |filename|timestamp   |date               |
+-------------+-----+-----+-----+-----+-----------+------------------------------------------------------------------+--------+------------+-------------------+
|1364774820000|93.25|93.3 |93.3 |

*Tag-Filter*
---
---

In [6]:
#
# PRO-TAG-FILTER
#

base_data_dayfiltered = spark.sql(
  """SELECT d.* FROM base_data d  
     where  (d.date, filename) in (select max(m.date), filename from base_data m group by date_format(m.date, "y-M-d"), filename)
  """)

# Erstellen der SQL-Tabelle
base_data_dayfiltered.createOrReplaceTempView("base_data_dayfiltered")

base_data_dayfiltered_total_count = base_data_dayfiltered.count()
#base_data_dayfiltered.show(Console_Preview_Limit, False)

base_data_dayfiltered_informationView = spark.sql("SELECT filename, count(*), min(date), max(date)  FROM base_data_dayfiltered group by filename")
print("Available rows:", base_data_dayfiltered_total_count, "(lost",(df_spark_total_count-base_data_dayfiltered_total_count), "rows)", "\n\nVerfügbare Daten pro Tag:")
base_data_dayfiltered_informationView.show(Console_Preview_Limit)

Available rows: 156579 (lost 33753178 rows) 

Verfügbare Daten pro Tag:
+--------+--------+-------------------+-------------------+
|filename|count(1)|          min(date)|          max(date)|
+--------+--------+-------------------+-------------------+
|  neceth|      18|2019-06-30 15:13:00|2019-09-30 03:55:00|
|  dtxusd|      43|2019-08-16 23:55:00|2019-10-01 23:35:00|
|  omnusd|     167|2018-10-30 23:44:00|2019-10-01 03:05:00|
|  btcusd|    2211|2013-04-01 23:58:00|2019-10-01 18:46:00|
|  yywusd|     664|2017-12-01 23:59:00|2019-10-01 19:47:00|
|  sngeth|     325|2018-01-24 23:11:00|2019-09-26 21:04:00|
|  neojpy|     541|2018-03-29 22:48:00|2019-10-01 22:36:00|
|  dtabtc|     225|2018-07-12 23:41:00|2019-10-01 10:04:00|
|  euteur|      78|2018-11-27 16:32:00|2019-10-01 07:48:00|
|  gotusd|     359|2018-09-06 23:52:00|2019-10-01 14:06:00|
|  pasusd|     253|2019-01-17 23:09:00|2019-10-01 22:17:00|
|  tnbusd|     615|2018-01-08 23:59:00|2019-10-01 19:00:00|
|  udcusd|     285|2018-12-0

*EMA-UDF*
---
---
Berrechnet nach: [tradistats.com](https://tradistats.com/exponentieller-gleitender-durchschnitt/)

In [0]:
#
# EMA-UDF
#

# see https://tradistats.com/exponentieller-gleitender-durchschnitt/
def ema(ar):
    if len(ar) > 0:
       SF  = 2/ (len(ar)+1)
       SFi = 1 - SF
       my_ema = ar[0]
       for i in ar:
           my_ema = (i * SF) + (my_ema * SFi)
    return my_ema

ema_udf = udf(ema, DoubleType())

*Windows*
---
---

In [0]:
#
# EMA-WINDOWS
#
win26 = (Window
    .partitionBy("filename")
    .orderBy("date") \
    .rowsBetween(-25, 0))

win12 = (Window
    .partitionBy("filename")
    .orderBy("date") \
    .rowsBetween(-11, 0))

win9 = (Window
    .partitionBy("filename")
    .orderBy("date") \
    .rowsBetween(-8, 0))

*PPO- & Signal-Berechnung*
---
---

In [9]:
#
# EMA_26 && EMA_12 CALCULATION
#

base_data_tmp_ema_1 = base_data_dayfiltered \
  .withColumn('win26_close_list', collect_list('close').over(win26)) \
  .withColumn('win12_close_list', collect_list('close').over(win12))

base_data_tmp_ema_2 = base_data_tmp_ema_1.select(
  "*",
  ema_udf(base_data_tmp_ema_1["win26_close_list"]).alias("EMA26"),
  ema_udf(base_data_tmp_ema_1["win12_close_list"]).alias("EMA12")
)

base_data_tmp_ema_2.createOrReplaceTempView("base_data_tmp_ema_2")
base_data_tmp_ema_2.show(Console_Preview_Limit, False)

+-------------+------+------+------+------+------+------------------------------------------------------------------+--------+------------+-------------------+------------------------------------+------------------------------------+------------------+------------------+
|time         |open  |close |high  |low   |volume|filepath                                                          |filename|timestamp   |date               |win26_close_list                    |win12_close_list                    |EMA26             |EMA12             |
+-------------+------+------+------+------+------+------------------------------------------------------------------+--------+------------+-------------------+------------------------------------+------------------------------------+------------------+------------------+
|1565999700000|5.005 |5.005 |5.005 |5.005 |0.0012|file:/content/ABDA2019/testdaten/cryptominuteresolution/dtxusd.csv|dtxusd  |1.5659997E9 |2019-08-16 23:55:00|[5.005]                  

In [10]:
#
# PPO BERECHNUNG
#

base_data_tmp_ema_3 = spark.sql(
  " select *, (((EMA12 - EMA26) / EMA26) * CAST(100 AS DOUBLE))   " \
  " FROM base_data_tmp_ema_2 " \
  " order by filename asc, date desc "
)

#
# SIGNAL BERECHNUNG
#
base_data_tmp_ema_4 = base_data_tmp_ema_3 \
  .withColumn('win9_ema12sub26_list', collect_list('(((EMA12 - EMA26) / EMA26) * CAST(100 AS DOUBLE))').over(win9))


base_data_tmp_ema_5 = base_data_tmp_ema_4.select(
  "*",
  base_data_tmp_ema_4['(((EMA12 - EMA26) / EMA26) * CAST(100 AS DOUBLE))'].alias("PPO"),
  ema_udf(base_data_tmp_ema_4["win9_ema12sub26_list"]).alias("SIGNAL")
)

base_data_tmp_ema_5.createOrReplaceTempView("base_data_tmp_ema_5")
base_data_tmp_ema_5.show(Console_Preview_Limit, False)

+-------------+------+------+------+------+------+------------------------------------------------------------------+--------+------------+-------------------+------------------------------------+------------------------------------+------------------+------------------+-------------------------------------------------+-------------------------+---+------+
|time         |open  |close |high  |low   |volume|filepath                                                          |filename|timestamp   |date               |win26_close_list                    |win12_close_list                    |EMA26             |EMA12             |(((EMA12 - EMA26) / EMA26) * CAST(100 AS DOUBLE))|win9_ema12sub26_list     |PPO|SIGNAL|
+-------------+------+------+------+------+------+------------------------------------------------------------------+--------+------------+-------------------+------------------------------------+------------------------------------+------------------+------------------+-----------

*PPO_Histogram-Berechnung*
---
---

In [11]:
#
# PPO-HISTOGRAM BERECHNUNG
#

base_data_tmp_ema_6 = spark.sql(
  " select *, (PPO - SIGNAL)   " \
  " FROM base_data_tmp_ema_5 " \
  " order by filename asc, date desc "
)

base_data_ema = base_data_tmp_ema_6.select(
  "*",
  base_data_tmp_ema_6['(PPO - SIGNAL)'].alias("PPO_HISTOGRAM")
)

base_data_ema.createOrReplaceTempView("base_data_ema")
base_data_ema.show(Console_Preview_Limit, False)

+-------------+--------+--------+--------+--------+-------------+------------------------------------------------------------------+--------+------------+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------+--------------------+---------------------+-------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------+-------------------+--------------------+--------------------+
|time         |open    |close   |high    |low     |volume       |

---
# Analyse 
---
*PPO-Analyse ebenfalls basierend auf: [Investopedia.com](https://www.investopedia.com/terms/p/ppo.asp#what-the-indicator-tells-you)*


*Windows*
---
---

In [0]:
#
# Analyse-Windows
#

win2 = (Window
    .partitionBy("filename")
    .orderBy("date") \
    .rowsBetween(-1, 0))

win3 = (Window
    .partitionBy("filename")
    .orderBy("date")
    .rowsBetween(-2, 0))

*Funktionen*
---
---
**→ ppo_trend_analysis** :

Auf den 'PPO' angewendet (window mind. 1):

> *When the PPO is above zero, that helps confirm an uptrend since the short-term EMA is above the longer-term EMA. When the PPO is below zero, the short-term EMA is below the longer-term EMA, which is an indication of a downtrend.*


**→ ppo_crossover_signal_analysis** :

Auf den 'PPO_HISTOGRAM' angewendet (window mind. 2):
> *The indicator generates a buy signal when the PPO line crosses above the signal line from below, and a sell signal occurs when the PPO line crosses  below the signal from above.*

Auf den 'PPO' angewendet (window mind. 2):
> *Centerline crossovers also generate trading signals. Traders consider a move from below to above the centerline as bullish, and a move from above to below the centerline as bearish.*

**→ ppo_technical_divergence_analysis:**

Auf den 'close' und 'PPO' (window mind. 2; ggf. 'close' mit 'open' tauschen):
> *Traders can also use the PPO to look for technical divergence between the indicator and price. For example, if the price of an asset makes a higher high, but the indicator makes a lower high, it may indicate the upward momentum is subsiding. Conversely, if an asset's price makes a lower low, but the indicator makes a higher low, it could suggest that the bears are losing their traction and the price could head higher soon.*


In [0]:
#
# Analyse-UDF
#

def ppo_trend_analysis(ar):
    output = ""
    count = len(ar)
    if count > 0:
      # Einzelabgleich
      if count == 1:
        current = ar[0]
        if current > 0:
          output = "▲ Aufwärtstrend" # "Aufwärtstrend"
        elif current < 0:
          output = "▼ Abwärtstrend" # "Abwärtstrend"

      # Vergleich mit den davorherigenden; ggf. mit mehreren vorherigen
      elif count > 1:
        last_position = count - 1
        next_to_last_position = last_position - 1

        last = ar[last_position] 
        next_to_last = ar[next_to_last_position]

        if last > 0 and next_to_last > 0:
          output = "▲ Aufwärtstrend"
        elif last < 0 and next_to_last > 0:
          output = "↘ neuer Abwärtstrend"
        elif last < 0 and next_to_last < 0:
          output = "▼ Abwärtstrend"
        elif last > 0 and next_to_last < 0:
          output = "↗ neuer Aufwärtstrend"

    return output

def ppo_crossover_signal_analysis(ar):
    output = ""
    count = len(ar)
    # Vergleich des letzten Crossovers
    if count > 1:
        last_position = count - 1
        next_to_last_position = last_position - 1
        
        last = ar[last_position] 
        next_to_last = ar[next_to_last_position]

        if last <= 0 and next_to_last > 0:
          output = "Verkauf-Signal"
        elif last >= 0 and next_to_last < 0:
          output = "Kauf-Signal"

    return output

def ppo_technical_divergence_analysis(close_ar, ppo_ar):
    output = ""
    count_close = len(close_ar)
    count_ppo = len(ppo_ar)
    
    # Arrays müssen selbe Größe haben
    if count_close == count_ppo and count_close > 1:
      # delta berechnen
      pointer = count_close - 1
      delta_close = close_ar[pointer]
      delta_ppo = ppo_ar[pointer]
      pointer -= 1

      while pointer >= 0:
        delta_close -= close_ar[pointer]
        delta_ppo -= ppo_ar[pointer]
        pointer -= 1

      # steigung berechnen
      div_close = delta_close / count_close
      div_ppo = delta_ppo / count_ppo

      if div_close > div_ppo:
        output = "↧ Abflachend"
      else:
        output = "↥ Wachsend"

    return output

ppo_trend_analysis_udf = udf(ppo_trend_analysis, StringType())
ppo_crossover_signal_analysis_udf = udf(ppo_crossover_signal_analysis, StringType())
ppo_technical_divergence_analysis_udf = udf(ppo_technical_divergence_analysis, StringType())

*Aggregation*
---
---

In [14]:
#
# SIGNAL ANALYSIS
#

base_data_ema_tmp_eval = base_data_ema \
  .withColumn('win2_ppo_list', collect_list('PPO').over(win2)) \
  .withColumn('win2_ppoh_list', collect_list('PPO_Histogram').over(win2)) \
  .withColumn('win2_close_list', collect_list('close').over(win2))

base_data_ema_eval_result = base_data_ema_tmp_eval.select(
  "*",
  ppo_trend_analysis_udf(base_data_ema_tmp_eval["win2_ppo_list"]).alias("TREND"),
  ppo_crossover_signal_analysis_udf(base_data_ema_tmp_eval["win2_ppo_list"]).alias("WEAK_SIGNAL"),
  ppo_crossover_signal_analysis_udf(base_data_ema_tmp_eval["win2_ppoh_list"]).alias("STRONG_SIGNAL"),
  ppo_technical_divergence_analysis_udf(base_data_ema_tmp_eval["win2_close_list"], base_data_ema_tmp_eval["win2_ppo_list"]).alias("DIVERGENCE")
)

base_data_ema_eval_result.createOrReplaceTempView("base_data_ema_eval_result")
base_data_ema_eval_result.show(Console_Preview_Limit, False)

+-------------+------+------+------+------+------+------------------------------------------------------------------+--------+------------+-------------------+------------------------------------+------------------------------------+------------------+------------------+-------------------------------------------------+-------------------------+---+------+--------------+-------------+-------------+--------------+---------------+-----+-----------+-------------+------------+
|time         |open  |close |high  |low   |volume|filepath                                                          |filename|timestamp   |date               |win26_close_list                    |win12_close_list                    |EMA26             |EMA12             |(((EMA12 - EMA26) / EMA26) * CAST(100 AS DOUBLE))|win9_ema12sub26_list     |PPO|SIGNAL|(PPO - SIGNAL)|PPO_HISTOGRAM|win2_ppo_list|win2_ppoh_list|win2_close_list|TREND|WEAK_SIGNAL|STRONG_SIGNAL|DIVERGENCE  |
+-------------+------+------+------+------+-

---
# Auswertung
---

*Für alle finalen Empfehlungen gilt eine Reichteweite von 0 (schwach) bis 3 (stark)*

**Halten**
---
---
*   **H0:** 
  * ▼ Abwärtstrend && ↧ Abflachend
  * ▼ Abwärtstrend && ↥ Wachsend
  * ↘ neuer Abwärtstrend && ↧ Abflachend
*   **H1:**
  * ↗ neuer Aufwärtstrend && ↧ Abflachend
*   **H2:**
  * ▲ Aufwärtstrend && ↧ Abflachend
  * ↘ neuer Abwärtstrend && ↥ Wachsend
*   **H3:**
  * ▲ Aufwärtstrend && ↥ Wachsend
  * ↗ neuer Aufwärtstrend && ↥ Wachsend

In [0]:
#
# Halten-SQL-Statement
#

holdingCaseStatement = "CASE WHEN (TREND = \"↗ neuer Aufwärtstrend\" AND DIVERGENCE = \"↧ Abflachend\") THEN \"H1\"" +"\n" \
"WHEN (TREND = \"▲ Aufwärtstrend\" = \"↧ Abflachend\") OR (TREND = \"↘ neuer Abwärtstrend\" AND DIVERGENCE = \"↥ Wachsend\") THEN \"H2\"" +"\n" \
"WHEN (TREND = \"▲ Aufwärtstrend\" AND DIVERGENCE = \"↥ Wachsend\") OR (TREND = \"↗ neuer Aufwärtstrend\" AND DIVERGENCE = \"↥ Wachsend\") THEN \"H3\"" +"\n" \
"ELSE \"H0\" END" +"\n"

**Kaufen**
---
---
*   **K0:** 
  * ▼ Abwärtstrend && ↧ Abflachend
  * ↘ neuer Abwärtstrend && ↧ Abflachend
*   **K1:**
  * ▼ Abwärtstrend && ↥ Wachsend
  * ↘ neuer Abwärtstrend && ↥ Wachsend
  * ↗ neuer Aufwärtstrend && ↧ Abflachend
*   **K2:**
  * ▲ Aufwärtstrend && ↧ Abflachend
  * ↗ neuer Aufwärtstrend && ↥ Wachsend
*   **K3:**
  * ▲ Aufwärtstrend && ↥ Wachsend

In [0]:
#
# Kaufen-SQL-Statement
#

buyingCaseStatement = "CASE WHEN (TREND = \"▼ Abwärtstrend\" AND DIVERGENCE = \"↧ Abflachend\") OR (TREND = \"↘ neuer Abwärtstrend\" AND DIVERGENCE = \"↧ Abflachend\") THEN \"K0\"" +"\n" \
"WHEN (TREND = \"▲ Aufwärtstrend\" AND DIVERGENCE = \"↧ Abflachend\") OR (TREND = \"↗ neuer Aufwärtstrend\" AND DIVERGENCE = \"↥ Wachsend\") THEN \"K2\"" +"\n" \
"WHEN (TREND = \"▲ Aufwärtstrend\" AND DIVERGENCE = \"↥ Wachsend\") THEN \"K3\"" +"\n" \
"ELSE \"K1\" END" +"\n"

**Verkaufen**
---
---
*   **V0:** 
  * ▲ Aufwärtstrend && ↥ Wachsend
  * ▲ Aufwärtstrend && ↧ Abflachend
*   **V1:**
  * ↗ neuer Aufwärtstrend && ↥ Wachsend
  * ↘ neuer Abwärtstrend && ↥ Wachsend
*   **V2:**
  * ▼ Abwärtstrend && ↥ Wachsend
  * ↗ neuer Aufwärtstrend && ↧ Abflachend
  * ↘ neuer Abwärtstrend && ↧ Abflachend
*   **V3:**
  * ▼ Abwärtstrend && ↧ Abflachend

In [0]:
#
# Verkaufen-SQL-Statement
#

sellingCaseStatement = "CASE WHEN (TREND = \"▲ Aufwärtstrend\" AND DIVERGENCE = \"↥ Wachsend\")OR (TREND = \"▲ Aufwärtstrend\" AND DIVERGENCE = \"↧ Abflachend\") THEN \"V0\"" +"\n" \
"WHEN (TREND = \"↗ neuer Aufwärtstrend\" = \"↥ Wachsend\" AND DIVERGENCE = \"↥ Wachsend\") OR (TREND = \"↘ neuer Abwärtstrend\" AND DIVERGENCE = \"↥ Wachsend\") THEN \"V1\"" +"\n" \
"WHEN (TREND = \"▼ Abwärtstrend\" AND DIVERGENCE = \"↧ Abflachend\") THEN \"V3\" " +"\n" \
"ELSE \"V2\" END" +"\n"

**Aggregation**
---
---
*vgl. dazu Investopedia-Link (s.h. Sections-Header)*

* Wenn beide Signale null sind ⇒ *halten*
* Ansonsten 'Strong-' > 'Weaksignal', für jeweils beide gilt:
  * KAUF-Signal ⇒ *kaufen*
  * VERKAUF-Signal ⇒ *verkaufen*

```
+----------------+----------------+------------+
|  WEAK_SIGNAL   | STRONG_SIGNAL  | EMPFEHLUNG |
+----------------+----------------+------------+
| null           | null           | HALTEN     |
| Kauf-Signal    | null           | KAUFEN     |
| Kauf-Signal    | Kauf-Signal    | KAUFEN     |
| ...            | ...            | ...        |
| Verkauf-Signal | null           | VERKAUFEN  |
| Verkauf-Signal | Verkauf-Signal | VERKAUFEN  |
| ...            | ...            | ...        |
| Kauf-Signal    | Verkauf-Signal | VERKAUFEN  |
| Verkauf-Signal | Kauf-Signal    | KAUFEN     |
+----------------+----------------+------------+
```


In [0]:
#
# Combiniertes-SQL-Statement
#

final_result_statement = "SELECT *," \
" CASE" \
"   WHEN (WEAK_SIGNAL = \"\" AND STRONG_SIGNAL = \"\") THEN " + holdingCaseStatement + \
"   WHEN (STRONG_SIGNAL = \"\") THEN " \
"     CASE" \
"        WHEN (WEAK_SIGNAL = \"Kauf-Signal\") THEN " + buyingCaseStatement + \
"        ELSE " + sellingCaseStatement + \
"     END" \
"   ELSE" \
"     CASE" \
"        WHEN (STRONG_SIGNAL = \"Kauf-Signal\") THEN " + buyingCaseStatement + \
"        ELSE " + sellingCaseStatement + \
"     END"  \
" END AS PPO_RESULT" \
" FROM base_data_ema_eval_result"

data_final_eval_result = spark.sql(final_result_statement)
data_final_eval_result.createOrReplaceTempView("data_final_eval_result")

---
# Visualisierung
---

*Parameter*
---

In [19]:
#
# Minimales bis Maximale Daten der geladenen Kruse
#

base_data_dayfiltered_informationView.show(Console_Preview_Limit)

+--------+--------+-------------------+-------------------+
|filename|count(1)|          min(date)|          max(date)|
+--------+--------+-------------------+-------------------+
|  neceth|      18|2019-06-30 15:13:00|2019-09-30 03:55:00|
|  dtxusd|      43|2019-08-16 23:55:00|2019-10-01 23:35:00|
|  omnusd|     167|2018-10-30 23:44:00|2019-10-01 03:05:00|
|  btcusd|    2211|2013-04-01 23:58:00|2019-10-01 18:46:00|
|  yywusd|     664|2017-12-01 23:59:00|2019-10-01 19:47:00|
|  sngeth|     325|2018-01-24 23:11:00|2019-09-26 21:04:00|
|  neojpy|     541|2018-03-29 22:48:00|2019-10-01 22:36:00|
|  dtabtc|     225|2018-07-12 23:41:00|2019-10-01 10:04:00|
|  euteur|      78|2018-11-27 16:32:00|2019-10-01 07:48:00|
|  gotusd|     359|2018-09-06 23:52:00|2019-10-01 14:06:00|
|  pasusd|     253|2019-01-17 23:09:00|2019-10-01 22:17:00|
|  tnbusd|     615|2018-01-08 23:59:00|2019-10-01 19:00:00|
|  udcusd|     285|2018-12-04 23:15:00|2019-10-01 17:39:00|
|  vldeth|     129|2018-11-19 23:22:00|2

In [0]:
Anzeige_Für_Alle_Gelesen_Daten = False #@param {type:"boolean"}
#@markdown Parameter beziehen sich lediglich auf die Ausgabe:
Anzeige_Limit = 365  #@param {type: "slider", min: 1, max: 10000}
Anzeige_Zahlen_Genaugikeit = 3  #@param {type: "slider", min: 1, max: 10}
Anzeige_Template = "plotly_dark" #@param ['ggplot2', 'seaborn', 'simple_white', 'plotly', 'plotly_white', 'plotly_dark', 'presentation', 'xgridoff', 'ygridoff', 'gridon', 'none'] {allow-input: false}
#@markdown Start- & End-Tag haben keinen Einfluss auf die Exportierten Tage:
Anzeige_Start_Tag = '2018-10-01' #@param {type:"date"}
Anzeige_Ende_Tag = '2019-10-01' #@param {type:"date"}

Anzeige_Zahlen_Genaugikeit_str = str(Anzeige_Zahlen_Genaugikeit)

*Vorbereitung*
---

In [21]:
#
# Datenvorbereitung
#
if Anzeige_Für_Alle_Gelesen_Daten:
  data_eval_result = spark.sql(
    " select filename, Date(date), format_number(high, "+Anzeige_Zahlen_Genaugikeit_str+") high, format_number(low, "+Anzeige_Zahlen_Genaugikeit_str+") low, format_number(open, "+Anzeige_Zahlen_Genaugikeit_str+") open, format_number(close, "+Anzeige_Zahlen_Genaugikeit_str+") close" \
    " , format_number(PPO, "+Anzeige_Zahlen_Genaugikeit_str+") PPO, format_number(SIGNAL, "+Anzeige_Zahlen_Genaugikeit_str+") Signal, format_number(PPO_HISTOGRAM, "+Anzeige_Zahlen_Genaugikeit_str+") PPO_HISTOGRAM " \
    " , TREND, WEAK_SIGNAL, STRONG_SIGNAL, DIVERGENCE, PPO_RESULT "\
    " FROM data_final_eval_result " \
    " WHERE date >= '"+ Anzeige_Start_Tag.strip() +" 00:00:00' AND date <= '"+ Anzeige_Ende_Tag.strip() +"23:59:59' " \
    " order by filename asc, date asc " 
  )

  data_pd = data_eval_result.toPandas()

  # Tausender-Zeichen entfernen: 1,000.001 -> 1000.001
  data_pd.close = (data_pd['close'].replace('\,','', regex = True).astype(float))
  data_pd.open = (data_pd['open'].replace('\,','', regex = True).astype(float))
  data_pd.high = (data_pd['high'].replace('\,','', regex = True).astype(float))
  data_pd.low = (data_pd['low'].replace('\,','', regex = True).astype(float))

  data_pd.date = pd.to_datetime(data_pd.date)
  data_pd.close = pd.to_numeric(data_pd.close)
  data_pd.PPO = pd.to_numeric(data_pd.PPO)
  data_pd.Signal = pd.to_numeric(data_pd.Signal)
  data_pd.PPO_HISTOGRAM = pd.to_numeric(data_pd.PPO_HISTOGRAM)

  data_pd.filename = (data_pd['filename']).astype(str)

  # Vorbereiten der for-each-loop
  data_pd_filenames = spark.sql("SELECT DISTINCT filename FROM data_final_eval_result").toPandas().to_numpy()

else:
  print("Aufbereiotung der Daten zur Visualisierung durch Parameter 'Anzeige_Für_Alle_Gelesen_Daten' übersprungen.")

Aufbereiotung der Daten zur Visualisierung durch Parameter 'Anzeige_Für_Alle_Gelesen_Daten' übersprungen.


Output
---

In [22]:
#
# Charts & Table
#

if Anzeige_Für_Alle_Gelesen_Daten:
  hideInjectedFilenameColumn_CSS = "<style>" \
    "table td:nth-child(1) { display:none;}" \
    "table.dataframe thead th:first-child {display: none;}" \
    "</style>"

  for currentFilenames in data_pd_filenames:
    for currentFilename in currentFilenames:

        current_data_pd = data_pd[data_pd.filename == currentFilename]

        stock_chart = go.Figure(
            data = [ go.Candlestick( 
                x = current_data_pd['date'],
                open = current_data_pd['open'],
                high = current_data_pd['high'],
                low = current_data_pd['low'],
                close = current_data_pd['close']
                )]
        )

        ppo_chart = go.Figure(
            data = [ go.Scatter(
                x = current_data_pd['date'],
                y = current_data_pd['PPO'],
                mode = 'lines',
                name = 'PPO'
            ), go.Scatter(
                x = current_data_pd['date'],
                y = current_data_pd['Signal'],
                mode = 'lines',
                name = 'Signal'
            )]
        )

        ppo_chart.add_trace(
            go.Bar(
                name = 'PPO-Histogram',
                x = current_data_pd['date'],
                y = current_data_pd['PPO_HISTOGRAM']
            )
        )

        stock_chart.update_layout(
            title='Chart für CSV-Datei \'' + currentFilename + '\'',
            yaxis_title = 'Points',
            template = Anzeige_Template,
            xaxis_rangeslider_visible = False
        )

        ppo_chart.update_layout(
            title='PPO-Indikator für CSV-Datei \'' + currentFilename + '\'',
            yaxis_title = 'Changes in %',
            template = Anzeige_Template
        )

        print("\n\nFilename: " + currentFilename + "\nZeitraum: " + Anzeige_Start_Tag + " → " + Anzeige_Ende_Tag + "\n")
        stock_chart.show()
        ppo_chart.show()
        print()
        display(HTML(data_pd.to_html(index = False, max_rows = Anzeige_Limit) + hideInjectedFilenameColumn_CSS))
else:
  print("Visualisierung durch Parameter 'Anzeige_Für_Alle_Gelesen_Daten' übersprungen.")

Visualisierung durch Parameter 'Anzeige_Für_Alle_Gelesen_Daten' übersprungen.


----
# CSV-Export
----
*Die Daten sollen nach Tage sotiert ausgegeben werden (vgl. Besprechung 13.12.19)*

*Darstellung des Datums nach [ISO-8601-Norm](https://lmgtfy.com/?q=ISO-8601) (z.B.: 2019-09-07)*

Ordner-Struktur
---

**aktueller Pfad sollte ~/content' sein**

In [0]:
#@markdown Maximal einen Unterordner - Übergeordneter Ordner muss bereits existieren:
Export_Mainordner_Pfad = 'ABDA2019'  #@param {type: "string"}
Export_Subordner_Pfad = 'spark_warehouse'  #@param {type: "string"}
Export_Ordnername = 'ppo_csv_export'  #@param {type: "string"}
Download_Export_Ordner = True #@param {type:"boolean"}

In [0]:
#
# (Neuen) Ordner Erstellen
#

# Alten Ordner löschen
if os.path.exists(os.path.join(Export_Mainordner_Pfad, Export_Subordner_Pfad, Export_Ordnername)):
  os.rmdir(os.path.join(Export_Mainordner_Pfad, Export_Subordner_Pfad, Export_Ordnername))

# Ordner-Checks
if not os.path.exists(Export_Mainordner_Pfad):
  os.mkdir(Export_Mainordner_Pfad)

if not os.path.exists(os.path.join(Export_Mainordner_Pfad, Export_Subordner_Pfad)):
  os.mkdir(os.path.join(Export_Mainordner_Pfad, Export_Subordner_Pfad))

if not os.path.exists(os.path.join(Export_Mainordner_Pfad, Export_Subordner_Pfad, Export_Ordnername)):
  os.mkdir(os.path.join(Export_Mainordner_Pfad, Export_Subordner_Pfad, Export_Ordnername))

Vorbereitung
---

In [0]:
#
# Outputt Daten sammeln
#

data_output = spark.sql(
  " select filename, Date(date) date, open, close, high, low, volume, PPO_RESULT" \
  " FROM data_final_eval_result " \
  " order by filename asc, date asc " 
).toPandas()

# Types definieren
data_output.filename = (data_output['filename']).astype(str)
data_output.date = pd.to_datetime(data_output.date)
data_output.close = (data_output['close'].replace('\,','', regex = True).astype(float))
data_output.open = (data_output['open'].replace('\,','', regex = True).astype(float))
data_output.high = pd.to_numeric(data_output.high)
data_output.low = pd.to_numeric(data_output.low)
data_output.volume = pd.to_numeric(data_output.volume)
data_output.PPO_RESULT = (data_output['PPO_RESULT']).astype(str)

data_output_days = spark.sql("SELECT DISTINCT Date(date) date FROM data_final_eval_result").toPandas()
data_output_days.date = data_output_days.date = pd.to_datetime(data_output_days.date)

data_distinct_days = data_output_days.to_numpy()

Export
---

In [26]:
#
# Daten als csv exportieren
#

print("Zu exportierende Tage:")
numpy.set_printoptions(threshold = Console_Preview_Limit)
print(data_distinct_days)
print("\nZu exportierender Datensatz:")
display(HTML(data_output.to_html(index = False, max_rows = Console_Preview_Limit)))

for currentDays in data_distinct_days:
  for currentDay in currentDays:

      # Daten
      currentOutput = data_output[data_output.date == currentDay]
      # Zeitpunkt
      currentTimestamp = Timestamp(currentDay)

      # Anpassung des Monats zu 2 Stellen
      if currentTimestamp.month > 9:
        formattedMonth = str(currentTimestamp.month)
      else:
        formattedMonth = '0' + str(currentTimestamp.month)

      # Anpassung des Tages zu 2 Stellen
      if currentTimestamp.day > 9:
        formattedDay = str(currentTimestamp.day)
      else:
        formattedDay = '0' + str(currentTimestamp.day)

      # Filename
      currentFilename = str(currentTimestamp.year) + "-" + formattedMonth + "-" + formattedDay + ".csv"
      
      # Schreiben
      currentOutput.to_csv(os.path.join(Export_Mainordner_Pfad, Export_Subordner_Pfad, Export_Ordnername, currentFilename),
                             sep = '\t', encoding = 'utf-8', index = False, mode = 'w')
        
print("\n\nExport abgeschlossen, erstellte Dateien:")
os.listdir(os.path.join(Export_Mainordner_Pfad, Export_Subordner_Pfad, Export_Ordnername))


Zu exportierende Tage:
[['2019-06-04T00:00:00.000000000']
 ['2018-08-10T00:00:00.000000000']
 ['2019-05-08T00:00:00.000000000']
 ...
 ['2013-07-27T00:00:00.000000000']
 ['2013-07-22T00:00:00.000000000']
 ['2013-06-04T00:00:00.000000000']]

Zu exportierender Datensatz:


filename,date,open,close,high,low,volume,PPO_RESULT
abseth,2018-08-09,0.000100,0.000098,0.000100,0.000097,3563.0,H0
abseth,2018-08-10,0.000084,0.000084,0.000084,0.000084,300.0,H0
...,...,...,...,...,...,...,...
zrxusd,2019-09-30,0.205310,0.205310,0.205310,0.205310,500.0,H3
zrxusd,2019-10-01,0.213130,0.213140,0.213140,0.213130,2000.0,H0




Export abgeschlossen, erstellte Dateien:


['2014-01-01.csv',
 '2013-11-17.csv',
 '2014-02-12.csv',
 '2015-12-06.csv',
 '2014-09-11.csv',
 '2016-06-17.csv',
 '2019-02-19.csv',
 '2015-09-30.csv',
 '2016-10-24.csv',
 '2015-07-29.csv',
 '2018-12-19.csv',
 '2019-06-01.csv',
 '2013-11-19.csv',
 '2016-12-06.csv',
 '2014-08-01.csv',
 '2016-12-21.csv',
 '2015-08-22.csv',
 '2018-01-28.csv',
 '2014-08-03.csv',
 '2014-06-23.csv',
 '2014-09-23.csv',
 '2015-07-16.csv',
 '2018-03-09.csv',
 '2016-01-07.csv',
 '2017-02-17.csv',
 '2013-06-26.csv',
 '2017-09-22.csv',
 '2019-06-08.csv',
 '2014-07-03.csv',
 '2019-06-30.csv',
 '2017-01-05.csv',
 '2018-07-29.csv',
 '2015-09-11.csv',
 '2016-07-04.csv',
 '2016-03-06.csv',
 '2014-11-13.csv',
 '2018-12-27.csv',
 '2015-02-18.csv',
 '2013-07-26.csv',
 '2017-05-02.csv',
 '2014-08-28.csv',
 '2016-02-03.csv',
 '2014-09-26.csv',
 '2017-10-23.csv',
 '2019-08-28.csv',
 '2013-07-15.csv',
 '2014-02-05.csv',
 '2019-06-11.csv',
 '2018-04-01.csv',
 '2014-05-27.csv',
 '2014-04-22.csv',
 '2018-03-23.csv',
 '2014-07-06

In [0]:
if Download_Export_Ordner:
  shutil.make_archive('Exportdaten_PPO', 'zip', os.path.join(Export_Mainordner_Pfad, Export_Subordner_Pfad, Export_Ordnername))
  files.download('/content/Exportdaten_PPO.zip')

---
# Vergleichbare Charts
---

**BTCUSD**: [traidingview.com](https://de.tradingview.com/chart/ecCxGiMv/) *(login erforderlich)*

```
+-------------+------------+------------+-------------+-------------+
| TraidingView.com:                                                 |
+-------------+------------+------------+-------------+-------------+
|             | 2019-10-01 | 2019-09-30 | 2019-08-10  | 2019-08-07  |
+-------------+------------+------------+-------------+-------------+
| EMA26       |    9305.74 |    9384.68 |    10848.14 |    10630.29 |
| EMA12       |    8734.78 |    8810.38 |    11193.74 |    10886.97 |
| PPO         |      -6.14 |      -6.12 |        3.19 |        2.41 |
| SIGNAL      |      -4.56 |      -4.17 |        1.63 |        0.13 |
| PPO_History |      -1.57 |      -1.95 |        1.55 |        2.28 |
+-------------+------------+------------+-------------+-------------+

+-------------+------------+------------+-------------+-------------+
| CALCULATED:                                                       |
+-------------+------------+------------+-------------+-------------+
|             | 2019-10-01 | 2019-09-30 | 2019-08-10  | 2019-08-07  |
+-------------+------------+------------+-------------+-------------+
| EMA26       |  9327.3017 |  9446.6421 |  10651.5067 |  10673.4093 |
| EMA12       |  8751.5960 |  8842.4748 |  11163.8865 |  10785.1409 |
| PPO         |     -6.172 |    -6.3956 |      4.8104 |      1.0468 |
| SIGNAL      |    -4.4147 |    -4.0178 |      0.9102 |     -1.8162 |
| PPO_History |    -1.7576 |    -2.3778 |      3.9002 |      2.8631 |
+-------------+------------+------------+-------------+-------------+
```



**intc.csv (utf-8)** *--> eher ungeeignetes Beispiel*

 see [School.StockCharts.com](https://school.stockcharts.com/doku.php?id=technical_indicators:price_oscillators_ppo) 

```
time,open,close,high,low,volume
1271203140000,21.16,20.16,20.16,20.16,0.0
1271289540000,20.49,20.49,20.49,20.49,0.0
1271375940000,20.74,20.74,20.74,20.74,0.0
1271462340000,20.77,20.77,20.77,20.77,0.0
1271721540000,20.53,20.53,20.53,20.53,0.0
1271807940000,19.61,19.61,19.61,19.61,0.0
1271894340000,20.02,20.02,20.02,20.02,0.0
1271980740000,19.70,19.70,19.70,19.70,0.0
1272067140000,19.94,19.94,19.94,19.94,0.0
1272326340000,19.62,19.62,19.62,19.62,0.0
1272412740000,19.11,19.11,19.11,19.11,0.0
1272499140000,19.32,19.32,19.32,19.32,0.0
1272585540000,19.61,19.61,19.61,19.61,0.0
1272671940000,19.54,19.54,19.54,19.54,0.0
1272931140000,18.89,18.89,18.89,18.89,0.0
1273017540000,19.33,19.33,19.33,19.33,0.0
1273103940000,19.21,19.21,19.21,19.21,0.0
1273190340000,19.51,19.51,19.51,19.51,0.0
1273276740000,19.55,19.55,19.55,19.55,0.0
1273535940000,19.92,19.92,19.92,19.92,0.0
1273622340000,20.29,20.29,20.29,20.29,0.0
1273708740000,20.58,20.58,20.58,20.58,0.0
1273795140000,20.52,20.52,20.52,20.52,0.0
1273881540000,20.69,20.69,20.69,20.69,0.0
1274140740000,20.67,20.67,20.67,20.67,0.0
1274227140000,20.72,20.72,20.72,20.72,0.0
1274313540000,20.25,20.25,20.25,20.25,0.0
1274399940000,20.56,20.56,20.56,20.56,0.0
1274486340000,20.49,20.49,20.49,20.49,0.0
1274745540000,20.39,20.39,20.39,20.39,0.0

```

