# Fecha de entrega: 23/07/2022 (corresponde a clase 15)

# Base de datos -> BTC_Forex

# Diccionario

**BTC_final**

- **Date**: fecha en YYYY-MM-DD (datetime64 [ns])
- **Price**: precio de cierre de BTC en el día de la fecha (USD) (float) (Variable numérica continua)
- **Open**: precio de apertura de BTC en el día de la fecha (USD) (float) (Variable numérica continua)
- **High**: precio más alto de BTC en el día de la fecha (USD) (float) (Variable numérica continua)
- **Low**: precio más bajo de BTC en el día de la fecha (USD) (float) (Variable numérica continua)
- **Vol.**: volumen de BTC transferidos (comprados+vendidos) en el día de la fecha (BTC) (float) (Variable numérica continua)
- **Percentage_diff**: diferencial porcentual del precio de BTC en la fecha [x+1] con respecto a la fecha [x] (float)  (Variable numérica continua)
- **Target**: 1 indica que en el día de la fecha el precio subió, y 0 que el precio bajó (float) (Variable categórica).

# Librerías

In [1]:
import numpy as np
import pandas as pd
import datetime
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', None)

# Data acquisition

In [2]:
def gdriveColabPath(sharing_url):
  file_id=sharing_url.split('/')[-2]
  dwn_url='https://drive.google.com/uc?id=' + file_id
  return dwn_url

## Adquiriendo base de datos de BTC

In [3]:
sharing_url = "https://drive.google.com/file/d/1M38n5So0-6r_Q0vhs1i5nIW-cpjgDlww/view?usp=sharing"

In [4]:
dwn_url=gdriveColabPath(sharing_url)
BTC_df =pd.read_csv(dwn_url, sep=";", decimal=".")

In [5]:
BTC_df.head()

Unnamed: 0,Date,Price,Open,High,Low,Vol.,Percentage_diff,Target
0,2010-07-18,0.1,0.0,0.1,0.1,80.0,0.0,0.0
1,2010-07-19,0.1,0.1,0.1,0.1,570.0,0.0,0.0
2,2010-07-20,0.1,0.1,0.1,0.1,260.0,0.0,0.0
3,2010-07-21,0.1,0.1,0.1,0.1,580.0,0.0,0.0
4,2010-07-22,0.1,0.1,0.1,0.1,2160.0,0.0,0.0


## Adquiriendo base de datos de EUR/USD, USD/JPY, USD/CNY (raw)

### EUR/USD

In [6]:
sharing_url = "https://drive.google.com/file/d/16iaYVmlv5vtNDN4WZKaQmeO3MsI9QG12/view?usp=sharing"

In [7]:
dwn_url=gdriveColabPath(sharing_url)
EUR_USD_df =pd.read_csv(dwn_url, sep=",", thousands=",", decimal=".")

In [8]:
EUR_USD_df.head()

Unnamed: 0,Date,Price,Open,High,Low,Change %
0,"Jul 20, 2022",1.0242,1.0231,1.0248,1.0231,0.18%
1,"Jul 19, 2022",1.0224,1.0143,1.027,1.012,0.82%
2,"Jul 18, 2022",1.0141,1.0085,1.0202,1.0078,0.54%
3,"Jul 15, 2022",1.0087,1.0018,1.0099,1.0007,0.71%
4,"Jul 14, 2022",1.0016,1.0061,1.0065,0.9952,-0.44%


### USD/JPY

In [9]:
sharing_url = "https://drive.google.com/file/d/1xGO8ejLusQGmQ8_cNLhhX5Mvh0yk2YRO/view?usp=sharing"

In [10]:
dwn_url=gdriveColabPath(sharing_url)
USD_JPY_df =pd.read_csv(dwn_url, sep=",", thousands=",", decimal=".")

In [11]:
USD_JPY_df.head()

Unnamed: 0,Date,Price,Open,High,Low,Change %
0,"Jul 20, 2022",138.07,138.18,138.38,137.9,-0.08%
1,"Jul 19, 2022",138.18,138.13,138.4,137.38,0.04%
2,"Jul 18, 2022",138.12,138.54,138.59,137.89,-0.30%
3,"Jul 15, 2022",138.53,138.96,139.13,138.39,-0.27%
4,"Jul 14, 2022",138.9,137.43,139.4,137.33,1.08%


### USD/CNY

In [12]:
sharing_url = "https://drive.google.com/file/d/1tyfAwJctyEE1Amvy9C7B3n_AaJmqw4av/view?usp=sharing"

In [13]:
dwn_url=gdriveColabPath(sharing_url)
USD_CNY_df =pd.read_csv(dwn_url, sep=",", thousands=",", decimal=".")

In [14]:
USD_CNY_df.head()

Unnamed: 0,Date,Price,Open,High,Low,Change %
0,"Jul 20, 2022",6.7523,6.7447,6.7556,6.7401,0.14%
1,"Jul 19, 2022",6.743,6.7473,6.7542,6.739,0.01%
2,"Jul 18, 2022",6.742,6.7481,6.7551,6.732,-0.21%
3,"Jul 15, 2022",6.7565,6.7517,6.7698,6.7396,0.03%
4,"Jul 14, 2022",6.7542,6.7269,6.7674,6.7228,0.55%


# Data Wrangling

## Etapa de descubrimiento

Se procede a analizar la estructura fundamental de las bases de datos adquiridas.

### EUR/USD

In [15]:
# Cantidad de registros y columnas/variables
print ("Cantidad de registros: ", EUR_USD_df.shape [0])
print ("Cantidad de variables: ", EUR_USD_df.shape [1])

Cantidad de registros:  3274
Cantidad de variables:  6


In [16]:
# Nombre de las columnas
EUR_USD_df.columns

Index(['Date', 'Price', 'Open', 'High', 'Low', 'Change %'], dtype='object')

In [17]:
# Cantidad de registros
EUR_USD_df.count()

Date        3274
Price       3274
Open        3274
High        3274
Low         3274
Change %    3274
dtype: int64

In [18]:
# Conociendo los registros nulos
EUR_USD_df.isnull().sum() 

Date        0
Price       0
Open        0
High        0
Low         0
Change %    0
dtype: int64

In [19]:
# Tipos de datos
EUR_USD_df.dtypes

Date         object
Price       float64
Open        float64
High        float64
Low         float64
Change %     object
dtype: object

In [20]:
# Se buscan duplicados
print ("Cantidad de duplicados en la variable [Date]: ", EUR_USD_df["Date"].duplicated().sum())

Cantidad de duplicados en la variable [Date]:  0


In [21]:
# Se modifica la variable [Change %] a fin de quitarle el signo "%" asignado a cada valor
EUR_USD_df ["Change %"] = EUR_USD_df ["Change %"].str.replace ("%","")

In [22]:
EUR_USD_df.head()

Unnamed: 0,Date,Price,Open,High,Low,Change %
0,"Jul 20, 2022",1.0242,1.0231,1.0248,1.0231,0.18
1,"Jul 19, 2022",1.0224,1.0143,1.027,1.012,0.82
2,"Jul 18, 2022",1.0141,1.0085,1.0202,1.0078,0.54
3,"Jul 15, 2022",1.0087,1.0018,1.0099,1.0007,0.71
4,"Jul 14, 2022",1.0016,1.0061,1.0065,0.9952,-0.44


In [23]:
# Se renombra la columna [Change %] por [Percentage_diff]
EUR_USD_df = EUR_USD_df.rename (columns={"Change %":"Percentage_diff"})

In [24]:
EUR_USD_df.columns

Index(['Date', 'Price', 'Open', 'High', 'Low', 'Percentage_diff'], dtype='object')

In [25]:
# Se transforma el tipo de dato de la columna [Percentage_diff] de objeto a float
EUR_USD_df ["Percentage_diff"]= EUR_USD_df ["Percentage_diff"].astype(float)

In [26]:
EUR_USD_df.dtypes

Date                object
Price              float64
Open               float64
High               float64
Low                float64
Percentage_diff    float64
dtype: object

In [27]:
# Se transforma el tipo de dato de la columna [Date] de objeto a datetime64 [ns]
EUR_USD_df ['Date'] = pd.to_datetime(EUR_USD_df ['Date'])

In [28]:
EUR_USD_df.dtypes

Date               datetime64[ns]
Price                     float64
Open                      float64
High                      float64
Low                       float64
Percentage_diff           float64
dtype: object

In [29]:
# Se reordena el data set según fechas ascendentes (se resetea el índice y se elimina la columna índice que se crea por defecto)
EUR_USD_df = EUR_USD_df.sort_values("Date").reset_index().drop(["index"], axis=1)

In [30]:
EUR_USD_df.head()

Unnamed: 0,Date,Price,Open,High,Low,Percentage_diff
0,2010-01-01,1.4326,1.4328,1.433,1.4322,0.06
1,2010-01-04,1.4413,1.4325,1.4457,1.4257,0.61
2,2010-01-05,1.4369,1.4409,1.4484,1.4346,-0.31
3,2010-01-06,1.4413,1.4364,1.4436,1.4282,0.31
4,2010-01-07,1.4318,1.4407,1.4447,1.4295,-0.66


In [31]:
# Dado que se concatenará con los otros pares de monedas se reduce el data set a [Date], [Price] y [Percentage_diff]
EUR_USD_df = EUR_USD_df [["Date", "Price", "Percentage_diff"]]

In [32]:
# Se modifica los nombres de las columnas para su posterior concatenación
Columnas = ["Price", "Percentage_diff"]

for col in Columnas:
    EUR_USD_df = EUR_USD_df.rename(columns={col:""+col+"_EUR_USD"})

In [33]:
EUR_USD_df.columns

Index(['Date', 'Price_EUR_USD', 'Percentage_diff_EUR_USD'], dtype='object')

### USD_JPY_df

In [34]:
# Cantidad de registros y columnas/variables
print ("Cantidad de registros: ", USD_JPY_df.shape [0])
print ("Cantidad de variables: ", USD_JPY_df.shape [1])

Cantidad de registros:  3274
Cantidad de variables:  6


In [35]:
# Nombre de las columnas
USD_JPY_df.columns

Index(['Date', 'Price', 'Open', 'High', 'Low', 'Change %'], dtype='object')

In [36]:
# Cantidad de registros
USD_JPY_df.count()

Date        3274
Price       3274
Open        3274
High        3274
Low         3274
Change %    3274
dtype: int64

In [37]:
# Conociendo los registros nulos
USD_JPY_df.isnull().sum() 

Date        0
Price       0
Open        0
High        0
Low         0
Change %    0
dtype: int64

In [38]:
# Tipos de datos
USD_JPY_df.dtypes

Date         object
Price       float64
Open        float64
High        float64
Low         float64
Change %     object
dtype: object

In [39]:
# Se buscan duplicados
print ("Cantidad de duplicados en la variable [Date]: ", USD_JPY_df["Date"].duplicated().sum())

Cantidad de duplicados en la variable [Date]:  0


In [40]:
# Se modifca la variable [Change %] a fin de quitarle el signo "%" asignado a cada valor
USD_JPY_df ["Change %"] = USD_JPY_df ["Change %"].str.replace ("%","")

In [41]:
USD_JPY_df.head()

Unnamed: 0,Date,Price,Open,High,Low,Change %
0,"Jul 20, 2022",138.07,138.18,138.38,137.9,-0.08
1,"Jul 19, 2022",138.18,138.13,138.4,137.38,0.04
2,"Jul 18, 2022",138.12,138.54,138.59,137.89,-0.3
3,"Jul 15, 2022",138.53,138.96,139.13,138.39,-0.27
4,"Jul 14, 2022",138.9,137.43,139.4,137.33,1.08


In [42]:
# Se renombra la columna [Change %] por [Percentage_diff]
USD_JPY_df = USD_JPY_df.rename (columns={"Change %":"Percentage_diff"})

In [43]:
USD_JPY_df.columns

Index(['Date', 'Price', 'Open', 'High', 'Low', 'Percentage_diff'], dtype='object')

In [44]:
# Se transforma el tipo de dato de la columna [Percentage_diff] de objeto a float
USD_JPY_df ["Percentage_diff"]= USD_JPY_df ["Percentage_diff"].astype(float)

In [45]:
USD_JPY_df.dtypes

Date                object
Price              float64
Open               float64
High               float64
Low                float64
Percentage_diff    float64
dtype: object

In [46]:
# Se transforma el tipo de dato de la columna [Date] de objeto a datetime64 [ns]
USD_JPY_df ['Date'] = pd.to_datetime(USD_JPY_df ['Date'])

In [47]:
USD_JPY_df.dtypes

Date               datetime64[ns]
Price                     float64
Open                      float64
High                      float64
Low                       float64
Percentage_diff           float64
dtype: object

In [48]:
# Se reordena el data set según fechas ascendentes (se resetea el índice y se elimina la columna índice que se crea por defecto)
USD_JPY_df = USD_JPY_df.sort_values("Date").reset_index().drop(["index"], axis=1)

In [49]:
USD_JPY_df.head()

Unnamed: 0,Date,Price,Open,High,Low,Percentage_diff
0,2010-01-01,93.04,93.04,93.04,93.04,0.01
1,2010-01-04,92.47,92.75,93.21,92.19,-0.6
2,2010-01-05,91.58,92.47,92.51,91.25,-0.97
3,2010-01-06,92.38,91.56,92.74,91.53,0.87
4,2010-01-07,93.45,92.36,93.77,92.11,1.16


In [50]:
# Dado que se concatenará con los otros pares de monedas se reduce el data set a [Date], [Price] y [Percentage_diff]
USD_JPY_df = USD_JPY_df [["Date", "Price", "Percentage_diff"]]

In [51]:
# Se modifica los nombres de las columnas para su posterior concatenación
Columnas = ["Price", "Percentage_diff"]

for col in Columnas:
    USD_JPY_df = USD_JPY_df.rename(columns={col:""+col+"_USD_JPY"})

In [52]:
USD_JPY_df.columns

Index(['Date', 'Price_USD_JPY', 'Percentage_diff_USD_JPY'], dtype='object')

### USD_CNY_df

In [53]:
# Cantidad de registros y columnas/variables
print ("Cantidad de registros: ", USD_CNY_df.shape [0])
print ("Cantidad de variables: ", USD_CNY_df.shape [1])

Cantidad de registros:  3257
Cantidad de variables:  6


In [54]:
# Nombre de las columnas
USD_CNY_df.columns

Index(['Date', 'Price', 'Open', 'High', 'Low', 'Change %'], dtype='object')

In [55]:
# Cantidad de registros
USD_CNY_df.count()

Date        3257
Price       3257
Open        3257
High        3257
Low         3257
Change %    3257
dtype: int64

In [56]:
# Conociendo los registros nulos
USD_CNY_df.isnull().sum() 

Date        0
Price       0
Open        0
High        0
Low         0
Change %    0
dtype: int64

In [57]:
# Tipos de datos
USD_CNY_df.dtypes

Date         object
Price       float64
Open        float64
High        float64
Low         float64
Change %     object
dtype: object

In [58]:
# Se buscan duplicados
print ("Cantidad de duplicados en la variable [Date]: ", USD_CNY_df["Date"].duplicated().sum())

Cantidad de duplicados en la variable [Date]:  0


In [59]:
# Se modifica la variable [Change %] a fin de quitarle el signo "%" asignado a cada valor
USD_CNY_df ["Change %"] = USD_CNY_df ["Change %"].str.replace ("%","")

In [60]:
USD_CNY_df.head()

Unnamed: 0,Date,Price,Open,High,Low,Change %
0,"Jul 20, 2022",6.7523,6.7447,6.7556,6.7401,0.14
1,"Jul 19, 2022",6.743,6.7473,6.7542,6.739,0.01
2,"Jul 18, 2022",6.742,6.7481,6.7551,6.732,-0.21
3,"Jul 15, 2022",6.7565,6.7517,6.7698,6.7396,0.03
4,"Jul 14, 2022",6.7542,6.7269,6.7674,6.7228,0.55


In [61]:
# Se renombra la columna [Change %] por [Percentage_diff]
USD_CNY_df = USD_CNY_df.rename (columns={"Change %":"Percentage_diff"})

In [62]:
USD_CNY_df.columns

Index(['Date', 'Price', 'Open', 'High', 'Low', 'Percentage_diff'], dtype='object')

In [63]:
# Se transforma el tipo de dato de la columna [Percentage_diff] de objeto a float
USD_CNY_df ["Percentage_diff"]= USD_CNY_df ["Percentage_diff"].astype(float)

In [64]:
USD_CNY_df.dtypes

Date                object
Price              float64
Open               float64
High               float64
Low                float64
Percentage_diff    float64
dtype: object

In [65]:
# Se transforma el tipo de dato de la columna [Date] de objeto a datetime64 [ns]
USD_CNY_df ['Date'] = pd.to_datetime(USD_CNY_df ['Date'])

In [66]:
USD_CNY_df.dtypes

Date               datetime64[ns]
Price                     float64
Open                      float64
High                      float64
Low                       float64
Percentage_diff           float64
dtype: object

In [67]:
# Se reordena el data set según fechas ascendentes (se resetea el índice y se elimina la columna índice que se crea por defecto)
USD_CNY_df = USD_CNY_df.sort_values("Date").reset_index().drop(["index"], axis=1)

In [68]:
USD_CNY_df.head()

Unnamed: 0,Date,Price,Open,High,Low,Percentage_diff
0,2010-01-01,6.827,6.827,6.827,6.827,0.0
1,2010-01-04,6.8285,6.8297,6.8297,6.827,0.02
2,2010-01-05,6.8268,6.8283,6.8288,6.8257,-0.02
3,2010-01-06,6.8278,6.8277,6.8291,6.8267,0.01
4,2010-01-07,6.8281,6.8269,6.8282,6.8258,0.0


In [69]:
# Dado que se concatenará con los otros pares de monedas se reduce el data set a [Date], [Price] y [Percentage_diff]
USD_CNY_df = USD_CNY_df [["Date", "Price", "Percentage_diff"]]

In [70]:
# Se modifica los nombres de las columnas para su posterior concatenación
Columnas = ["Price", "Percentage_diff"]

for col in Columnas:
    USD_CNY_df = USD_CNY_df.rename(columns={col:""+col+"_USD_CNY"})

In [71]:
USD_CNY_df.columns

Index(['Date', 'Price_USD_CNY', 'Percentage_diff_USD_CNY'], dtype='object')

## Etapa de estructuración de datos

Se realiza un merge entre las tablas de los pares de FOREX (EUR_USD_df, USD_JPY_df, USD_CNY_df)

In [72]:
# Se vuelve a convertir las columnas [Date] de los diversos dataframes porque se realizará un merge (no acepta valores de tipo datetime)
EUR_USD_df ["Date"] = EUR_USD_df ["Date"].astype(str)
USD_JPY_df ["Date"] = USD_JPY_df ["Date"].astype(str)
USD_CNY_df ["Date"] = USD_CNY_df ["Date"].astype(str)

In [73]:
EUR_USD_df.head()

Unnamed: 0,Date,Price_EUR_USD,Percentage_diff_EUR_USD
0,2010-01-01,1.4326,0.06
1,2010-01-04,1.4413,0.61
2,2010-01-05,1.4369,-0.31
3,2010-01-06,1.4413,0.31
4,2010-01-07,1.4318,-0.66


In [74]:
# Se realiza el merge entre EUR_USD_df y USD_JPY_df => parámetros => on=Date how=inner
Forex_df = pd.merge (EUR_USD_df, USD_JPY_df, on="Date", how="inner")

In [75]:
Forex_df.head()

Unnamed: 0,Date,Price_EUR_USD,Percentage_diff_EUR_USD,Price_USD_JPY,Percentage_diff_USD_JPY
0,2010-01-01,1.4326,0.06,93.04,0.01
1,2010-01-04,1.4413,0.61,92.47,-0.6
2,2010-01-05,1.4369,-0.31,91.58,-0.97
3,2010-01-06,1.4413,0.31,92.38,0.87
4,2010-01-07,1.4318,-0.66,93.45,1.16


In [76]:
# Se realiza el merge entre EUR_USD_df y USD_JPY_df => parámetros => on=Date how=inner
Forex_df = pd.merge (Forex_df, USD_CNY_df, on="Date", how="inner")

In [77]:
Forex_df.head()

Unnamed: 0,Date,Price_EUR_USD,Percentage_diff_EUR_USD,Price_USD_JPY,Percentage_diff_USD_JPY,Price_USD_CNY,Percentage_diff_USD_CNY
0,2010-01-01,1.4326,0.06,93.04,0.01,6.827,0.0
1,2010-01-04,1.4413,0.61,92.47,-0.6,6.8285,0.02
2,2010-01-05,1.4369,-0.31,91.58,-0.97,6.8268,-0.02
3,2010-01-06,1.4413,0.31,92.38,0.87,6.8278,0.01
4,2010-01-07,1.4318,-0.66,93.45,1.16,6.8281,0.0


Dado que el data set de Forex_df no posee valores para fin de semana y feriados se procederá a crear un data set con valores de tipo datetime entre las fechas 2010-01-01 y 2022-07-20, a fin de poder realizar un merge entre ambas tablas y tratar con los valores de dichos días en tanto missing values. Esto es necesario dado que la tabla de BTC consta de valores para todos los días, incluidos sábados, domingos y feriados. 

Criterio para tratar los missing values generados al realizarse el merge:
- En el caso de los missing values correspondientes a los días de fin de semana se repetirá el precio de cierre del viernes.
- En el caso de feriados, se repetirá el valor de cierre del día anterior.

In [78]:
#Para esto se creará un data set con valores entre el 01-01-2010 y el 20-07-2022
start = datetime.datetime.strptime("2010-01-01", "%Y-%m-%d")
end = datetime.datetime.strptime("2022-07-20", "%Y-%m-%d")
date_generated = pd.date_range(start, end)

In [79]:
# Se convierte en un data frame las fechas
date_generated_df = pd.DataFrame (date_generated, columns=["Date"])

In [80]:
date_generated_df.head()

Unnamed: 0,Date
0,2010-01-01
1,2010-01-02
2,2010-01-03
3,2010-01-04
4,2010-01-05


In [81]:
# A fin de poder hacer un futuro merge (necesita str y no datetime64) se procede a convertir la fecha a string
date_generated_df ["Date"] = date_generated_df ["Date"].astype(str)

In [82]:
date_generated_df.dtypes

Date    object
dtype: object

In [83]:
# Se realiza un merge entre date_generated_df y Forex_df => parámetros => on="Date", how="left"
Forex_df_2 = pd.merge (date_generated_df, Forex_df, on="Date", how="left")

In [84]:
Forex_df_2.head()

Unnamed: 0,Date,Price_EUR_USD,Percentage_diff_EUR_USD,Price_USD_JPY,Percentage_diff_USD_JPY,Price_USD_CNY,Percentage_diff_USD_CNY
0,2010-01-01,1.4326,0.06,93.04,0.01,6.827,0.0
1,2010-01-02,,,,,,
2,2010-01-03,,,,,,
3,2010-01-04,1.4413,0.61,92.47,-0.6,6.8285,0.02
4,2010-01-05,1.4369,-0.31,91.58,-0.97,6.8268,-0.02


In [85]:
Forex_df_2.columns

Index(['Date', 'Price_EUR_USD', 'Percentage_diff_EUR_USD', 'Price_USD_JPY',
       'Percentage_diff_USD_JPY', 'Price_USD_CNY', 'Percentage_diff_USD_CNY'],
      dtype='object')

In [86]:
#Se procede a rellenar los NaN según el criterio explicado arriba (repetir valores de último día hábil). Se utilizará la función fillna()
Columnas = ['Price_EUR_USD', 'Percentage_diff_EUR_USD', 'Price_USD_JPY',
       'Percentage_diff_USD_JPY', 'Price_USD_CNY', 'Percentage_diff_USD_CNY']

for col in Columnas:
    
    Forex_df_2 [col] = Forex_df_2 [col].fillna(method="ffill")

Forex_df_2.head()

Unnamed: 0,Date,Price_EUR_USD,Percentage_diff_EUR_USD,Price_USD_JPY,Percentage_diff_USD_JPY,Price_USD_CNY,Percentage_diff_USD_CNY
0,2010-01-01,1.4326,0.06,93.04,0.01,6.827,0.0
1,2010-01-02,1.4326,0.06,93.04,0.01,6.827,0.0
2,2010-01-03,1.4326,0.06,93.04,0.01,6.827,0.0
3,2010-01-04,1.4413,0.61,92.47,-0.6,6.8285,0.02
4,2010-01-05,1.4369,-0.31,91.58,-0.97,6.8268,-0.02


Se realiza un merge entre BTC_df y Forex_df_2 a fin de obtener la base de datos final

In [87]:
# Se realiza un merge entre date_generated_df y Forex_df => parámetros => on="Date", how="inner"
Forex_df_final = pd.merge (BTC_df, Forex_df_2, on="Date", how="inner")

In [88]:
Forex_df_final.head()

Unnamed: 0,Date,Price,Open,High,Low,Vol.,Percentage_diff,Target,Price_EUR_USD,Percentage_diff_EUR_USD,Price_USD_JPY,Percentage_diff_USD_JPY,Price_USD_CNY,Percentage_diff_USD_CNY
0,2010-07-18,0.1,0.0,0.1,0.1,80.0,0.0,0.0,1.2933,-0.05,86.57,-0.98,6.7752,-0.04
1,2010-07-19,0.1,0.1,0.1,0.1,570.0,0.0,0.0,1.294,0.05,86.76,0.21,6.7782,0.04
2,2010-07-20,0.1,0.1,0.1,0.1,260.0,0.0,0.0,1.2886,-0.42,87.41,0.75,6.7783,0.0
3,2010-07-21,0.1,0.1,0.1,0.1,580.0,0.0,0.0,1.2757,-1.0,86.85,-0.64,6.777,-0.02
4,2010-07-22,0.1,0.1,0.1,0.1,2160.0,0.0,0.0,1.2891,1.05,87.22,0.43,6.7797,0.04


In [89]:
Forex_df_final.tail()

Unnamed: 0,Date,Price,Open,High,Low,Vol.,Percentage_diff,Target,Price_EUR_USD,Percentage_diff_EUR_USD,Price_USD_JPY,Percentage_diff_USD_JPY,Price_USD_CNY,Percentage_diff_USD_CNY
4381,2022-07-16,21209.9,20825.2,21561.3,20484.4,136890.0,1.85,1.0,1.0087,0.71,138.53,-0.27,6.7565,0.03
4382,2022-07-17,20785.6,21209.8,21654.4,20755.2,132810.0,-2.0,0.0,1.0087,0.71,138.53,-0.27,6.7565,0.03
4383,2022-07-18,22525.8,20785.6,22714.9,20770.6,279720.0,8.37,1.0,1.0141,0.54,138.12,-0.3,6.742,-0.21
4384,2022-07-19,23410.2,22529.3,23757.3,21581.8,308910.0,3.93,1.0,1.0224,0.82,138.18,0.04,6.743,0.01
4385,2022-07-20,23149.1,23412.0,23429.9,22965.9,290210.0,-1.12,0.0,1.0242,0.18,138.07,-0.08,6.7523,0.14


## Extrayendo la base de dato a un archivo csv

In [90]:
Forex_df_final.to_csv ("D:\Luciano\Programación\Data science\Trabajo final\Bases de datos para modelos\BTC_Forex.csv", index=False, sep=";")

# Bases de datos resultantes (diccionario de archivos .csv)

**BTC_Forex.csv**

- **Date**: fecha en YYYY-MM-DD (object)
- **Price**: precio de cierre de BTC en el día de la fecha (USD) (float) (Variable numérica continua)
- **Open**: precio de apertura de BTC en el día de la fecha (USD) (float) (Variable numérica continua)
- **High**: precio más alto de BTC en el día de la fecha (USD) (float) (Variable numérica continua)
- **Low**: precio más bajo de BTC en el día de la fecha (USD) (float) (Variable numérica continua)
- **Vol.**: volumen de BTC transferidos (comprados+vendidos) en el día de la fecha (BTC) (float) (Variable numérica continua)
- **Percentage_diff**: diferencial porcentual del precio de BTC en la fecha [x+1] con respecto a la fecha [x] (float)  (Variable numérica continua)
- **Target**: 1 indica que en el día de la fecha el precio subió, y 0 que el precio bajó (float) (Variable categórica).
- **Price_EUR_USD**: precio de la relación Euro y Dolar norteamericano (float) (Variable numérica continua)
- **Percentage_diff_EUR_USD**: diferencia porcentual de la relación Euro y Dolar norteamericano (float) (Variable numérica continua)
- **Price_USD_JPY**: precio de la relación Dolar norteamericano y el Yen japonés (float) (Variable numérica continua)
- **Percentage_diff_USD_JPY**: diferencia porcentual de la relación Dolar norteamericano y Yen japonés (float) (Variable numérica continua)
- **Price_EUR_USD**: precio de la relación Dolar norteamericano y el Yuan chino (float) (Variable numérica continua)
- **Percentage_diff_EUR_USD**: diferencia porcentual de la relación Dolar norteamericano y Yuan chino (float) (Variable numérica continua)