# Desafio Data Wrangling

Se Creara un df con la ayuda de una api publica, sobre el valor de 20 principales acciones representativas. Para crear la muestra se seleccioanaran 1000 dias. El df estara compuesto por el valor la accion en columnas y la fecha en cada fila. Luego se comprobara si hay que operar valores nulos, y se realizaran transformaciones a los datos.

Se pasara a cargar un df mediante la api de yahoo finance, para lo cual se utilizo ayuda de chat gpt para mejorar y simplificar el codigo.

## Creacion del DF

**Codigo Principal**

In [2]:

!pip install yfinance
import yfinance as yf
import pandas as pd
from datetime import datetime, timedelta, date

def obtener_datos_acciones(simbolos, inicio, fin):
    datos_acciones = {}

    for simbolo in simbolos:
        try:
            # Obtener datos históricos de acciones
            acciones = yf.download(simbolo, start=inicio, end=fin)
            
            if acciones.empty:
                print(f"Error al obtener datos para {simbolo}: No se encontraron datos de acciones.")
                continue

            # Agregar a datos_acciones
            datos_acciones[simbolo] = acciones['Close']

        except Exception as e:
            print(f"Error al obtener datos para {simbolo}: {str(e)}")

    return datos_acciones

if __name__ == "__main__":
    # Símbolos de las 20 acciones más importantes (ejemplo con algunas acciones)
    simbolos_acciones = ["AAPL", "GOOGL", "MSFT", "AMZN", "TSLA", "META", "JPM", "V", "NVDA", "PYPL", 
                         "BA", "INTC", "CSCO", "DIS", "GS", "IBM", "PFE", "WMT", "C", "CVX"]

    # Obtener la fecha de hace 1000 días hábiles
    fin = date.today()
    dias_habiles = 0
    fecha_inicio = fin

    while dias_habiles < 1000:
        fecha_inicio -= timedelta(days=1)
        if fecha_inicio.weekday() < 5:  # 0 a 4 son días hábiles (lunes a viernes)
            dias_habiles += 1

    # Convertir fecha de inicio a formato datetime
    inicio = datetime.combine(fecha_inicio, datetime.min.time())

    datos_acciones = obtener_datos_acciones(simbolos_acciones, inicio, fin)

    # Crear un DataFrame con los datos
    df = pd.DataFrame(datos_acciones)

    # Imprimir el DataFrame
    print(df)

[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%*******

--------------------------------------------------------------------------------------------------------------------------------

**Muestra del df**

In [3]:
df.head()

Unnamed: 0_level_0,AAPL,GOOGL,MSFT,AMZN,TSLA,META,JPM,V,NVDA,PYPL,BA,INTC,CSCO,DIS,GS,IBM,PFE,WMT,C,CVX
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
2020-01-30,80.967499,72.712502,172.779999,93.533997,42.720669,209.529999,135.889999,208.210007,61.452499,117.120003,323.299988,66.470001,47.240002,137.809998,244.130005,130.755264,35.170776,116.580002,77.43,111.400002
2020-01-31,77.377502,71.639,170.229996,100.435997,43.371334,201.910004,132.360001,198.970001,59.107498,113.889999,318.269989,63.93,45.970001,138.309998,237.75,137.40918,35.332069,114.489998,74.410004,107.139999
2020-02-03,77.165001,74.129997,174.380005,100.209999,52.0,204.190002,133.369995,200.809998,60.0825,116.510002,316.0,64.419998,46.529999,141.320007,239.009995,139.837479,35.588234,114.269997,75.129997,106.279999
2020-02-04,79.712502,72.2705,180.119995,102.483498,59.137333,209.830002,135.289993,203.559998,61.782501,120.080002,317.940002,65.459999,47.619999,144.729996,241.940002,142.552582,35.759014,115.269997,76.5,106.849998
2020-02-05,80.362503,72.302498,179.899994,101.9935,48.98,210.110001,137.589996,202.809998,62.689999,119.720001,329.549988,67.339996,48.450001,141.369995,244.300003,149.455063,36.21442,116.809998,78.849998,110.279999


-------------------------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------------------------------------------------

## Transformacion de datos

--------------------------------------------------------------------------------------------------------------------------------

**Cantidad de filas y columnas**

In [7]:
df.shape

(966, 20)

**Tipo de datos**

In [10]:
df.dtypes

AAPL     float64
GOOGL    float64
MSFT     float64
AMZN     float64
TSLA     float64
META     float64
JPM      float64
V        float64
NVDA     float64
PYPL     float64
BA       float64
INTC     float64
CSCO     float64
DIS      float64
GS       float64
IBM      float64
PFE      float64
WMT      float64
C        float64
CVX      float64
dtype: object

**Analizamos cantidad de nulos**

In [6]:
df.isnull().sum()

AAPL     0
GOOGL    0
MSFT     0
AMZN     0
TSLA     0
META     0
JPM      0
V        0
NVDA     0
PYPL     0
BA       0
INTC     0
CSCO     0
DIS      0
GS       0
IBM      0
PFE      0
WMT      0
C        0
CVX      0
dtype: int64

**Cambiamos el valor de dolar a pesos en una nueva columna**

In [9]:
# Creacion de un nuevo dataframe
valor_dolar = 370
df_pesos = df * valor_dolar

# Filtramos a 2 decimales
df_pesos = df_pesos.round(2)

df_pesos.head()

Unnamed: 0_level_0,AAPL,GOOGL,MSFT,AMZN,TSLA,META,JPM,V,NVDA,PYPL,BA,INTC,CSCO,DIS,GS,IBM,PFE,WMT,C,CVX
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
2020-01-30,29957.97,26903.63,63928.6,34607.58,15806.65,77526.1,50279.3,77037.7,22737.42,43334.4,119621.0,24593.9,17478.8,50989.7,90328.1,48379.45,13013.19,43134.6,28649.1,41218.0
2020-01-31,28629.68,26506.43,62985.1,37161.32,16047.39,74706.7,48973.2,73618.9,21869.77,42139.3,117759.9,23654.1,17008.9,51174.7,87967.5,50841.4,13072.87,42361.3,27531.7,39641.8
2020-02-03,28551.05,27428.1,64520.6,37077.7,19240.0,75550.3,49346.9,74299.7,22230.53,43108.7,116920.0,23835.4,17216.1,52288.4,88433.7,51739.87,13167.65,42279.9,27798.1,39323.6
2020-02-04,29493.63,26740.09,66644.4,37918.89,21880.81,77637.1,50057.3,75317.2,22859.53,44429.6,117637.8,24220.2,17619.4,53550.1,89517.8,52744.46,13230.84,42649.9,28305.0,39534.5
2020-02-05,29734.13,26751.92,66563.0,37737.59,18122.6,77740.7,50908.3,75039.7,23195.3,44296.4,121933.5,24915.8,17926.5,52306.9,90391.0,55298.37,13399.34,43219.7,29174.5,40803.6
