# Adquisición de datos para finanzas

## 1. Adquisición de datos a partir de ficheros

jupyter notebook

### 1.1 Ficheros separados por coma (CSV).

In [2]:
import pandas as pd

# Separador por defecto ','
invoices_df = pd.read_csv('../data/ecommerce.csv')
print(invoices_df.head())

  InvoiceNo StockCode                          Description  Quantity  \
0    536365    85123A   WHITE HANGING HEART T-LIGHT HOLDER         6   
1    536365     71053                  WHITE METAL LANTERN         6   
2    536365    84406B       CREAM CUPID HEARTS COAT HANGER         8   
3    536365    84029G  KNITTED UNION FLAG HOT WATER BOTTLE         6   
4    536365    84029E       RED WOOLLY HOTTIE WHITE HEART.         6   

      InvoiceDate  UnitPrice  CustomerID         Country  
0  12/1/2010 8:26       2.55     17850.0  United Kingdom  
1  12/1/2010 8:26       3.39     17850.0  United Kingdom  
2  12/1/2010 8:26       2.75     17850.0  United Kingdom  
3  12/1/2010 8:26       3.39     17850.0  United Kingdom  
4  12/1/2010 8:26       3.39     17850.0  United Kingdom  


In [3]:
# Cuando el separador no es ',' hay que especificarlo. Puede ser ';', tabulación, '#' u otros
invoices_semicolon_sep_df = pd.read_csv('../data/ecommerce_semicolon_sep.csv')
print(invoices_semicolon_sep_df.head())

  InvoiceNo;StockCode;Description;Quantity;InvoiceDate;UnitPrice;CustomerID;Country
0  536365;85123A;WHITE HANGING HEART T-LIGHT HOLD...                               
1  536365;71053;WHITE METAL LANTERN;6;12/1/2010 8...                               
2  536365;84406B;CREAM CUPID HEARTS COAT HANGER;8...                               
3  536365;84029G;KNITTED UNION FLAG HOT WATER BOT...                               
4  536365;84029E;RED WOOLLY HOTTIE WHITE HEART.;6...                               


In [4]:
invoices_semicolon_sep_df = pd.read_csv('../data/ecommerce_semicolon_sep.csv', sep=';')
print(invoices_semicolon_sep_df.head())

  InvoiceNo StockCode                          Description  Quantity  \
0    536365    85123A   WHITE HANGING HEART T-LIGHT HOLDER         6   
1    536365     71053                  WHITE METAL LANTERN         6   
2    536365    84406B       CREAM CUPID HEARTS COAT HANGER         8   
3    536365    84029G  KNITTED UNION FLAG HOT WATER BOTTLE         6   
4    536365    84029E       RED WOOLLY HOTTIE WHITE HEART.         6   

      InvoiceDate  UnitPrice  CustomerID         Country  
0  12/1/2010 8:26       2.55     17850.0  United Kingdom  
1  12/1/2010 8:26       3.39     17850.0  United Kingdom  
2  12/1/2010 8:26       2.75     17850.0  United Kingdom  
3  12/1/2010 8:26       3.39     17850.0  United Kingdom  
4  12/1/2010 8:26       3.39     17850.0  United Kingdom  


### 1.3. Ficheros de Excel.

### 1.4. Ficheros JSON.

### 1.5. Ficheros en formato parquet.

[Apache Parquet](https://parquet.apache.org/)
* ¿Qué es?
Apache Parquet es un formato de fichero columnar, de código abierto y diseñado para ser más eficiente en la lectura, almacenamiento y escritura de datos.

* ¿Qué problema viene a resolver?

* ¿Cómo lo resuelve? ¿Cuáles son sus ventajas?

* Desventajas.
La principal desventaja de un fichero parquet es que no es legible para, a diferencia de los ficheros CSV o JSON.


[Apache Arrow](https://arrow.apache.org/)

https://www.youtube.com/watch?app=desktop&v=1j8SdS7s_NY

https://www.databricks.com/glossary/what-is-parquet

https://es.slideshare.net/databricks/the-parquet-format-and-performance-optimization-opportunities


In [5]:
import pandas as pd

# You can specifiy an engine to direct the serialization.
# This can be one of pyarrow, or fastparquet, or auto.
# If the engine is NOT specified, then the pd.options.io.parquet.engine option is checked;
# if this is also auto, then then pyarrow is tried, and falling back to fastparquet

ecommerce_parquet_df = pd.read_parquet('../data/ecommerce.parquet', engine='pyarrow')
print(ecommerce_parquet_df.head())

  InvoiceNo StockCode                          Description  Quantity  \
0    536365    85123A   WHITE HANGING HEART T-LIGHT HOLDER         6   
1    536365     71053                  WHITE METAL LANTERN         6   
2    536365    84406B       CREAM CUPID HEARTS COAT HANGER         8   
3    536365    84029G  KNITTED UNION FLAG HOT WATER BOTTLE         6   
4    536365    84029E       RED WOOLLY HOTTIE WHITE HEART.         6   

      InvoiceDate  UnitPrice  CustomerID         Country  
0  12/1/2010 8:26       2.55     17850.0  United Kingdom  
1  12/1/2010 8:26       3.39     17850.0  United Kingdom  
2  12/1/2010 8:26       2.75     17850.0  United Kingdom  
3  12/1/2010 8:26       3.39     17850.0  United Kingdom  
4  12/1/2010 8:26       3.39     17850.0  United Kingdom  


## 2. Adquisición de datos a través de APIs.

In [6]:
import requests
import pandas as pd
from pandas import json_normalize


url = "https://real-time-product-search.p.rapidapi.com/search"

querystring = {"q":"Nike shoes","country":"us","language":"en","limit":"30"}

headers = {
	"X-RapidAPI-Key": "be814bcabbmshc4f57ebcf4b7568p1eb15djsn52335224755f",
	"X-RapidAPI-Host": "real-time-product-search.p.rapidapi.com"
}

response = requests.get(url, headers=headers, params=querystring).json()
print(response)

{'status': 'OK', 'request_id': '7a2babaa-3790-4fe5-a8b9-66e128a46fd0', 'data': [{'product_id': '1895888000104236047', 'product_id_v2': '1895888000104236047:17750431743876774496', 'product_title': 'Nike PS Dunk Low - White / Black 11.5C', 'product_description': "The Nike Dunk Low Retro White Black (PS) sneakers combine iconic style with modern comfort. With its timeless white and black colorway, these sneakers are versatile and perfect for any occasion. The retro design pays homage to the original Nike Dunk, while the low-top silhouette offers a contemporary vibe. Crafted with premium materials, these sneakers provide durability and support. Whether you're hitting the skate park or strolling the streets, the Nike Dunk Low Retro White Black (PS) sneakers will elevate your footwear game.", 'product_photos': ['https://encrypted-tbn2.gstatic.com/shopping?q=tbn:ANd9GcSQa1yIcq2PSPAFale5P3hSHy0ztLtCv6BZlJfehg1BdCY17IzXYSrYa2oQuyh1sXxq2l1fODkh59QrNEiTQRgqmudtN2fx&usqp=CAE', 'https://encrypted-t

In [7]:
data_dict = response["data"]
print(data_dict)

[{'product_id': '1895888000104236047', 'product_id_v2': '1895888000104236047:17750431743876774496', 'product_title': 'Nike PS Dunk Low - White / Black 11.5C', 'product_description': "The Nike Dunk Low Retro White Black (PS) sneakers combine iconic style with modern comfort. With its timeless white and black colorway, these sneakers are versatile and perfect for any occasion. The retro design pays homage to the original Nike Dunk, while the low-top silhouette offers a contemporary vibe. Crafted with premium materials, these sneakers provide durability and support. Whether you're hitting the skate park or strolling the streets, the Nike Dunk Low Retro White Black (PS) sneakers will elevate your footwear game.", 'product_photos': ['https://encrypted-tbn2.gstatic.com/shopping?q=tbn:ANd9GcSQa1yIcq2PSPAFale5P3hSHy0ztLtCv6BZlJfehg1BdCY17IzXYSrYa2oQuyh1sXxq2l1fODkh59QrNEiTQRgqmudtN2fx&usqp=CAE', 'https://encrypted-tbn3.gstatic.com/shopping?q=tbn:ANd9GcQZZn2-w-DfKNBSSrtVwvSPeklf-JVDRBlttQ52m3PI

In [8]:
selected_cols = [
    'product_id',
    'product_title',
    'product_rating',
    'typical_price_range',
    'offer'
]
data_df = pd.DataFrame(data_dict)[selected_cols]
print(data_df.head())

             product_id                                      product_title  \
0   1895888000104236047             Nike PS Dunk Low - White / Black 11.5C   
1   2060730182710679218  Nike Court Vision Low Next Nature White/Pink W...   
2   2334515098854897626      Jordan 4 Retro Travis Scott Cactus Jack (F&F)   
3  11518126772533919413                  Nike Dunk Low GS White Black Blue   
4   3750088064557429106  Nike Women's Court Vision Low Next Nature Shoe...   

   product_rating typical_price_range  \
0             4.4    [$70.00, $87.00]   
1             4.5          [$69, $84]   
2             4.4  [$12,026, $12,095]   
3             4.6          [$79, $90]   
4             4.5    [$60.97, $80.00]   

                                               offer  
0  {'store_name': 'Nike', 'store_rating': 4.5, 'o...  
1  {'store_name': 'ShopWSS', 'store_rating': None...  
2  {'store_name': 'StockX', 'store_rating': 4.1, ...  
3  {'store_name': 'GOAT', 'store_rating': 4.1, 'o...  
4  {'stor

In [9]:
# Aplanar el diccionario dentro de la columna 'datos'
df_aplanado = json_normalize(data_df['offer'])

# Concatenar el DataFrame aplanado con el DataFrame original
df_resultante = pd.concat([data_df, df_aplanado], axis=1)

cols_to_drop = [
    'offer',
    'offer_page_url',
    'store_reviews_page_url',
    'original_price',
    'product_condition',
    'buy_now_url',
    'on_sale',
    'shipping'
]
df_resultante = df_resultante.drop(columns=cols_to_drop, axis=1)
print(df_resultante.head())

             product_id                                      product_title  \
0   1895888000104236047             Nike PS Dunk Low - White / Black 11.5C   
1   2060730182710679218  Nike Court Vision Low Next Nature White/Pink W...   
2   2334515098854897626      Jordan 4 Retro Travis Scott Cactus Jack (F&F)   
3  11518126772533919413                  Nike Dunk Low GS White Black Blue   
4   3750088064557429106  Nike Women's Court Vision Low Next Nature Shoe...   

   product_rating typical_price_range             store_name  store_rating  \
0             4.4    [$70.00, $87.00]                   Nike           4.5   
1             4.5          [$69, $84]                ShopWSS           NaN   
2             4.4  [$12,026, $12,095]                 StockX           4.1   
3             4.6          [$79, $90]                   GOAT           4.1   
4             4.5    [$60.97, $80.00]  DICK'S Sporting Goods           4.6   

   store_review_count       price                  tax  
0    

## 3. Adquisición de datos a través de conexiones a bases de datos (BBDD).

### 3.1 Bases de datos relacionales (SQL).

Se ha generado una base de datos PostgreSQL en https://console.neon.tech/app/projects de forma gratuita para este caso. Se han insertado 18 registros del CSV de ecommerce trabajado previamente.

In [13]:
import pandas as pd
from sqlalchemy import create_engine, URL

url_object = URL.create(
    "postgresql",
    username="ismaelcazalilla",
    password="dxcvRtW4N3KL",
    host="ep-throbbing-haze-36918596.eu-central-1.aws.neon.tech",
    database="adquisicion_datos",
)

# Generamos una instancia de motor de conexión a la base de datos
db_engine = create_engine(url_object)

# Conectamos con la base de datos y lanzamos una query para leer los datos
with db_engine.connect() as conn, conn.begin():  
    df = pd.read_sql_query("SELECT * FROM adquisicion.ecommerce WHERE invoiceno='536365'",con=db_engine)
    print(df.head())

OperationalError: (psycopg2.OperationalError) ERROR:  Endpoint ID is not specified. Either please upgrade the postgres client library (libpq) for SNI support or pass the endpoint ID (first part of the domain name) as a parameter: '?options=endpoint%3D<endpoint-id>'. See more at https://neon.tech/sni
ERROR:  connection is insecure (try using `sslmode=require`)

(Background on this error at: https://sqlalche.me/e/20/e3q8)