**Introducción**  

Construir un pipeline de datos utilizando Python que considere los siguientes requisitos:

**Data Sources**

- Prints (prints.json) - historial de 1 mes de value props que fueron mostradas a cada usuario, en formato json lines
- Taps (taps.json) - historial de 1 mes de value props que fueron clickeadas por un usuario, en formato json lines
- Payments (pays.csv) - historial de 1 mes de pagos realizados por los usuarios, en formato csv

**Resultado esperado**

Un dataset de salida con la siguiente información:

- prints de la última semana
- por cada print:
  - un campo que indique si se hizo click o no
  - cantidad de veces que el usuario vio cada value prop en las 3 semanas previas a ese print.
  - cantidad de veces que el usuario clickeo cada value prop en las 3 semanas previas a ese print.
  - cantidad de pagos que el usuario realizó para cada value prop en las 3 semanas previas a ese print.
  - importes acumulados que el usuario gasto para cada value prop en las 3 semanas previas a ese print.

**Solución propuesta (Python 3)**

**Importación de librerias a utilizar**

In [116]:
import pandas as pd

from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.pipeline import Pipeline

import warnings
warnings.filterwarnings("ignore")

pd.set_option('future.no_silent_downcasting', True)


**Carga del conjunto de datos**  

Se hace uso de la libreria "pandas" para cargar los respectivos conjuntos de datos

In [117]:
df_prints = pd.read_json('data/prints.json', lines=True)
df_prints.head(10)

Unnamed: 0,day,event_data,user_id
0,2020-11-01,"{'position': 0, 'value_prop': 'cellphone_recha...",98702
1,2020-11-01,"{'position': 1, 'value_prop': 'prepaid'}",98702
2,2020-11-01,"{'position': 0, 'value_prop': 'prepaid'}",63252
3,2020-11-01,"{'position': 0, 'value_prop': 'cellphone_recha...",24728
4,2020-11-01,"{'position': 1, 'value_prop': 'link_cobro'}",24728
5,2020-11-01,"{'position': 2, 'value_prop': 'credits_consumer'}",24728
6,2020-11-01,"{'position': 3, 'value_prop': 'point'}",24728
7,2020-11-01,"{'position': 0, 'value_prop': 'point'}",25517
8,2020-11-01,"{'position': 1, 'value_prop': 'credits_consumer'}",25517
9,2020-11-01,"{'position': 2, 'value_prop': 'transport'}",25517


In [118]:
df_taps = pd.read_json('data/taps.json', lines=True)
df_taps.head(10)

Unnamed: 0,day,event_data,user_id
0,2020-11-01,"{'position': 0, 'value_prop': 'cellphone_recha...",98702
1,2020-11-01,"{'position': 2, 'value_prop': 'point'}",3708
2,2020-11-01,"{'position': 3, 'value_prop': 'send_money'}",3708
3,2020-11-01,"{'position': 0, 'value_prop': 'transport'}",93963
4,2020-11-01,"{'position': 1, 'value_prop': 'cellphone_recha...",93963
5,2020-11-01,"{'position': 0, 'value_prop': 'link_cobro'}",94945
6,2020-11-01,"{'position': 1, 'value_prop': 'cellphone_recha...",94945
7,2020-11-01,"{'position': 2, 'value_prop': 'prepaid'}",89026
8,2020-11-01,"{'position': 0, 'value_prop': 'link_cobro'}",7616
9,2020-11-01,"{'position': 0, 'value_prop': 'link_cobro'}",63471


In [119]:
df_pays = pd.read_csv('data/pays.csv')
df_pays.head(10)

Unnamed: 0,pay_date,total,user_id,value_prop
0,2020-11-01,7.04,35994,link_cobro
1,2020-11-01,37.36,79066,cellphone_recharge
2,2020-11-01,15.84,19321,cellphone_recharge
3,2020-11-01,26.26,19321,send_money
4,2020-11-01,35.35,38438,send_money
5,2020-11-01,20.95,85939,transport
6,2020-11-01,74.48,14372,prepaid
7,2020-11-01,31.52,14372,link_cobro
8,2020-11-01,83.76,65274,transport
9,2020-11-01,93.54,65274,prepaid


**Analisis exploratorio de las fuentes**  

El objetivo de esta fase es realizar un analisis descriptivo de las fuentes de datos a procesar con el fin de identificar su estructura, los tipos de datos utilizados y la distribución de los mismos.

In [120]:
df_prints.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 508617 entries, 0 to 508616
Data columns (total 3 columns):
 #   Column      Non-Null Count   Dtype 
---  ------      --------------   ----- 
 0   day         508617 non-null  object
 1   event_data  508617 non-null  object
 2   user_id     508617 non-null  int64 
dtypes: int64(1), object(2)
memory usage: 11.6+ MB


In [121]:
df_taps.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50859 entries, 0 to 50858
Data columns (total 3 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   day         50859 non-null  object
 1   event_data  50859 non-null  object
 2   user_id     50859 non-null  int64 
dtypes: int64(1), object(2)
memory usage: 1.2+ MB


In [122]:
df_pays.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 756483 entries, 0 to 756482
Data columns (total 4 columns):
 #   Column      Non-Null Count   Dtype  
---  ------      --------------   -----  
 0   pay_date    756483 non-null  object 
 1   total       756483 non-null  float64
 2   user_id     756483 non-null  int64  
 3   value_prop  756483 non-null  object 
dtypes: float64(1), int64(1), object(2)
memory usage: 23.1+ MB


- El set de datos "prints" tiene un total de 508.617 entradas y 3 columnas day, event_data y user_id, no presenta valores nulos.
- El set de datos "taps" tiene un total de 50.859 entradas y 3 columnas day, event_data y user_id, no presenta valores nulos.
- El set de datos "pays" tiene un total de 756.483 entradas y 4 columnas pay_date, total, user_id y value_prop, no presenta valores nulos.

De los 3 conjuntos de datos se infiere que los campos "user_id" y "value_prop" serán propicios para realizar las operaciones de agrupación a través del tiempo para el dataset de salida requerido.

**Preparación y limpieza de los datos**

Como parte de la fase de preparación de los datos se aplicaran las siguientes tranformaciones:

+ Convertir las columnas que representan fechas al tipo "datetime" conservando el formato 'YYYY-MM-DD'
+ Aplicar una operación tipo "explode/flatten" para el campo event_data, que significa transformar una estructura jerárquica potencialmente anidada (json) en un formato tabular en la que cada par clave-valor se convierte en columnas y filas.


In [123]:
def object_to_datetime(df: pd.DataFrame, column_name: str) -> pd.DataFrame:
    """
    Change column datatype to datetime.

    Parameters:
    - df: The original dataframe.
    - column_name: The name of the column containing string date values

    Returns:
    - A new dataframe with the datetime column converted.
    """
    df_converted = df.copy()
    try:
            converted = pd.to_datetime(df_converted[column_name], errors='raise')
            df_converted[column_name] = converted
    except (ValueError, TypeError):
           print("Error converting object to datetime type")
    return df_converted

In [124]:
def flatten_json_column(df: pd.DataFrame, column_name: str, record_prefix: str = '') -> pd.DataFrame:
    """
    Flatten a column containing JSON objects and adds the resulting fields to the original dataframe.

    Parameters:
    - df: The original dataframe.
    - column_name: The name of the column containing JSON/dict objects.
    - record_prefix: Prefix to add to flattened columns (optional).

    Returns:
    - A new dataframe with the JSON column flattened and merged.
    """
    # Ensure the column contains dict-like structures
    json_series = df[column_name].apply(lambda x: x if isinstance(x, dict) else {})

    # Normalize (flatten) the JSON column
    flattened = pd.json_normalize(json_series, sep='_')
    if record_prefix:
        flattened = flattened.add_prefix(record_prefix)

    # Combine with original DataFrame
    df_result = pd.concat([df.drop(columns=[column_name]), flattened], axis=1)
    return df_result

In [125]:
def clean_data(df1: pd.DataFrame, df2: pd.DataFrame, df3: pd.DataFrame):
  _df_prints = flatten_json_column(object_to_datetime(df1,'day'), 'event_data')
  _df_taps   = flatten_json_column(object_to_datetime(df2,'day'), 'event_data')
  _df_pays   = object_to_datetime(df3,'pay_date')

  return _df_prints, _df_taps, _df_pays

In [126]:
df_prints_final, df_taps_final, df_pays_final = clean_data(df_prints,df_taps,df_pays)

In [127]:
print("Prints :")
print(df_prints_final.head(10))
print("\nTaps :")
print(df_taps_final.head(10))
print("\nPayments :")
print(df_pays_final.head(10))

Prints :
         day  user_id  position          value_prop
0 2020-11-01    98702         0  cellphone_recharge
1 2020-11-01    98702         1             prepaid
2 2020-11-01    63252         0             prepaid
3 2020-11-01    24728         0  cellphone_recharge
4 2020-11-01    24728         1          link_cobro
5 2020-11-01    24728         2    credits_consumer
6 2020-11-01    24728         3               point
7 2020-11-01    25517         0               point
8 2020-11-01    25517         1    credits_consumer
9 2020-11-01    25517         2           transport

Taps :
         day  user_id  position          value_prop
0 2020-11-01    98702         0  cellphone_recharge
1 2020-11-01     3708         2               point
2 2020-11-01     3708         3          send_money
3 2020-11-01    93963         0           transport
4 2020-11-01    93963         1  cellphone_recharge
5 2020-11-01    94945         0          link_cobro
6 2020-11-01    94945         1  cellphone_rech

Examinando valores mínimos y máximos para la columnas tipo fecha en dataset "prints"

In [128]:
df_prints_final[['day']].agg(['min', 'max'])

Unnamed: 0,day
min,2020-11-01
max,2020-11-30


El análisis confirma que el intervalo de tiempo para el análisis final corresponde al mes de Noviembre de 2020

**Transformación de los datos para el dataset de salida**

A continuación se define un pipeline de procesamiento que recibe como parametros los 3 conjuntos de datos obtenidos del paso anterior y deberá retornar un solo dataset con lo requerimientos descritos en el apartado de Introducción, así:

- Seleccionar los print correspondientes a la última semana. Por cada print, crear los siguientes atributos:
  - 'was_clicked': indica si se hizo click o no en el print (true/false)
  - 'prints_last_window: cantidad de veces que el usuario vio cada value prop en las 3 semanas previas a ese print.
  - 'taps_last_windows': cantidad de veces que el usuario clickeo cada value prop en las 3 semanas previas a ese print.
  - 'payments_last_window': cantidad de pagos que el usuario realizó para cada value prop en las 3 semanas previas a ese print.
  - 'total_paid_last_window': importes acumulados que el usuario gasto para cada value prop en las 3 semanas previas a ese print.  


Las clases se implementaron con el fin de utilizar nombramiento de columnas dinámico y con la posibilidad de cambiar la ventana de tiempo para el analisis, por defecto, 21 dias, correspondiente a las 3 semanas previas al print en la última semana (últimos 7 días del mes de Noviembre)

In [129]:
class LastWeekOfDataFilter(BaseEstimator, TransformerMixin):
    def __init__(self, date_col='day', days=7):
        """
        Filter print data points over a time window:
          - Selection of prints in the last N days

        Parameters:
          - date_col: The name of the column containing date values
          - days: Number of days to filter

        Returns:
          - A new filtered dataframe for date_col in the last N days.
        """
        self.date_col = date_col
        self.days = days

    def fit(self, X, y=None):
        return self

    def transform(self, X):
        X = X.copy()
        max_ts = X[self.date_col].max()
        cutoff = max_ts - pd.Timedelta(days=self.days)
        return X[X[self.date_col] >= cutoff]

In [130]:
class PrintCounterWindow(BaseEstimator, TransformerMixin):
    def __init__(self, all_prints_df, window_days=21,
                 user_col='user_id', product_col='value_prop', time_col='day',
                 output_col='prints_last_window'):
        """
        Computes print-based feature over a time window:
        - Count of prints per user/product in the last N days

        Parameters:
          - all_prints_df: Dataframe containing all print values
          - window_days: Number of days to filter
          - user_col: The name of the column containing user ids values
          - product_col: The name of the column containing value prop values
          - time_col: The name of the column containing date values
          - output_col: The name of the output column

        Returns:
          - A new dataframe including 'prints_last_window' column to existing prints dataframe.
        """
        self.all_prints_df = all_prints_df.copy()
        self.window_days = window_days
        self.user_col = user_col
        self.product_col = product_col
        self.time_col = time_col
        self.output_col = output_col

    def fit(self, X, y=None):
        return self

    def transform(self, X):
        df = X.copy()
        df['_row_id'] = df.index

        history_df = self.all_prints_df.copy()

        # Join on user/product
        merged = df[[self.user_col, self.product_col, self.time_col, '_row_id']].merge(
            history_df[[self.user_col, self.product_col, self.time_col]],
            on=[self.user_col, self.product_col],
            suffixes=('_current', '_past'),
            how='left'
        )

        # Filter prints within the window
        window = pd.Timedelta(days=self.window_days)
        filtered = merged[
            (merged[f'{self.time_col}_past'] < merged[f'{self.time_col}_current']) &
            (merged[f'{self.time_col}_past'] >= merged[f'{self.time_col}_current'] - window)
        ]

        counts = filtered.groupby('_row_id').size().rename(self.output_col)

        # Merge back
        df = df.join(counts, on='_row_id')
        df[self.output_col] = df[self.output_col].fillna(0).astype(int)

        return df.drop(columns=['_row_id'])



In [131]:
class ClickFeaturesWindow(BaseEstimator, TransformerMixin):
    def __init__(self, clicks_df, window_days=21,
                 user_col='user_id', product_col='value_prop', time_col='day',
                 clicked_col='was_clicked', clicks_count_col='taps_last_window'):
        """
        Computes click-based features over a time window:
        - Click/Print match in the current window - 7 days
        - Count of clicks per user/product in the last N days

        Parameters:
          - clicks_df: Dataframe containing all taps values
          - window_days: Number of days to filter
          - user_col: The name of the column containing user ids values
          - product_col: The name of the column containing value prop values
          - time_col: The name of the column containing date values
          - clicked_col: The name of the output column for click/print match
          - clicks_count_col: The name of the output column for click counts


        Returns:
          - A new dataframe including 'was_clicked' , 'taps_last_window' columns to existing prints dataframe.
        """
        self.clicks_df = clicks_df.copy()
        self.window_days = window_days
        self.user_col = user_col
        self.product_col = product_col
        self.time_col = time_col
        self.clicked_col = clicked_col
        self.clicks_count_col = clicks_count_col

    def fit(self, X, y=None):
        return self

    def transform(self, X):
        X = X.copy()
        X['_row_id'] = X.index

        clicks = self.clicks_df.copy()


        # Exact match flag
        was_clicked_df = clicks[[self.user_col, self.product_col, self.time_col]].drop_duplicates()
        was_clicked_df[self.clicked_col] = True
        X = X.merge(was_clicked_df, how='left', on=[self.user_col, self.product_col, self.time_col])
        X[self.clicked_col] = X[self.clicked_col].fillna(False)

        # Count clicks in window
        merged = X[[self.user_col, self.product_col, self.time_col, '_row_id']].merge(
            clicks, on=[self.user_col, self.product_col], suffixes=('_print', '_click'), how='left'
        )

        window = pd.Timedelta(days=self.window_days)
        merged = merged[
            (merged[f'{self.time_col}_click'] < merged[f'{self.time_col}_print']) &
            (merged[f'{self.time_col}_click'] >= merged[f'{self.time_col}_print'] - window)
        ]
        counts = merged.groupby('_row_id').size().rename(self.clicks_count_col)

        # Merge back
        X = X.join(counts, on='_row_id')
        X[self.clicks_count_col] = X[self.clicks_count_col].fillna(0).astype(int)

        return X.drop(columns=['_row_id'])


In [132]:
class PaymentFeaturesWindow(BaseEstimator, TransformerMixin):
    def __init__(self, payments_df, window_days=21,
                 user_col='user_id', product_col='value_prop',
                 time_col='pay_date', amount_col='total',
                 payments_count_col='payments_last_window', total_paid_col='total_paid_last_window'):
        """
        Computes payment-based features over a time window:
        - Count of payments per user/product in the last N days
        - Total amount paid per user/product in the last N days

        Parameters:
          - payments_df: Dataframe containing all payment values
          - window_days: Number of days to filter
          - user_col: The name of the column containing user ids values
          - product_col: The name of the column containing value prop values
          - time_col: The name of the column containing date values
          - amount_col: The name of the column containing total values
          - payments_count_col: The name of the output column for payment counts
          - total_paid_col: The name of the output column for total paid amounts


        Returns:
          - A new dataframe including 'payments_last_window' , 'total_paid_last_window' columns to existing prints dataframe.
        """

        self.payments_df = payments_df.copy()
        self.window_days = window_days
        self.user_col = user_col
        self.product_col = product_col
        self.time_col = time_col
        self.amount_col = amount_col
        self.payments_count_col = payments_count_col
        self.total_paid_col = total_paid_col

    def fit(self, X, y=None):
        return self

    def transform(self, X):
        df = X.copy()
        df['_row_id'] = df.index

        payments = self.payments_df.copy()

        # Join on user/product
        merged = df[[self.user_col, self.product_col, 'day', '_row_id']].merge(
            payments[[self.user_col, self.product_col, self.time_col, self.amount_col]],
            on=[self.user_col, self.product_col], how='left'
        )

        # Filter payments within the window
        window = pd.Timedelta(days=self.window_days)
        mask = (
            (merged[self.time_col] < merged['day']) &
            (merged[self.time_col] >= merged['day'] - window)
        )
        filtered = merged[mask]

        # Aggregate: count + sum of total_paid
        agg = filtered.groupby('_row_id').agg({
            self.amount_col: ['count', 'sum']
        })
        agg.columns = [self.payments_count_col, self.total_paid_col]
        agg = agg.rename_axis('_row_id')

        # Merge back
        df = df.join(agg, on='_row_id')
        df[self.payments_count_col] = df[self.payments_count_col].fillna(0).astype(int)
        df[self.total_paid_col] = df[self.total_paid_col].fillna(0.0)

        return df.drop(columns=['_row_id'])

In [133]:
pipeline = Pipeline([
    ('last_week_only', LastWeekOfDataFilter(date_col='day', days=7)),
    ('print_feature', PrintCounterWindow(df_prints_final, window_days=21)),
    ('click_features', ClickFeaturesWindow(df_taps_final, window_days=21)),
    ('payment_features', PaymentFeaturesWindow(df_pays_final, window_days=21))
])

In [134]:
output_df = pipeline.fit_transform(df_prints_final)

In [135]:
output_df.head(10)

Unnamed: 0,day,user_id,position,value_prop,prints_last_window,was_clicked,taps_last_window,payments_last_window,total_paid_last_window
0,2020-11-23,69000,0,credits_consumer,0,False,0,0,0.0
1,2020-11-23,69000,1,link_cobro,0,False,0,0,0.0
2,2020-11-23,69000,2,transport,1,False,0,1,136.73
3,2020-11-23,69000,3,prepaid,0,False,0,3,143.82
4,2020-11-23,66521,0,prepaid,0,False,0,1,157.87
5,2020-11-23,66521,1,cellphone_recharge,1,False,0,0,0.0
6,2020-11-23,66521,2,credits_consumer,0,True,0,1,8.53
7,2020-11-23,66521,3,point,2,False,1,1,32.97
8,2020-11-23,65232,0,credits_consumer,0,False,0,0,0.0
9,2020-11-23,5810,0,cellphone_recharge,0,True,0,2,62.8


**Pruebas de validación de resultados**   

+ *User_id = 69000*

In [137]:
output_df[output_df['user_id'] == 69000]

Unnamed: 0,day,user_id,position,value_prop,prints_last_window,was_clicked,taps_last_window,payments_last_window,total_paid_last_window
0,2020-11-23,69000,0,credits_consumer,0,False,0,0,0.0
1,2020-11-23,69000,1,link_cobro,0,False,0,0,0.0
2,2020-11-23,69000,2,transport,1,False,0,1,136.73
3,2020-11-23,69000,3,prepaid,0,False,0,3,143.82


In [138]:
df_prints_final[df_prints_final['user_id'] == 69000]

Unnamed: 0,day,user_id,position,value_prop
291082,2020-11-18,69000,0,point
291083,2020-11-18,69000,1,transport
291084,2020-11-18,69000,2,send_money
381063,2020-11-23,69000,0,credits_consumer
381064,2020-11-23,69000,1,link_cobro
381065,2020-11-23,69000,2,transport
381066,2020-11-23,69000,3,prepaid


En el dataset the "prints", el value_prop 'transport' aparece 1 sola vez en la ventana de tiempo anterior (2020-11-18) lo cual confirma el valor del campo 'prints_last_window' : 1

In [139]:
df_taps_final[df_taps_final['user_id'] == 69000]

Unnamed: 0,day,user_id,position,value_prop


En el dataset the "taps", no aparece información alguna para el user_id: 69000, lo cual confirma el valor del campo 'taps_last_window' : 0 y 'was_clicked': False

In [140]:
df_pays_final[df_pays_final['user_id'] == 69000]

Unnamed: 0,pay_date,total,user_id,value_prop
13148,2020-11-01,96.16,69000,link_cobro
13149,2020-11-01,0.65,69000,cellphone_recharge
31000,2020-11-02,28.09,69000,prepaid
31001,2020-11-02,101.82,69000,cellphone_recharge
81432,2020-11-04,9.08,69000,prepaid
81433,2020-11-04,136.73,69000,transport
229909,2020-11-10,3.74,69000,cellphone_recharge
229910,2020-11-10,29.38,69000,point
254403,2020-11-11,106.65,69000,prepaid
254404,2020-11-11,24.23,69000,point


En el dataset the "payments", los value_prop 'transport, prepaid' aparece 1 y 3 veces respectivamente en la ventana de tiempo anterior [2020-11-02 , 2020-11-23), lo cual confirma el valor del campo 'payments_last_window' : [1, 3] y  'total_paid_last_window': [136.73, 143.82]

+ *User_id = 5810*

In [141]:
output_df[output_df['user_id'] == 5810]

Unnamed: 0,day,user_id,position,value_prop,prints_last_window,was_clicked,taps_last_window,payments_last_window,total_paid_last_window
9,2020-11-23,5810,0,cellphone_recharge,0,True,0,2,62.8
10,2020-11-23,5810,1,credits_consumer,0,False,0,0,0.0
57218,2020-11-26,5810,0,point,0,True,0,0,0.0
57219,2020-11-26,5810,1,cellphone_recharge,1,False,1,1,40.48
57220,2020-11-26,5810,2,transport,0,False,0,1,24.48
57221,2020-11-26,5810,3,link_cobro,0,False,0,1,159.01


In [142]:
df_prints_final[df_prints_final['user_id'] == 5810]

Unnamed: 0,day,user_id,position,value_prop
381072,2020-11-23,5810,0,cellphone_recharge
381073,2020-11-23,5810,1,credits_consumer
438281,2020-11-26,5810,0,point
438282,2020-11-26,5810,1,cellphone_recharge
438283,2020-11-26,5810,2,transport
438284,2020-11-26,5810,3,link_cobro


En el dataset the "prints", el value_prop 'cellphone_recharge' aparece 1 sola vez en la ventana de tiempo anterior (2020-11-23) lo cual confirma el valor del campo 'prints_last_window' : 1

In [143]:
df_taps_final[df_taps_final['user_id'] == 5810]

Unnamed: 0,day,user_id,position,value_prop
38078,2020-11-23,5810,0,cellphone_recharge
43846,2020-11-26,5810,0,point


En el dataset the "taps", el value_prop 'cellphone_recharge' y 'point' tienen un click para las fechas del 2020-11-23 y 2020-11-26, lo cual confirma el valor del campo 'was_clicked' : True y el campo 'taps_last_window': 1 para la fecha 2020-11-26 con value_prop: 'cellphone_recharge'

In [144]:
df_pays_final[df_pays_final['user_id'] == 5810]

Unnamed: 0,pay_date,total,user_id,value_prop
90126,2020-11-04,22.32,5810,cellphone_recharge
90127,2020-11-04,135.72,5810,transport
228409,2020-11-10,159.01,5810,link_cobro
228410,2020-11-10,24.48,5810,transport
482330,2020-11-20,40.48,5810,cellphone_recharge
645586,2020-11-26,60.87,5810,point
645587,2020-11-26,3.54,5810,transport
725764,2020-11-29,77.54,5810,point


En el dataset the "payments", los value_prop 'cellphone_recharge, transport' toman intervalos de tiempo diferentes: [2020-11-02 , 2020-11-23), sobre este se caluculan los agregados para los print del 2020-11-23	y un segundo intervalo [2020-11-05 ; 2020-11-26) para los print del 2020-11-26 , lo cual confirma los valores obtenidos para los diferentes intervalos de tiempo correspondientes a cada punto

**Guardar archivo de salida en disco, folder data**


In [145]:
output_df.to_csv('data/prints_refined.csv', encoding='utf-8', index=False, header=True)