# The `DataFilter` Class

The class `DataFilter` is initialized with 4 parameters: 
    1. start_date
    2. end_date
    3. input_directory
    4. output_directory.

`start_date` and `end_date` are the date range limits that Pandas converts to datetime objects. `input_directory` is where the original CSV files are stored. `output_directory` is where the new CSV files with filtered data will be saved.


The class has one method named `filter_data` that loads all the CSV files from the `input_directory`. Then, for each of these files, the program does the following:

    - reads the CSV file into a pandas DataFrame, with the assumption that the CSV file has no header.

    - assigns the column names as 'datetime' and 'devicecount'.

    - converts the 'datetime' column into a datetime format suitable for filtering.

    - creates a mask for the date range: any date within `start_date` and `end_date` will have the value True. The mask is to make a DataFrame (filtered_data) from the original DataFrame that only includes data rows within the date range specified.

    - If the output directory does not exist, it creates one.

    - name of the output file is taken from the original file and saves `filtered_data` into a new CSV file in the `output_directory`.
    

In [1]:
import os
import pandas as pd
from glob import glob

class DataFilter:
    def __init__(self, start_date, end_date, input_directory, output_directory):
        self.start_date         = pd.to_datetime(start_date)
        self.end_date           = pd.to_datetime(end_date)
        self.input_directory    = input_directory
        self.output_directory   = output_directory

    def filter_data(self):
        csv_files = glob(os.path.join(self.input_directory, '*.csv'))

        for file in csv_files:
            try:
                df              = pd.read_csv(file, header=None)
                df.columns      = ['datetime', 'devicecount']
                df['datetime']  = pd.to_datetime(df['datetime'])
                mask            = (df['datetime'] > self.start_date) & (df['datetime'] <= self.end_date)
                filtered_data   = df.loc[mask]
                
                if not os.path.exists(self.output_directory):
                    os.makedirs(self.output_directory)
            
                file_name   = os.path.basename(file)
                output_file = os.path.join(self.output_directory, file_name)
                filtered_data.to_csv(output_file, index=False)

            except Exception as e:
                print(f"An error occurred when processing {file}: {e}")

            
            
BASE_DIR            = os.getcwd()

def build_path(*args):
    return os.path.join(BASE_DIR, *args)

CSV_DIRECTORY       = build_path('data', 'input', 'WiFiData')

data_filter = DataFilter('2019-09-15', '2019-11-15', CSV_DIRECTORY , 'CUT-CSV')
data_filter.filter_data()


In [None]:
BASE_DIR            = os.getcwd()

def build_path(*args):
    return os.path.join(BASE_DIR, *args)

CSV_DIRECTORY       = build_path('data', 'input', 'WiFiData')

data_filter = DataFilter('2019-09-15', '2019-11-15', CSV_DIRECTORY , 'CUT-CSV')
data_filter.filter_data()