# File-Processing Analytics Demo
This notebook demonstrates the `file-processing-analytics` library, which extracts metadata from a collection of files and saves it to a CSV. We’ll gather sample files using `get_all_test_files()` and process them.

In [1]:
# Import the necessary classes
from file_processing_analytics.analytics import AnalyticsProcessor
from file_processing_analytics.progress import ProgressTracker
from file_processing_test_data import get_all_test_files

# Define the path for the output CSV file
output_csv_path = 'output/metadata_results.csv'

# Use the helper function to get a list of all available test files
file_list = get_all_test_files()

# Initialize the AnalyticsProcessor with a list of files and output path
processor = AnalyticsProcessor(
    input_collection=file_list,
    output_csv_path=output_csv_path,
    progress_tracker=ProgressTracker()
)

## Processing Files
The `AnalyticsProcessor` will iterate over each file in the list, extract metadata, and save the results in `metadata_results.csv`.

In [2]:
# Run the file processing and save metadata to CSV
processor.process_files()

[INFO] Starting processing of 132 files.
  0%|          | 0/132 [00:00<?, ?file/s][INFO] Processing file: c:\Users\BCRUSE\CRUSE\Repos2\file-processing-guide\.venv\lib\site-packages\file_processing_test_data\test_files\2021_Census_English.csv
  1%|          | 1/132 [00:07<16:29,  7.55s/file][INFO] Processing file: c:\Users\BCRUSE\CRUSE\Repos2\file-processing-guide\.venv\lib\site-packages\file_processing_test_data\test_files\2021_Census_English_corrupted.csv
[ERROR] Error processing file c:\Users\BCRUSE\CRUSE\Repos2\file-processing-guide\.venv\lib\site-packages\file_processing_test_data\test_files\2021_Census_English_corrupted.csv: Error encountered while processing c:\Users\BCRUSE\CRUSE\Repos2\file-processing-guide\.venv\lib\site-packages\file_processing_test_data\test_files\2021_Census_English_corrupted.csv: 'charmap' codec can't decode byte 0x90 in position 18: character maps to <undefined>
  2%|▏         | 2/132 [00:25<30:06, 13.90s/file][INFO] Processing file: c:\Users\BCRUSE\CRUSE\



  8%|▊         | 10/132 [00:26<02:47,  1.37s/file][INFO] Processing file: c:\Users\BCRUSE\CRUSE\Repos2\file-processing-guide\.venv\lib\site-packages\file_processing_test_data\test_files\CanadaLogo_corrupted.tif
[ERROR] Error processing file c:\Users\BCRUSE\CRUSE\Repos2\file-processing-guide\.venv\lib\site-packages\file_processing_test_data\test_files\CanadaLogo_corrupted.tif: Error encountered while processing c:\Users\BCRUSE\CRUSE\Repos2\file-processing-guide\.venv\lib\site-packages\file_processing_test_data\test_files\CanadaLogo_corrupted.tif: cannot identify image file 'C:\\Users\\BCRUSE\\CRUSE\\Repos2\\file-processing-guide\\.venv\\Lib\\site-packages\\file_processing_test_data\\test_files\\CanadaLogo_corrupted.tif'
[INFO] Processing file: c:\Users\BCRUSE\CRUSE\Repos2\file-processing-guide\.venv\lib\site-packages\file_processing_test_data\test_files\canadian_constitution.txt
[INFO] Processing file: c:\Users\BCRUSE\CRUSE\Repos2\file-processing-guide\.venv\lib\site-packages\file_proce

## Viewing Results
After processing, the CSV file `metadata_results.csv` contains metadata for each file, along with any errors encountered. **Note**: Some test files are intentionally set up to produce errors for testing purposes, and any error messages in the CSV are expected as part of this demonstration.

In [5]:
# Display the first few lines of the CSV to confirm output
import pandas as pd

# Load and display a preview of the CSV file
df = pd.read_csv(output_csv_path)
df.head()

Unnamed: 0,file_name,text,error
0,2021_Census_English.csv,"CENSUS_YEAR"",""DGUID"",""ALT_GEO_CODE"",""GEO_LEVEL...",
1,2021_Census_English_corrupted.csv,,Error encountered while processing c:\Users\BC...
2,align.py,,
3,Approved_Schools_2023_10_01.csv,"academic_level_area_of_study_e"",""academic_leve...",
4,ArtificialNeuralNetworksForBeginners.pdf,Artificial Neural Networks for Beginners\nCarl...,


# Conclusion
In this demo, we used `file-processing-analytics` to gather and store metadata for a collection of files in CSV format. This library serves as a powerful tool for data discovery and auditing across diverse file types, integrating seamlessly with `file-processing`.