# File-Processing Library Demo
This notebook demonstrates the use of the `file-processing` library to extract metadata and text content from different file types. We'll be working with the `Constitution_Act.pdf` and `Ottawa_1935_Weather.csv` files.

In [1]:
# Import the File class from the file-processing library
from file_processing import File

# Use the helper function to get the path to the test files
from file_processing_test_data import get_test_files_path

# Load the test files
test_files_path = get_test_files_path()
pdf_file_path = test_files_path / 'Constitution_Act.pdf'
csv_file_path = test_files_path / 'Ottawa_1935_Weather.csv'

# Initialize File objects for the PDF and CSV files
pdf_file = File(str(pdf_file_path))
csv_file = File(str(csv_file_path))

## PDF File Metadata
Let's extract and display some basic metadata from the PDF file, such as the file name, size, owner, and text content.

In [2]:
# Print out the metadata
print(f"File Name: {pdf_file.file_name}\n")
print(f"File Size: {pdf_file.size} bytes\n")
print(f"Owner: {pdf_file.owner}\n")
print(f"Text Content: {pdf_file.metadata.get('text', 'No text extracted')[:500]}\n")  # Display the first 500 characters of text

File Name: Constitution_Act.pdf

File Size: 717135 bytes

Owner: AD/BCRUSE

Text Content: Current to January 1, 2024
Published by the Minister of Justice at the following address:
http://laws-lois.justice.gc.ca
Publié par le ministre de la Justice à l’adresse suivante :
http://lois-laws.justice.gc.ca
CANADA
CODIFICATION
LOIS CONSTITUTIONNELLES
DE 1867 à 1982
CONSOLIDATION
THE CONSTITUTION ACTS
  1867 to 1982
À jour au 1er  janvier 2024THE CONSTITUTION ACTS 1867 to 1982 LOIS CONSTITUTIONNELLES DE 1867 à 1982
 FOREWORD  AVANT-PROPOS
Current to January 1, 2024 ii À jour au 1er janvier 2



## CSV File Metadata
Now, let's extract and display metadata from the CSV file, such as the encoding, number of rows and columns, and a preview of the content.

In [3]:
# Print out detailed metadata from the CSV file
print(f"CSV File Name: {csv_file.file_name}\n")
print(f"CSV File Size: {csv_file.size} bytes\n")
print(f"CSV Owner: {csv_file.owner}\n")
print(f"CSV Encoding: {csv_file.metadata.get('encoding', 'Unknown')}\n")
print(f"Number of Rows: {csv_file.metadata.get('num_rows', 0)}\n")
print(f"Number of Columns: {csv_file.metadata.get('num_cols', 0)}\n")
print(f"Total Cells: {csv_file.metadata.get('num_cells', 0)}\n")
print(f"Empty Cells: {csv_file.metadata.get('empty_cells', 0)}")

# Preview the first 10 rows of the CSV content
csv_preview = csv_file.metadata.get('text', '').split('\n')[:10]
print("\nCSV Content Preview:")
for row in csv_preview:
    print(row)

CSV File Name: Ottawa_1935_Weather.csv

CSV File Size: 52818 bytes

CSV Owner: AD/BCRUSE

CSV Encoding: UTF-8-SIG

Number of Rows: 366

Number of Columns: 31

Total Cells: 11346

Empty Cells: 7652

CSV Content Preview:
Longitude (x)","Latitude (y)","Station Name","Climate ID","Date/Time","Year","Month","Day","Data Quality","Max Temp (°C)","Max Temp Flag","Min Temp (°C)","Min Temp Flag","Mean Temp (°C)","Mean Temp Flag","Heat Deg Days (°C)","Heat Deg Days Flag","Cool Deg Days (°C)","Cool Deg Days Flag","Total Rain (mm)","Total Rain Flag","Total Snow (cm)","Total Snow Flag","Total Precip (mm)","Total Precip Flag","Snow on Grnd (cm)","Snow on Grnd Flag","Dir of Max Gust (10s deg)","Dir of Max Gust Flag","Spd of Max Gust (km/h)","Spd of Max Gust Flag
-75.72","45.40","OTTAWA","6105887","1935-01-01","1935","01","01","","-6.1","","-18.3","","-12.2","","30.2","","0.0","","0.0","","1.0","","1.0","","","","","","","
-75.72","45.40","OTTAWA","6105887","1935-01-02","1935","01","02","","-13.9","","

# Conclusion
In this demo, we initialized `File` objects for both a PDF and a CSV file, extracting and displaying metadata for each. The `file-processing` library provides a unified interface to handle different file types efficiently.