Permalink
Fetching contributors…
Cannot retrieve contributors at this time
394 lines (295 sloc) 14.2 KB

File Validator

Overview

A Laravel package for applying validation rules to CSV and XLS files. Provides an interface for making sure files you're reading in your applications are valid against your defined schema.

The class loads a file from the local disk, accepts several options to validate against, then provides a collection of validation results.

use Mdcass\FileValidator\Facades\FileValidator;

$results = FileValidator::load('file.csv')->execute();
$results->file_exists; // true
  1. Installation
  2. Validating and Reading
  3. Methods
  4. Examples
  5. Accessors
  6. Results Bag
  7. FAQ
  8. Contributing

Installation

Installation instructions are for Laravel 5.3

Require the library

composer require mdcass/file-validator

Add the Service Provider to the providers list in config/app.php

// config/app.php

'providers' => [
    ...
    /*
     * Third Party Service Providers
     */
    Mdcass\FileValidator\FileValidatorServiceProvider::class
]

Optionally alias the Facade in the aliases list in config/app.php

// config/app.php

'aliases' => [
    ...
    'FileValidator' => Mdcass\FileValidator\Facades\FileValidator::class
]

The package loads files relative to the path set at config('filesystems.disks.local.root').

Validating and Reading

The two methods below represent the suggested implementation for this library.

Validating an Entire File

The execute method on the validator facade uses the configuration you specify to return a ResultsBag.

$results = FileValidator::load('relative/file/path.csv')
                             ->rules([
                                    'required' => [
                                        ['date']
                                    ],
                                    'date' => [
                                        ['date']
                                    ],
                                    'in' => [
                                        ['company', ['Schoen-Kertzmann']]
                                    ],
                                    'lengthMax' => [
                                        ['company', 255]
                                    ]
                               ])
                             ->execute();
$results->valid; // true

Traversing a File

The reader attribute on the validator facade represents the class's underlying Reader class (the configuration of which can be specified in the same fashion as when setting the header row).

The following example uses the chunk method on the underlying library to get results in a more efficient way from a large file.

$validator = FileValidator::load('accessor.csv')
                                ->aliases([
                                    'date' => 'datum'
                                ])
                                ->rules([
                                    'required' => [
                                        ['datum']
                                    ],
                                ]);

// The $validator is passed by reference into the chunk callback,
// as each $row it is given to validate will contribute to the internal
// results (such as the total row_count, valid_row_count, and invalid_row_count)
$validator->reader->chunk(1000, function($row) use (&$validator) {
    // Pass the row into the validator to validate, as well as set
    // any aliased rows and accessors
    $results = $validator->validateRow($row);

    $results; // ['valid' => true, 'row' => Mdcass\FileValidator\Support\Row]
    $results['row']->toArray(); // ['date' => '2016-11-23', ..., 'datum' => '2016-11-23']

}, false); // Third parameter prevents the underlying library for queueing the chunk

$results = $validator->results;
$results->valid // true        

Shorthand Instantiation

The class can be instantiated by passing an array to the constructor of the methods to call.

$validator = new FileValidator([
                    'load' => 'simple.csv',
                    'rules' => [
                                    'required' => [
                                        ['date']
                                    ]
                               ],
                    'distinct' => ['randomelementab'],

                ]);

$results = $validator->execute();

Methods

Method Params Description
load() string Load a file from a relative path
config() array Set config variables on the underlying library
sheet() string Set the sheet on an Excel file
rules() array Set validation rules
distinct() array Set fields to retrieve distinct values
aliases() array Set field aliases
accessors() array Set field accessors
execute() Validate the entire file

Attributes

Attribute Returns Description
results ResultsBag Validation results and distinct values

Examples

File Exists

Verify the file exists. The path passed to the load method should be relative to the Laravel Storage config variable at config('filesystems.disks.local.root').

$results = FileValidator::load('simple.csv')->execute();

$results->file_exists; // true
$results->has_data; // true

The has_data flag in the results will be false if only the header exists in the sheet

Set Header Row

Set the header row to use

$results = FileValidator::load('simple.csv')
                             ->config(['start_row' => 2])
                             ->execute();

$results->has_data; // true

The config method customises the attributes on the underlying Excel manipulation package, and you can see what can be configured here at Maatwebsite/Laravel-Excel@2.1

Specifying a sheet

$results = FileValidator::load('multisheet.xls')
                             ->sheet('Second Sheet')
                             ->execute();

$results->has_data; // true

The has_data flag in the results will be false if the sheet doesn't exist

Validation Rules

Validation rule reference available at the excellent library vlucas/valitron

simple_rules.csv

date company email randomDigit randomFloat
2016-11-30 Schoen-Kertzmann bbruen@yahoo.com 2 1.023789456
2016-12-01 Aufderhar LLC smitham.gayle@yahoo.com 7 183601.189291
$results = FileValidator::load('simple_rules.csv')
                             ->rules([
                                    'required' => [
                                        ['date']
                                    ],
                                    'date' => [
                                        ['date']
                                    ],
                                    'in' => [
                                        ['company', ['Schoen-Kertzmann']]
                                    ],
                                    'lengthMax' => [
                                        ['company', 255]
                                    ]
                               ])
                             ->execute();
$results->valid; // false
$results->row_count; // 2
$results->valid_row_count; // 1
$results->invalid_row_count; // 1
$results->errors; // ['company' => 1]

Retrieving Distinct Values

The library can collect distinct values of fields in the file which may be useful for further validation outside of the library.

Any fields set to return distinct values are added as required in the validation rules

simple_distinct.csv

date company email randomDigit randomFloat
2016-11-30 Schoen-Kertzmann bbruen@yahoo.com 2 1.023789456
2016-12-01 Aufderhar LLC smitham.gayle@yahoo.com 7 183601.189291
2016-12-01 Adel Sef rediase@yahoo.com 74 56.937856

$results = FileValidator::load('simple_distinct.csv')
            ->distinct(['date'])
            ->execute();

$results->values->date; // ['2016-11-30', '2016-12-01']

Accessors

You may specify a callback for a field to manipulate the value returned from the row.

Validation happens on the value returned from the callback

accessors.csv

date company email randomDigit randomFloat
2016-11-30 Schoen-Kertzmann bbruen@yahoo.com 2 1.023789456
2016-12-01 Aufderhar LLC smitham.gayle@yahoo.com 7 183601.189291
2016-12-01 Adel Sef rediase@yahoo.com 74 56.937856
// define an accessor for the date field to change the format from Y-m-d to m/d/Y
$validator = FileValidator::load('accessor.csv')
                            ->accessors([
                                    [
                                    'field' => 'date',
                                    'callback' => function($val) {
                                        return \Carbon\Carbon::parse($val)->format('m/d/Y');
                                    },
                                ],
                            ]);

// Retrieve a row from the underlying reader and
// pass it to the $validator for a validation result
$row = $validator->reader->take(1)->get();
$res = $validator->validateData($row);

$res['row']->date; // 30/11/2016

Aliases

You may alias an accessor with a given callback, and validate it separately to the original field.

$validator = FileValidator::load('accessor.csv')
                            ->accessors([
                                [
                                    'field' => 'date',
                                    'callback' => function($val) {
                                        return 1;
                                    },
                                    'alias' => 'datum',
                                ],
                            ])
                            ->rules([
                                'required' => [
                                    ['datum']
                                ],
                                'numeric' => [
                                    ['datum']
                                ],
                            ]);

$row = $validator->reader->take(1)->get();
$res = $validator->validateData($row);

$res['valid']; // true
$res['row']->toArray(); // ['date' => '2016-11-23', ..., 'datum' => 1]

Results Bag

Available Attributes

The execute method returns a class Mdcass\FileValidator\ResultsBag which has the following attributes:

attribute type Description
valid boolean Whether the file exists, has data, and has no validation errors
file_exists boolean Whether the file exists
has_data boolean Whether the file has data
row_count integer Total rows (not including the header row)
valid_row_count integer Total valid rows
invalid_row_count integer Total invalid rows
errors array List of invalid fields and the number of rows that failed validation
values ValuesBag Where distinct values are collected
timers Support/ScriptExecution Where timers are tracked

Timers

The Results Bag also contains a timers attribute, a list of script execution times. Each item in the list contains the start and end unix timestamps with microseconds, as well as the result in seconds.

name description
load Time to load the file
has_data Time to establish there is data to process in the file
execute Time to traverse the file, and validate against the set rules and collect distinct values
$results = FileValidator::load('file.csv')->execute();

$results->timers->load; // [ 'start' => 1482743338.946, 'end' => 1482743338.9575, 'result' => 0.011478900909424 ]

var_dump($results->timers->toArray()); // ['load' => [...], 'has_data' => [...], 'execute' => [...]]

FAQ

Can the library load files from a stream, or can a storage disk be specified?

No, the file must exist on the local disk. Whilst out of scope of this package, another package in the future will contain this functionality (however will simply by a hacky store locally, validate, delete as per this GitHub issue discussion on this package's underlying file reading package dependency)

Contributing

Testing

Package testing follows this article. To test:

$ cd /path/to/package/mdcass/file-validator
$ ../../../vendor/bin/phpunit

Tests create dummy files of data in the location defined in the package's config file config/file-validator-testing.php the variable storage_path.