@pondigitalsolutions/guess-csv

With this package you can make an educated guess for the mapping of a csv file. Without knowing about the headers or data. It will make an educated guess based on the header and the data of the first x lines.

This will help with user uploaded csv's, because you cannot trust the headers (typos, renaming) or the placement of the columns.

Educated guess

cars.csv

#; DEALERNUMMER; DEALERNAAM; MERK; CHASSISNUMMER; KENTEKEN; DATUM_IN; DATUM_UIT; REDEN_UIT; OBJECT_STATUS
1;123456;Car Rental;V;1FBHE21H6MHA60929;AB123C;23-1-2018;19-2-2020;scrap;BEEINDIGD
2;456789;Super Rental;A;1G1JC6SH4E4163616;EF456G;19-2-2020;;;Accident

usage:

import {
  educatedGuess, 
  fetchDatafromCsv,
  VinValidator,
  LicensePlateValidator,
  Validator
 } from '@pondigitalsolutions/guess-csv'

const logger = console
const domain = [
  new VinValidator({ header: 'vin' }),
  new LicensePlateValidator({ header: 'kenteken' }),
  new Validator({
    header: 'dealer',
    dictionary: ['dealer'],
    regex: '^\\d{6}$'
  })
]

const fileName = 'cars.csv'

const suggestion = await educatedGuess({
  fileName,
  domain,
  logger
})
const data = await fetchDatafromCsv(suggestion)
logger.info(data)

Running the above code will result in:

Delimiter Found: ;
Mapping Suggestion: {"CHASSISNUMMER":"vin","KENTEKEN":"kenteken","DEALERNUMMER":"dealer"}
[
  { dealer: '123456', vin: '1FBHE21H6MHA60929', kenteken: 'AB123C' },
  { dealer: '456789', vin: '1G1JC6SH4E4163616', kenteken: 'EF456G' }
]

The domain array is build up from validators. The validators try to find the most likely column which will fit. Based on a dictionary of synonyms, and a data check.

The algorithm will for each header score the strings in the dictionary based on the Levenshtein distance. The data in the column will receive a percentage based on if the validator will succeed on the data. These results will be combined into a score and the best guess will win.

Validators

Currently their are a few Validators available:

DateValidator - Checks if the data is date/datetime
LicensePlateValidator - Checks if the data is a (Dutch) licenseplate
NumericValidator - Checks if the data is numeric
StringValidator - Checks if the data is of type string
VinValidator - Checks for vin numbers (chassisnumbers)
Validator - The Base validator works with a regex

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.github/workflows		.github/workflows
src		src
.editorconfig		.editorconfig
.gitignore		.gitignore
.jscpd.json		.jscpd.json
.npmrc		.npmrc
.nvmrc		.nvmrc
LICENSE		LICENSE
README.md		README.md
babel.config.cjs		babel.config.cjs
jest.config.cjs		jest.config.cjs
package-lock.json		package-lock.json
package.json		package.json
sonar-project.properties		sonar-project.properties

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

@pondigitalsolutions/guess-csv

Educated guess

Validators

About

Releases 5

Packages 1

Languages

License

pondigitalsolutions/guess-csv

Folders and files

Latest commit

History

Repository files navigation

@pondigitalsolutions/guess-csv

Educated guess

Validators

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 5

Packages 1

Languages