Skip to content

kappamaki/datacomp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

datacomp

datacomp is a python command-line tool for comparing 2 tabular data files (.csv and .parquet formats currently supported) and providing detailed information on the similarity of the data.

By default, it will compare rows based on the row number. You can specify one or more columns to use as an index for comparing rows.

e.g.

# Compare files using columns named "id" and "timestamp" to index the rows
# (will raise an error if these columns do not exist in either input file)
datacomp filepath1.parquet filepath2.parquet id timestamp

For more detailed usage instructions, run datacomp --help.

Installation

pip3 install git+https://github.com/kappamaki/datacomp

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages