Rob's Python Tools
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
data
.gitignore
LICENSE.txt
README.md
requirements.txt
test_read_data.py
tools.py

README.md

Rob's Python Tools

Introduction

This is my repository of code for working with CSV files.

I frequently work with CSV data, using Python to:

  • Import transactions from downloaded Amazon, eBay and PayPal CSV files into accounting software.
  • Format data from CSV files for use in Microsoft Access tables

Over the years I've written a set of basic functions I call 'tools', for extracting, formatting and manipulating data from CSV files. For each type of CSV file I encountered, I combined these tools as needed into custom, file- specific code depending on how the CSV data was to be used.

Example

I need to import Amazon seller account transactions from files downloaded from Amazon into GnuCash.

The Amazon download is a tab delimited file with extra lines at the top that contain header data. See the file amazon.txt in this repo's folder data for a sample Amazon file. Here are some of the steps required to prepare the Amazon data for import:

  • Read the data into a list structure.
  • Remove the four lines at the top of the file, since their column layout is not consistent with the rest of the file and there is no data needed from them.
  • Convert dates column to yyyy-mm-dd format.
  • Convert amounts from strings to numbers, which includes stripping away the currency symbols.
  • Add the correct GnuCash debit and credit account names for each transaction.

tools.py is the set of Python functions I wrote to peform these and other similar steps with CSV data.

A Work in Progress

My Python tools, and the code created from them, have gone through many changes and re-factoring over the years, so the code I actually use on a regular basis has become, to put it kindly, messy. This repo is my effort to clean up and share the code in a way that will hopefully make it useful to others as well as help me improve my Python skills. I plan to add each tool one by one, starting with read_data(), then include file-specific code that uses the tools.

Installation

Written and tested in Python 3.6 on Windows 10.

Download or clone the repo as is. Included:

  • The file tools.py
  • Tests
  • Folder data, containing test CSV files
  • README.md (this documentation)
  • LICENSE.TXT (MIT license)
  • requirements.txt

Documentation

  • As I add code to the repo, documentation for each function will be added below.

read_data()

This function is the starting point for all of my work with CSV files. It reads a CSV file and returns the data in the form of a list.

Features:

  • Reads a CSV file and returns a list containing rows of data from the file.
  • Accepts both comma and tab delimited files.
  • Handles decoding errors.
  • Warns if a file contains null bytes, which is a sign it may be corrupted.

Example CSV input file:

date,description,amount
6/1/18,Sams Supermarket,$12.34
6/5/18,City Utilities,$26.91
6/11/18,Internet service,$49.99

Example output:

[['date', 'description', 'amount'], 
 ['6/1/18', 'Sams Supermarket', '$12.34'], 
 ['6/5/18', 'City Utilities', '$26.91'], 
 ['6/11/18', 'Internet service', '$49.99']]

Once the data is in a list structure like the above, additional formatting can be performed as needed, depending on the purpose of the data, such as:

  • Convert dates to the format yyyy-mm-dd.
  • Convert amounts to numeric values for use in calculations.
  • Extract only certain columns from the data.

Tests

test_read_data.py

  • Reads all the data files in \data into list structures.
  • Ensures all files are readable.