Skip to content

Commit

Permalink
Accept environment variables to set cache location
Browse files Browse the repository at this point in the history
So users don't have to muck around in their `site-packages` directories
for this.
  • Loading branch information
myersjustinc committed Jun 11, 2018
1 parent 4655416 commit b15ab98
Show file tree
Hide file tree
Showing 2 changed files with 43 additions and 6 deletions.
36 changes: 33 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -247,7 +247,39 @@ Let's use Sutter Health Sacramento Sierra Region's 12/2014 filing, which has an

IRSx ships with a default location to which each xml file is downloaded. But if you're dealing with these files in bulk you may wish to sync specific folders directly, and point irsx' default cache *at that folder*. That way you could download the files in bulk, and then runs irsx without it ever having to download the files, because they were already on disk.

You can do that by setting the local_settings.py file. To figure out where that settings file is, log in to a terminal and type:
### Environment variables ###

You can set the `IRSX_CACHE_DIRECTORY` environment variable in order to control where
IRSx saves and looks for data files. For example, on Linux and OS X, you could
run the following before you run `irsx` or `irsx_index`:

$ export IRSX_CACHE_DIRECTORY=/absolute/path/to/arbitrary/directory/irsx

$ irsx --format=csv 201533089349301428
# XML will end up at /absolute/path/to/arbitrary/directory/irsx/XML/201533089349301428_public.xml

$ irsx_index --year 2017
# CSV will end up at /absolute/path/to/arbitrary/directory/irsx/CSV/index_2017.csv

If you don't like the forced `XML` and `CSV` directories, you can have even more control by setting two other environment variables instead:

* Set `IRSX_WORKING_DIRECTORY` to an absolute path where tax returns' XML files will be stored.
* Set `IRSX_INDEX_DIRECTORY` to an absolute path where yearly indexes' CSV files will be stored.

For example:

$ export IRSX_WORKING_DIRECTORY=/absolute/path/to/working/directory
$ irsx --format=csv 201533089349301428
# XML will end up at /absolute/path/to/working/directory/201533089349301428_public.xml

$ export IRSX_INDEX_DIRECTORY=/absolute/path/to/index/directory
$ irsx_index --year 2017
# CSV will end up at /absolute/path/to/index/directory/index_2017.csv


### Legacy configuration ###

You also can configure IRSx's cache location by setting the local_settings.py file. To figure out where that settings file is, log in to a terminal and type:

>>> from irsx.settings import IRSX_SETTINGS_LOCATION
>>> IRSX_SETTINGS_LOCATION
Expand All @@ -262,8 +294,6 @@ Go to that directory. You can either modify the settings.py file or the local_se

Then edit local_settings.py to set WORKING\_DIRECTORY to where the raw xml files are found.

This piece of configuration is annoying and may change if we can think of a better approach.


## IRSx from python

Expand Down
13 changes: 10 additions & 3 deletions irs_reader/settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,19 @@
# This is the URL to amazon's bucket, could use another synced to it
IRS_XML_HTTP_BASE = "https://s3.amazonaws.com/irs-form-990"

# The directory we put files in while we're processing them
WORKING_DIRECTORY = (os.path.join(IRS_READER_ROOT, "XML"))
# It can be hard to locate this.
IRSX_SETTINGS_LOCATION = (os.path.join(IRS_READER_ROOT, "settings.py"))

# Defaults to the same directory as this settings file, but you can override
# with the `IRSX_CACHE_DIRECTORY` environment variable
IRSX_CACHE_DIRECTORY = os.environ.get("IRSX_CACHE_DIRECTORY", IRS_READER_ROOT)

# The directory we put files in while we're processing them
WORKING_DIRECTORY = os.environ.get(
"IRSX_WORKING_DIRECTORY", os.path.join(IRSX_CACHE_DIRECTORY, "XML"))
# Helpful to keep these around for lookup purposes
INDEX_DIRECTORY = (os.path.join(IRS_READER_ROOT, "CSV"))
INDEX_DIRECTORY = os.environ.get(
"IRSX_INDEX_DIRECTORY", os.path.join(IRSX_CACHE_DIRECTORY, "CSV"))

KNOWN_SCHEDULES = [
"IRS990", "IRS990EZ", "IRS990PF", "IRS990ScheduleA",
Expand Down

0 comments on commit b15ab98

Please sign in to comment.