No description, website, or topics provided.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.gitignore
HOUJIN_BANGOU.ctl
HOUJIN_BANGOU.sql
HOUJIN_BANGOU_VIEW.ctl
Makefile
README.md
adaptor.cpp
clean_all.bat
conf.ini
dl_houjin_code.py

README.md

houjin_bangou

What is this?

A set of programs and parameter files necessary for the National Tax Administration Agency, a Japanese government agency, to download data of "corporate number" released on a monthly basis from the authority's site, apply processing, load it into the Oracle Database table is.

What is value of using this?

You will be released from boring work, because this scrape the download site on your behalf.
Just repeat the seven commands, you will be able to get the latest corporate number table on the DB.

Tell me how to use.

Do it before you start using this.

  1. Prepare a Windows machine.
  2. Install the Firefox browser on it.
  3. Install Microsoft Visual Studio 2013 or later.
  4. Install Python 3 language processor.
    • pip install --upgrade pip
    • pip install selenium
  5. Install Mozilla GeckoDriver.
  6. Add place of python.exe and GeckoDriver into PATH variable.
  7. git clone https://github.com/plumsix/houjin_bangou.git
  8. cd /d path/to/houjin_bangou
  9. nmake EXE="adaptor.exe" OBJS="adaptor.obj" CPPFLAGS="/nologo /EHsc /Zi /O2"
  10. sqlplus user/passwd@alias @HOUJIN_BANGOU.sql

Do it when you want to refresh.

  1. python dl_houjin_code.py
  2. adaptor *all????????.csv
  3. rename 00_houjin_bangou.csv 00_houjin_bangou_zenken.csv
  4. adaptor diff_????????.csv
  5. rename 00_houjin_bangou.csv 00_houjin_bangou_sabun.csv
  6. sqlldr USERID=user/passwd@alias control=HOUJIN_BANGOU DIRECT=Y ROWS=500000
  7. sqlldr USERID=user/passwd@alias control=HOUJIN_BANGOU_VIEW DIRECT=N ROWS=1000 READSIZE=10000000 BINDSIZE=10000000