# TCAD file exploration

We have received files from a client.  They are ....

# Shorten files for browsing

To shorten the files for browsing we can run a short shell script. This opens the zip that was received, and truncates each file at 100 lines long.

```{bash, eval=F}
# rm -rf shortened_appraisal_files
unzip original_data/Appraisal_Roll_History_1990.zip -d shortened_appraisal_files
find shortened_appraisal_files -name "*.TXT" -exec sed -i.full 100q {} \;
find shortened_appraisal_files -name "*.TXT.full" -exec rm {} \;
zip -r shortened_appraisal_files.zip shortened_appraisal_files
```

We can now attempt to load a shortened file using pandas

In [1]:
!pip install openpyxl
import pandas as pd

df = pd.read_csv("shortened_appraisal_files/Appraisal_Roll_History_1990_A/TCBC_SUM_1990_JURIS.TXT", sep = "|")
df.head()

Collecting openpyxl
  Using cached https://files.pythonhosted.org/packages/6a/94/a59521de836ef0da54aaf50da6c4da8fb4072fb3053fa71f052fd9399e7a/openpyxl-3.1.2-py2.py3-none-any.whl
Collecting et-xmlfile (from openpyxl)
  Using cached https://files.pythonhosted.org/packages/96/c2/3dd434b0108730014f1b96fd286040dc3bcb70066346f7e01ec2ac95865f/et_xmlfile-1.1.0-py3-none-any.whl
Installing collected packages: et-xmlfile, openpyxl
[31mCould not install packages due to an EnvironmentError: [Errno 13] Permission denied: '/opt/jupyterhub/pyvenv/lib/python3.8/site-packages/et_xmlfile-1.1.0.dist-info'
Consider using the `--user` option or check the permissions.
[0m
[33mYou are using pip version 19.0, however version 23.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


Unnamed: 0,0000000003,0000,1990,02,0.56950,CI,Unnamed: 6,275,0,2923,...,Unnamed: 17,Unnamed: 18,Unnamed: 19,Unnamed: 20,4098.00,0.00,0.00.1,12.23,11.11,23.34
0,3,0,1990,3,0.409,CO,,275,0,2923,...,,,,,4098.0,0.0,0.0,16.76,0.0,16.76
1,3,0,1990,4,0.0001,CR,,275,0,2923,...,,,,,4098.0,0.0,0.0,0.0,0.0,0.0
2,3,0,1990,8,1.641,SD,,275,0,2923,...,,,Y,,4098.0,0.0,0.0,50.24,17.01,67.25
3,7,0,1990,1,1.266,SD,,25500,0,35000,...,,,Y,,78000.0,0.0,0.0,836.55,150.93,987.48
4,7,0,1990,2,0.5695,CI,,25500,0,35000,...,,,,,78000.0,0.0,0.0,232.75,211.46,444.21


In [2]:
# extract zip folder into a new folder
import zipfile
import os

# zip_file_path = "shortened_appraisal_files.zip"
zip_file_path = "original_data/Appraisal_Roll_History_1990.zip"
extract_folder_path = "data"

# Create the extract folder if it doesn't exist
if not os.path.exists(extract_folder_path):
    os.makedirs(extract_folder_path)

# Open the zip file and extract its contents to the extract folder
with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
    zip_ref.extractall(extract_folder_path)

Challenge now is to use the *.TDF files to create tables.  I can think of two approaches.

1. The TDF files are SQL, so if those are fed to duckdb they should be able to create tables into which the TXT pipe-separated CSV files can be read.  There may be issues with the datatypes not matching (which would require mapping the current datatype definitions to duckdb datatypes by changing the words used to give the datatype to the columns).

2. Take the column names out of the TDF files and add them as the column names while reading the relevant CSV files into duckdb.  This would use duckdb's auto understanding of the column datatypes (so it would run, but it might guess wrongly and truncate or change data).

I think we should explore step 1 first.

## Creating tables using the TDF files

We have TDF files scattered through the \_A and \_B folders.  I have created a schema (a namespace) for the files from \_A called "folder_A" and "folder_B". So there are tables named the same thing in each of the schemas.  You can reference the tables as folder_A.TCBC_SUM_1990_JURIS and folder_B.TCBC_SUM_1990_JURIS 

We can use python to read each TDF file separately, create the table and then try to load the matching TXT file.  A little guidance on how to process a directory structure of files using Path and glob here:
http://howisonlab.github.io/datawrangling/faq.html#get-data-from-filenames

In [3]:
import csv
from pathlib import Path
import duckdb

con = duckdb.connect('duckdb-file.db') #  string to persist to disk
cursor = con.cursor()

# file_directory = 'shortened_appraisal_files/'
file_directory = 'data/'
# limit_to_file = 'TCBC_SUM_1990_JURIS'
limit_to_file = '*' # all files

# create schemas
cursor.execute("CREATE SCHEMA IF NOT EXISTS folder_A_TCBC;")
cursor.execute("CREATE SCHEMA IF NOT EXISTS folder_A_TXBC;")
cursor.execute("CREATE SCHEMA IF NOT EXISTS folder_B_TCBC;")
cursor.execute("CREATE SCHEMA IF NOT EXISTS folder_B_TXBC;")
# delete schemas that created previously
# cursor.execute("DROP SCHEMA IF EXISTS folder_A CASCADE")
# cursor.execute("DROP SCHEMA IF EXISTS folder_B CASCADE")

for filename in Path(file_directory).rglob(limit_to_file + '.TDF'):
    print(filename.parts)
    if "_A" in filename.parts[1] and "TCBC_" in filename.parts[2]:
        schema = "folder_A_TCBC"
    elif "_A" in filename.parts[1] and "TXBC_" in filename.parts[2]:
        schema = "folder_A_TXBC"
    elif "_B" in filename.parts[1] and "TCBC_" in filename.parts[2]:
        schema = "folder_B_TCBC"
    elif "_B" in filename.parts[1] and "TXBC_" in filename.parts[2]:
        schema = "folder_B_TXBC"
    else:
        exit("can't set schema")
    
    table_name = schema + "." + Path(filename).stem # e.g., A_TCBC_SUM_1990_JURIS

    # read .TDF file into string
    create_table_sql = Path(filename).read_text()
    # Need to alter table name to read in both _A and _B files
    create_table_sql = create_table_sql.replace(Path(filename).stem, table_name)
    
    # Here we have the table creation code in a string, so we can
    # swap datatypes out.
    # tried SMALLDATETIME --> DATETIME but was still giving errors
    # will need to fix this later.
    create_table_sql = create_table_sql.replace("SMALLDATETIME", "TEXT")
    create_table_sql = create_table_sql.replace("CREATE TABLE", "CREATE TABLE IF NOT EXISTS")    
    create_table_sql = f"DROP TABLE IF EXISTS {table_name}; "+ create_table_sql
    

    # execute that SQL with duckdb, this should create the table
#     already created table so no need to run
    #print(create_table_sql)
    #exit(1)
    cursor.execute(create_table_sql) 

    # copy CSV into duckdb. CSV is the matching .TXT
    path_to_csvpipefile = Path(filename).with_suffix(".TXT")
    # duckdb copy documentation: https://duckdb.org/docs/sql/statements/copy.html
    query = f"COPY {table_name} FROM '{path_to_csvpipefile}' ( DELIMITER '|')"
    # print(query)
    cursor.execute(query)

('data', 'Appraisal_Roll_History_1990_A', 'TCBC_SUM_1990_GRANT_EXMP.TDF')
('data', 'Appraisal_Roll_History_1990_A', 'TXBC_SUM_1990_JURIS_EXMP.TDF')
('data', 'Appraisal_Roll_History_1990_A', 'TXBC_SUM_1990_USECODE.TDF')
('data', 'Appraisal_Roll_History_1990_A', 'TXBC_SUM_1990_SUSP_INIT.TDF')
('data', 'Appraisal_Roll_History_1990_A', 'TCBC_SUM_1990_SUSP.TDF')
('data', 'Appraisal_Roll_History_1990_A', 'TCBC_SUM_1990_JURIS.TDF')
('data', 'Appraisal_Roll_History_1990_A', 'TXBC_SUM_1990_JURIS.TDF')
('data', 'Appraisal_Roll_History_1990_A', 'TXBC_SUM_1990_SUSP.TDF')
('data', 'Appraisal_Roll_History_1990_A', 'TXBC_SUM_1990.TDF')
('data', 'Appraisal_Roll_History_1990_A', 'TCBC_SUM_1990_SUSP_INIT.TDF')
('data', 'Appraisal_Roll_History_1990_A', 'TCBC_SUM_1990_JURIS_EXMP.TDF')
('data', 'Appraisal_Roll_History_1990_A', 'TXBC_SUM_1990_GRANT_EXMP.TDF')
('data', 'Appraisal_Roll_History_1990_A', 'TCBC_SUM_1990.TDF')
('data', 'Appraisal_Roll_History_1990_A', 'TCBC_SUM_1990_LEGAL.TDF')
('data', 'Appraisa

Create the tables for dbdocs

In [4]:
# set up sql for dbdocs
for filename in Path(file_directory).rglob(limit_to_file + '.TDF'):

    # read .TDF file into string
    dbdocs_create_table = Path(filename).read_text()

    # Remove commas before closing parentheses using regular expressions
    dbdocs_create_table = dbdocs_create_table.replace("),", ")")

    # Replacements for dbdocs
    dbdocs_create_table = dbdocs_create_table.replace("CREATE TABLE", "TABLE")
    dbdocs_create_table = dbdocs_create_table.replace("SMALLDATETIME", "TEXT")
    dbdocs_create_table = dbdocs_create_table.replace(" (", "{ ")
    dbdocs_create_table = dbdocs_create_table.replace(");", " }")
    
    # Print the updated SQL table code
    print(dbdocs_create_table)


TABLE TCBC_SUM_1990_GRANT_EXMP{ 
AcctNum VARCHAR(10)
SufxId VARCHAR(4)
TaxYear VARCHAR(4)
ExemType VARCHAR(1)
ExemNum VARCHAR(1) }

TABLE TXBC_SUM_1990_JURIS_EXMP{ 
Parcel VARCHAR(10)
OwnrId VARCHAR(4)
TaxYear VARCHAR(4)
Juris VARCHAR(2)
ExemType VARCHAR(1)
ExemNum VARCHAR(1)
ExemAmt NUMERIC(11,0) }

TABLE TXBC_SUM_1990_USECODE{ 
UseCode VARCHAR(2)
Description VARCHAR(30)
Category VARCHAR(30) }

TABLE TXBC_SUM_1990_SUSP_INIT{ 
Parcel VARCHAR(10)
OwnrId VARCHAR(4)
TaxYear VARCHAR(4)
ARBInit VARCHAR(3) }

TABLE TCBC_SUM_1990_SUSP{ 
AcctNum VARCHAR(10)
SufxId VARCHAR(4)
TaxYear VARCHAR(4)
InformalDate TEXT,
FormalDate TEXT,
HearingType VARCHAR(1)
HearingOrigType VARCHAR(1)
HearingReasonCode VARCHAR(2)
DocketYear VARCHAR(4)
DocketNum VARCHAR(6)
InformalArea VARCHAR(1)
InformalApprInit VARCHAR(3)
ValApprInit VARCHAR(3)
AgentARBTemp VARCHAR(4)
LateStatus VARCHAR(1)
SuppFlag VARCHAR(1)
HoldFlag VARCHAR(1)
AreaChgFlag VARCHAR(1)
PrintFlag VARCHAR(1)
UseInfoAddrFlag VARCHAR(1)
CtrlAcctNum VARCH

# Data / Files exploration

Interesting finding that two appraisal history of A and B are having all identical data. So what would be the reason that we are receiving the identical datasets?

In [5]:
import os

folder_A = 'data/Appraisal_Roll_History_1990_A'
folder_B = 'data/Appraisal_Roll_History_1990_B'

for filename in os.listdir(folder_A):
    file_A = os.path.join(folder_A, filename)
    file_B = os.path.join(folder_B, filename)

    with open(file_A, "r") as a, open(file_B, "r") as b:
        data_A = a.read()
        data_B = b.read()

    if data_A == data_B:
        print(f"{filename}: Files are identical.")
    else:
        print(f"{filename}: Files are different.")


TCBC_SUM_1990_GRANT_EXMP.TDF: Files are identical.
TXBC_SUM_1990_SUSP.TXT: Files are identical.
TXBC_SUM_1990_SUSP.IDX: Files are identical.
TXBC_SUM_1990_JURIS.TXT: Files are identical.
TXBC_SUM_1990_JURIS.IDX: Files are identical.
TXBC_SUM_1990_JURIS_EXMP.TDF: Files are identical.
TXBC_SUM_1990_USECODE.TDF: Files are identical.
TCBC_SUM_1990_SUSP.IDX: Files are identical.
TCBC_SUM_1990_SUSP.TXT: Files are identical.
TCBC_SUM_1990_JURIS.IDX: Files are identical.
TCBC_SUM_1990_JURIS.TXT: Files are identical.
TXBC_SUM_1990_SUSP_INIT.TDF: Files are identical.
TXBC_SUM_1990_SUSP_INIT.IDX: Files are identical.
TXBC_SUM_1990_SUSP_INIT.TXT: Files are identical.
TCBC_SUM_1990_SUSP.TDF: Files are identical.
TCBC_SUM_1990_JURIS.TDF: Files are identical.
TXBC_SUM_1990_USECODE.TXT: Files are identical.
RPT_TX1333_1990_20040701_162440.TXT: Files are identical.
TXBC_SUM_1990_JURIS.TDF: Files are identical.
TXBC_SUM_1990_JURIS_EXMP.IDX: Files are identical.
TXBC_SUM_1990_JURIS_EXMP.TXT: Files are id

# SQL for analysis

In [6]:
# setup from https://duckdb.org/docs/guides/python/jupyter.html
import duckdb
import pandas as pd
# No need to import duckdb_engine
#  jupysql will auto-detect the driver needed based on the connection string!

# Import jupysql Jupyter extension to create SQL cells
%load_ext sql
%config SqlMagic.autopandas = True
%config SqlMagic.feedback = False
%config SqlMagic.displaycon = False

In [7]:
pd.options.display.max_columns = None

In [8]:
%sql duckdb:///duckdb-file.db

In [9]:
%%sql
SHOW TABLES -- no schema name

Unnamed: 0,name
0,TCBC_SUM_1990
1,TCBC_SUM_1990
2,TCBC_SUM_1990_CFOR
3,TCBC_SUM_1990_CFOR
4,TCBC_SUM_1990_GRANT_EXMP
5,TCBC_SUM_1990_GRANT_EXMP
6,TCBC_SUM_1990_JURIS
7,TCBC_SUM_1990_JURIS
8,TCBC_SUM_1990_JURIS_EXMP
9,TCBC_SUM_1990_JURIS_EXMP


Hey, duckdb implements all the same information schema names as postgres, so one can use the same queries to find the tables with their schaema names.

In [10]:
%%sql
SELECT schemaname AS schema_name, tablename AS table_name
FROM pg_catalog.pg_tables
WHERE schemaname != 'pg_catalog'
AND schemaname != 'information_schema'
ORDER BY schemaname, tablename ASC;

Unnamed: 0,schema_name,table_name
0,folder_A_TCBC,TCBC_SUM_1990
1,folder_A_TCBC,TCBC_SUM_1990_CFOR
2,folder_A_TCBC,TCBC_SUM_1990_GRANT_EXMP
3,folder_A_TCBC,TCBC_SUM_1990_JURIS
4,folder_A_TCBC,TCBC_SUM_1990_JURIS_EXMP
5,folder_A_TCBC,TCBC_SUM_1990_LEGAL
6,folder_A_TCBC,TCBC_SUM_1990_SUSP
7,folder_A_TCBC,TCBC_SUM_1990_SUSP_INIT
8,folder_A_TXBC,TXBC_SUM_1990
9,folder_A_TXBC,TXBC_SUM_1990_CFOR


TCBC_SUM_1990_JURIS - Suppose total of 134933 rows, rows are adding up everytime rerun (fixed now)

JURIS probably means "jurisdiction" which means a legal area.  This makes sense because the columns are about tax rates (and metadata about tax status, like 'freeport').  So possibly this file is a list of jurisdictions to which a parcel can belong (and therefore holds the rates that would apply to the parcel?). It is surprising to have 134,933 different jurisdictions though!

In [11]:
%%sql
SELECT * FROM folder_A_TCBC.TCBC_SUM_1990_JURIS;

Unnamed: 0,AcctNum,SufxId,TaxYear,Juris,Rate,JurisType,JurisCED,MdseVal,FrptVal,FFEVal,VehVal,LVehVal,LeaseVal,OthrVal,TotVal,ExmpVal,BctChgFlag,ExmpStatFlag,JurisPctFlag,FreeportFlag,FreeportStatus,AssessVal,TaxFrzVal,TaxBeforeFrz,GenFundTax,SinkFundTax,TotTax
0,0000000003,0000,1990,02,0.56950,CI,,275,0,2923,900,0,0,0,4098,0,,,,,,4098.00,0.00,0.00,12.23,11.11,23.34
1,0000000003,0000,1990,03,0.40900,CO,,275,0,2923,900,0,0,0,4098,0,,,,,,4098.00,0.00,0.00,16.76,0.00,16.76
2,0000000003,0000,1990,04,0.00010,CR,,275,0,2923,900,0,0,0,4098,0,,,,,,4098.00,0.00,0.00,0.00,0.00,0.00
3,0000000003,0000,1990,08,1.64100,SD,,275,0,2923,900,0,0,0,4098,0,,,,Y,,4098.00,0.00,0.00,50.24,17.01,67.25
4,0000000007,0000,1990,01,1.26600,SD,,25500,0,35000,17500,0,0,0,78000,0,,,,Y,,78000.00,0.00,0.00,836.55,150.93,987.48
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
134928,0000061017,0000,1990,01,1.26600,SD,,0,0,0,100653,0,0,0,100653,0,,,,Y,,100653.00,0.00,0.00,1079.51,194.76,1274.27
134929,0000061017,0000,1990,02,0.56950,CI,,0,0,0,100653,0,0,0,100653,0,,,,,,100653.00,0.00,0.00,300.35,272.87,573.22
134930,0000061017,0000,1990,03,0.40900,CO,,0,0,0,100653,0,0,0,100653,0,,,,,,100653.00,0.00,0.00,411.67,0.00,411.67
134931,0000061017,0000,1990,04,0.00010,CR,,0,0,0,100653,0,0,0,100653,0,,,,,,100653.00,0.00,0.00,0.10,0.00,0.10


The table without a suffix (TCBC_SUM_1990) has only 28,086 rows.  Perhaps these are accounts for individual tax payers, but individual tax payers can have multiple account number.

In [12]:
%%sql
SELECT * FROM folder_A_TCBC.TCBC_SUM_1990;

Unnamed: 0,AcctNum,SufxId,TaxYear,RunDate,KeyCode,LoanCo,LoanNum,ExmpCode,LocStreet,LocHouse,LocFrac,LocAlpha,LocUnit,LocZip,FmtLoc,RendFlag,TotSqft,OthAcctDist,OthAcctNum,AddrSuppressCode,Area,PropType,NOAVPrintCode,AVChangeFlag,SICCode,CPPRRcvdCode,ApprSelect,MarinaAirfield,LinkParcel,PTDCode,PTDComplexFlag,ApprInit,ApprInit2,OwnerName,FirmName,ValSetFlag,ValSetInit,AgentTCAD,AgentARB,AgentCOLL,Zip5,Zip4,Zip2,MailCnt,MailAddr1,MailAddr2,MailAddr3,MailAddr4,MailAddr5,ComboRate
0,0000000003,0000,1990,1992-07-06,,0,,,MO-PAC CI,001004,,,00101,78746,1004 MO-PAC CI 101,,0,,,,P,C,,,,N,,,,L1,,006,,ARCHER JOSEPH C-PRES,A & A REALTY TAX SERVICE INC,,,0010,0010,,78767,0971,,4,A & A REALTY TAX SERVICE,INC,P O BOX 971,AUSTIN TX 78767-0971,,2.61950
1,0000000007,0000,1990,1992-07-06,,0,,,5 ST E,002811,,,,MULTI,2811 5 ST E,,0,,,,P,C,,,,N,,,,L1,,008,,ORTIZ ALFRED,A & J CARPET/JANITORIAL SERVICE INC,,,,,,78744,,,4,A & J CARPET/JANITORIAL,SERVICE INC,4122 TODD LANE,AUSTIN TX 78744,,2.29450
2,0000000014,0000,1990,1992-07-06,,0,,,KENTSHIRE CI,000603,,,,78704,603 KENTSHIRE CI,,0,,,,P,C,,,,Y,,,,L1,,005,,BOUTWELL G. DAVID,A A A COMMERCIAL STRIPING,,,,,,78704,5615,,4,A A A COMMERCIAL,STRIPING,603 KENTSHIRE CIR #B,AUSTIN TX 78704-5615,,2.29450
3,0000000015,0000,1990,1992-07-06,,0,,,BEN WHITE BV E,004818,,,00202,MULTI,4818 BEN WHITE BV E 202,,0,,,,P,C,,,,Y,,,,L1,,008,,SYMANK ERVIN W-PRES,A A A CONSTRUCTION INSPECTIONS INC,,,,,,78759,,,4,A A A CONSTRUCTION,INSPECTIONS INC,8500 NORTH MOPAC #813,AUSTIN TX 78759,,2.08850
4,0000000018,0000,1990,1992-07-06,,0,,,BURNET RD,004402,,,,MULTI,4402 BURNET RD,,0,,,,P,C,,,,N,,,,L1,,002,,LINVILLE HAROLD PRES,A A A FILTER SERVICE CORP,,,,,,78765,4674,,4,A A A FILTER SERVICE,CORP,P O BOX 4674,AUSTIN TX 78765-4674,,2.29450
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
28081,0000060425,0000,1990,1992-07-06,,0,,,RED RIVER ST,000912,,,,MULTI,912 RED RIVER ST,,0,,,,P,C,,,,N,,,,L1,,007,,JOSEPH SALEM,JOSEPH SALEM,,,,,,78703,,,3,JOSEPH SALEM,1500 SCENIC DR #106,AUSTIN TX 78703,,,2.29450
28082,0000060456,0000,1990,1992-07-06,,0,,,WILLIAM CANNON DR W,000414,,,00008,,414 WILLIAM CANNON DR W 8,,0,,,,P,C,,,5992,N,,,,L1,,005,,COX JAMES ARNOLD,FLOWERS BY HAND,,,,,,78745,5664,,3,FLOWERS BY HAND,414 W WILLIAM CANNON #8,AUSTIN TX 78745-5664,,,2.29450
28083,0000060832,0000,1990,1992-07-06,,0,,,AMERICAN DR,003404,,,,78641,3404 AMERICAN DR,,0,,,,P,B,,,,N,,,,L1,,003,,THE PRIME GROUP,THE PRIME GROUP,,,,,,78645,6500,,3,THE PRIME GROUP,3404 AMERICAN DR,LAGO VISTA TX 78645-6500,,,2.61850
28084,0000060999,0000,1990,1992-07-06,,0,,,YAGER LN W,000615,,,,78753,615 YAGER LN W,,0,,,,P,C,,,,Y,,,,L1,,010,,CONCRETE CORING CO INC,CONCRETE CORING CO INC,,,,,,78753,,,4,CONCRETE CORING CO INC,ATTN: MARTHA TURNER,615 YAGER LANE WEST,AUSTIN TX 78753,,1.91400


Skip down to middle data of to look into detail information of the TCBC summary file.

In [13]:
%%sql
SELECT * FROM folder_A_TCBC.TCBC_SUM_1990
LIMIT 100
OFFSET 20000;

Unnamed: 0,AcctNum,SufxId,TaxYear,RunDate,KeyCode,LoanCo,LoanNum,ExmpCode,LocStreet,LocHouse,LocFrac,LocAlpha,LocUnit,LocZip,FmtLoc,RendFlag,TotSqft,OthAcctDist,OthAcctNum,AddrSuppressCode,Area,PropType,NOAVPrintCode,AVChangeFlag,SICCode,CPPRRcvdCode,ApprSelect,MarinaAirfield,LinkParcel,PTDCode,PTDComplexFlag,ApprInit,ApprInit2,OwnerName,FirmName,ValSetFlag,ValSetInit,AgentTCAD,AgentARB,AgentCOLL,Zip5,Zip4,Zip2,MailCnt,MailAddr1,MailAddr2,MailAddr3,MailAddr4,MailAddr5,ComboRate
0,0000046663,0000,1990,1992-07-06,,0,,,SHOAL CREEK BV,008900,,,00103,MULTI,8900 SHOAL CREEK BV 103,,0,,,,P,C,,,7251,N,,,,L1,,003,,TRAVIS CTY SHOE HOSPITAL,AUSTIN SHOE HOSPITAL,,,,,,78758,6840,,4,AUSTIN SHOE HOSPITAL,%TRAVIS CTY SHOE HOSP,8900 SHOAL CREEK BV #103,AUSTIN TX 78758-6840,,2.29450
1,0000046666,0000,1990,1992-07-06,,0,,,LA POSADA DR,001016,,,00174,78752,1016 LA POSADA DR 174,,0,,,,P,C,,,6142,Y,,,,L1,,009,012,T O A CREDIT UNION,T O A CREDIT UNION,,,,,,78752,3895,,4,T O A CREDIT UNION,% MANAGER,1016 LA POSADA DR #174,AUSTIN TX 78752-3895,,2.29450
2,0000046667,0000,1990,1992-07-06,,0,,,TOMANET TR,012412,,,,,12412 TOMANET TR,Y,0,,,,P,C,,,8351,Y,,,,L1,,004,,PIMENTEL RICHARD-PRES,PARMER LANE DAY CARE,,,,,,78758,2412,,3,PARMER LANE DAY CARE,12412 TOMANET TRAIL,AUSTIN TX 78758-2412,,,1.75500
3,0000046668,0000,1990,1992-07-06,,0,,,ANDERSON LN W,001810,,,,MULTI,1810 ANDERSON LN W,,0,,,,P,C,,,5942,N,,,,L1,,009,008,WADSWORTH THOMAS T,BOOK EXCHANGE THE,,,,,,78757,1338,,3,BOOK EXCHANGE THE,1810 WEST ANDERSON LN,AUSTIN TX 78757-1338,,,2.29450
4,0000046680,0000,1990,1992-07-06,,0,,,HIDALGO ST,003411,,,,78702,3411 HIDALGO ST,Y,0,,,,P,C,,,5141,Y,,,,L1,,008,,GRANT LYDICK BEVERAGE CO,SEVEN UP LIKE BOTTLING,,,0011,0011,0011,78220,0243,,4,SEVEN UP LIKE BOTTLING,% GRANT LYDICK BEVERAGE,P O BOX 200243,SAN ANTONIO TX 78220-0243,,2.29450
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,0000046847,0002,1990,1992-07-06,,0,,,CAPITAL OF TX HY N,009020,,,00335,MULTI,9020 CAPITAL OF TX HY N 335,,0,,,,P,C,,,2222,Y,,,,L1,,003,,AMERICAN NETWORK LEASING,AMERICAN NETWORK LEASING,,,,,,75024,,,5,AMERICAN NETWORK LEASING,LEASE #882119,% EDS (S TAX) - PPT,5400 LEGACY DR,PLANO TX 75024,2.29450
96,0000046847,0003,1990,1992-07-06,,0,,,THERMAL DR,013804,,C,,,13804C THERMAL DR,,0,,,,P,C,,,2222,Y,,,,L1,,004,008,AMERICAN NETWORK LEASING,AMERICAN NETWORK LEASING,,,,,,75024,,,5,AMERICAN NETWORK LEASING,LEASE #325,% EDS (S TAX) - PPT,5400 LEGACY DR,PLANO TX 75024,1.91400
97,0000046847,0004,1990,1992-07-06,,0,,,BEE CAVES RD,004015,,A,,MULTI,4015A BEE CAVES RD,,0,,,,P,C,,,2222,Y,,,,L1,,006,008,AMERICAN NETWORK LEASING,AMERICAN NETWORK LEASING,,,,,,75024,,,5,AMERICAN NETWORK LEASING,LEASE # 692,% EDS (S TAX) - PPT,5400 LEGACY DR,PLANO TX 75024,2.35130
98,0000046847,0006,1990,1992-07-06,,0,,,CONGRESS AV S,007110,,B,,MULTI,7110B CONGRESS AV S,,0,,,,P,C,,,2222,Y,,,,L1,,005,008,AMERICAN NETWORK LEASING,AMERICAN NETWORK LEASING,,,,,,75024,,,5,AMERICAN NETWORK LEASING,LEASE #409,% EDS (S TAX) - PPT,5400 LEGACY DR,PLANO TX 75024,2.29450


The table without a suffix (TXBC_SUM_1990) has 255,593 rows.  Perhaps these are accounts for individual tax payers, but individual tax payers can have multiple parcel.

In [14]:
%%sql
SELECT * FROM folder_A_TXBC.TXBC_SUM_1990
LIMIT 100
OFFSET 15000;

Unnamed: 0,Parcel,OwnrId,TaxYear,RunDate,KeyCode,LoanCo,LoanNum,ExmpCode,ExmpLandCode,ExmpImprCode,LocStreet,LocHouse,LocFrac,LocAlpha,LocUnit,LocZip,FmtLoc,RendFlag,TotSqft,AYOC,EYOC,AgUseCode,AgUseMulti,LandAreaCode,LandAreaVal,HmExPctAdj,LandRecCnt,ImprRecCnt,DeedType,DeedVol,DeedPg,DeedDate,DeedDocCode,DeedDocId,MsegCode,MsegGrp,RegionCode,LinkCode,LandGrp,O65QualDate,O65RemoveDate,O65FrzDate,LandCostVal,ImprCostVal,TotCostVal,LandRecordVal,ImprRecordVal,TotRecordVal,OthAcctDist,OthAcctNum,AddrSuppressCode,Area,PropType,NOAVPrintCode,AVChangeFlag,PTDCode,PTDComplexFlag,PTDLandCode,PTDImprCode,LandAdjFlag,PctOwnerFlag,FloorsFlag,PctCompFlag,GradeFlag,DeprAppldFlag,DeckFlag,PoolFlag,UseCode,UseMulti,UseClass,ChgReasonLand,ChgReasonImpr,RefParcel1,RefParcel2,RefParcel3,MohoLabelPfx,MohoLabel1,MohoLabel2,ApprInit,ApprInit2,OwnerName,DBAName,ValSetFlag,ValSetInit,AgentTCAD,AgentARB,AgentCOLL,Zip5,Zip4,Zip2,MailCnt,MailAddr1,MailAddr2,MailAddr3,MailAddr4,MailAddr5,ComboRate
0,0115230349,0001,1990,1992-06-13,,0,,,,,CASTLE RIDGE RD,000803,,,,78746,803 CASTLE RIDGE RD,,2586,,1979,,,,0.000,0.00,0,0,,00000,00000,1900-00-00,,,N0810,,,,,,,,0,0,0,0,0,0,,,,2,R,,,B2,,B2,B2,,,,,,,,,02,,,,,,,,,,,,,CRAIG CARRIE GAIL,,,,,,,78746,5105,,3,CRAIG CARRIE GAIL,803 CASTLE RIDGE ROAD,AUSTIN TX 78746-5105,,,2.16500
1,0115230350,0000,1990,1992-06-13,EX,0,,,,,CASTLE RIDGE RD,,,,,78746,CASTLE RIDGE RD,,0,,,,,,0.000,0.00,0,0,,00000,00000,1900-00-00,,,,,,,,,,,0,0,0,0,0,0,,,,2,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,WESTLAKE HOMEOWNERS,,,,,,,00000,,,2,WESTLAKE HOMEOWNERS,AUSTIN TX 00000,,,,0.00000
2,0115230351,0000,1990,1992-06-13,,0,,,,,CAPITAL OF TX HY S,000000,,,,MULTI,CAPITAL OF TX HY S,,0,,,,,A,1.001,0.00,0,0,,00000,00000,1900-00-00,,,,,,,,,,,0,0,0,0,0,0,,,,2,R,,,C1,,C1,,,,,,,,,,,,,,,,,,,,,,,STEIN GERALD P,,,,0001,0001,0001,77027,9311,,3,STEIN GERALD P,45 BRIARHOLLOW NO 8,HOUSTON TX 77027-9311,,,2.16500
3,0115230352,0000,1990,1992-06-13,,0,,,,,CAPITAL OF TX HY S,000720,,,,MULTI,720 CAPITAL OF TX HY S,,0,,,,,A,4.812,0.00,0,0,,00000,00000,1900-00-00,,,,,,,,,,,0,0,0,0,0,0,,,,2,R,,,C1,,C1,,,,,,,,,,,,,,,,,,,,,,,BEXAR SAVINGS ASSOCIATION,,,,0002,0002,0002,78217,0770,,4,BEXAR SAVINGS,ASSOCIATION,P O BOX 17770,SAN ANTONIO TX 78217-0770,,2.16500
4,0115230353,0000,1990,1992-06-13,,0,,,,,LASCIMAS PKWY,000806,,,,,806 LASCIMAS PKWY,,0,,,,,A,19.289,0.00,0,0,,00000,00000,1900-00-00,,,,,,,,,,,0,0,0,0,0,0,,,,2,R,,,C1,,C1,,,,,,,,,,,,,,,,,,,,,,,BEXAR SAVINGS ASSOCIATION,,,,0002,0002,0002,78217,0770,,4,BEXAR SAVINGS,ASSOCIATION,P O BOX 17770,SAN ANTONIO TX 78217-0770,,2.16500
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,0115280202,0001,1990,1992-06-13,,0,,,,,CANYON RIM DR,000610,,,,78746,610 CANYON RIM DR,,3452,,1962,,,A,0.592,0.00,0,0,,00000,00000,1900-00-00,,,N0870,,,,,,,,0,0,0,0,0,0,,,,2,R,,,A1,,A1,A1,,,,,,,,,01,,,,,,,,,,,,,FEDERAL DEPOSIT INSURANCE CORP,,,,0002,0002,0002,20429,,,3,FDIC,550 17TH STREET N W,WASHINGTON DC 20429,,,2.08000
96,0115280301,0000,1990,1992-06-13,,32,60021016285,,,,CANYON RIM DR,000575,,,,78746,575 CANYON RIM DR,,2021,,1962,,,A,1.140,0.00,0,0,,00000,00000,1900-00-00,,,N0870,,,,,,,,0,0,0,0,0,0,,,,2,R,,,A1,,A1,A1,,,,,,,,,01,,,,,,,,,,,,,HENRY TRENTON B & ANGELA M SMI,,,,,,,78746,,,4,HENRY TRENTON B &,ANGELA M SMITH,575 CANYON RIM DRIVE,AUSTIN TX 78746,,2.08000
97,0115280401,0000,1990,1992-06-13,,0,,,,,WHIPPOORWILL TR,,,,,MULTI,WHIPPOORWILL TR,,0,,,,,A,0.737,0.00,0,0,,00000,00000,1900-00-00,,,N0870,,,,,,,,0,0,0,0,0,0,,,,2,R,,,C1,,C1,,,,,,,,,,,,,,,,,,,,,,,MUELLER DONALD P,,,,,,,78212,0637,,3,MUELLER DONALD P,P O BOX 12637,SAN ANTONIO TX 78212-0637,,,2.08000
98,0115280402,0000,1990,1992-06-13,,32,60071015826,,,,CANYON RIM DR,000545,,,,78746,545 CANYON RIM DR,,2800,,1967,,,A,1.030,0.00,0,0,,00000,00000,1900-00-00,,,N0870,,,,,,,,0,0,0,0,0,0,,,,2,R,,,A1,,A1,A1,,,,,,,,,01,,,,,,,,,,,,,RADEMACHER HAROLD W & PATSY C,,,,,,,78746,5022,,4,RADEMACHER HAROLD W &,PATSY C,545 CANYON RIM DR,AUSTIN TX 78746-5022,,2.08000


In [15]:
%%sql
SELECT * FROM folder_A_TXBC.TXBC_SUM_1990_USECODE;

Unnamed: 0,UseCode,Description,Category
0,01,1 Family Dwelling,Residential
1,02,Duplex,Residential
2,03,Tri‐Plex,Residential
3,04,Four‐Plex,Residential
4,11,MOHO (Mobile Home) Single PP,Residential
...,...,...,...
82,83,Service/Repair Garage,Industrial
83,84,Mini‐Lube/Tune Up,Industrial
84,86,Auto Car Wash,Industrial
85,,,


The unique key is by Parcel and OwnrId

In [16]:
%%sql
SELECT Parcel, OwnrId, COUNT(*) FROM folder_A_TXBC.TXBC_SUM_1990
GROUP BY 1, 2
ORDER BY COUNT(*) DESC;

Unnamed: 0,Parcel,OwnrId,count_star()
0,0436300603,0000,1
1,0436340402,0000,1
2,0438070305,0000,1
3,0438070721,0000,1
4,0438190411,0000,1
...,...,...,...
255588,0233100420,0000,1
255589,0265040406,0000,1
255590,0402231550,0000,1
255591,0415570126,0000,1


Looking for possible column that have relation to the location

In [17]:
%%sql
SELECT * FROM information_schema.columns
WHERE column_name LIKE '%Loc%'
ORDER BY table_schema, table_name;

Unnamed: 0,table_catalog,table_schema,table_name,column_name,ordinal_position,column_default,is_nullable,data_type,character_maximum_length,character_octet_length,numeric_precision,numeric_precision_radix,numeric_scale,datetime_precision,interval_type,interval_precision,character_set_catalog,character_set_schema,character_set_name,collation_catalog,collation_schema,collation_name,domain_catalog,domain_schema,domain_name,udt_catalog,udt_schema,udt_name,scope_catalog,scope_schema,scope_name,maximum_cardinality,dtd_identifier,is_self_referencing,is_identity,identity_generation,identity_start,identity_increment,identity_maximum,identity_minimum,identity_cycle,is_generated,generation_expression,is_updatable
0,duckdb-file,folder_A_TCBC,TCBC_SUM_1990,LocStreet,9,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,duckdb-file,folder_A_TCBC,TCBC_SUM_1990,LocHouse,10,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,duckdb-file,folder_A_TCBC,TCBC_SUM_1990,LocFrac,11,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3,duckdb-file,folder_A_TCBC,TCBC_SUM_1990,LocAlpha,12,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
4,duckdb-file,folder_A_TCBC,TCBC_SUM_1990,LocUnit,13,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
5,duckdb-file,folder_A_TCBC,TCBC_SUM_1990,LocZip,14,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
6,duckdb-file,folder_A_TCBC,TCBC_SUM_1990,FmtLoc,15,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
7,duckdb-file,folder_A_TXBC,TXBC_SUM_1990,LocStreet,11,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
8,duckdb-file,folder_A_TXBC,TXBC_SUM_1990,LocHouse,12,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
9,duckdb-file,folder_A_TXBC,TXBC_SUM_1990,LocFrac,13,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


In [18]:
%%sql
SELECT MailCnt,
       MailAddr1,
       MailAddr2,
       MailAddr3, 
       MailAddr4,
       MailAddr5 
FROM folder_A_TXBC.TXBC_SUM_1990
LIMIT 100
OFFSET 15000;

Unnamed: 0,MailCnt,MailAddr1,MailAddr2,MailAddr3,MailAddr4,MailAddr5
0,3,CRAIG CARRIE GAIL,803 CASTLE RIDGE ROAD,AUSTIN TX 78746-5105,,
1,2,WESTLAKE HOMEOWNERS,AUSTIN TX 00000,,,
2,3,STEIN GERALD P,45 BRIARHOLLOW NO 8,HOUSTON TX 77027-9311,,
3,4,BEXAR SAVINGS,ASSOCIATION,P O BOX 17770,SAN ANTONIO TX 78217-0770,
4,4,BEXAR SAVINGS,ASSOCIATION,P O BOX 17770,SAN ANTONIO TX 78217-0770,
...,...,...,...,...,...,...
95,3,FDIC,550 17TH STREET N W,WASHINGTON DC 20429,,
96,4,HENRY TRENTON B &,ANGELA M SMITH,575 CANYON RIM DRIVE,AUSTIN TX 78746,
97,3,MUELLER DONALD P,P O BOX 12637,SAN ANTONIO TX 78212-0637,,
98,4,RADEMACHER HAROLD W &,PATSY C,545 CANYON RIM DR,AUSTIN TX 78746-5022,


Concentrate looking for the location information that is important in the research. Where the FmtLoc present the full address of the record and the other columns (LocStreet, LocHouse, LocFrac, LocAlpha, LocUnit, and LocZip) are the splited address information. This applies to both TCBC and TXBC records.

While below shows the sample table from folder_A_TCBC schema of TCBC_SUM_1990 file.

In [19]:
%%sql
SELECT FmtLoc, 
       LocStreet, 
       LocHouse, 
       LocFrac, 
       LocAlpha, 
       LocUnit, 
       LocZip 
       FROM folder_A_TCBC.TCBC_SUM_1990;

Unnamed: 0,FmtLoc,LocStreet,LocHouse,LocFrac,LocAlpha,LocUnit,LocZip
0,1004 MO-PAC CI 101,MO-PAC CI,001004,,,00101,78746
1,2811 5 ST E,5 ST E,002811,,,,MULTI
2,603 KENTSHIRE CI,KENTSHIRE CI,000603,,,,78704
3,4818 BEN WHITE BV E 202,BEN WHITE BV E,004818,,,00202,MULTI
4,4402 BURNET RD,BURNET RD,004402,,,,MULTI
...,...,...,...,...,...,...,...
28081,912 RED RIVER ST,RED RIVER ST,000912,,,,MULTI
28082,414 WILLIAM CANNON DR W 8,WILLIAM CANNON DR W,000414,,,00008,
28083,3404 AMERICAN DR,AMERICAN DR,003404,,,,78641
28084,615 YAGER LN W,YAGER LN W,000615,,,,78753


In [20]:
%%sql
SELECT FmtLoc, 
       LocStreet, 
       LocHouse, 
       LocFrac, 
       LocAlpha, 
       LocUnit, 
       LocZip 
       FROM folder_A_TXBC.TXBC_SUM_1990
LIMIT 100
OFFSET 15000;

Unnamed: 0,FmtLoc,LocStreet,LocHouse,LocFrac,LocAlpha,LocUnit,LocZip
0,803 CASTLE RIDGE RD,CASTLE RIDGE RD,000803,,,,78746
1,CASTLE RIDGE RD,CASTLE RIDGE RD,,,,,78746
2,CAPITAL OF TX HY S,CAPITAL OF TX HY S,000000,,,,MULTI
3,720 CAPITAL OF TX HY S,CAPITAL OF TX HY S,000720,,,,MULTI
4,806 LASCIMAS PKWY,LASCIMAS PKWY,000806,,,,
...,...,...,...,...,...,...,...
95,610 CANYON RIM DR,CANYON RIM DR,000610,,,,78746
96,575 CANYON RIM DR,CANYON RIM DR,000575,,,,78746
97,WHIPPOORWILL TR,WHIPPOORWILL TR,,,,,MULTI
98,545 CANYON RIM DR,CANYON RIM DR,000545,,,,78746


Searching for columns that includes 'arcel' for parcel number:

In the TCBC files, the only columns that relate with parcel is the LinkParcel.

Mainly the parcel is located throughout all TXBC type files. 

In [21]:
%%sql
SELECT * FROM information_schema.columns
WHERE column_name LIKE '%arcel%'
ORDER BY table_name;

Unnamed: 0,table_catalog,table_schema,table_name,column_name,ordinal_position,column_default,is_nullable,data_type,character_maximum_length,character_octet_length,numeric_precision,numeric_precision_radix,numeric_scale,datetime_precision,interval_type,interval_precision,character_set_catalog,character_set_schema,character_set_name,collation_catalog,collation_schema,collation_name,domain_catalog,domain_schema,domain_name,udt_catalog,udt_schema,udt_name,scope_catalog,scope_schema,scope_name,maximum_cardinality,dtd_identifier,is_self_referencing,is_identity,identity_generation,identity_start,identity_increment,identity_maximum,identity_minimum,identity_cycle,is_generated,generation_expression,is_updatable
0,duckdb-file,folder_A_TCBC,TCBC_SUM_1990,LinkParcel,29,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,duckdb-file,folder_B_TCBC,TCBC_SUM_1990,LinkParcel,29,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,duckdb-file,folder_A_TXBC,TXBC_SUM_1990,Parcel,1,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3,duckdb-file,folder_B_TXBC,TXBC_SUM_1990,RefParcel3,75,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
4,duckdb-file,folder_A_TXBC,TXBC_SUM_1990,RefParcel1,73,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
5,duckdb-file,folder_A_TXBC,TXBC_SUM_1990,RefParcel2,74,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
6,duckdb-file,folder_A_TXBC,TXBC_SUM_1990,RefParcel3,75,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
7,duckdb-file,folder_B_TXBC,TXBC_SUM_1990,RefParcel2,74,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
8,duckdb-file,folder_B_TXBC,TXBC_SUM_1990,RefParcel1,73,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
9,duckdb-file,folder_B_TXBC,TXBC_SUM_1990,Parcel,1,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


There is no parcel relation in TCBC files knowing the only possible outcome is None from the LinkParcel column. Therefore, the only parcel number is under TXBC files.

In [22]:
%%sql
SELECT DISTINCT LinkParcel FROM folder_A_TCBC.TCBC_SUM_1990;

Unnamed: 0,LinkParcel
0,


Searching columns that have relation with the use: 

TCBC_SUM_1990_SUSP - UseInfoAddrFlag

TXBC_SUM_1990 - AgUseCode, AgUseMulti, UseCode, UseMulti, UseClass

TXBC_SUM_1990_SUSP - UseInfoAddrFlag

In [23]:
%%sql
SELECT * FROM information_schema.columns
WHERE column_name ILIKE '%Use%'
ORDER BY table_name;

Unnamed: 0,table_catalog,table_schema,table_name,column_name,ordinal_position,column_default,is_nullable,data_type,character_maximum_length,character_octet_length,numeric_precision,numeric_precision_radix,numeric_scale,datetime_precision,interval_type,interval_precision,character_set_catalog,character_set_schema,character_set_name,collation_catalog,collation_schema,collation_name,domain_catalog,domain_schema,domain_name,udt_catalog,udt_schema,udt_name,scope_catalog,scope_schema,scope_name,maximum_cardinality,dtd_identifier,is_self_referencing,is_identity,identity_generation,identity_start,identity_increment,identity_maximum,identity_minimum,identity_cycle,is_generated,generation_expression,is_updatable
0,duckdb-file,folder_A_TCBC,TCBC_SUM_1990,LocHouse,10,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,duckdb-file,folder_B_TCBC,TCBC_SUM_1990,LocHouse,10,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,duckdb-file,folder_A_TCBC,TCBC_SUM_1990_SUSP,UseInfoAddrFlag,20,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3,duckdb-file,folder_B_TCBC,TCBC_SUM_1990_SUSP,UseInfoAddrFlag,20,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
4,duckdb-file,folder_A_TXBC,TXBC_SUM_1990,LocHouse,12,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
5,duckdb-file,folder_A_TXBC,TXBC_SUM_1990,AgUseCode,22,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
6,duckdb-file,folder_A_TXBC,TXBC_SUM_1990,AgUseMulti,23,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
7,duckdb-file,folder_A_TXBC,TXBC_SUM_1990,UseCode,68,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
8,duckdb-file,folder_A_TXBC,TXBC_SUM_1990,UseMulti,69,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
9,duckdb-file,folder_A_TXBC,TXBC_SUM_1990,UseClass,70,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


There's no data in all files with suffix of SUSP.

In [24]:
%%sql
SELECT * FROM folder_B_TCBC.TCBC_SUM_1990_SUSP;

In [25]:
%%sql
SELECT * FROM folder_B_TXBC.TXBC_SUM_1990_SUSP;

The important information that is seeking for use is the UseCode in TXBC_SUM_1990 file. Where the use code is a two digit number and it might just a code that match with other information.

In [26]:
%%sql
SELECT AgUseCode, AgUseMulti, UseCode, UseMulti, UseClass FROM folder_A_TXBC.TXBC_SUM_1990;

Unnamed: 0,AgUseCode,AgUseMulti,UseCode,UseMulti,UseClass
0,,,,,
1,,,,,
2,,,,,
3,,,,,
4,,,,,
...,...,...,...,...,...
255588,,,01,,
255589,,,,,
255590,,,,,
255591,,,01,,


In [27]:
%%sql
SELECT UseCode, COUNT(*) FROM folder_B_TXBC.TXBC_SUM_1990
GROUP BY UseCode
ORDER BY COUNT(*) DESC
LIMIT 10;

Unnamed: 0,UseCode,count_star()
0,1.0,133632
1,,76399
2,15.0,10836
3,2.0,10600
4,13.0,3276
5,0.0,3101
6,11.0,2934
7,20.0,1571
8,53.0,1518
9,61.0,1462


In [28]:
%%sql
SELECT UseCode, COUNT(*) FROM folder_A_TXBC.TXBC_SUM_1990
GROUP BY UseCode
ORDER BY COUNT(*) DESC
LIMIT 10;

Unnamed: 0,UseCode,count_star()
0,1.0,133632
1,,76399
2,15.0,10836
3,2.0,10600
4,13.0,3276
5,0.0,3101
6,11.0,2934
7,20.0,1571
8,53.0,1518
9,61.0,1462


Searching for the columns that can possibly find the data of "sq ft":

Both TCBC and TXBC files with no suffix (_SUM_1990) have the column "TotSqft" and may be the data we are searching for. 

In [29]:
%%sql
SELECT * FROM information_schema.columns
WHERE column_name ILIKE '%sq%'
ORDER BY table_name;

Unnamed: 0,table_catalog,table_schema,table_name,column_name,ordinal_position,column_default,is_nullable,data_type,character_maximum_length,character_octet_length,numeric_precision,numeric_precision_radix,numeric_scale,datetime_precision,interval_type,interval_precision,character_set_catalog,character_set_schema,character_set_name,collation_catalog,collation_schema,collation_name,domain_catalog,domain_schema,domain_name,udt_catalog,udt_schema,udt_name,scope_catalog,scope_schema,scope_name,maximum_cardinality,dtd_identifier,is_self_referencing,is_identity,identity_generation,identity_start,identity_increment,identity_maximum,identity_minimum,identity_cycle,is_generated,generation_expression,is_updatable
0,duckdb-file,folder_A_TCBC,TCBC_SUM_1990,TotSqft,17,,YES,"DECIMAL(9,0)",,,9,10,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,duckdb-file,folder_B_TCBC,TCBC_SUM_1990,TotSqft,17,,YES,"DECIMAL(9,0)",,,9,10,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,duckdb-file,folder_A_TXBC,TXBC_SUM_1990,TotSqft,19,,YES,"DECIMAL(9,0)",,,9,10,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3,duckdb-file,folder_B_TXBC,TXBC_SUM_1990,TotSqft,19,,YES,"DECIMAL(9,0)",,,9,10,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


In [30]:
%%sql
SELECT * FROM information_schema.columns
WHERE column_name ILIKE '%area%'
ORDER BY table_name;

Unnamed: 0,table_catalog,table_schema,table_name,column_name,ordinal_position,column_default,is_nullable,data_type,character_maximum_length,character_octet_length,numeric_precision,numeric_precision_radix,numeric_scale,datetime_precision,interval_type,interval_precision,character_set_catalog,character_set_schema,character_set_name,collation_catalog,collation_schema,collation_name,domain_catalog,domain_schema,domain_name,udt_catalog,udt_schema,udt_name,scope_catalog,scope_schema,scope_name,maximum_cardinality,dtd_identifier,is_self_referencing,is_identity,identity_generation,identity_start,identity_increment,identity_maximum,identity_minimum,identity_cycle,is_generated,generation_expression,is_updatable
0,duckdb-file,folder_A_TCBC,TCBC_SUM_1990,Area,21,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,duckdb-file,folder_B_TCBC,TCBC_SUM_1990,Area,21,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,duckdb-file,folder_A_TCBC,TCBC_SUM_1990_SUSP,InformalArea,11,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3,duckdb-file,folder_A_TCBC,TCBC_SUM_1990_SUSP,AreaChgFlag,18,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
4,duckdb-file,folder_B_TCBC,TCBC_SUM_1990_SUSP,InformalArea,11,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
5,duckdb-file,folder_B_TCBC,TCBC_SUM_1990_SUSP,AreaChgFlag,18,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
6,duckdb-file,folder_A_TXBC,TXBC_SUM_1990,LandAreaCode,24,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
7,duckdb-file,folder_A_TXBC,TXBC_SUM_1990,LandAreaVal,25,,YES,"DECIMAL(11,3)",,,11.0,10.0,3.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
8,duckdb-file,folder_A_TXBC,TXBC_SUM_1990,Area,52,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
9,duckdb-file,folder_B_TXBC,TXBC_SUM_1990,LandAreaCode,24,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


In [31]:
%%sql
SELECT LandAreaCode, COUNT(*) FROM folder_A_TXBC.TXBC_SUM_1990
GROUP BY 1
ORDER BY 1 DESC;

Unnamed: 0,LandAreaCode,count_star()
0,,208069
1,A,47524


In [32]:
%%sql
SELECT LandAreaVal, COUNT(*) FROM folder_A_TXBC.TXBC_SUM_1990
GROUP BY 1
ORDER BY 1 DESC;

Unnamed: 0,LandAreaVal,count_star()
0,1704677.000,1
1,1662576.000,1
2,1241431.000,1
3,1135260.000,1
4,797584.000,1
...,...,...
10579,0.004,1
10580,0.003,6
10581,0.002,3
10582,0.001,3


Looking at all the possible values, TCBC file only have value of "0", and TXBC file have 8,866 types of outcomes. Therefore, I assume the record that does not have the square feet are all recorded as "0".

In [33]:
%%sql
SELECT DISTINCT TotSqft FROM folder_A_TCBC.TCBC_SUM_1990;

Unnamed: 0,TotSqft
0,0


In [34]:
%%sql
SELECT DISTINCT TotSqft FROM folder_A_TXBC.TXBC_SUM_1990
ORDER BY TotSqft DESC;

Unnamed: 0,TotSqft
0,2614058
1,1179741
2,953368
3,952989
4,944760
...,...
8861,8
8862,6
8863,5
8864,4


In [35]:
%%sql
SELECT Parcel, COUNT(*) FROM folder_A_TXBC.TXBC_SUM_1990
GROUP BY 1
ORDER BY COUNT(*) DESC;

Unnamed: 0,Parcel,count_star()
0,0167570203,315
1,0262300138,253
2,0252281024,218
3,0201040216,184
4,0151670263,181
...,...,...
236209,0348010107,1
236210,0274201015,1
236211,0278280615,1
236212,0247210707,1


In [36]:
%%sql
SELECT * FROM information_schema.columns

Unnamed: 0,table_catalog,table_schema,table_name,column_name,ordinal_position,column_default,is_nullable,data_type,character_maximum_length,character_octet_length,numeric_precision,numeric_precision_radix,numeric_scale,datetime_precision,interval_type,interval_precision,character_set_catalog,character_set_schema,character_set_name,collation_catalog,collation_schema,collation_name,domain_catalog,domain_schema,domain_name,udt_catalog,udt_schema,udt_name,scope_catalog,scope_schema,scope_name,maximum_cardinality,dtd_identifier,is_self_referencing,is_identity,identity_generation,identity_start,identity_increment,identity_maximum,identity_minimum,identity_cycle,is_generated,generation_expression,is_updatable
0,duckdb-file,folder_A_TCBC,TCBC_SUM_1990_CFOR,AcctNum,1,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,duckdb-file,folder_A_TCBC,TCBC_SUM_1990_CFOR,SufxId,2,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,duckdb-file,folder_A_TCBC,TCBC_SUM_1990_CFOR,TaxYear,3,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3,duckdb-file,folder_A_TCBC,TCBC_SUM_1990_CFOR,CforCode,4,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
4,duckdb-file,folder_A_TCBC,TCBC_SUM_1990_LEGAL,AcctNum,1,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
651,duckdb-file,folder_B_TXBC,TXBC_SUM_1990_SUSP,PrintFlag,19,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
652,duckdb-file,folder_B_TXBC,TXBC_SUM_1990_SUSP,UseInfoAddrFlag,20,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
653,duckdb-file,folder_B_TXBC,TXBC_SUM_1990_SUSP,CtrlParcel,21,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
654,duckdb-file,folder_B_TXBC,TXBC_SUM_1990_SUSP,CtrlOwnrId,22,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


In [37]:
%%sql
SELECT * FROM information_schema.columns
WHERE column_name ILIKE '%deed%'
ORDER BY table_name;

Unnamed: 0,table_catalog,table_schema,table_name,column_name,ordinal_position,column_default,is_nullable,data_type,character_maximum_length,character_octet_length,numeric_precision,numeric_precision_radix,numeric_scale,datetime_precision,interval_type,interval_precision,character_set_catalog,character_set_schema,character_set_name,collation_catalog,collation_schema,collation_name,domain_catalog,domain_schema,domain_name,udt_catalog,udt_schema,udt_name,scope_catalog,scope_schema,scope_name,maximum_cardinality,dtd_identifier,is_self_referencing,is_identity,identity_generation,identity_start,identity_increment,identity_maximum,identity_minimum,identity_cycle,is_generated,generation_expression,is_updatable
0,duckdb-file,folder_A_TXBC,TXBC_SUM_1990,DeedType,29,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,duckdb-file,folder_A_TXBC,TXBC_SUM_1990,DeedVol,30,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,duckdb-file,folder_A_TXBC,TXBC_SUM_1990,DeedPg,31,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3,duckdb-file,folder_A_TXBC,TXBC_SUM_1990,DeedDate,32,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
4,duckdb-file,folder_A_TXBC,TXBC_SUM_1990,DeedDocCode,33,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
5,duckdb-file,folder_A_TXBC,TXBC_SUM_1990,DeedDocId,34,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
6,duckdb-file,folder_B_TXBC,TXBC_SUM_1990,DeedType,29,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
7,duckdb-file,folder_B_TXBC,TXBC_SUM_1990,DeedVol,30,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
8,duckdb-file,folder_B_TXBC,TXBC_SUM_1990,DeedPg,31,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
9,duckdb-file,folder_B_TXBC,TXBC_SUM_1990,DeedDate,32,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


In [38]:
%%sql
SELECT DeedDate, COUNT(*) FROM folder_A_TXBC.TXBC_SUM_1990
GROUP BY 1
ORDER BY COUNT(*) DESC;

Unnamed: 0,DeedDate,count_star()
0,1900-00-00,255593


In [39]:
%%sql
SELECT DeedType, COUNT(*) FROM folder_A_TXBC.TXBC_SUM_1990
GROUP BY 1
ORDER BY COUNT(*) DESC;

Unnamed: 0,DeedType,count_star()
0,,255593


In [40]:
%%sql
SELECT DeedVol, COUNT(*) FROM folder_A_TXBC.TXBC_SUM_1990
GROUP BY 1
ORDER BY COUNT(*) DESC;

Unnamed: 0,DeedVol,count_star()
0,0,255593


In [41]:
%%sql
SELECT DeedPg, COUNT(*) FROM folder_A_TXBC.TXBC_SUM_1990
GROUP BY 1
ORDER BY COUNT(*) DESC;

Unnamed: 0,DeedPg,count_star()
0,0,255593


In [42]:
%%sql
SELECT DeedDocCode, COUNT(*) FROM folder_A_TXBC.TXBC_SUM_1990
GROUP BY 1
ORDER BY COUNT(*) DESC;

Unnamed: 0,DeedDocCode,count_star()
0,,255593


In [43]:
%%sql
SELECT * FROM information_schema.columns
WHERE column_name ILIKE '%ownername%'
ORDER BY table_name;

Unnamed: 0,table_catalog,table_schema,table_name,column_name,ordinal_position,column_default,is_nullable,data_type,character_maximum_length,character_octet_length,numeric_precision,numeric_precision_radix,numeric_scale,datetime_precision,interval_type,interval_precision,character_set_catalog,character_set_schema,character_set_name,collation_catalog,collation_schema,collation_name,domain_catalog,domain_schema,domain_name,udt_catalog,udt_schema,udt_name,scope_catalog,scope_schema,scope_name,maximum_cardinality,dtd_identifier,is_self_referencing,is_identity,identity_generation,identity_start,identity_increment,identity_maximum,identity_minimum,identity_cycle,is_generated,generation_expression,is_updatable
0,duckdb-file,folder_A_TCBC,TCBC_SUM_1990,OwnerName,34,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,duckdb-file,folder_B_TCBC,TCBC_SUM_1990,OwnerName,34,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,duckdb-file,folder_A_TXBC,TXBC_SUM_1990,OwnerName,81,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3,duckdb-file,folder_B_TXBC,TXBC_SUM_1990,OwnerName,81,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


In [44]:
%%sql
SELECT OwnerName FROM folder_A_TXBC.TXBC_SUM_1990;

Unnamed: 0,OwnerName
0,CITY OF AUSTIN
1,CITY OF AUSTIN
2,CITY OF AUSTIN
3,CITY OF AUSTIN
4,CITY OF AUSTIN
...,...
255588,DAUGHERTY EDGAR S
255589,ONTIBEROS LEROY A (VLB)
255590,SPIRES ALBERT B JR
255591,GOETZ WILLIAM T


In [45]:
%%sql
SELECT OwnerName FROM folder_A_TCBC.TCBC_SUM_1990;

Unnamed: 0,OwnerName
0,ARCHER JOSEPH C-PRES
1,ORTIZ ALFRED
2,BOUTWELL G. DAVID
3,SYMANK ERVIN W-PRES
4,LINVILLE HAROLD PRES
...,...
28081,JOSEPH SALEM
28082,COX JAMES ARNOLD
28083,THE PRIME GROUP
28084,CONCRETE CORING CO INC


In [46]:
%%sql
SELECT * FROM information_schema.columns
WHERE column_name ILIKE '%eyoc%'
ORDER BY table_name;

Unnamed: 0,table_catalog,table_schema,table_name,column_name,ordinal_position,column_default,is_nullable,data_type,character_maximum_length,character_octet_length,numeric_precision,numeric_precision_radix,numeric_scale,datetime_precision,interval_type,interval_precision,character_set_catalog,character_set_schema,character_set_name,collation_catalog,collation_schema,collation_name,domain_catalog,domain_schema,domain_name,udt_catalog,udt_schema,udt_name,scope_catalog,scope_schema,scope_name,maximum_cardinality,dtd_identifier,is_self_referencing,is_identity,identity_generation,identity_start,identity_increment,identity_maximum,identity_minimum,identity_cycle,is_generated,generation_expression,is_updatable
0,duckdb-file,folder_A_TXBC,TXBC_SUM_1990,EYOC,21,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,duckdb-file,folder_B_TXBC,TXBC_SUM_1990,EYOC,21,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


EYOC is effective year of construction, which could be the effective date built. But looking at the table below, there are some confusion or uncertainties. 

1. There is 'None' and '0000' as the possible outcome, what would be the differences?
2. One of the record is 2971 and another one is 0984, which this didn't make sense. Is there possible typo in this record?

In [47]:
%%sql
SELECT EYOC, COUNT(*) FROM folder_A_TXBC.TXBC_SUM_1990
GROUP BY 1
ORDER BY EYOC DESC;

Unnamed: 0,EYOC,count_star()
0,,76399
1,2971,1
2,1990,12
3,1989,1817
4,1988,2005
...,...,...
103,1076,1
104,1075,1
105,1072,1
106,0984,1


In [48]:
%%sql
SELECT DocketYear FROM folder_A_TXBC.TXBC_SUM_1990_SUSP
GROUP BY 1
ORDER BY COUNT(*) DESC;

In [49]:
%%sql
SELECT * FROM information_schema.columns
WHERE column_name ILIKE '%legal%'
ORDER BY table_name;

Unnamed: 0,table_catalog,table_schema,table_name,column_name,ordinal_position,column_default,is_nullable,data_type,character_maximum_length,character_octet_length,numeric_precision,numeric_precision_radix,numeric_scale,datetime_precision,interval_type,interval_precision,character_set_catalog,character_set_schema,character_set_name,collation_catalog,collation_schema,collation_name,domain_catalog,domain_schema,domain_name,udt_catalog,udt_schema,udt_name,scope_catalog,scope_schema,scope_name,maximum_cardinality,dtd_identifier,is_self_referencing,is_identity,identity_generation,identity_start,identity_increment,identity_maximum,identity_minimum,identity_cycle,is_generated,generation_expression,is_updatable
0,duckdb-file,folder_A_TCBC,TCBC_SUM_1990_LEGAL,LegalSub2,5,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,duckdb-file,folder_B_TCBC,TCBC_SUM_1990_LEGAL,LegalSub1,4,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,duckdb-file,folder_B_TCBC,TCBC_SUM_1990_LEGAL,LegalCd1,6,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3,duckdb-file,folder_B_TCBC,TCBC_SUM_1990_LEGAL,LegalLn1,7,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
4,duckdb-file,folder_B_TCBC,TCBC_SUM_1990_LEGAL,LegalCd2,8,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
5,duckdb-file,folder_B_TCBC,TCBC_SUM_1990_LEGAL,LegalLn2,9,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
6,duckdb-file,folder_B_TCBC,TCBC_SUM_1990_LEGAL,LegalCd3,10,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
7,duckdb-file,folder_B_TCBC,TCBC_SUM_1990_LEGAL,LegalLn3,11,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
8,duckdb-file,folder_B_TCBC,TCBC_SUM_1990_LEGAL,LegalCd4,12,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
9,duckdb-file,folder_B_TCBC,TCBC_SUM_1990_LEGAL,LegalLn4,13,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


In [50]:
%%sql
SELECT * FROM folder_A_TXBC.TXBC_SUM_1990_LEGAL
LIMIT 100
OFFSET 15000;

Unnamed: 0,Parcel,OwnrId,TaxYear,LegalSub1,LegalSub2,LegalCd1,LegalLn1,LegalCd2,LegalLn2,LegalCd3,LegalLn3,LegalCd4,LegalLn4,LegalCd5,LegalLn5,LegalCd6,LegalLn6
0,0115230349,0001,1990,CAMELOT SEC 2 PHS 2,,*,50% OF,D,LOT 20,,,,,,,,
1,0115230350,0000,1990,WESTLAKE CONDOMINIUMS,,*,COMMON AREA,,,,,,,,,,
2,0115230351,0000,1990,CEDAR CHOPPERS CORNER,,D,LOT 1,,,,,,,,,,
3,0115230352,0000,1990,LAS CIMAS OFFICE PARK,,D,LOT 1 BLK B,,,,,,,,,,
4,0115230353,0000,1990,LAS CIMAS OFFICE PARK,,D,LOT 5 BLK A *AND,D,LOT 2 BLK B,*,(COMMON AREA),,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,0115280202,0001,1990,CAMELOT (UNRECORDED),,*,1/2 INTEREST IN,D,LOT 1 BLK D *(.592AC),,,,,,,,
96,0115280301,0000,1990,CAMELOT (UNRECORDED),,D,LOT 1 BLK E *(1.14 ACR),,,,,,,,,,
97,0115280401,0000,1990,CAMELOT (UNRECORDED),MINOR D,*,.737 ACR PT OF,D,LOT 6 BLK A,D,ABS 515 SUR 416,,,,,,
98,0115280402,0000,1990,CAMELOT (UNRECORDED),,D,LOT 2 BLK F *(1.03 ACR),,,,,,,,,,


In [51]:
%%sql
SELECT DISTINCT LegalLn4 FROM folder_A_TXBC.TXBC_SUM_1990_LEGAL;


Unnamed: 0,LegalLn4
0,
1,IMPS ONLY LOT 3 BLK B
2,IMPS ONLY
3,LOT 2
4,LOT 5
...,...
1666,(12)
1667,OLT 7 DIV Z
1668,LOT 34 BLK C
1669,LOT H-13


In [52]:
%%sql
SELECT * FROM information_schema.columns
WHERE column_name ILIKE '%appr%'
ORDER BY table_name;

Unnamed: 0,table_catalog,table_schema,table_name,column_name,ordinal_position,column_default,is_nullable,data_type,character_maximum_length,character_octet_length,numeric_precision,numeric_precision_radix,numeric_scale,datetime_precision,interval_type,interval_precision,character_set_catalog,character_set_schema,character_set_name,collation_catalog,collation_schema,collation_name,domain_catalog,domain_schema,domain_name,udt_catalog,udt_schema,udt_name,scope_catalog,scope_schema,scope_name,maximum_cardinality,dtd_identifier,is_self_referencing,is_identity,identity_generation,identity_start,identity_increment,identity_maximum,identity_minimum,identity_cycle,is_generated,generation_expression,is_updatable
0,duckdb-file,folder_A_TCBC,TCBC_SUM_1990,ApprSelect,27,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,duckdb-file,folder_A_TCBC,TCBC_SUM_1990,ApprInit,32,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,duckdb-file,folder_A_TCBC,TCBC_SUM_1990,ApprInit2,33,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3,duckdb-file,folder_B_TCBC,TCBC_SUM_1990,ApprSelect,27,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
4,duckdb-file,folder_B_TCBC,TCBC_SUM_1990,ApprInit,32,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
5,duckdb-file,folder_B_TCBC,TCBC_SUM_1990,ApprInit2,33,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
6,duckdb-file,folder_A_TCBC,TCBC_SUM_1990_SUSP,InformalApprInit,12,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
7,duckdb-file,folder_A_TCBC,TCBC_SUM_1990_SUSP,ValApprInit,13,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
8,duckdb-file,folder_B_TCBC,TCBC_SUM_1990_SUSP,InformalApprInit,12,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
9,duckdb-file,folder_B_TCBC,TCBC_SUM_1990_SUSP,ValApprInit,13,,YES,VARCHAR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


In [53]:
%%sql
SELECT * FROM folder_A_TXBC.TXBC_SUM_1990_JURIS
LIMIT 100
OFFSET 15000;

Unnamed: 0,Parcel,OwnrId,TaxYear,Juris,Rate,JurisType,JurisCED,AgHistFlag,LandMktVal,ImprMktVal,TotMktVal,LandApprdVal,ImprApprdVal,TotApprdVal,LandExmpVal,ImprExmpVal,TotExmpVal,LandActVal,ImprActVal,TotActVal,BctChgFlag,ExmpStatFlag,JurisPctFlag,FrzFlag,TenPctCapFlag,AssessVal,TaxFrzVal,TaxBeforeFrz,GenFundTax,SinkFundTax,TotTax
0,0102570605,0000,1990,68,0.05000,CC,,,51870,63347,115217,51870,63347,115217,0,0,0,0,0,0,,,,,,103695.00,0.00,51.85,51.85,0.00,51.85
1,0102570606,0000,1990,01,1.26600,SD,,,38010,92785,130795,38010,92785,130795,0,0,0,0,0,0,,,,,,125795.00,0.00,1592.56,1592.56,0.00,1592.56
2,0102570606,0000,1990,03,0.40900,CO,,,38010,92785,130795,38010,92785,130795,0,0,0,0,0,0,,,,,,104636.00,0.00,427.96,427.96,0.00,427.96
3,0102570606,0000,1990,04,0.00010,CR,,,38010,92785,130795,38010,92785,130795,0,0,0,0,0,0,,,,,,130795.00,0.00,0.13,0.13,0.00,0.13
4,0102570606,0000,1990,14,0.21000,WC,,,38010,92785,130795,38010,92785,130795,0,0,0,0,0,0,,,,,,130795.00,0.00,274.67,274.67,0.00,274.67
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,0102570621,0000,1990,53,0.03000,FP,,,34440,74761,109201,34440,74761,109201,0,0,0,0,0,0,,,,,,104201.00,0.00,31.26,31.26,0.00,31.26
96,0102570621,0000,1990,68,0.05000,CC,,,34440,74761,109201,34440,74761,109201,0,0,0,0,0,0,,,,,,98281.00,0.00,49.14,49.14,0.00,49.14
97,0102570623,0000,1990,01,1.26600,SD,,,16120,0,16120,16120,0,16120,0,0,0,0,0,0,,,,,,16120.00,0.00,204.08,204.08,0.00,204.08
98,0102570623,0000,1990,03,0.40900,CO,,,16120,0,16120,16120,0,16120,0,0,0,0,0,0,,,,,,16120.00,0.00,65.93,65.93,0.00,65.93


In [54]:
%%sql
SELECT LandApprdVal, ImprApprdVal FROM folder_A_TXBC.TXBC_SUM_1990_JURIS
LIMIT 100
OFFSET 15000;

Unnamed: 0,LandApprdVal,ImprApprdVal
0,51870,63347
1,38010,92785
2,38010,92785
3,38010,92785
4,38010,92785
...,...,...
95,34440,74761
96,34440,74761
97,16120,0
98,16120,0



# Create deliverable table

In [55]:
%%sql
SELECT folder_A_TXBC.TXBC_SUM_1990.Parcel,
       folder_A_TXBC.TXBC_SUM_1990.OwnrId,
       folder_A_TXBC.TXBC_SUM_1990_USECODE.UseCode,
       folder_A_TXBC.TXBC_SUM_1990_USECODE.Description AS Use_description,
       folder_A_TXBC.TXBC_SUM_1990_USECODE.Category AS Use_category,
       folder_A_TXBC.TXBC_SUM_1990.TotSqft, 
       folder_A_TXBC.TXBC_SUM_1990.EYOC AS Effect_date_built, 
       folder_A_TXBC.TXBC_SUM_1990.DeedDate, 
       folder_A_TXBC.TXBC_SUM_1990.DeedVol, 
       folder_A_TXBC.TXBC_SUM_1990.DeedPg,
       folder_A_TXBC.TXBC_SUM_1990_JURIS.LandApprdVal,
       folder_A_TXBC.TXBC_SUM_1990_JURIS.ImprApprdVal,
       folder_A_TXBC.TXBC_SUM_1990.OwnerName,
       folder_A_TXBC.TXBC_SUM_1990.MailCnt,
       folder_A_TXBC.TXBC_SUM_1990.MailAddr1,
       folder_A_TXBC.TXBC_SUM_1990.MailAddr2,
       folder_A_TXBC.TXBC_SUM_1990.MailAddr3, 
       folder_A_TXBC.TXBC_SUM_1990.MailAddr4,
       folder_A_TXBC.TXBC_SUM_1990.MailAddr5,
       folder_A_TXBC.TXBC_SUM_1990.FmtLoc AS Location,
       folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalSub1,
       folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalSub2,
       folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalCd1,
       folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalLn1,
       folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalCd2,
       folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalLn2, 
       folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalCd3,
       folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalLn3,
       folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalCd4,
       folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalLn4, 
       folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalCd5,
       folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalLn5,
       folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalCd6,
       folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalLn6 
    FROM folder_A_TXBC.TXBC_SUM_1990
JOIN folder_A_TXBC.TXBC_SUM_1990_JURIS ON 
    folder_A_TXBC.TXBC_SUM_1990.Parcel = folder_A_TXBC.TXBC_SUM_1990_JURIS.Parcel AND
    folder_A_TXBC.TXBC_SUM_1990.OwnrId = folder_A_TXBC.TXBC_SUM_1990_JURIS.OwnrId 
JOIN folder_A_TXBC.TXBC_SUM_1990_LEGAL ON 
    folder_A_TXBC.TXBC_SUM_1990.Parcel = folder_A_TXBC.TXBC_SUM_1990_LEGAL.Parcel AND
    folder_A_TXBC.TXBC_SUM_1990.OwnrId = folder_A_TXBC.TXBC_SUM_1990_LEGAL.OwnrId
JOIN folder_A_TXBC.TXBC_SUM_1990_USECODE ON 
    folder_A_TXBC.TXBC_SUM_1990.UseCode = folder_A_TXBC.TXBC_SUM_1990_USECODE.UseCode;

Unnamed: 0,Parcel,OwnrId,UseCode,Use_description,Use_category,TotSqft,Effect_date_built,DeedDate,DeedVol,DeedPg,LandApprdVal,ImprApprdVal,OwnerName,MailCnt,MailAddr1,MailAddr2,MailAddr3,MailAddr4,MailAddr5,Location,LegalSub1,LegalSub2,LegalCd1,LegalLn1,LegalCd2,LegalLn2,LegalCd3,LegalLn3,LegalCd4,LegalLn4,LegalCd5,LegalLn5,LegalCd6,LegalLn6
0,0100000006,0001,61,Warehouse < 20K SF,Industrial,4846,1965,1900-00-00,00000,00000,0,69140,SOUTHERN PACIFIC TRANSPORTATIO,4,SOUTHERN PACIFIC,TRANSPORTATION CO,P O BOX 1319,HOUSTON TX 77007-1319,,,SOUTHERN PACIFIC RAILROAD,,*,IMPROVEMENTS ONLY IN,*,TRANSPORTATION CORRIDOR,,,,,,,,
1,0100000006,0001,61,Warehouse < 20K SF,Industrial,4846,1965,1900-00-00,00000,00000,0,69140,SOUTHERN PACIFIC TRANSPORTATIO,4,SOUTHERN PACIFIC,TRANSPORTATION CO,P O BOX 1319,HOUSTON TX 77007-1319,,,SOUTHERN PACIFIC RAILROAD,,*,IMPROVEMENTS ONLY IN,*,TRANSPORTATION CORRIDOR,,,,,,,,
2,0100000006,0001,61,Warehouse < 20K SF,Industrial,4846,1965,1900-00-00,00000,00000,0,69140,SOUTHERN PACIFIC TRANSPORTATIO,4,SOUTHERN PACIFIC,TRANSPORTATION CO,P O BOX 1319,HOUSTON TX 77007-1319,,,SOUTHERN PACIFIC RAILROAD,,*,IMPROVEMENTS ONLY IN,*,TRANSPORTATION CORRIDOR,,,,,,,,
3,0100000006,0001,61,Warehouse < 20K SF,Industrial,4846,1965,1900-00-00,00000,00000,0,69140,SOUTHERN PACIFIC TRANSPORTATIO,4,SOUTHERN PACIFIC,TRANSPORTATION CO,P O BOX 1319,HOUSTON TX 77007-1319,,,SOUTHERN PACIFIC RAILROAD,,*,IMPROVEMENTS ONLY IN,*,TRANSPORTATION CORRIDOR,,,,,,,,
4,0100000006,0001,61,Warehouse < 20K SF,Industrial,4846,1965,1900-00-00,00000,00000,0,69140,SOUTHERN PACIFIC TRANSPORTATIO,4,SOUTHERN PACIFIC,TRANSPORTATION CO,P O BOX 1319,HOUSTON TX 77007-1319,,,SOUTHERN PACIFIC RAILROAD,,*,IMPROVEMENTS ONLY IN,*,TRANSPORTATION CORRIDOR,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
871733,0595150119,0000,01,1 Family Dwelling,Residential,620,1972,1900-00-00,00000,00000,3000,16507,JOINER MARY REBECCA,3,JOINER MARY REBECCA,752 BRACERA ROAD,ENCINITAS CA 92024-3810,,,SHAW DR,PARADISE MANOR SEC 1,,D,LOT 8 BLK C,,,,,,,,,,
871734,0595150119,0000,01,1 Family Dwelling,Residential,620,1972,1900-00-00,00000,00000,3000,16507,JOINER MARY REBECCA,3,JOINER MARY REBECCA,752 BRACERA ROAD,ENCINITAS CA 92024-3810,,,SHAW DR,PARADISE MANOR SEC 1,,D,LOT 8 BLK C,,,,,,,,,,
871735,0651090302,0000,01,1 Family Dwelling,Residential,1162,1939,1900-00-00,00000,00000,1500,31497,CARLSON MARY PETERSON,3,CARLSON MARY PETERSON,RT 4 BOX 75,ELGIN TX 78621-9647,,,COUNTY LINE RD,MARTIN H,,D,ABS 518 SUR 65,D,ACR 1.00,,,,,,,,
871736,0651090302,0000,01,1 Family Dwelling,Residential,1162,1939,1900-00-00,00000,00000,1500,31497,CARLSON MARY PETERSON,3,CARLSON MARY PETERSON,RT 4 BOX 75,ELGIN TX 78621-9647,,,COUNTY LINE RD,MARTIN H,,D,ABS 518 SUR 65,D,ACR 1.00,,,,,,,,


In [56]:
%%sql
SELECT folder_A_TXBC.TXBC_SUM_1990.Parcel,
       folder_A_TXBC.TXBC_SUM_1990.OwnrId,
       folder_A_TXBC.TXBC_SUM_1990_USECODE.UseCode,
       folder_A_TXBC.TXBC_SUM_1990_USECODE.Description AS Use_description,
       folder_A_TXBC.TXBC_SUM_1990_USECODE.Category AS Use_category,
       folder_A_TXBC.TXBC_SUM_1990.TotSqft, 
       folder_A_TXBC.TXBC_SUM_1990.EYOC AS Effect_date_built, 
       folder_A_TXBC.TXBC_SUM_1990.DeedDate, 
       folder_A_TXBC.TXBC_SUM_1990.DeedVol, 
       folder_A_TXBC.TXBC_SUM_1990.DeedPg,
       folder_A_TXBC.TXBC_SUM_1990_JURIS.LandApprdVal,
       folder_A_TXBC.TXBC_SUM_1990_JURIS.ImprApprdVal,
       folder_A_TXBC.TXBC_SUM_1990.OwnerName,
       folder_A_TXBC.TXBC_SUM_1990.MailCnt,
       folder_A_TXBC.TXBC_SUM_1990.MailAddr1,
       folder_A_TXBC.TXBC_SUM_1990.MailAddr2,
       folder_A_TXBC.TXBC_SUM_1990.MailAddr3, 
       folder_A_TXBC.TXBC_SUM_1990.MailAddr4,
       folder_A_TXBC.TXBC_SUM_1990.MailAddr5,
       folder_A_TXBC.TXBC_SUM_1990.FmtLoc AS Location,
       folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalSub1,
       folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalSub2,
       folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalCd1,
       folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalLn1,
       folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalCd2,
       folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalLn2, 
       folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalCd3,
       folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalLn3,
       folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalCd4,
       folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalLn4, 
       folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalCd5,
       folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalLn5,
       folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalCd6,
       folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalLn6 
    FROM folder_A_TXBC.TXBC_SUM_1990
JOIN folder_A_TXBC.TXBC_SUM_1990_JURIS ON 
    folder_A_TXBC.TXBC_SUM_1990.Parcel = folder_A_TXBC.TXBC_SUM_1990_JURIS.Parcel AND
    folder_A_TXBC.TXBC_SUM_1990.OwnrId = folder_A_TXBC.TXBC_SUM_1990_JURIS.OwnrId 
JOIN folder_A_TXBC.TXBC_SUM_1990_LEGAL ON 
    folder_A_TXBC.TXBC_SUM_1990.Parcel = folder_A_TXBC.TXBC_SUM_1990_LEGAL.Parcel AND
    folder_A_TXBC.TXBC_SUM_1990.OwnrId = folder_A_TXBC.TXBC_SUM_1990_LEGAL.OwnrId 
JOIN folder_A_TXBC.TXBC_SUM_1990_USECODE ON 
    folder_A_TXBC.TXBC_SUM_1990.UseCode = folder_A_TXBC.TXBC_SUM_1990_USECODE.UseCode
LIMIT 100
OFFSET 150000;

Unnamed: 0,Parcel,OwnrId,UseCode,Use_description,Use_category,TotSqft,Effect_date_built,DeedDate,DeedVol,DeedPg,LandApprdVal,ImprApprdVal,OwnerName,MailCnt,MailAddr1,MailAddr2,MailAddr3,MailAddr4,MailAddr5,Location,LegalSub1,LegalSub2,LegalCd1,LegalLn1,LegalCd2,LegalLn2,LegalCd3,LegalLn3,LegalCd4,LegalLn4,LegalCd5,LegalLn5,LegalCd6,LegalLn6
0,0145120865,0041,15,Condo (Stacked),Residential,1502,1983,1900-00-00,00000,00000,11467,52876,CONYNGHAM JIM & KAREN CONYNGHA,4,CONYNGHAM JIM &,KAREN CONYNGHAM,7403 NEWHALL LANE,AUSTIN TX 78746-4115,,7205L WALDON DR 204,LAKEWOOD CONDOMINIUMS AMENDEDTHE,,C,UNT 204 BLD L,*,PLUS 1.6482% INTEREST IN,*,COMMON AREA,,,,,,
1,0145120865,0041,15,Condo (Stacked),Residential,1502,1983,1900-00-00,00000,00000,11467,52876,CONYNGHAM JIM & KAREN CONYNGHA,4,CONYNGHAM JIM &,KAREN CONYNGHAM,7403 NEWHALL LANE,AUSTIN TX 78746-4115,,7205L WALDON DR 204,LAKEWOOD CONDOMINIUMS AMENDEDTHE,,C,UNT 204 BLD L,*,PLUS 1.6482% INTEREST IN,*,COMMON AREA,,,,,,
2,0145120865,0041,15,Condo (Stacked),Residential,1502,1983,1900-00-00,00000,00000,11467,52876,CONYNGHAM JIM & KAREN CONYNGHA,4,CONYNGHAM JIM &,KAREN CONYNGHAM,7403 NEWHALL LANE,AUSTIN TX 78746-4115,,7205L WALDON DR 204,LAKEWOOD CONDOMINIUMS AMENDEDTHE,,C,UNT 204 BLD L,*,PLUS 1.6482% INTEREST IN,*,COMMON AREA,,,,,,
3,0145120865,0041,15,Condo (Stacked),Residential,1502,1983,1900-00-00,00000,00000,11467,52876,CONYNGHAM JIM & KAREN CONYNGHA,4,CONYNGHAM JIM &,KAREN CONYNGHAM,7403 NEWHALL LANE,AUSTIN TX 78746-4115,,7205L WALDON DR 204,LAKEWOOD CONDOMINIUMS AMENDEDTHE,,C,UNT 204 BLD L,*,PLUS 1.6482% INTEREST IN,*,COMMON AREA,,,,,,
4,0145120865,0042,15,Condo (Stacked),Residential,1707,1983,1900-00-00,00000,00000,12995,53469,MARTIN JOHN G ETUX,3,MARTIN JOHN G ETUX,800 LEWISTON DR,SAN JOSE CA 95136-1516,,,7205M WALDON DR 205,LAKEWOOD CONDOMINIUMS AMENDEDTHE,,C,UNT 205 BLD M,*,PLUS 1.8731% INTEREST IN,*,COMMON AREA,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,0145130119,0000,01,1 Family Dwelling,Residential,2696,1983,1900-00-00,00000,00000,30000,127510,AKRIDGE JAMES A & JUDY C AKRID,4,AKRIDGE JAMES A &,JUDY C AKRIDGE,6726 BEAUFORD DRIVE,AUSTIN TX 78750-8123,,6726 BEAUFORD DR,JESTER ESTATE SEC 1 PHS 1,,D,LOT 22 BLK G,,,,,,,,,,
96,0145130119,0000,01,1 Family Dwelling,Residential,2696,1983,1900-00-00,00000,00000,30000,127510,AKRIDGE JAMES A & JUDY C AKRID,4,AKRIDGE JAMES A &,JUDY C AKRIDGE,6726 BEAUFORD DRIVE,AUSTIN TX 78750-8123,,6726 BEAUFORD DR,JESTER ESTATE SEC 1 PHS 1,,D,LOT 22 BLK G,,,,,,,,,,
97,0145130119,0000,01,1 Family Dwelling,Residential,2696,1983,1900-00-00,00000,00000,30000,127510,AKRIDGE JAMES A & JUDY C AKRID,4,AKRIDGE JAMES A &,JUDY C AKRIDGE,6726 BEAUFORD DRIVE,AUSTIN TX 78750-8123,,6726 BEAUFORD DR,JESTER ESTATE SEC 1 PHS 1,,D,LOT 22 BLK G,,,,,,,,,,
98,0145130119,0000,01,1 Family Dwelling,Residential,2696,1983,1900-00-00,00000,00000,30000,127510,AKRIDGE JAMES A & JUDY C AKRID,4,AKRIDGE JAMES A &,JUDY C AKRIDGE,6726 BEAUFORD DRIVE,AUSTIN TX 78750-8123,,6726 BEAUFORD DR,JESTER ESTATE SEC 1 PHS 1,,D,LOT 22 BLK G,,,,,,,,,,


In [57]:
%%sql 
COPY (SELECT folder_A_TXBC.TXBC_SUM_1990.Parcel,
            folder_A_TXBC.TXBC_SUM_1990.OwnrId,
            folder_A_TXBC.TXBC_SUM_1990_USECODE.UseCode,
            folder_A_TXBC.TXBC_SUM_1990_USECODE.Description AS Use_description,
            folder_A_TXBC.TXBC_SUM_1990_USECODE.Category AS Use_category,
            folder_A_TXBC.TXBC_SUM_1990.TotSqft, 
            folder_A_TXBC.TXBC_SUM_1990.EYOC AS Effect_date_built, 
            folder_A_TXBC.TXBC_SUM_1990.DeedDate, 
            folder_A_TXBC.TXBC_SUM_1990.DeedVol, 
            folder_A_TXBC.TXBC_SUM_1990.DeedPg,
            folder_A_TXBC.TXBC_SUM_1990_JURIS.LandApprdVal,
            folder_A_TXBC.TXBC_SUM_1990_JURIS.ImprApprdVal,
            folder_A_TXBC.TXBC_SUM_1990.OwnerName,
            folder_A_TXBC.TXBC_SUM_1990.MailCnt,
            folder_A_TXBC.TXBC_SUM_1990.MailAddr1,
            folder_A_TXBC.TXBC_SUM_1990.MailAddr2,
            folder_A_TXBC.TXBC_SUM_1990.MailAddr3, 
            folder_A_TXBC.TXBC_SUM_1990.MailAddr4,
            folder_A_TXBC.TXBC_SUM_1990.MailAddr5,
            folder_A_TXBC.TXBC_SUM_1990.FmtLoc AS Location,
            folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalSub1,
            folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalSub2,
            folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalCd1,
            folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalLn1,
            folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalCd2,
            folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalLn2, 
            folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalCd3,
            folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalLn3,
            folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalCd4,
            folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalLn4, 
            folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalCd5,
            folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalLn5,
            folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalCd6,
            folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalLn6 
    FROM folder_A_TXBC.TXBC_SUM_1990
    JOIN folder_A_TXBC.TXBC_SUM_1990_JURIS ON 
        folder_A_TXBC.TXBC_SUM_1990.Parcel = folder_A_TXBC.TXBC_SUM_1990_JURIS.Parcel AND
        folder_A_TXBC.TXBC_SUM_1990.OwnrId = folder_A_TXBC.TXBC_SUM_1990_JURIS.OwnrId 
    JOIN folder_A_TXBC.TXBC_SUM_1990_LEGAL ON 
        folder_A_TXBC.TXBC_SUM_1990.Parcel = folder_A_TXBC.TXBC_SUM_1990_LEGAL.Parcel AND
        folder_A_TXBC.TXBC_SUM_1990.OwnrId = folder_A_TXBC.TXBC_SUM_1990_LEGAL.OwnrId 
    JOIN folder_A_TXBC.TXBC_SUM_1990_USECODE ON 
        folder_A_TXBC.TXBC_SUM_1990.UseCode = folder_A_TXBC.TXBC_SUM_1990_USECODE.UseCode
) TO 'Deliverable.csv' WITH (FORMAT CSV, DELIMITER ',');


Unnamed: 0,Count
0,871738


In [58]:
%%sql result_set <<
SELECT folder_A_TXBC.TXBC_SUM_1990.Parcel,
            folder_A_TXBC.TXBC_SUM_1990.OwnrId,
            folder_A_TXBC.TXBC_SUM_1990_USECODE.UseCode,
            folder_A_TXBC.TXBC_SUM_1990_USECODE.Description AS Use_description,
            folder_A_TXBC.TXBC_SUM_1990_USECODE.Category AS Use_category,
            folder_A_TXBC.TXBC_SUM_1990.TotSqft, 
            folder_A_TXBC.TXBC_SUM_1990.EYOC AS Effect_date_built, 
            folder_A_TXBC.TXBC_SUM_1990.DeedDate, 
            folder_A_TXBC.TXBC_SUM_1990.DeedVol, 
            folder_A_TXBC.TXBC_SUM_1990.DeedPg,
            folder_A_TXBC.TXBC_SUM_1990_JURIS.LandApprdVal,
            folder_A_TXBC.TXBC_SUM_1990_JURIS.ImprApprdVal,
            folder_A_TXBC.TXBC_SUM_1990.OwnerName,
            folder_A_TXBC.TXBC_SUM_1990.MailCnt,
            folder_A_TXBC.TXBC_SUM_1990.MailAddr1,
            folder_A_TXBC.TXBC_SUM_1990.MailAddr2,
            folder_A_TXBC.TXBC_SUM_1990.MailAddr3, 
            folder_A_TXBC.TXBC_SUM_1990.MailAddr4,
            folder_A_TXBC.TXBC_SUM_1990.MailAddr5,
            folder_A_TXBC.TXBC_SUM_1990.FmtLoc AS Location,
            folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalSub1,
            folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalSub2,
            folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalCd1,
            folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalLn1,
            folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalCd2,
            folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalLn2, 
            folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalCd3,
            folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalLn3,
            folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalCd4,
            folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalLn4, 
            folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalCd5,
            folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalLn5,
            folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalCd6,
            folder_A_TXBC.TXBC_SUM_1990_LEGAL.LegalLn6 
    FROM folder_A_TXBC.TXBC_SUM_1990
    JOIN folder_A_TXBC.TXBC_SUM_1990_JURIS ON 
        folder_A_TXBC.TXBC_SUM_1990.Parcel = folder_A_TXBC.TXBC_SUM_1990_JURIS.Parcel AND
        folder_A_TXBC.TXBC_SUM_1990.OwnrId = folder_A_TXBC.TXBC_SUM_1990_JURIS.OwnrId 
    JOIN folder_A_TXBC.TXBC_SUM_1990_LEGAL ON 
        folder_A_TXBC.TXBC_SUM_1990.Parcel = folder_A_TXBC.TXBC_SUM_1990_LEGAL.Parcel AND
        folder_A_TXBC.TXBC_SUM_1990.OwnrId = folder_A_TXBC.TXBC_SUM_1990_LEGAL.OwnrId 
    JOIN folder_A_TXBC.TXBC_SUM_1990_USECODE ON 
        folder_A_TXBC.TXBC_SUM_1990.UseCode = folder_A_TXBC.TXBC_SUM_1990_USECODE.UseCode



In [59]:
result_set.to_excel("final_deliverables.xlsx")

ModuleNotFoundError: No module named 'openpyxl'