# Export Database

In this notebook, we will perform the following steps to export data from a DuckDB database:
1. Initialize the DataExporter class.
2. Export data for all ZIP codes and industry levels.
3. Verify the export results.

This notebook will use the `DataExporter` class to handle data extraction and export operations.


In [1]:
import os
import logging
import duckdb
from multiprocessing import Pool
from tqdm import tqdm
import query as q
from duck_db_exporter import DataExporter  # Assuming DataExporter is defined in data_exporter.py

# Configure logging
logging.basicConfig(level=logging.INFO)


In [2]:
import query
print(dir(query))  # List all attributes and classes in the package


['DataQueryManager', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'csv', 'duckdb', 'os', 'pd']


## I. Export Database
- this is in the jupyterversion, there will also be a script version that be used by github action soon.

In [3]:
# Initialize DataExporter
exporter = DataExporter(
    base_db_path='../zip_data/duck_db_manager/database/',
    threads=4,
    export_dir='../../US/zip',
    industry_levels=[2, 5, 6],  # Specify industry levels if needed
    year=2019  # Specify the year for the data
)

In [4]:
# Export data for all states
file_paths = exporter.make_csv()  # No state specified

for file_path in file_paths:
    print(f"Exported file: {file_path}")

['CA', 'GA', 'MA', 'UT', 'NV', 'SD', 'WA', 'ME', 'KY', 'NH', 'CT', 'RI', 'TN', 'SC', 'IA', 'ND', 'PA', 'OH', 'IN', 'NE', 'WI', 'LA', 'HI', 'DC', 'CO', 'AZ', 'MD', 'VA', 'NM', 'MO', 'OR', 'OK', 'NC', 'KS', 'MN', 'ID', 'AR', 'TX', 'WV', 'AK', 'FL', 'VT', None, 'MI', 'AL', 'MS', 'WY', 'IL', 'NY', 'NJ', 'MT', 'DE']


TypeError: 'NoneType' object is not iterable

## II. Unit Test After Export

In this section, we will validate the consistency of the exported CSV files with the original DuckDB database. The tests will ensure that the exported files contain the expected data and match the data from the database.

We will perform the following tests:
1. Check the number of rows in the exported CSV files.
2. Validate that the data in the CSV files matches the data in the DuckDB database.
