# Export Database

In this notebook, we will perform the following steps to export data from a DuckDB database:
1. Initialize the DataExporter class.
2. Export data for all ZIP codes and industry levels.
3. Verify the export results.

This notebook will use the `DataExporter` class to handle data extraction and export operations.


In [1]:
import os
import logging
import duckdb
from multiprocessing import Pool
from tqdm import tqdm
import query as q
from duck_db_exporter import DataExporter  # Assuming DataExporter is defined in data_exporter.py

# Configure logging
logging.basicConfig(level=logging.INFO)


## I. Export Database
- this is in the jupyterversion, there will also be a script version that be used by github action soon.

### Export a single annual data

In [2]:
# Initialize DataExporter
exporter = DataExporter(
    base_db_path='../zip_data/duck_db_manager/database/',
    threads=4,
    export_dir='../../US/zip',
    industry_levels=[2, 5, 6],  # Specify industry levels if needed
    year=2019  # Specify the year for the data
)

In [8]:
# Export data for all states
file_paths = exporter.make_csv()  # No state specified

for file_path in file_paths:
    print(f"Exported file: {file_path}")

Exported file: /Users/nivyni/Documents/Work_SDE/ModelEarth/WorkingRepos/ModelEarth-community-zipcodes/industries/naics/duck_zipcode_db/exporter/../../US/zip/CA/US-CA-census-naics2-zip-2019.csv
Exported file: /Users/nivyni/Documents/Work_SDE/ModelEarth/WorkingRepos/ModelEarth-community-zipcodes/industries/naics/duck_zipcode_db/exporter/../../US/zip/CA/US-CA-census-naics5-zip-2019.csv
Exported file: /Users/nivyni/Documents/Work_SDE/ModelEarth/WorkingRepos/ModelEarth-community-zipcodes/industries/naics/duck_zipcode_db/exporter/../../US/zip/CA/US-CA-census-naics6-zip-2019.csv
Exported file: /Users/nivyni/Documents/Work_SDE/ModelEarth/WorkingRepos/ModelEarth-community-zipcodes/industries/naics/duck_zipcode_db/exporter/../../US/zip/GA/US-GA-census-naics2-zip-2019.csv
Exported file: /Users/nivyni/Documents/Work_SDE/ModelEarth/WorkingRepos/ModelEarth-community-zipcodes/industries/naics/duck_zipcode_db/exporter/../../US/zip/GA/US-GA-census-naics5-zip-2019.csv
Exported file: /Users/nivyni/Docume

### Export all annual data in a single run

## II. Unit Test After Export

In this section, we will validate the consistency of the exported CSV files with the original DuckDB database. The tests will ensure that the exported files contain the expected data and match the data from the database.

We will perform the following tests:
1. Check the number of rows in the exported CSV files.
2. Validate that the data in the CSV files matches the data in the DuckDB database.


### II.(a) Unit Test for Data Exporter Accuracy

This notebook validates the accuracy of data exported by the `DataExporter` class. It ensures that the exported CSV files correctly reflect the data in the DuckDB database.

#### Description

The notebook performs the following tasks:
- **Setup**: Creates a temporary DuckDB database with test data.
- **Export**: Uses the `DataExporter` to generate CSV files.
- **Validation**: 
  - Checks that the number of rows in the CSV files matches the database.
  - Verifies that the data in the CSV files is consistent with the database.
- **Teardown**: Removes test files and the temporary database.

#### Key Tests

- **Row Count Check**: Ensures that the number of rows in the exported CSV files matches those in the database.
- **Data Consistency Check**: Confirms that the data values and types in the CSV files align with the database.

#### Usage

Execute the cells in the notebook to perform the tests. The notebook will automatically set up the test environment, run the validations, and clean up afterward.


In [None]:
# Execute the test script
%run test_data_exporter.py

### II.(b) Data Export Validation Test

This test ensures that the CSV files exported from our DuckDB database match the data stored in the database.

**Description:**
The `DataExporterTest` class validates the accuracy of exported CSV files by comparing them with the data in the DuckDB database. This process checks for consistency across all states and industry levels, ensuring that the exported files correctly represent the database records.

**Key Tests:**
1. **Initialization:** Configure the `DataExporterTest` with the target year and directory paths.
2. **Data Fetching:** Retrieve data from the database for all states and industry levels.
3. **CSV Comparison:** Compare the contents of each CSV file with the corresponding database data.
4. **Mismatch Reporting:** Log any discrepancies with detailed information, including file paths and data previews.

**Usage:**
To run the test:
1. Initialize the `DataExporterTest` with the desired year and directory paths.
2. Execute the test script.
3. Review the logs for any discrepancies between the CSV files and the database data.

The test will help ensure the integrity and accuracy of the exported data files.


In [3]:
from test_data_exporter_accuracy import DataExporterTest

In [4]:
# Replace the year with your desired year
tester = DataExporterTest(year=2019)
tester.run_test()

All CSV files match the database data by row count.
