# Featurebyte Sample Datasets

Featurebyte comes with three sample datasets for you to try.



1. French grocery: used in our detailed tutorials
2. Credit cards: used for the test exercises in the detailed tutorials
3. Healthcare: used in the quick start tutorial

Each of these datasets has been pre-installed in the Featurebyte Snowflake beta-testing environment.

# French Grocery Dataset

The French grocery dataset consists of 4 data tables recording grocery purchasing activity for each customer. It is stored in the BETA_TESTING_DATASETS database, with a schema name of GROCERY.

## Data Model
![FrenchGrocery.png](resources/FrenchGrocery.png)

## Data Dictionary

**Table: GroceryCustomer**


<table>
  <tr>
   <td><strong>Column</strong>
   </td>
   <td><strong>Description</strong>
   </td>
  </tr>
  <tr>
   <td>RowID
   </td>
   <td>Primary key
   </td>
  </tr>
  <tr>
   <td>GroceryCustomerGuid
   </td>
   <td>Unique identifier for each customer
   </td>
  </tr>
  <tr>
   <td>ValidFrom
   </td>
   <td>Timestamp at which the row becomes active
   </td>
  </tr>
  <tr>
   <td>CurrentRecord
   </td>
   <td>Flags whether the row is the latest version
   </td>
  </tr>
  <tr>
   <td>Gender
   </td>
   <td>Gender of the customer (male or female)
   </td>
  </tr>
  <tr>
   <td>Title
   </td>
   <td>Customer’s title e.g. Mr, Ms
   </td>
  </tr>
  <tr>
   <td>GivenName
   </td>
   <td>Customer’s given name e.g. Jane, John
   </td>
  </tr>
  <tr>
   <td>MiddleInitial
   </td>
   <td>Customer’s middle initial
   </td>
  </tr>
  <tr>
   <td>Surname
   </td>
   <td>Customer’s surname
   </td>
  </tr>
  <tr>
   <td>StreetAddress
   </td>
   <td>Customer’s residential street address 
   </td>
  </tr>
  <tr>
   <td>City
   </td>
   <td>Customer’s residential city
   </td>
  </tr>
  <tr>
   <td>State
   </td>
   <td>Customer’s residential state
   </td>
  </tr>
  <tr>
   <td>PostalCode
   </td>
   <td>Customer’s residential postal code (zip code)
   </td>
  </tr>
  <tr>
   <td>BrowserUserAgent
   </td>
   <td>Customer’s web browser user agent details
   </td>
  </tr>
  <tr>
   <td>DateofBirth
   </td>
   <td>Customer’s date of birth
   </td>
  </tr>
  <tr>
   <td>Latitude
   </td>
   <td>Latitude of customer’s residential address
   </td>
  </tr>
  <tr>
   <td>Longitude
   </td>
   <td>Longitude of customer’s residential address
   </td>
  </tr>
  <tr>
   <td>record_available_at
   </td>
   <td>Timestamp when the data warehouse record was created
   </td>
  </tr>
</table>


**Table: GroceryInvoice**


<table>
  <tr>
   <td><strong>Column</strong>
   </td>
   <td><strong>Description</strong>
   </td>
  </tr>
  <tr>
   <td>GroceryInvoiceGuid
   </td>
   <td>Unique identifier of each grocery invoice, primary key
   </td>
  </tr>
  <tr>
   <td>GroceryCustomerGuid
   </td>
   <td>Unique identifier of the customer who made the purchases
   </td>
  </tr>
  <tr>
   <td>Timestamp
   </td>
   <td>Timestamp of when the purchase occurred
   </td>
  </tr>
  <tr>
   <td>record_available_at
   </td>
   <td>Timestamp when the data warehouse record was created
   </td>
  </tr>
  <tr>
   <td>Amount
   </td>
   <td>Total amount charged for all items in the invoice
   </td>
  </tr>
</table>


**Table: InvoiceItems**


<table>
  <tr>
   <td><strong>Column</strong>
   </td>
   <td><strong>Description</strong>
   </td>
  </tr>
  <tr>
   <td>GroceryInvoiceItemGuid
   </td>
   <td>Primary key
   </td>
  </tr>
  <tr>
   <td>GroceryInvoiceGuid
   </td>
   <td>Unique identifier of each grocery invoice
   </td>
  </tr>
  <tr>
   <td>GroceryProductGuid
   </td>
   <td>Identifier for the type of grocery product item purchased
   </td>
  </tr>
  <tr>
   <td>Quantity
   </td>
   <td>Number of items purchased of this product type
   </td>
  </tr>
  <tr>
   <td>UnitPrice
   </td>
   <td>Price per item
   </td>
  </tr>
  <tr>
   <td>TotalCost
   </td>
   <td>Quantity times unit price
   </td>
  </tr>
  <tr>
   <td>Discount
   </td>
   <td>Total amount of discount applied
   </td>
  </tr>
  <tr>
   <td>record_available_at
   </td>
   <td>Timestamp when the data warehouse record was created
   </td>
  </tr>
</table>


**Table: GroceryProduct**


<table>
  <tr>
   <td><strong>Column</strong>
   </td>
   <td><strong>Description</strong>
   </td>
  </tr>
  <tr>
   <td>GroceryProductGuid
   </td>
   <td>Identifier for the type of grocery product item purchased, primary key
   </td>
  </tr>
  <tr>
   <td>ProductGroup
   </td>
   <td>Text name for the type of grocery product item purchased
   </td>
  </tr>
</table>

## DBML Definitions

In DBML format, the table definitions are:
```sql
Table GroceryCustomer {
  RowID               guid [pk, not null]
  GroceryCustomerGuid guid [not null, note: 'ENTITY: GROCERYCUSTOMER']
  ValidFrom           timestamp [not null]
  CurrentRecord       bool [not null]
  Gender              nvarchar [not null]
  Title               nvarchar [not null]
  GivenName           nvarchar [not null]
  MiddleInitial       nvarchar [not null]
  Surname             nvarchar [not null]
  StreetAddress       nvarchar [not null]
  City                nvarchar [not null]
  State               nvarchar [not null]
  PostalCode          nvarchar [not null]
  BrowserUserAgent    nvarchar [not null]
  DateOfBirth         nvarchar [not null]
  Latitude            float [not null]
  Longitude           float [not null]
  record_available_at timestamp [not null, note: 'record_creation_timestamp']

  Note: 'Table Type: SCD'
}


Table GroceryInvoice {
  GroceryInvoiceGuid  guid [pk, not null, note: 'ENTITY: GROCERYINVOICE']
  GroceryCustomerGuid guid [not null, note: 'ENTITY: GROCERYCUSTOMER']
  Timestamp           timestamp [not null]
  record_available_at timestamp [not null]
  Amount              float [not null]

  Note: 'Table Type: Event'
}


Table InvoiceItems {
  GroceryInvoiceItemGuid guid [pk, not null]
  GroceryInvoiceGuid     guid [not null, note: 'ENTITY: GROCERYINVOICE']
  GroceryProductGuid     guid [not null, note: 'ENTITY: GROCERYPRODUCT']
  Quantity               int [not null]
  UnitPrice              float [not null]
  TotalCost              float [not null]
  Discount               float [not null]
  record_available_at    timestamp [not null, note: 'record_creation_timestamp']

  Note: 'Table Type: Item'
}


Table GroceryProduct {
  GroceryProductGuid guid [pk, not null, note: 'ENTITY: GROCERYPRODUCT']
  ProductGroup       nvarchar [not null]

  Note: 'Table Type: Dimension'
}


Ref: GroceryCustomer.GroceryCustomerGuid > GroceryInvoice.GroceryCustomerGuid

Ref: GroceryInvoice.GroceryInvoiceGuid > InvoiceItems.GroceryInvoiceGuid

Ref: InvoiceItems.GroceryProductGuid > GroceryProduct.GroceryProductGuid
```

# Credit Card Dataset

The credit card dataset consists of 6 data tables recording credit card transactions of bank customers. It is stored in the BETA_TESTING_DATASETS database, with a schema name of CREDIT_CARD.

## Data Model
![CreditCardDataModel.png](resources/CreditCardDataModel.png)

## Data Dictionary

**Table: BankCustomer**


<table>
  <tr>
   <td><strong>Column</strong>
   </td>
   <td><strong>Description</strong>
   </td>
  </tr>
  <tr>
   <td>RowID
   </td>
   <td>Primary key
   </td>
  </tr>
  <tr>
   <td>BankCustomerGuid
   </td>
   <td>Unique identifier for each bank customer
   </td>
  </tr>
  <tr>
   <td>ValidFrom
   </td>
   <td>Timestamp of when this version of the customer data become active
   </td>
  </tr>
  <tr>
   <td>ValidTo
   </td>
   <td>Timestamp of when this version of the customer data ceased being active
   </td>
  </tr>
  <tr>
   <td>Title
   </td>
   <td>Customer’s title e.g. Mr, Ms
   </td>
  </tr>
  <tr>
   <td>GivenName
   </td>
   <td>Customer’s given name e.g. Jane, John
   </td>
  </tr>
  <tr>
   <td>MiddleInitial
   </td>
   <td>Customer’s middle initial
   </td>
  </tr>
  <tr>
   <td>Surname
   </td>
   <td>Customer’s surname
   </td>
  </tr>
  <tr>
   <td>DateOfBirth
   </td>
   <td>Customer’s residential street address 
   </td>
  </tr>
  <tr>
   <td>Gender
   </td>
   <td>Customer’s residential city
   </td>
  </tr>
  <tr>
   <td>StreetAddress
   </td>
   <td>Customer’s residential state
   </td>
  </tr>
  <tr>
   <td>City
   </td>
   <td>Customer’s residential postal code (zip code)
   </td>
  </tr>
  <tr>
   <td>StateCode
   </td>
   <td>Customer’s web browser user agent details
   </td>
  </tr>
  <tr>
   <td>ZipCode
   </td>
   <td>Customer’s date of birth
   </td>
  </tr>
  <tr>
   <td>Latitude
   </td>
   <td>Latitude of customer’s residential address
   </td>
  </tr>
  <tr>
   <td>Longitude
   </td>
   <td>Longitude of customer’s residential address
   </td>
  </tr>
  <tr>
   <td>record_available_at
   </td>
   <td>Timestamp when the data warehouse record was created
   </td>
  </tr>
  <tr>
   <td>closed_at
   </td>
   <td>Timestamp when the person ceased to be a customer (a missing value whenever the person is a customer)
   </td>
  </tr>
</table>


**Table: CreditCard**


<table>
  <tr>
   <td><strong>Column</strong>
   </td>
   <td><strong>Description</strong>
   </td>
  </tr>
  <tr>
   <td>RowID
   </td>
   <td>Primary key
   </td>
  </tr>
  <tr>
   <td>BankCustomerGuid
   </td>
   <td>Unique identifier for each bank customer
   </td>
  </tr>
  <tr>
   <td>AccountID
   </td>
   <td>Unique identifier for each credit card
   </td>
  </tr>
  <tr>
   <td>ValidFrom
   </td>
   <td>Timestamp of when this version of the credit card data become active
   </td>
  </tr>
  <tr>
   <td>ValidTo
   </td>
   <td>Timestamp of when this version of the credit card data ceased being active
   </td>
  </tr>
  <tr>
   <td>record_available_at
   </td>
   <td>Timestamp when the data warehouse record was created
   </td>
  </tr>
  <tr>
   <td>CardExpiry
   </td>
   <td>Month in which the credit card expires, in MM/YYYY format
   </td>
  </tr>
  <tr>
   <td>CVV2
   </td>
   <td>Card verification value, in NNN format
   </td>
  </tr>
  <tr>
   <td>closed_at
   </td>
   <td>Timestamp when the credit card ceased to be valid (a missing value whenever the credit card is valid)
   </td>
  </tr>
</table>


**Table: StateDetails**


<table>
  <tr>
   <td><strong>Column</strong>
   </td>
   <td><strong>Description</strong>
   </td>
  </tr>
  <tr>
   <td>StateGuid
   </td>
   <td>Primary key
   </td>
  </tr>
  <tr>
   <td>StateCode
   </td>
   <td>Two character US state code, uniquely identifies each state
   </td>
  </tr>
  <tr>
   <td>StateName
   </td>
   <td>Text name of the state
   </td>
  </tr>
  <tr>
   <td>CentroidLatitude
   </td>
   <td>Latitude of the centroid of the state boundaries
   </td>
  </tr>
  <tr>
   <td>CentroidLongitude
   </td>
   <td>Longitude of the centroid of the state boundaries
   </td>
  </tr>
  <tr>
   <td>Area
   </td>
   <td>State’s area, in square miles
   </td>
  </tr>
  <tr>
   <td>CensusRegion
   </td>
   <td>State’s census region name
   </td>
  </tr>
  <tr>
   <td>HospitalCount
   </td>
   <td>Number of hospitals
   </td>
  </tr>
  <tr>
   <td>HospitalBedCount
   </td>
   <td>Number of hospital beds
   </td>
  </tr>
  <tr>
   <td>BelowPovertyLevel
   </td>
   <td>Percentage of the population living below the poverty line e.g. 12.3 = 12.3%
   </td>
  </tr>
  <tr>
   <td>Aged65Plus
   </td>
   <td>Percentage of the population aged 65 or more e.g. 12.3 = 12.3%
   </td>
  </tr>
  <tr>
   <td>TotalPopulation
   </td>
   <td>Number of residents
   </td>
  </tr>
  <tr>
   <td>ValidFrom
   </td>
   <td>Timestamp of when this version of the state data become active
   </td>
  </tr>
  <tr>
   <td>ValidTo
   </td>
   <td>Timestamp of when this version of the state data ceased being active
   </td>
  </tr>
</table>


**Table: CardTransactions**


<table>
  <tr>
   <td><strong>Column</strong>
   </td>
   <td><strong>Description</strong>
   </td>
  </tr>
  <tr>
   <td>CardTransactionID
   </td>
   <td>Unique identifier for each transaction, primary key
   </td>
  </tr>
  <tr>
   <td>AccountID
   </td>
   <td>Unique identifier for each credit card
   </td>
  </tr>
  <tr>
   <td>Timestamp
   </td>
   <td>Timestamp of when the transaction occurred
   </td>
  </tr>
  <tr>
   <td>record_available_at
   </td>
   <td>Timestamp when the data warehouse record was created
   </td>
  </tr>
  <tr>
   <td>CardTransactionDescription
   </td>
   <td>Text description of the type of transaction e.g. fast food
   </td>
  </tr>
  <tr>
   <td>Amount
   </td>
   <td>Transaction amount
   </td>
  </tr>
</table>


**Table: CardTransactionGroup**s


<table>
  <tr>
   <td><strong>Column</strong>
   </td>
   <td><strong>Description</strong>
   </td>
  </tr>
  <tr>
   <td>CardTransactionDescription
   </td>
   <td>Text description of the type of transaction e.g. fast food, primary key
   </td>
  </tr>
  <tr>
   <td>TransactionGroup
   </td>
   <td>Text description of the broader grouping type of the transaction e.g. bank fee
   </td>
  </tr>
</table>


**Table: CardFraudStatus**


<table>
  <tr>
   <td><strong>Column</strong>
   </td>
   <td><strong>Description</strong>
   </td>
  </tr>
  <tr>
   <td>RowID
   </td>
   <td>Primary key
   </td>
  </tr>
  <tr>
   <td>CardTransactionID
   </td>
   <td>Unique identifier for each transaction
   </td>
  </tr>
  <tr>
   <td>Status
   </td>
   <td>Fraud status e.g. reported
   </td>
  </tr>
  <tr>
   <td>ValidFrom
   </td>
   <td>Timestamp of when this version of the fraud status data become active
   </td>
  </tr>
  <tr>
   <td>ValidTo
   </td>
   <td>Timestamp of when this version of the fraud status data ceased being active
   </td>
  </tr>
  <tr>
   <td>record_available_at
   </td>
   <td>Timestamp when the data warehouse record was created
   </td>
  </tr>
</table>

## DBML Definitions

In DBML format, the table definitions are:

```sql
Table CARDTRANSACTIONGROUPS {
  CardTransactionDescription varchar [pk, not null, note: 'ENTITY:CARDTRANSACTIONDESCRIPTION + dimension_id']
  TransactionGroup           varchar [not null, note: 'ENTITY:CARDTRANSACTIONGROUP']

  Note: 'Table Type: Dimension'
}


Table STATEDETAILS {
  StateGuid         guid [PK, not null]
  StateCode         nvarchar [not null, note: 'ENTITY: STATE']
  StateName         nvarchar [not null]
  CentroidLatitude  float [not null]
  CentroidLongitude float [not null]
  Area              float [not null]
  CensusRegion      nvarchar [not null]
  HospitalCount     int [not null]
  HospitalBedCount  int [not null]
  BelowPovertyLevel float [not null]
  Aged65Plus        float [not null]
  TotalPopulation   float [not null]
  ValidFrom         timestamp [not null]
  ValidTo           timestamp [not null]
  
  Note: 'Table Type: Dimension'
}


Table CARDTRANSACTIONS {
  CardTransactionID          guid [pk, note: 'ENTITY:CARDTRANSACTION']
  AccountID                  guid [not null, note: 'ENTITY:CREDITCARD']
  Timestamp                  timestamp [not null, note: 'event_timestamp']
  record_available_at        timestamp [not null, note: 'record_creation_timestamp']
  CardTransactionDescription varchar [not null, note: 'ENTITY:TRANSACTIONDESCRIPTION']
  Amount                     float [not null]

  Note: 'Table Type: Event'
}


Table CREDITCARD {
  RowID               guid [pk, note: 'surrogate_key']
  BankCustomerID      guid [not null, note: 'ENTITY:BANKCUSTOMER']
  AccountID           guid [not null, note: 'ENTITY:CREDITCARD + natural_key']
  ValidFrom           timestamp [not null, note: 'effective_timestamp']
  ValidTo             timestamp [not null, note: 'end_timestamp']
  record_available_at timestamp [not null, note: 'record_creation_timestamp']
  CardExpiry          varchar [not null, note: 'MM/YYYY format']
  CVV2                varchar [not null]
  closed_at           timestamp 

  Note: 'Table Type: SlowlyChanging'
}


Table BANKCUSTOMER {
  RowID               guid [pk, note: 'surrogate_key']
  BankCustomerID      guid [not null, note: 'ENTITY:BANKCUSTOMER']
  ValidFrom           timestamp [not null, note: 'effective_timestamp']
  ValidTo             timestamp [not null, note: 'end_timestamp']
  Title               nvarchar [not null]
  GivenName           nvarchar [not null]
  MiddleInitial       nvarchar [not null]
  Surname             nvarchar [not null]
  DateOfBirth         date [not null]
  Gender              nvarchar [not null]
  StreetAddress       nvarchar [not null]
  City                nvarchar [not null]
  StateCode           nvarchar [not null, note: 'ENTITY: STATE']
  ZipCode             nvarchar [not null]
  Latitude            float [not null]
  Longitude           float [not null]
  record_available_at timestamp [not null, note: 'record_creation_timestamp']
  closed_at           timestamp 

  Note: 'Table Type: SlowlyChanging'
}


Table CARDFRAUDSTATUS {
  RowID               guid [pk, not null]
  CardTransactionID   guid [not null, note: 'ENTITY:CARDTRANSACTION']
  Status              varchar [not null]
  ValidFrom           timestamp [not null, note: 'effective_timestamp']
  ValidTo             timestamp [not null, note: 'end_timestamp']
  record_available_at timestamp [not null, note: 'record_creation_timestamp']

  Note: 'Table Type: SCD'
}

Ref: CREDITCARD.AccountID > CARDTRANSACTIONS.AccountID

Ref: CARDTRANSACTIONS.CardTransactionDescription > CARDTRANSACTIONGROUPS.CardTransactionDescription

Ref: CARDTRANSACTIONS.CardTransactionID > CARDFRAUDSTATUS.CardTransactionID

Ref: BANKCUSTOMER.StateCode > STATEDETAILS.StateCode

Ref: BANKCUSTOMER.BankCustomerID > CREDITCARD.BankCustomerID
```

# Healthcare Dataset

The healthcare dataset consists of 12 data tables recording the medical activity for each patient. It is stored in the BETA_TESTING_DATASETS database, with a schema name of HEALTHCARE.

## Data Model

![HealthcareDataModel.png](resources/HealthcareDataModel.png)


## Data Dictionary

**Table: Patient**


<table>
  <tr>
   <td><strong>Column</strong>
   </td>
   <td><strong>Description</strong>
   </td>
  </tr>
  <tr>
   <td>RowID
   </td>
   <td>Primary key
   </td>
  </tr>
  <tr>
   <td>PatientGuid
   </td>
   <td>Unique identifier for each patient
   </td>
  </tr>
  <tr>
   <td>Gender
   </td>
   <td>Patient’s gender, M or F
   </td>
  </tr>
  <tr>
   <td>DateOfBirth
   </td>
   <td>Patient’s date of birth
   </td>
  </tr>
  <tr>
   <td>StateCode
   </td>
   <td>Patient’s reisdential state, a two character US state code
   </td>
  </tr>
  <tr>
   <td>ValidFrom
   </td>
   <td>Timestamp at which the row becomes active
   </td>
  </tr>
  <tr>
   <td>CurrentRecord
   </td>
   <td>Flags whether the row is the latest version
   </td>
  </tr>
  <tr>
   <td>record_available_at
   </td>
   <td>Timestamp when the data warehouse record was created
   </td>
  </tr>
</table>


**Table: Diagnosis**


<table>
  <tr>
   <td><strong>Column</strong>
   </td>
   <td><strong>Description</strong>
   </td>
  </tr>
  <tr>
   <td>RowID
   </td>
   <td>Primary key
   </td>
  </tr>
  <tr>
   <td>DiagnosisGuid
   </td>
   <td>Unique identifier for each diagnosis
   </td>
  </tr>
  <tr>
   <td>PatientGuid
   </td>
   <td>Unique identifier for each patient
   </td>
  </tr>
  <tr>
   <td>ValidFrom
   </td>
   <td>Timestamp at which the row becomes active
   </td>
  </tr>
  <tr>
   <td>ValidTo
   </td>
   <td>Timestamp at which the row ceases to be active
   </td>
  </tr>
  <tr>
   <td>ICD9Code
   </td>
   <td>ICD9 code for the diagnosis
   </td>
  </tr>
  <tr>
   <td>DiagnosisDescription
   </td>
   <td>Text description of the diagnosis
   </td>
  </tr>
  <tr>
   <td>Acute
   </td>
   <td>Flags whether the medical condition is acute
   </td>
  </tr>
  <tr>
   <td>record_available_at
   </td>
   <td>Timestamp when the data warehouse record was created
   </td>
  </tr>
  <tr>
   <td>closed_at
   </td>
   <td>Timestamp when the diagnosis ceased to apply to the patient (a missing value whenever the diagnosis applies to the patient)
   </td>
  </tr>
</table>


**Table: ICD9Hierarchy**


<table>
  <tr>
   <td><strong>Column</strong>
   </td>
   <td><strong>Description</strong>
   </td>
  </tr>
  <tr>
   <td>ICD9Code
   </td>
   <td>ICD9 code for the diagnosis
   </td>
  </tr>
  <tr>
   <td>Group1
   </td>
   <td>Highest level ontological grouping of diagnoses for this ICD9 code
   </td>
  </tr>
  <tr>
   <td>Group2
   </td>
   <td>Second highest level ontological grouping of diagnoses for this ICD9 code
   </td>
  </tr>
  <tr>
   <td>Group3
   </td>
   <td>Lowest level ontological grouping of diagnoses for this ICD9 code
   </td>
  </tr>
</table>


**Table: LabResult**


<table>
  <tr>
   <td><strong>Column</strong>
   </td>
   <td><strong>Description</strong>
   </td>
  </tr>
  <tr>
   <td>LabResultGuid
   </td>
   <td>Unique identifier for each lab result, primary key
   </td>
  </tr>
  <tr>
   <td>PatientGuid
   </td>
   <td>Unique identifier for each patient
   </td>
  </tr>
  <tr>
   <td>ReportDate
   </td>
   <td>Date at which the lab tests were reported
   </td>
  </tr>
  <tr>
   <td>record_available_at
   </td>
   <td>Timestamp when the data warehouse record was created
   </td>
  </tr>
</table>


**Table: LabObservation**


<table>
  <tr>
   <td><strong>Column</strong>
   </td>
   <td><strong>Description</strong>
   </td>
  </tr>
  <tr>
   <td>LabObservationGuid
   </td>
   <td>Unique identifier for each lab observation, primary key
   </td>
  </tr>
  <tr>
   <td>LabResultGuid
   </td>
   <td>Unique identifier for the lab result containing this observation
   </td>
  </tr>
  <tr>
   <td>HL7Text
   </td>
   <td>Text description of the lab observation type
   </td>
  </tr>
  <tr>
   <td>ObservationValue
   </td>
   <td>Numeric observation value
   </td>
  </tr>
  <tr>
   <td>Units
   </td>
   <td>Units for the observation value e.g. mmol/L
   </td>
  </tr>
  <tr>
   <td>ReferenceRange
   </td>
   <td>Healthy range of observation values for this observation type
   </td>
  </tr>
  <tr>
   <td>AbnormalFlags
   </td>
   <td>Text description if the observation value is abnormal e.g. above normal, missing value if not abnormal
   </td>
  </tr>
  <tr>
   <td>IsAbnormalValue
   </td>
   <td>Flags abnormal value
   </td>
  </tr>
  <tr>
   <td>record_available_at
   </td>
   <td>Timestamp when the data warehouse record was created
   </td>
  </tr>
</table>


**Table: StateDetails**


<table>
  <tr>
   <td><strong>Column</strong>
   </td>
   <td><strong>Description</strong>
   </td>
  </tr>
  <tr>
   <td>StateGuid
   </td>
   <td>Primary key
   </td>
  </tr>
  <tr>
   <td>StateCode
   </td>
   <td>Two character US state code, uniquely identifies each state
   </td>
  </tr>
  <tr>
   <td>StateName
   </td>
   <td>Text name of the state
   </td>
  </tr>
  <tr>
   <td>CentroidLatitude
   </td>
   <td>Latitude of the centroid of the state boundaries
   </td>
  </tr>
  <tr>
   <td>CentroidLongitude
   </td>
   <td>Longitude of the centroid of the state boundaries
   </td>
  </tr>
  <tr>
   <td>Area
   </td>
   <td>State’s area, in square miles
   </td>
  </tr>
  <tr>
   <td>CensusRegion
   </td>
   <td>State’s census region name
   </td>
  </tr>
  <tr>
   <td>HospitalCount
   </td>
   <td>Number of hospitals
   </td>
  </tr>
  <tr>
   <td>HospitalBedCount
   </td>
   <td>Number of hospital beds
   </td>
  </tr>
  <tr>
   <td>BelowPovertyLevel
   </td>
   <td>Percentage of the population living below the poverty line e.g. 12.3 = 12.3%
   </td>
  </tr>
  <tr>
   <td>Aged65Plus
   </td>
   <td>Percentage of the population aged 65 or more e.g. 12.3 = 12.3%
   </td>
  </tr>
  <tr>
   <td>TotalPopulation
   </td>
   <td>Number of residents
   </td>
  </tr>
  <tr>
   <td>ValidFrom
   </td>
   <td>Timestamp of when this version of the state data become active
   </td>
  </tr>
  <tr>
   <td>ValidTo
   </td>
   <td>Timestamp of when this version of the state data ceased being active
   </td>
  </tr>
</table>


**Table: PatientSmokingStatus**


<table>
  <tr>
   <td><strong>Column</strong>
   </td>
   <td><strong>Description</strong>
   </td>
  </tr>
  <tr>
   <td>PatientSmokingStatusGuid
   </td>
   <td>Primary key
   </td>
  </tr>
  <tr>
   <td>PatientGuid
   </td>
   <td>Unique identifier for each patient
   </td>
  </tr>
  <tr>
   <td>Description
   </td>
   <td>Text description of the smoking status
   </td>
  </tr>
  <tr>
   <td>NISTcode
   </td>
   <td>NIST code for the smoking status
   </td>
  </tr>
  <tr>
   <td>ValidFrom
   </td>
   <td>Timestamp of when this version of a patient’s smoking status data become active
   </td>
  </tr>
  <tr>
   <td>CurrentRecord
   </td>
   <td>Flags whether the row is the latest version
   </td>
  </tr>
  <tr>
   <td>record_available_at
   </td>
   <td>Timestamp when the data warehouse record was created
   </td>
  </tr>
</table>


**Table: Prescription**


<table>
  <tr>
   <td><strong>Column</strong>
   </td>
   <td><strong>Description</strong>
   </td>
  </tr>
  <tr>
   <td>PrescriptionGuid
   </td>
   <td>Unique identifier for each prescription, primary key
   </td>
  </tr>
  <tr>
   <td>PatientGuid
   </td>
   <td>Unique identifier for each patient
   </td>
  </tr>
  <tr>
   <td>PrescriptionDate
   </td>
   <td>Date at which the prescription was issued
   </td>
  </tr>
  <tr>
   <td>Quantity
   </td>
   <td>Number of medical product units to be provided
   </td>
  </tr>
  <tr>
   <td>NumberOfRefills
   </td>
   <td>Number of medical product refill units to be provided
   </td>
  </tr>
  <tr>
   <td>RefillAsNeeded
   </td>
   <td>Flags whether refills may be provided as the patient requires
   </td>
  </tr>
  <tr>
   <td>GenericAllowed
   </td>
   <td>Flags whether genric medical products are permitted
   </td>
  </tr>
  <tr>
   <td>NdcCode
   </td>
   <td>National drug code for the type of drug provided 
   </td>
  </tr>
  <tr>
   <td>MedicationName
   </td>
   <td>Text name of the medication
   </td>
  </tr>
  <tr>
   <td>MedicationStrength
   </td>
   <td>Strength of the medication e.g. 40 mg
   </td>
  </tr>
  <tr>
   <td>Schedule
   </td>
   <td>DEA medication schedule code, missing value if not a scheduled medication
   </td>
  </tr>
  <tr>
   <td>record_available_at
   </td>
   <td>Timestamp when the data warehouse record was created
   </td>
  </tr>
</table>


**Table: MedicalProduct**


<table>
  <tr>
   <td><strong>Column</strong>
   </td>
   <td><strong>Description</strong>
   </td>
  </tr>
  <tr>
   <td>NdcCode
   </td>
   <td>National drug code for the type of drug provided, primary key
   </td>
  </tr>
  <tr>
   <td>ProductType
   </td>
   <td>Text description of the product group e.g. human prescription drug
   </td>
  </tr>
  <tr>
   <td>ProprietaryName
   </td>
   <td>Proprietary branded name of the product
   </td>
  </tr>
  <tr>
   <td>NonProprietaryName
   </td>
   <td>Generic name of the product
   </td>
  </tr>
  <tr>
   <td>DosageForm
   </td>
   <td>Unit of dosage e.g. injection or capsule
   </td>
  </tr>
  <tr>
   <td>Route
   </td>
   <td>Method of delivery into the patient e.g. oral or intravenous
   </td>
  </tr>
  <tr>
   <td>Brand
   </td>
   <td>Manufacturer brand name
   </td>
  </tr>
  <tr>
   <td>ActiveSubstance
   </td>
   <td>List of active ingredients, delimited by semi-colon
   </td>
  </tr>
  <tr>
   <td>Strength
   </td>
   <td>Dosage strength e.g. 80 mg/mL
   </td>
  </tr>
  <tr>
   <td>PharmaceuticalClasses
   </td>
   <td>Python dictionary of pharmaceutical classes, in json format
   </td>
  </tr>
  <tr>
   <td>DeaSchedule
   </td>
   <td>DEA medication schedule code, missing value if not a scheduled medication
   </td>
  </tr>
</table>


**Table: Allergy**


<table>
  <tr>
   <td><strong>Column</strong>
   </td>
   <td><strong>Description</strong>
   </td>
  </tr>
  <tr>
   <td>AllergyGuid
   </td>
   <td>Primary key
   </td>
  </tr>
  <tr>
   <td>PatientGuid
   </td>
   <td>Unique identifier for each patient
   </td>
  </tr>
  <tr>
   <td>StartDate
   </td>
   <td>Date at which allergy was first diagnosed
   </td>
  </tr>
  <tr>
   <td>AllergyType
   </td>
   <td>Text description of allergy type e.g. dairy
   </td>
  </tr>
  <tr>
   <td>CurrentRecord
   </td>
   <td>Flags whether the row is the latest version
   </td>
  </tr>
  <tr>
   <td>ReactionName
   </td>
   <td>Text description of allergic reaction  e.g. rsh - generalized
   </td>
  </tr>
  <tr>
   <td>Severity
   </td>
   <td>Text coding of severity of allergy e.g. mild
   </td>
  </tr>
  <tr>
   <td>record_available_at
   </td>
   <td>Timestamp when the data warehouse record was created
   </td>
  </tr>
</table>


**Table: Visit**


<table>
  <tr>
   <td><strong>Column</strong>
   </td>
   <td><strong>Description</strong>
   </td>
  </tr>
  <tr>
   <td>VisitGuid
   </td>
   <td>Primary key
   </td>
  </tr>
  <tr>
   <td>PatientGuid
   </td>
   <td>Unique identifier for each patient
   </td>
  </tr>
  <tr>
   <td>VisitDate
   </td>
   <td>Date at which the patient visited the physician
   </td>
  </tr>
  <tr>
   <td>Height
   </td>
   <td>Height of patient, in inches
   </td>
  </tr>
  <tr>
   <td>Weight
   </td>
   <td>Weight of patient, in pounds
   </td>
  </tr>
  <tr>
   <td>BMI
   </td>
   <td>Patient’s body mass index
   </td>
  </tr>
  <tr>
   <td>SystolicBP
   </td>
   <td>Patient’s systolic blood pressure
   </td>
  </tr>
  <tr>
   <td>DiastolicBP
   </td>
   <td>Patient’s diastolic blood pressure
   </td>
  </tr>
  <tr>
   <td>RespiratoryRate
   </td>
   <td>Patient’s respiratory rate
   </td>
  </tr>
  <tr>
   <td>Temperature
   </td>
   <td>Patient’s temperature in degrees fahrenheit
   </td>
  </tr>
  <tr>
   <td>PhysicianSpecialty
   </td>
   <td>Text description of physician’s specialty
   </td>
  </tr>
  <tr>
   <td>record_available_at
   </td>
   <td>Timestamp when the data warehouse record was created
   </td>
  </tr>
</table>


**Table: SpecialtyGroup**


<table>
  <tr>
   <td><strong>Column</strong>
   </td>
   <td><strong>Description</strong>
   </td>
  </tr>
  <tr>
   <td>SpecialtyGroupGuid
   </td>
   <td>Primary key
   </td>
  </tr>
  <tr>
   <td>PhysicianSpecialty
   </td>
   <td>Text description of physician’s specialty
   </td>
  </tr>
  <tr>
   <td>Specialty
   </td>
   <td>Lowest level grouping of physician’s specialty
   </td>
  </tr>
  <tr>
   <td>Group
   </td>
   <td>Highest level grouping of physician’s specialty
   </td>
  </tr>
</table>

## DBML Definitions

In DBML format, the table definitions are:

```sql
Table Allergy {
  AllergyGuid         guid [pk, not null, note: 'ENTITY: ALLERGYRECORD']
  PatientGuid         guid [not null, note: 'ENTITY: PATIENT']
  StartDate           date [not null]
  AllergyType         nvarchar [not null]
  CurrentRecord       bool [not null]
  ReactionName        nvarchar [not null]
  Severity            nvarchar [not null, note: 'SEMANTIC: Ordinal Categorical']
  record_available_at timestamp [not null]
  
  Note: 'Table Type: SCD'
}


Table Diagnosis {
  RowID                guid [pk, not null]
  DiagnosisGuid        guid [not null, note: 'ENTITY: DIAGNOSISRECORD']
  PatientGuid          guid [not null, note: 'ENTITY: PATIENT']
  ValidFrom            date [not null]
  ValidTo              date [null]
  ICD9Code             nvarchar [not null, note: 'ENTITY: ICD9CODE']
  DiagnosisDescription nvarchar [not null]
  Acute                bool [not null]
  record_available_at  timestamp [not null]
  closed_at            timestamp [not null]

  Note: 'Table Type: SCD'
}


Table ICD9Hierarchy {
  ICD9Code nvarchar [pk, not null, note: 'ENTITY: ICD9CODE']
  Group1   nvarchar [not null]
  Group2   nvarchar [not null]
  Group3   nvarchar [not null]
  
  Note: 'Table Type: Dimension'
}


Table LabObservation {
  LabObservationGuid  guid [pk, not null, note: 'ENTITY: IMMUNIZATIONRECORD']
  LabResultGuid       guid [not null, note: 'ENTITY: LABRESULT']
  HL7Text             nvarchar [not null, note: 'Warning: High cardinality']
  ObservationValue    float [not null]
  Units               nvarchar [null, note: 'ENTITY: MEASUREMENTUNIT, Warning: High cardinality']
  ReferenceRange      nvarchar [null, note: 'Warning: High cardinality']
  AbnormalFlags       nvarchar [null, note: 'SEMANTIC: Ordinal Categorical']
  IsAbnormalValue     bool [not null, note: 'Used for target column']
  record_available_at timestamp [not null]

  Note: 'Table Type: Item'
}


Table LabResult {
  LabResultGuid       guid [pk, not null, note: 'ENTITY: LABRESULT']
  PatientGuid         guid [not null]
  ReportDate          date [not null]
  record_available_at timestamp [not null]

  Note: 'Table Type: Event'
}


Table MedicalProduct {
  NdcCode               nvarchar [not null, note: 'Warning: high cardinality']  
  ProductType           nvarchar [not null, note: 'Semantic Required']
  ProprietaryName       nvarchar [not null]
  NonProprietaryName    nvarchar [not null]
  DosageForm            nvarchar [not null]
  Route                 nvarchar [null]
  Brand                 nvarchar [not null]
  ActiveSubstance       nvarchar [null]
  Strength              nvarchar [not null]
  PharmaceuticalClasses dict [not null]
  DeaSchedule           nvarchar [null]
  
  Note: 'Table type: Dimension'
}


Table Patient {
  RowID               guid [pk, not null]
  PatientGuid         guid [ not null, note: 'ENTITY: PATIENT']
  Gender              nvarchar [not null]
  DateOfBirth         date [not null]
  StateCode           nvarchar [not null, note: 'ENTITY: STATE']
  ValidFrom           date [not null]
  CurrentRecord       bool [not null]
  record_available_at timestamp [not null]

  Note: 'Table Type: SCD'
}


Table PatientSmokingStatus {
  PatientSmokingStatusGuid guid [PK, not null, note: 'ENTITY: PATIENTSMOKINGSTATUS']
  PatientGuid              guid [null, note: 'ENTITY: PATIENT']
  Description              nvarchar [not null, note: 'SEMANTICS: HHS HIT 45 CFR §170.302(g)']
  NISTcode                 int [not null, note: 'WARNING: Number does not align with ordinality']
  ValidFrom                date [null]
  CurrentRecord            bool [not null]
  record_available_at      timestamp [not null]

  Note: 'Table Type: SCD'
}


Table Prescription {
  PrescriptionGuid    guid [PK, not null, note: 'ENTITY: PRESCRIPTION']
  PatientGuid         guid [not null, note: 'ENTITY: PATIENT']
  PrescriptionDate    timestamp [not null, note: 'Replace with a date']
  Quantity            int [not null, note: 'Clean up and replace with integer']
  NumberOfRefills     int [not null, note: 'Clean up and replace with integer']
  RefillAsNeeded      bool [not null]
  GenericAllowed      bool [not null]
  NdcCode             nvarchar [not null, note: 'Warning: high cardinality']
  MedicationName      nvarchar [not null, note: 'Warning: high cardinality']
  MedicationStrength  nvarchar [not null, note: 'Warning: high cardinality']
  Schedule            nvarchar [null]
  record_available_at timestamp [not null]

  Note: 'Table Type: Event'
}


Table SpecialtyGroup {
  SpecialtyGroupGuid guid [PK, not null]
  PhysicianSpecialty nvarchar [not null]
  Specialty          nvarchar [not null]
  Group              nvarchar [not null]

  Note: 'Table Type: Item,  Content based on https://en.wikipedia.org/wiki/Medical_specialty#List_of_North_American_medical_specialties_and_others'
}


Table StateDetails {
  StateGuid         guid [PK, not null]
  StateCode         nvarchar [not null, note: 'ENTITY: STATE']
  StateName         nvarchar [not null]
  CentroidLatitude  float [not null]
  CentroidLongitude float [not null]
  Area              float [not null]
  CensusRegion      nvarchar [not null]
  HospitalCount     int [not null]
  HospitalBedCount  int [not null]
  BelowPovertyLevel float [not null]
  Aged65Plus        float [not null]
  TotalPopulation   float [not null]

  Note: 'Table Type: Dimension'
}

Table Visit {
  VisitGuid guid [PK, not null, note: 'ENTITY: TRANSCRIPT']
  PatientGuid guid [not null, note: 'ENTITY: PATIENT']
  VisitDate date [not null, note: 'Replace with Date']
  Height float [null]
  Weight float [null]
  BMI float [null]
  SystolicBP int [null]
  DiastolicBP int [null]
  RespiratoryRate int [null]
  Temperature float [null]
  PhysicianSpecialty nvarchar [not null, note: 'WARNING: High cardinality with rare labels']
  record_available_at timestamp [not null]
  
  Note: 'Table Type: Event'
}


Ref: LabResult.LabResultGuid > LabObservation.LabResultGuid

Ref: Patient.PatientGuid > LabResult.PatientGuid

Ref: Patient.PatientGuid >  PatientSmokingStatus.PatientGuid

Ref: Patient.PatientGuid > Visit.PatientGuid

Ref: Patient.PatientGuid > Diagnosis.PatientGuid

Ref: Diagnosis.ICD9Code > ICD9Hierarchy.ICD9Code

Ref: Patient.StateCode > StateDetails.StateCode

Ref: Patient.PatientGuid > Allergy.PatientGuid

Ref: Patient.PatientGuid> Prescription.PatientGuid

Ref: Prescription.NdcCode > MedicalProduct.NdcCode

Ref: Visit.PhysicianSpecialty > SpecialtyGroup.PhysicianSpecialty
```