# NHANES Analysis of Anemia Variables - Part 2

In this notebook, we will analyze the correlation among variables and profiles of individuals examined in NHANES survey. The focus is on variables related to anemia diagnosis. This work is based on the research of Patrícia Raia Nogueira Cavoto.


## Selecting the driver
The sentence below defines the driver for the database.

In [1]:
%defaultDatasource jdbc:h2:mem:db

# Profiles Network

* In this network each node is a profile and each edge indicates that two profiles are correlated in a certain intensity. The following queries prepare tables and views for the next notebook, which produces the final network.

## Starting Part 2

* This notebook is divided into three parts due to memory constraints. The queries below retrieve files produced in the previous part (`sql-network-02-nhanes-complete-p1`).

In [2]:
DROP TABLE IF EXISTS CorrelationMatrix;
CREATE TABLE CorrelationMatrix (
  SEQN VARCHAR(8),
  profile VARCHAR(18),
  LBXIRN_b SMALLINT DEFAULT 0,
  LBXTIB_b SMALLINT DEFAULT 0,
  LBXSLDSI_b SMALLINT DEFAULT 0,
  LBXWBCSI_b SMALLINT DEFAULT 0,
  LBXLYPCT_b SMALLINT DEFAULT 0,
  LBXMOPCT_b SMALLINT DEFAULT 0,
  LBXNEPCT_b SMALLINT DEFAULT 0,
  LBXEOPCT_b SMALLINT DEFAULT 0,
  LBXBAPCT_b SMALLINT DEFAULT 0,
  LBXRBCSI_b SMALLINT DEFAULT 0,
  LBXHGB_b SMALLINT DEFAULT 0,
  LBXHCT_b SMALLINT DEFAULT 0,
  LBXMCVSI_b SMALLINT DEFAULT 0,
  LBXMCHSI_b SMALLINT DEFAULT 0,
  LBXMC_b SMALLINT DEFAULT 0,
  LBXRDW_b SMALLINT DEFAULT 0,
  LBXPLTSI_b SMALLINT DEFAULT 0,
  LBXMPSI_b SMALLINT DEFAULT 0,
  PRIMARY KEY(SEQN)
) AS SELECT
  SEQN, profile, LBXIRN_b, LBXTIB_b, LBXSLDSI_b, LBXWBCSI_b, LBXLYPCT_b, LBXMOPCT_b, LBXNEPCT_b, LBXEOPCT_b, LBXBAPCT_b, LBXRBCSI_b, LBXHGB_b, LBXHCT_b, LBXMCVSI_b, LBXMCHSI_b, LBXMC_b, LBXRDW_b, LBXPLTSI_b, LBXMPSI_b
FROM CSVREAD('../data/nhanes2005-2006/correlation-matrix.csv');

DROP TABLE IF EXISTS VerticalSurveyD;
CREATE TABLE VerticalSurveyD (
  SEQN VARCHAR(8),
  variable VARCHAR(8),
  value DECIMAL(7,1),
  deviation DECIMAL(7,1),
  PRIMARY KEY(SEQN, variable)
) AS SELECT
  SEQN, variable, value, deviation
FROM CSVREAD('../data/nhanes2005-2006/vertical-survey-deviation.csv');

## Correlation analysis of profile pairs

* Each time that two persons share a variable out of the ranges, an edge is created between them.
* The edges are grouped by profile pairs. For each pair is computed the number of individuals/variables that cooccur.

In [3]:
DROP VIEW IF EXISTS ProfileCorrelation;

CREATE VIEW ProfileCorrelation AS
  SELECT CM1.SEQN AS SEQN1, CM1.profile AS profile1, CM2.SEQN AS SEQN2, CM2.profile AS profile2
  FROM VerticalSurveyD VS1, VerticalSurveyD VS2, CorrelationMatrix CM1, CorrelationMatrix CM2
  WHERE VS1.SEQN < VS2.SEQN AND VS1.variable = VS2.variable AND
        VS1.deviation > 0 AND VS2.deviation > 0 AND
        VS1.SEQN = CM1.SEQN AND VS2.SEQN = CM2.SEQN;
        
-- Gravação de pares de perfis com similaridade para rede
CALL CSVWRITE('../data/nhanes2005-2006/profile-pair-correlation.csv', 'SELECT * FROM ProfileCorrelation');

496235

## Finishing Part 2

* This notebook is divided into three parts due to memory constraints. The next part is the notebook `sql-network-02-nhanes-complete-p3`.