In [1]:
%defaultDatasource jdbc:h2:mem:db

# Valores de referência NHANES para survey de 2005-2006
## Importando faixas normais de valores indicadas em documentação do NHANES

* Para cada variável é indicado
  - gênero a que se aplica
  - faixa de idade (ageStart até ageEnd)


* A faixa é indicada na forma de valor mínimo e máximo considerado normais

In [2]:
DROP TABLE IF EXISTS ReferenceRanges;
CREATE TABLE ReferenceRanges (
  variable VARCHAR(8),
  gender VARCHAR(1),
  ageStart SMALLINT,
  ageEnd SMALLINT,
  min DECIMAL(7,1),
  max DECIMAL(7,1),
  PRIMARY KEY(variable,gender,ageStart,ageEnd)
) AS SELECT
  variable,gender,ageStart,ageEnd,min,max
FROM CSVREAD('../data/nhanes2005-2006/reference-ranges.csv');

SELECT DISTINCT variable FROM ReferenceRanges;
SELECT * FROM ReferenceRanges;

# Survey NHANES 2005-2006
## Importando dados de survey NHANES 2005-2006

* Estão sendo considerados apenas os campos relacionados a anemia, para aqueles indivíduos que têm valores para todos os campos

In [3]:
DROP TABLE IF EXISTS Survey;
CREATE TABLE Survey (
  SEQN VARCHAR(8),
  RIAGENDR VARCHAR(1),
  RIDAGEYR SMALLINT,
  LBXIRN DECIMAL(7,1),
  LBXTIB DECIMAL(7,1),
  LBXSLDSI DECIMAL(7,1),
  LBXWBCSI DECIMAL(7,1),
  LBXLYPCT DECIMAL(7,1),
  LBXMOPCT DECIMAL(7,1),
  LBXNEPCT DECIMAL(7,1),
  LBXEOPCT DECIMAL(7,1),
  LBXBAPCT DECIMAL(7,1),
  LBXRBCSI DECIMAL(7,1),
  LBXHGB DECIMAL(7,1),
  LBXHCT DECIMAL(7,1),
  LBXMCVSI DECIMAL(7,1),
  LBXMCHSI DECIMAL(7,1),
  LBXMC DECIMAL(7,1),
  LBXRDW DECIMAL(7,1),
  LBXPLTSI DECIMAL(7,1),
  LBXMPSI DECIMAL(7,1),
  PRIMARY KEY(SEQN)
) AS SELECT
  SEQN,RIAGENDR,RIDAGEYR,LBXIRN,LBXTIB,LBXSLDSI,LBXWBCSI,LBXLYPCT,LBXMOPCT,LBXNEPCT,LBXEOPCT,LBXBAPCT,LBXRBCSI,LBXHGB,LBXHCT,LBXMCVSI,LBXMCHSI,LBXMC,LBXRDW,LBXPLTSI,LBXMPSI
FROM CSVREAD('../data/nhanes2005-2006/combined-selected-variables.csv');

SELECT COUNT(*) FROM Survey;
SELECT * FROM Survey;

# Códigos e descrição das variáveis NHANES

In [4]:
DROP TABLE IF EXISTS VariableDescription;
CREATE TABLE VariableDescription (
  variable VARCHAR(8),
  acronym VARCHAR(8),
  name VARCHAR(50),
  unit VARCHAR(30),
  file VARCHAR(20),
  ranges VARCHAR(100),
  PRIMARY KEY(variable)
) AS SELECT
  variable,acronym,name,unit,file,ranges
FROM CSVREAD('../data/nhanes2005-2006/reference-ranges-variables.csv');

SELECT * FROM VariableDescription;

# Preparando matriz binária para definir perfil de pessoas

* Para cada variável essa tabela define uma coluna extra binária _b que é inicializada com 0 e receberá 1 se aquela variável estiver fora da faixa NHANES.

## Geração da tabela inicializada com 0

In [5]:
DROP TABLE IF EXISTS SurveyB;
CREATE TABLE SurveyB (
  SEQN VARCHAR(8),
  RIAGENDR VARCHAR(1),
  RIDAGEYR SMALLINT,
  LBXIRN DECIMAL(7,1),
  LBXIRN_b SMALLINT DEFAULT 0,
  LBXTIB DECIMAL(7,1),
  LBXTIB_b SMALLINT DEFAULT 0,
  LBXSLDSI DECIMAL(7,1),
  LBXSLDSI_b SMALLINT DEFAULT 0,
  LBXWBCSI DECIMAL(7,1),
  LBXWBCSI_b SMALLINT DEFAULT 0,
  LBXLYPCT DECIMAL(7,1),
  LBXLYPCT_b SMALLINT DEFAULT 0,
  LBXMOPCT DECIMAL(7,1),
  LBXMOPCT_b SMALLINT DEFAULT 0,
  LBXNEPCT DECIMAL(7,1),
  LBXNEPCT_b SMALLINT DEFAULT 0,
  LBXEOPCT DECIMAL(7,1),
  LBXEOPCT_b SMALLINT DEFAULT 0,
  LBXBAPCT DECIMAL(7,1),
  LBXBAPCT_b SMALLINT DEFAULT 0,
  LBXRBCSI DECIMAL(7,1),
  LBXRBCSI_b SMALLINT DEFAULT 0,
  LBXHGB DECIMAL(7,1),
  LBXHGB_b SMALLINT DEFAULT 0,
  LBXHCT DECIMAL(7,1),
  LBXHCT_b SMALLINT DEFAULT 0,
  LBXMCVSI DECIMAL(7,1),
  LBXMCVSI_b SMALLINT DEFAULT 0,
  LBXMCHSI DECIMAL(7,1),
  LBXMCHSI_b SMALLINT DEFAULT 0,
  LBXMC DECIMAL(7,1),
  LBXMC_b SMALLINT DEFAULT 0,
  LBXRDW DECIMAL(7,1),
  LBXRDW_b SMALLINT DEFAULT 0,
  LBXPLTSI DECIMAL(7,1),
  LBXPLTSI_b SMALLINT DEFAULT 0,
  LBXMPSI DECIMAL(7,1),
  LBXMPSI_b SMALLINT DEFAULT 0,
  PRIMARY KEY(SEQN)
) AS SELECT
  SEQN,RIAGENDR,RIDAGEYR,LBXIRN,0,LBXTIB,0,LBXSLDSI,0,LBXWBCSI,0,LBXLYPCT,0,LBXMOPCT,0,LBXNEPCT,0,LBXEOPCT,0,LBXBAPCT,0,LBXRBCSI,0,LBXHGB,0,LBXHCT,0,LBXMCVSI,0,LBXMCHSI,0,LBXMC,0,LBXRDW,0,LBXPLTSI,0,LBXMPSI,0
FROM CSVREAD('../data/nhanes2005-2006/combined-selected-variables.csv');

SELECT COUNT(*) FROM SurveyB;
SELECT * FROM SurveyB;

## Ensaio de verificação

* Ensaio de associação da variável Iron (LBXIRN) com os limites estabelecidos pela NHANES.

In [6]:
SELECT SB.LBXIRN, SB.LBXIRN_b, RR.gender, RR.ageStart, RR.ageEnd, RR.min, RR.max
FROM SurveyB SB, ReferenceRanges RR
WHERE RR.variable='LBXIRN' AND SB.RIAGENDR=RR.gender AND SB.RIDAGEYR>=RR.ageStart AND SB.RIDAGEYR<=RR.ageEnd;

## Construção da matriz

* Cada variável é comparada com os limites da NHANES e as colunas binárias _b são atualizadas.

In [7]:
-- Computing LBXIRN
UPDATE SurveyB SB
SET SB.LBXIRN_b = 1
WHERE EXISTS
(SELECT RRb.min
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXIRN' AND SB.RIAGENDR=RRb.gender AND SB.RIDAGEYR>=RRb.ageStart AND SB.RIDAGEYR<=RRb.ageEnd AND SB.LBXIRN<RRb.min);
UPDATE SurveyB SB
SET SB.LBXIRN_b = 1
WHERE SB.LBXIRN_b = 0 AND
EXISTS (SELECT RRb.max
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXIRN' AND SB.RIAGENDR=RRb.gender AND SB.RIDAGEYR>=RRb.ageStart AND SB.RIDAGEYR<=RRb.ageEnd AND SB.LBXIRN>RRb.max);

-- Computing LBXTIB
UPDATE SurveyB SB
SET SB.LBXTIB_b = 1
WHERE EXISTS
(SELECT RRb.min
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXTIB' AND SB.RIAGENDR=RRb.gender AND SB.RIDAGEYR>=RRb.ageStart AND SB.RIDAGEYR<=RRb.ageEnd AND SB.LBXTIB<RRb.min);
UPDATE SurveyB SB
SET SB.LBXTIB_b = 1
WHERE SB.LBXTIB_b = 0 AND
EXISTS (SELECT RRb.max
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXTIB' AND SB.RIAGENDR=RRb.gender AND SB.RIDAGEYR>=RRb.ageStart AND SB.RIDAGEYR<=RRb.ageEnd AND SB.LBXTIB>RRb.max);

-- Computing LBXSLDSI
UPDATE SurveyB SB
SET SB.LBXSLDSI_b = 1
WHERE EXISTS
(SELECT RRb.min
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXSLDSI' AND SB.RIAGENDR=RRb.gender AND SB.RIDAGEYR>=RRb.ageStart AND SB.RIDAGEYR<=RRb.ageEnd AND SB.LBXSLDSI<RRb.min);
UPDATE SurveyB SB
SET SB.LBXSLDSI_b = 1
WHERE SB.LBXSLDSI_b = 0 AND
EXISTS (SELECT RRb.max
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXSLDSI' AND SB.RIAGENDR=RRb.gender AND SB.RIDAGEYR>=RRb.ageStart AND SB.RIDAGEYR<=RRb.ageEnd AND SB.LBXSLDSI>RRb.max);

-- Computing LBXWBCSI
UPDATE SurveyB SB
SET SB.LBXWBCSI_b = 1
WHERE EXISTS
(SELECT RRb.min
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXWBCSI' AND SB.RIAGENDR=RRb.gender AND SB.RIDAGEYR>=RRb.ageStart AND SB.RIDAGEYR<=RRb.ageEnd AND SB.LBXWBCSI<RRb.min);
UPDATE SurveyB SB
SET SB.LBXWBCSI = 1
WHERE SB.LBXWBCSI_b = 0 AND
EXISTS (SELECT RRb.max
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXWBCSI' AND SB.RIAGENDR=RRb.gender AND SB.RIDAGEYR>=RRb.ageStart AND SB.RIDAGEYR<=RRb.ageEnd AND SB.LBXWBCSI>RRb.max);

-- Computing LBXLYPCT
UPDATE SurveyB SB
SET SB.LBXLYPCT_b = 1
WHERE EXISTS
(SELECT RRb.min
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXLYPCT' AND SB.RIAGENDR=RRb.gender AND SB.RIDAGEYR>=RRb.ageStart AND SB.RIDAGEYR<=RRb.ageEnd AND SB.LBXLYPCT<RRb.min);
UPDATE SurveyB SB
SET SB.LBXLYPCT =1
WHERE SB.LBXLYPCT_b = 0 AND
EXISTS (SELECT RRb.max
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXLYPCT' AND SB.RIAGENDR=RRb.gender AND SB.RIDAGEYR>=RRb.ageStart AND SB.RIDAGEYR<=RRb.ageEnd AND SB.LBXLYPCT>RRb.max);

-- Computing LBXMOPCT
UPDATE SurveyB SB
SET SB.LBXMOPCT_b = 1
WHERE EXISTS
(SELECT RRb.min
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXMOPCT' AND SB.RIAGENDR=RRb.gender AND SB.RIDAGEYR>=RRb.ageStart AND SB.RIDAGEYR<=RRb.ageEnd AND SB.LBXMOPCT<RRb.min);
UPDATE SurveyB SB
SET SB.LBXMOPCT = 1
WHERE SB.LBXMOPCT_b = 0 AND
EXISTS (SELECT RRb.max
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXMOPCT' AND SB.RIAGENDR=RRb.gender AND SB.RIDAGEYR>=RRb.ageStart AND SB.RIDAGEYR<=RRb.ageEnd AND SB.LBXMOPCT>RRb.max);

-- Computing LBXNEPCT
UPDATE SurveyB SB
SET SB.LBXNEPCT_b = 1
WHERE EXISTS
(SELECT RRb.min
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXNEPCT' AND SB.RIAGENDR=RRb.gender AND SB.RIDAGEYR>=RRb.ageStart AND SB.RIDAGEYR<=RRb.ageEnd AND SB.LBXNEPCT<RRb.min);
UPDATE SurveyB SB
SET SB.LBXNEPCT = 1
WHERE SB.LBXNEPCT_b = 0 AND
EXISTS (SELECT RRb.max
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXNEPCT' AND SB.RIAGENDR=RRb.gender AND SB.RIDAGEYR>=RRb.ageStart AND SB.RIDAGEYR<=RRb.ageEnd AND SB.LBXNEPCT>RRb.max);

-- Computing LBXEOPCT
UPDATE SurveyB SB
SET SB.LBXEOPCT_b = 1
WHERE EXISTS
(SELECT RRb.min
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXEOPCT' AND SB.RIAGENDR=RRb.gender AND SB.RIDAGEYR>=RRb.ageStart AND SB.RIDAGEYR<=RRb.ageEnd AND SB.LBXEOPCT<RRb.min);
UPDATE SurveyB SB
SET SB.LBXEOPCT = 1
WHERE SB.LBXEOPCT_b = 0 AND
EXISTS (SELECT RRb.max
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXEOPCT' AND SB.RIAGENDR=RRb.gender AND SB.RIDAGEYR>=RRb.ageStart AND SB.RIDAGEYR<=RRb.ageEnd AND SB.LBXEOPCT>RRb.max);

-- Computing LBXBAPCT
UPDATE SurveyB SB
SET SB.LBXBAPCT_b = 1
WHERE EXISTS
(SELECT RRb.min
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXBAPCT' AND SB.RIAGENDR=RRb.gender AND SB.RIDAGEYR>=RRb.ageStart AND SB.RIDAGEYR<=RRb.ageEnd AND SB.LBXBAPCT<RRb.min);
UPDATE SurveyB SB
SET SB.LBXBAPCT = 1
WHERE SB.LBXBAPCT_b = 0 AND
EXISTS (SELECT RRb.max
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXBAPCT' AND SB.RIAGENDR=RRb.gender AND SB.RIDAGEYR>=RRb.ageStart AND SB.RIDAGEYR<=RRb.ageEnd AND SB.LBXBAPCT>RRb.max);

-- Computing LBXRBCSI
UPDATE SurveyB SB
SET SB.LBXRBCSI_b = 1
WHERE EXISTS
(SELECT RRb.min
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXRBCSI' AND SB.RIAGENDR=RRb.gender AND SB.RIDAGEYR>=RRb.ageStart AND SB.RIDAGEYR<=RRb.ageEnd AND SB.LBXRBCSI<RRb.min);
UPDATE SurveyB SB
SET SB.LBXRBCSI = 1
WHERE SB.LBXRBCSI_b = 0 AND
EXISTS (SELECT RRb.max
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXRBCSI' AND SB.RIAGENDR=RRb.gender AND SB.RIDAGEYR>=RRb.ageStart AND SB.RIDAGEYR<=RRb.ageEnd AND SB.LBXRBCSI>RRb.max);

-- Computing LBXHGB
UPDATE SurveyB SB
SET SB.LBXHGB_b = 1
WHERE EXISTS
(SELECT RRb.min
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXHGB' AND SB.RIAGENDR=RRb.gender AND SB.RIDAGEYR>=RRb.ageStart AND SB.RIDAGEYR<=RRb.ageEnd AND SB.LBXHGB<RRb.min);
UPDATE SurveyB SB
SET SB.LBXHGB = 1
WHERE SB.LBXHGB_b = 0 AND
EXISTS (SELECT RRb.max
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXHGB' AND SB.RIAGENDR=RRb.gender AND SB.RIDAGEYR>=RRb.ageStart AND SB.RIDAGEYR<=RRb.ageEnd AND SB.LBXHGB>RRb.max);

-- Computing LBXHCT
UPDATE SurveyB SB
SET SB.LBXHCT_b = 1
WHERE EXISTS
(SELECT RRb.min
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXHCT' AND SB.RIAGENDR=RRb.gender AND SB.RIDAGEYR>=RRb.ageStart AND SB.RIDAGEYR<=RRb.ageEnd AND SB.LBXHCT<RRb.min);
UPDATE SurveyB SB
SET SB.LBXHCT = 1
WHERE SB.LBXHCT_b = 0 AND
EXISTS (SELECT RRb.max
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXHCT' AND SB.RIAGENDR=RRb.gender AND SB.RIDAGEYR>=RRb.ageStart AND SB.RIDAGEYR<=RRb.ageEnd AND SB.LBXHCT>RRb.max);

-- Computing LBXMCVSI
UPDATE SurveyB SB
SET SB.LBXMCVSI_b = 1
WHERE EXISTS
(SELECT RRb.min
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXMCVSI' AND SB.RIAGENDR=RRb.gender AND SB.RIDAGEYR>=RRb.ageStart AND SB.RIDAGEYR<=RRb.ageEnd AND SB.LBXMCVSI<RRb.min);
UPDATE SurveyB SB
SET SB.LBXMCVSI = 1
WHERE SB.LBXMCVSI_b = 0 AND
EXISTS (SELECT RRb.max
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXMCVSI' AND SB.RIAGENDR=RRb.gender AND SB.RIDAGEYR>=RRb.ageStart AND SB.RIDAGEYR<=RRb.ageEnd AND SB.LBXMCVSI>RRb.max);

-- Computing LBXMCHSI
UPDATE SurveyB SB
SET SB.LBXMCHSI_b = 1
WHERE EXISTS
(SELECT RRb.min
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXMCHSI' AND SB.RIAGENDR=RRb.gender AND SB.RIDAGEYR>=RRb.ageStart AND SB.RIDAGEYR<=RRb.ageEnd AND SB.LBXMCHSI<RRb.min);
UPDATE SurveyB SB
SET SB.LBXMCHSI = 1
WHERE SB.LBXMCHSI_b = 0 AND
EXISTS (SELECT RRb.max
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXMCHSI' AND SB.RIAGENDR=RRb.gender AND SB.RIDAGEYR>=RRb.ageStart AND SB.RIDAGEYR<=RRb.ageEnd AND SB.LBXMCHSI>RRb.max);

-- Computing LBXMC
UPDATE SurveyB SB
SET SB.LBXMC_b = 1
WHERE EXISTS
(SELECT RRb.min
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXMC' AND SB.RIAGENDR=RRb.gender AND SB.RIDAGEYR>=RRb.ageStart AND SB.RIDAGEYR<=RRb.ageEnd AND SB.LBXMC<RRb.min);
UPDATE SurveyB SB
SET SB.LBXMC = 1
WHERE SB.LBXMC_b = 0 AND
EXISTS (SELECT RRb.max
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXMC' AND SB.RIAGENDR=RRb.gender AND SB.RIDAGEYR>=RRb.ageStart AND SB.RIDAGEYR<=RRb.ageEnd AND SB.LBXMC>RRb.max);

-- Computing LBXRDW
UPDATE SurveyB SB
SET SB.LBXRDW_b = 1
WHERE EXISTS
(SELECT RRb.min
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXRDW' AND SB.RIAGENDR=RRb.gender AND SB.RIDAGEYR>=RRb.ageStart AND SB.RIDAGEYR<=RRb.ageEnd AND SB.LBXRDW<RRb.min);
UPDATE SurveyB SB
SET SB.LBXRDW = 1
WHERE SB.LBXRDW_b = 0 AND
EXISTS (SELECT RRb.max
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXRDW' AND SB.RIAGENDR=RRb.gender AND SB.RIDAGEYR>=RRb.ageStart AND SB.RIDAGEYR<=RRb.ageEnd AND SB.LBXRDW>RRb.max);

-- Computing LBXPLTSI
UPDATE SurveyB SB
SET SB.LBXPLTSI_b = 1
WHERE EXISTS
(SELECT RRb.min
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXPLTSI' AND SB.RIAGENDR=RRb.gender AND SB.RIDAGEYR>=RRb.ageStart AND SB.RIDAGEYR<=RRb.ageEnd AND SB.LBXPLTSI<RRb.min);
UPDATE SurveyB SB
SET SB.LBXPLTSI = 1
WHERE SB.LBXPLTSI_b = 0 AND
EXISTS (SELECT RRb.max
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXPLTSI' AND SB.RIAGENDR=RRb.gender AND SB.RIDAGEYR>=RRb.ageStart AND SB.RIDAGEYR<=RRb.ageEnd AND SB.LBXPLTSI>RRb.max);

-- Computing LBXMPSI
UPDATE SurveyB SB
SET SB.LBXMPSI_b = 1
WHERE EXISTS
(SELECT RRb.min
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXMPSI' AND SB.RIAGENDR=RRb.gender AND SB.RIDAGEYR>=RRb.ageStart AND SB.RIDAGEYR<=RRb.ageEnd AND SB.LBXMPSI<RRb.min);
UPDATE SurveyB SB
SET SB.LBXMPSI = 1
WHERE SB.LBXMPSI_b = 0 AND
EXISTS (SELECT RRb.max
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXMPSI' AND SB.RIAGENDR=RRb.gender AND SB.RIDAGEYR>=RRb.ageStart AND SB.RIDAGEYR<=RRb.ageEnd AND SB.LBXMPSI>RRb.max);

## Matriz final

* Construção da visão da matriz final que tem a indentificação da pessoa, a matriz binária _b e um perfil produzida pela concatenação da linha da matriz binária.
* O perfil representa de forma binária o que está anormal (fora dos limites) na pessoa.
* Só vão para a matriz final as pessoas que possuem algum indicador anormal.

In [8]:
DROP VIEW IF EXISTS DeviationProfiles;
DROP VIEW IF EXISTS CorrelationMatrix;

CREATE VIEW CorrelationMatrix AS
SELECT DISTINCT SB.SEQN, 
  CONCAT(SB.LBXIRN_b, SB.LBXTIB_b, SB.LBXSLDSI_b, SB.LBXWBCSI_b, SB.LBXLYPCT_b, SB.LBXMOPCT_b, SB.LBXNEPCT_b, SB.LBXEOPCT_b, SB.LBXBAPCT_b, SB.LBXRBCSI_b, SB.LBXHGB_b, SB.LBXHCT_b, SB.LBXMCVSI_b, SB.LBXMCHSI_b, SB.LBXMC_b, SB.LBXRDW_b, SB.LBXPLTSI_b, SB.LBXMPSI_b) AS profile,
  SB.LBXIRN_b, SB.LBXTIB_b, SB.LBXSLDSI_b, SB.LBXWBCSI_b, SB.LBXLYPCT_b, SB.LBXMOPCT_b, SB.LBXNEPCT_b, SB.LBXEOPCT_b, SB.LBXBAPCT_b, SB.LBXRBCSI_b, SB.LBXHGB_b, SB.LBXHCT_b, SB.LBXMCVSI_b, SB.LBXMCHSI_b, SB.LBXMC_b, SB.LBXRDW_b, SB.LBXPLTSI_b, SB.LBXMPSI_b
FROM SurveyB SB, ReferenceRanges RR
WHERE SB.RIAGENDR=RR.gender AND SB.RIDAGEYR>=RR.ageStart AND SB.RIDAGEYR<=RR.ageEnd AND
(LBXIRN_b>0 OR LBXTIB_b>0 OR LBXSLDSI_b>0 OR LBXWBCSI_b>0 OR LBXLYPCT_b>0 OR LBXMOPCT_b>0 OR LBXNEPCT_b>0 OR LBXEOPCT_b>0 OR LBXBAPCT_b>0 OR LBXRBCSI_b>0 OR LBXHGB_b>0 OR LBXHCT_b>0 OR LBXMCVSI_b>0 OR LBXMCHSI_b>0 OR LBXMC_b>0 OR LBXRDW_b>0 OR LBXPLTSI_b>0 OR LBXMPSI_b>0);

SELECT COUNT(*) FROM CorrelationMatrix;
SELECT * FROM CorrelationMatrix;

## Gravação da matriz binária

* Gravação da matriz binária em arquivo CSV.
* É possível fazer download do arquivo.

In [9]:
CALL CSVWRITE('../data/nhanes2005-2006/correlation-matrix.csv', 'SELECT * FROM CorrelationMatrix');

1418

# Rede de perfis

* As pessoas serão aqui associadas a partir de seus perfis binários produzindo uma rede de perfis e suas correlações.

## Agrupamento de perfis

* Os perfis são agrupados conforme o padrão binário e é registrado o número de pessoas com aquele perfil.

In [10]:
DROP VIEW IF EXISTS DeviationProfiles;

CREATE VIEW DeviationProfiles AS
SELECT CM.profile, COUNT(*) AS individuals
FROM CorrelationMatrix CM
GROUP BY CM.profile;

SELECT SUM(individuals) FROM DeviationProfiles;
SELECT * FROM DeviationProfiles;

## Gravação de perfis

* Os perfis e respectivo número de pessoas associadas é gravado em CSV.

In [11]:
CALL CSVWRITE('../data/nhanes2005-2006/deviation-profiles.csv', 'SELECT DP.profile AS id, DP.individuals AS weight FROM DeviationProfiles DP');

181

# Matriz com intensidade de desvio

* Esta segunda matriz registra não somente que variáveis da pessoa ultrapassam os limites, mas quanto elas ultrapassam.

## Geração de nova matriz de base inicializada com 0

In [12]:
DROP TABLE IF EXISTS SurveyD;
CREATE TABLE SurveyD (
  SEQN VARCHAR(8),
  RIAGENDR VARCHAR(1),
  RIDAGEYR SMALLINT,
  LBXIRN DECIMAL(7,1),
  LBXIRN_d DECIMAL(7,1) DEFAULT 0,
  LBXTIB DECIMAL(7,1),
  LBXTIB_d DECIMAL(7,1) DEFAULT 0,
  LBXSLDSI DECIMAL(7,1),
  LBXSLDSI_d DECIMAL(7,1) DEFAULT 0,
  LBXWBCSI DECIMAL(7,1),
  LBXWBCSI_d DECIMAL(7,1) DEFAULT 0,
  LBXLYPCT DECIMAL(7,1),
  LBXLYPCT_d DECIMAL(7,1) DEFAULT 0,
  LBXMOPCT DECIMAL(7,1),
  LBXMOPCT_d DECIMAL(7,1) DEFAULT 0,
  LBXNEPCT DECIMAL(7,1),
  LBXNEPCT_d DECIMAL(7,1) DEFAULT 0,
  LBXEOPCT DECIMAL(7,1),
  LBXEOPCT_d DECIMAL(7,1) DEFAULT 0,
  LBXBAPCT DECIMAL(7,1),
  LBXBAPCT_d DECIMAL(7,1) DEFAULT 0,
  LBXRBCSI DECIMAL(7,1),
  LBXRBCSI_d DECIMAL(7,1) DEFAULT 0,
  LBXHGB DECIMAL(7,1),
  LBXHGB_d DECIMAL(7,1) DEFAULT 0,
  LBXHCT DECIMAL(7,1),
  LBXHCT_d DECIMAL(7,1) DEFAULT 0,
  LBXMCVSI DECIMAL(7,1),
  LBXMCVSI_d DECIMAL(7,1) DEFAULT 0,
  LBXMCHSI DECIMAL(7,1),
  LBXMCHSI_d DECIMAL(7,1) DEFAULT 0,
  LBXMC DECIMAL(7,1),
  LBXMC_d DECIMAL(7,1) DEFAULT 0,
  LBXRDW DECIMAL(7,1),
  LBXRDW_d DECIMAL(7,1) DEFAULT 0,
  LBXPLTSI DECIMAL(7,1),
  LBXPLTSI_d DECIMAL(7,1) DEFAULT 0,
  LBXMPSI DECIMAL(7,1),
  LBXMPSI_d DECIMAL(7,1) DEFAULT 0,
  PRIMARY KEY(SEQN)
) AS SELECT
  SEQN,RIAGENDR,RIDAGEYR,LBXIRN,0,LBXTIB,0,LBXSLDSI,0,LBXWBCSI,0,LBXLYPCT,0,LBXMOPCT,0,LBXNEPCT,0,LBXEOPCT,0,LBXBAPCT,0,LBXRBCSI,0,LBXHGB,0,LBXHCT,0,LBXMCVSI,0,LBXMCHSI,0,LBXMC,0,LBXRDW,0,LBXPLTSI,0,LBXMPSI,0
FROM CSVREAD('../data/nhanes2005-2006/combined-selected-variables.csv');

SELECT * FROM SurveyD;

## Cálculo do desvio do limite por pessoa e variável

In [13]:
-- Computing LBXIRN
UPDATE SurveyD SD
SET SD.LBXIRN_d =
(SELECT RRa.min-SD.LBXIRN
 FROM ReferenceRanges RRa
 WHERE RRa.variable='LBXIRN' AND SD.RIAGENDR=RRa.gender AND SD.RIDAGEYR>=RRa.ageStart AND SD.RIDAGEYR<=RRa.ageEnd AND SD.LBXIRN<RRa.min)
WHERE EXISTS
(SELECT RRb.min
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXIRN' AND SD.RIAGENDR=RRb.gender AND SD.RIDAGEYR>=RRb.ageStart AND SD.RIDAGEYR<=RRb.ageEnd AND SD.LBXIRN<RRb.min);
UPDATE SurveyD SD
SET SD.LBXIRN_d =
(SELECT SD.LBXIRN-RRa.max
 FROM ReferenceRanges RRa
 WHERE RRa.variable='LBXIRN' AND SD.RIAGENDR=RRa.gender AND SD.RIDAGEYR>=RRa.ageStart AND SD.RIDAGEYR<=RRa.ageEnd AND SD.LBXIRN>RRa.max)
WHERE SD.LBXIRN_d = 0 AND
EXISTS (SELECT RRb.max
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXIRN' AND SD.RIAGENDR=RRb.gender AND SD.RIDAGEYR>=RRb.ageStart AND SD.RIDAGEYR<=RRb.ageEnd AND SD.LBXIRN>RRb.max);

-- Computing LBXTIB
UPDATE SurveyD SD
SET SD.LBXTIB_d =
(SELECT RRa.min-SD.LBXTIB
 FROM ReferenceRanges RRa
 WHERE RRa.variable='LBXTIB' AND SD.RIAGENDR=RRa.gender AND SD.RIDAGEYR>=RRa.ageStart AND SD.RIDAGEYR<=RRa.ageEnd AND SD.LBXTIB<RRa.min)
WHERE EXISTS
(SELECT RRb.min
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXTIB' AND SD.RIAGENDR=RRb.gender AND SD.RIDAGEYR>=RRb.ageStart AND SD.RIDAGEYR<=RRb.ageEnd AND SD.LBXTIB<RRb.min);
UPDATE SurveyD SD
SET SD.LBXTIB_d =
(SELECT SD.LBXTIB-RRa.max
 FROM ReferenceRanges RRa
 WHERE RRa.variable='LBXTIB' AND SD.RIAGENDR=RRa.gender AND SD.RIDAGEYR>=RRa.ageStart AND SD.RIDAGEYR<=RRa.ageEnd AND SD.LBXTIB>RRa.max)
WHERE SD.LBXTIB_d = 0 AND
EXISTS (SELECT RRb.max
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXTIB' AND SD.RIAGENDR=RRb.gender AND SD.RIDAGEYR>=RRb.ageStart AND SD.RIDAGEYR<=RRb.ageEnd AND SD.LBXTIB>RRb.max);

-- Computing LBXSLDSI
UPDATE SurveyD SD
SET SD.LBXSLDSI_d =
(SELECT RRa.min-SD.LBXSLDSI
 FROM ReferenceRanges RRa
 WHERE RRa.variable='LBXSLDSI' AND SD.RIAGENDR=RRa.gender AND SD.RIDAGEYR>=RRa.ageStart AND SD.RIDAGEYR<=RRa.ageEnd AND SD.LBXSLDSI<RRa.min)
WHERE EXISTS
(SELECT RRb.min
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXSLDSI' AND SD.RIAGENDR=RRb.gender AND SD.RIDAGEYR>=RRb.ageStart AND SD.RIDAGEYR<=RRb.ageEnd AND SD.LBXSLDSI<RRb.min);
UPDATE SurveyD SD
SET SD.LBXSLDSI_d =
(SELECT SD.LBXSLDSI-RRa.max
 FROM ReferenceRanges RRa
 WHERE RRa.variable='LBXSLDSI' AND SD.RIAGENDR=RRa.gender AND SD.RIDAGEYR>=RRa.ageStart AND SD.RIDAGEYR<=RRa.ageEnd AND SD.LBXSLDSI>RRa.max)
WHERE SD.LBXSLDSI_d = 0 AND
EXISTS (SELECT RRb.max
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXSLDSI' AND SD.RIAGENDR=RRb.gender AND SD.RIDAGEYR>=RRb.ageStart AND SD.RIDAGEYR<=RRb.ageEnd AND SD.LBXSLDSI>RRb.max);

-- Computing LBXWBCSI
UPDATE SurveyD SD
SET SD.LBXWBCSI_d =
(SELECT RRa.min-SD.LBXWBCSI
 FROM ReferenceRanges RRa
 WHERE RRa.variable='LBXWBCSI' AND SD.RIAGENDR=RRa.gender AND SD.RIDAGEYR>=RRa.ageStart AND SD.RIDAGEYR<=RRa.ageEnd AND SD.LBXWBCSI<RRa.min)
WHERE EXISTS
(SELECT RRb.min
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXWBCSI' AND SD.RIAGENDR=RRb.gender AND SD.RIDAGEYR>=RRb.ageStart AND SD.RIDAGEYR<=RRb.ageEnd AND SD.LBXWBCSI<RRb.min);
UPDATE SurveyD SD
SET SD.LBXWBCSI_d =
(SELECT SD.LBXWBCSI-RRa.max
 FROM ReferenceRanges RRa
 WHERE RRa.variable='LBXWBCSI' AND SD.RIAGENDR=RRa.gender AND SD.RIDAGEYR>=RRa.ageStart AND SD.RIDAGEYR<=RRa.ageEnd AND SD.LBXWBCSI>RRa.max)
WHERE SD.LBXWBCSI_d = 0 AND
EXISTS (SELECT RRb.max
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXWBCSI' AND SD.RIAGENDR=RRb.gender AND SD.RIDAGEYR>=RRb.ageStart AND SD.RIDAGEYR<=RRb.ageEnd AND SD.LBXWBCSI>RRb.max);

-- Computing LBXLYPCT
UPDATE SurveyD SD
SET SD.LBXLYPCT_d =
(SELECT RRa.min-SD.LBXLYPCT
 FROM ReferenceRanges RRa
 WHERE RRa.variable='LBXLYPCT' AND SD.RIAGENDR=RRa.gender AND SD.RIDAGEYR>=RRa.ageStart AND SD.RIDAGEYR<=RRa.ageEnd AND SD.LBXLYPCT<RRa.min)
WHERE EXISTS
(SELECT RRb.min
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXLYPCT' AND SD.RIAGENDR=RRb.gender AND SD.RIDAGEYR>=RRb.ageStart AND SD.RIDAGEYR<=RRb.ageEnd AND SD.LBXLYPCT<RRb.min);
UPDATE SurveyD SD
SET SD.LBXLYPCT_d =
(SELECT SD.LBXLYPCT-RRa.max
 FROM ReferenceRanges RRa
 WHERE RRa.variable='LBXLYPCT' AND SD.RIAGENDR=RRa.gender AND SD.RIDAGEYR>=RRa.ageStart AND SD.RIDAGEYR<=RRa.ageEnd AND SD.LBXLYPCT>RRa.max)
WHERE SD.LBXLYPCT_d = 0 AND
EXISTS (SELECT RRb.max
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXLYPCT' AND SD.RIAGENDR=RRb.gender AND SD.RIDAGEYR>=RRb.ageStart AND SD.RIDAGEYR<=RRb.ageEnd AND SD.LBXLYPCT>RRb.max);

-- Computing LBXMOPCT
UPDATE SurveyD SD
SET SD.LBXMOPCT_d =
(SELECT RRa.min-SD.LBXMOPCT
 FROM ReferenceRanges RRa
 WHERE RRa.variable='LBXMOPCT' AND SD.RIAGENDR=RRa.gender AND SD.RIDAGEYR>=RRa.ageStart AND SD.RIDAGEYR<=RRa.ageEnd AND SD.LBXMOPCT<RRa.min)
WHERE EXISTS
(SELECT RRb.min
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXMOPCT' AND SD.RIAGENDR=RRb.gender AND SD.RIDAGEYR>=RRb.ageStart AND SD.RIDAGEYR<=RRb.ageEnd AND SD.LBXMOPCT<RRb.min);
UPDATE SurveyD SD
SET SD.LBXMOPCT_d =
(SELECT SD.LBXMOPCT-RRa.max
 FROM ReferenceRanges RRa
 WHERE RRa.variable='LBXMOPCT' AND SD.RIAGENDR=RRa.gender AND SD.RIDAGEYR>=RRa.ageStart AND SD.RIDAGEYR<=RRa.ageEnd AND SD.LBXMOPCT>RRa.max)
WHERE SD.LBXMOPCT_d = 0 AND
EXISTS (SELECT RRb.max
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXMOPCT' AND SD.RIAGENDR=RRb.gender AND SD.RIDAGEYR>=RRb.ageStart AND SD.RIDAGEYR<=RRb.ageEnd AND SD.LBXMOPCT>RRb.max);

-- Computing LBXNEPCT
UPDATE SurveyD SD
SET SD.LBXNEPCT_d =
(SELECT RRa.min-SD.LBXNEPCT
 FROM ReferenceRanges RRa
 WHERE RRa.variable='LBXNEPCT' AND SD.RIAGENDR=RRa.gender AND SD.RIDAGEYR>=RRa.ageStart AND SD.RIDAGEYR<=RRa.ageEnd AND SD.LBXNEPCT<RRa.min)
WHERE EXISTS
(SELECT RRb.min
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXNEPCT' AND SD.RIAGENDR=RRb.gender AND SD.RIDAGEYR>=RRb.ageStart AND SD.RIDAGEYR<=RRb.ageEnd AND SD.LBXNEPCT<RRb.min);
UPDATE SurveyD SD
SET SD.LBXNEPCT_d =
(SELECT SD.LBXNEPCT-RRa.max
 FROM ReferenceRanges RRa
 WHERE RRa.variable='LBXNEPCT' AND SD.RIAGENDR=RRa.gender AND SD.RIDAGEYR>=RRa.ageStart AND SD.RIDAGEYR<=RRa.ageEnd AND SD.LBXNEPCT>RRa.max)
WHERE SD.LBXNEPCT_d = 0 AND
EXISTS (SELECT RRb.max
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXNEPCT' AND SD.RIAGENDR=RRb.gender AND SD.RIDAGEYR>=RRb.ageStart AND SD.RIDAGEYR<=RRb.ageEnd AND SD.LBXNEPCT>RRb.max);

-- Computing LBXEOPCT
UPDATE SurveyD SD
SET SD.LBXEOPCT_d =
(SELECT RRa.min-SD.LBXEOPCT
 FROM ReferenceRanges RRa
 WHERE RRa.variable='LBXEOPCT' AND SD.RIAGENDR=RRa.gender AND SD.RIDAGEYR>=RRa.ageStart AND SD.RIDAGEYR<=RRa.ageEnd AND SD.LBXEOPCT<RRa.min)
WHERE EXISTS
(SELECT RRb.min
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXEOPCT' AND SD.RIAGENDR=RRb.gender AND SD.RIDAGEYR>=RRb.ageStart AND SD.RIDAGEYR<=RRb.ageEnd AND SD.LBXEOPCT<RRb.min);
UPDATE SurveyD SD
SET SD.LBXEOPCT_d =
(SELECT SD.LBXEOPCT-RRa.max
 FROM ReferenceRanges RRa
 WHERE RRa.variable='LBXEOPCT' AND SD.RIAGENDR=RRa.gender AND SD.RIDAGEYR>=RRa.ageStart AND SD.RIDAGEYR<=RRa.ageEnd AND SD.LBXEOPCT>RRa.max)
WHERE SD.LBXEOPCT_d = 0 AND
EXISTS (SELECT RRb.max
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXEOPCT' AND SD.RIAGENDR=RRb.gender AND SD.RIDAGEYR>=RRb.ageStart AND SD.RIDAGEYR<=RRb.ageEnd AND SD.LBXEOPCT>RRb.max);

-- Computing LBXBAPCT
UPDATE SurveyD SD
SET SD.LBXBAPCT_d =
(SELECT RRa.min-SD.LBXBAPCT
 FROM ReferenceRanges RRa
 WHERE RRa.variable='LBXBAPCT' AND SD.RIAGENDR=RRa.gender AND SD.RIDAGEYR>=RRa.ageStart AND SD.RIDAGEYR<=RRa.ageEnd AND SD.LBXBAPCT<RRa.min)
WHERE EXISTS
(SELECT RRb.min
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXBAPCT' AND SD.RIAGENDR=RRb.gender AND SD.RIDAGEYR>=RRb.ageStart AND SD.RIDAGEYR<=RRb.ageEnd AND SD.LBXBAPCT<RRb.min);
UPDATE SurveyD SD
SET SD.LBXBAPCT_d =
(SELECT SD.LBXBAPCT-RRa.max
 FROM ReferenceRanges RRa
 WHERE RRa.variable='LBXBAPCT' AND SD.RIAGENDR=RRa.gender AND SD.RIDAGEYR>=RRa.ageStart AND SD.RIDAGEYR<=RRa.ageEnd AND SD.LBXBAPCT>RRa.max)
WHERE SD.LBXBAPCT_d = 0 AND
EXISTS (SELECT RRb.max
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXBAPCT' AND SD.RIAGENDR=RRb.gender AND SD.RIDAGEYR>=RRb.ageStart AND SD.RIDAGEYR<=RRb.ageEnd AND SD.LBXBAPCT>RRb.max);

-- Computing LBXRBCSI
UPDATE SurveyD SD
SET SD.LBXRBCSI_d =
(SELECT RRa.min-SD.LBXRBCSI
 FROM ReferenceRanges RRa
 WHERE RRa.variable='LBXRBCSI' AND SD.RIAGENDR=RRa.gender AND SD.RIDAGEYR>=RRa.ageStart AND SD.RIDAGEYR<=RRa.ageEnd AND SD.LBXRBCSI<RRa.min)
WHERE EXISTS
(SELECT RRb.min
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXRBCSI' AND SD.RIAGENDR=RRb.gender AND SD.RIDAGEYR>=RRb.ageStart AND SD.RIDAGEYR<=RRb.ageEnd AND SD.LBXRBCSI<RRb.min);
UPDATE SurveyD SD
SET SD.LBXRBCSI_d =
(SELECT SD.LBXRBCSI-RRa.max
 FROM ReferenceRanges RRa
 WHERE RRa.variable='LBXRBCSI' AND SD.RIAGENDR=RRa.gender AND SD.RIDAGEYR>=RRa.ageStart AND SD.RIDAGEYR<=RRa.ageEnd AND SD.LBXRBCSI>RRa.max)
WHERE SD.LBXRBCSI_d = 0 AND
EXISTS (SELECT RRb.max
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXRBCSI' AND SD.RIAGENDR=RRb.gender AND SD.RIDAGEYR>=RRb.ageStart AND SD.RIDAGEYR<=RRb.ageEnd AND SD.LBXRBCSI>RRb.max);

-- Computing LBXHGB
UPDATE SurveyD SD
SET SD.LBXHGB_d =
(SELECT RRa.min-SD.LBXHGB
 FROM ReferenceRanges RRa
 WHERE RRa.variable='LBXHGB' AND SD.RIAGENDR=RRa.gender AND SD.RIDAGEYR>=RRa.ageStart AND SD.RIDAGEYR<=RRa.ageEnd AND SD.LBXHGB<RRa.min)
WHERE EXISTS
(SELECT RRb.min
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXHGB' AND SD.RIAGENDR=RRb.gender AND SD.RIDAGEYR>=RRb.ageStart AND SD.RIDAGEYR<=RRb.ageEnd AND SD.LBXHGB<RRb.min);
UPDATE SurveyD SD
SET SD.LBXHGB_d =
(SELECT SD.LBXHGB-RRa.max
 FROM ReferenceRanges RRa
 WHERE RRa.variable='LBXHGB' AND SD.RIAGENDR=RRa.gender AND SD.RIDAGEYR>=RRa.ageStart AND SD.RIDAGEYR<=RRa.ageEnd AND SD.LBXHGB>RRa.max)
WHERE SD.LBXHGB_d = 0 AND
EXISTS (SELECT RRb.max
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXHGB' AND SD.RIAGENDR=RRb.gender AND SD.RIDAGEYR>=RRb.ageStart AND SD.RIDAGEYR<=RRb.ageEnd AND SD.LBXHGB>RRb.max);

-- Computing LBXHCT
UPDATE SurveyD SD
SET SD.LBXHCT_d =
(SELECT RRa.min-SD.LBXHCT
 FROM ReferenceRanges RRa
 WHERE RRa.variable='LBXHCT' AND SD.RIAGENDR=RRa.gender AND SD.RIDAGEYR>=RRa.ageStart AND SD.RIDAGEYR<=RRa.ageEnd AND SD.LBXHCT<RRa.min)
WHERE EXISTS
(SELECT RRb.min
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXHCT' AND SD.RIAGENDR=RRb.gender AND SD.RIDAGEYR>=RRb.ageStart AND SD.RIDAGEYR<=RRb.ageEnd AND SD.LBXHCT<RRb.min);
UPDATE SurveyD SD
SET SD.LBXHCT_d =
(SELECT SD.LBXHCT-RRa.max
 FROM ReferenceRanges RRa
 WHERE RRa.variable='LBXHCT' AND SD.RIAGENDR=RRa.gender AND SD.RIDAGEYR>=RRa.ageStart AND SD.RIDAGEYR<=RRa.ageEnd AND SD.LBXHCT>RRa.max)
WHERE SD.LBXHCT_d = 0 AND
EXISTS (SELECT RRb.max
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXHCT' AND SD.RIAGENDR=RRb.gender AND SD.RIDAGEYR>=RRb.ageStart AND SD.RIDAGEYR<=RRb.ageEnd AND SD.LBXHCT>RRb.max);

-- Computing LBXMCVSI
UPDATE SurveyD SD
SET SD.LBXMCVSI_d =
(SELECT RRa.min-SD.LBXMCVSI
 FROM ReferenceRanges RRa
 WHERE RRa.variable='LBXMCVSI' AND SD.RIAGENDR=RRa.gender AND SD.RIDAGEYR>=RRa.ageStart AND SD.RIDAGEYR<=RRa.ageEnd AND SD.LBXMCVSI<RRa.min)
WHERE EXISTS
(SELECT RRb.min
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXMCVSI' AND SD.RIAGENDR=RRb.gender AND SD.RIDAGEYR>=RRb.ageStart AND SD.RIDAGEYR<=RRb.ageEnd AND SD.LBXMCVSI<RRb.min);
UPDATE SurveyD SD
SET SD.LBXMCVSI_d =
(SELECT SD.LBXMCVSI-RRa.max
 FROM ReferenceRanges RRa
 WHERE RRa.variable='LBXMCVSI' AND SD.RIAGENDR=RRa.gender AND SD.RIDAGEYR>=RRa.ageStart AND SD.RIDAGEYR<=RRa.ageEnd AND SD.LBXMCVSI>RRa.max)
WHERE SD.LBXMCVSI_d = 0 AND
EXISTS (SELECT RRb.max
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXMCVSI' AND SD.RIAGENDR=RRb.gender AND SD.RIDAGEYR>=RRb.ageStart AND SD.RIDAGEYR<=RRb.ageEnd AND SD.LBXMCVSI>RRb.max);

-- Computing LBXMCHSI
UPDATE SurveyD SD
SET SD.LBXMCHSI_d =
(SELECT RRa.min-SD.LBXMCHSI
 FROM ReferenceRanges RRa
 WHERE RRa.variable='LBXMCHSI' AND SD.RIAGENDR=RRa.gender AND SD.RIDAGEYR>=RRa.ageStart AND SD.RIDAGEYR<=RRa.ageEnd AND SD.LBXMCHSI<RRa.min)
WHERE EXISTS
(SELECT RRb.min
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXMCHSI' AND SD.RIAGENDR=RRb.gender AND SD.RIDAGEYR>=RRb.ageStart AND SD.RIDAGEYR<=RRb.ageEnd AND SD.LBXMCHSI<RRb.min);
UPDATE SurveyD SD
SET SD.LBXMCHSI_d =
(SELECT SD.LBXMCHSI-RRa.max
 FROM ReferenceRanges RRa
 WHERE RRa.variable='LBXMCHSI' AND SD.RIAGENDR=RRa.gender AND SD.RIDAGEYR>=RRa.ageStart AND SD.RIDAGEYR<=RRa.ageEnd AND SD.LBXMCHSI>RRa.max)
WHERE SD.LBXMCHSI_d = 0 AND
EXISTS (SELECT RRb.max
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXMCHSI' AND SD.RIAGENDR=RRb.gender AND SD.RIDAGEYR>=RRb.ageStart AND SD.RIDAGEYR<=RRb.ageEnd AND SD.LBXMCHSI>RRb.max);

-- Computing LBXMC
UPDATE SurveyD SD
SET SD.LBXMC_d =
(SELECT RRa.min-SD.LBXMC
 FROM ReferenceRanges RRa
 WHERE RRa.variable='LBXMC' AND SD.RIAGENDR=RRa.gender AND SD.RIDAGEYR>=RRa.ageStart AND SD.RIDAGEYR<=RRa.ageEnd AND SD.LBXMC<RRa.min)
WHERE EXISTS
(SELECT RRb.min
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXMC' AND SD.RIAGENDR=RRb.gender AND SD.RIDAGEYR>=RRb.ageStart AND SD.RIDAGEYR<=RRb.ageEnd AND SD.LBXMC<RRb.min);
UPDATE SurveyD SD
SET SD.LBXMC_d =
(SELECT SD.LBXMC-RRa.max
 FROM ReferenceRanges RRa
 WHERE RRa.variable='LBXMC' AND SD.RIAGENDR=RRa.gender AND SD.RIDAGEYR>=RRa.ageStart AND SD.RIDAGEYR<=RRa.ageEnd AND SD.LBXMC>RRa.max)
WHERE SD.LBXMC_d = 0 AND
EXISTS (SELECT RRb.max
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXMC' AND SD.RIAGENDR=RRb.gender AND SD.RIDAGEYR>=RRb.ageStart AND SD.RIDAGEYR<=RRb.ageEnd AND SD.LBXMC>RRb.max);

-- Computing LBXRDW
UPDATE SurveyD SD
SET SD.LBXRDW_d =
(SELECT RRa.min-SD.LBXRDW
 FROM ReferenceRanges RRa
 WHERE RRa.variable='LBXRDW' AND SD.RIAGENDR=RRa.gender AND SD.RIDAGEYR>=RRa.ageStart AND SD.RIDAGEYR<=RRa.ageEnd AND SD.LBXRDW<RRa.min)
WHERE EXISTS
(SELECT RRb.min
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXRDW' AND SD.RIAGENDR=RRb.gender AND SD.RIDAGEYR>=RRb.ageStart AND SD.RIDAGEYR<=RRb.ageEnd AND SD.LBXRDW<RRb.min);
UPDATE SurveyD SD
SET SD.LBXRDW_d =
(SELECT SD.LBXRDW-RRa.max
 FROM ReferenceRanges RRa
 WHERE RRa.variable='LBXRDW' AND SD.RIAGENDR=RRa.gender AND SD.RIDAGEYR>=RRa.ageStart AND SD.RIDAGEYR<=RRa.ageEnd AND SD.LBXRDW>RRa.max)
WHERE SD.LBXRDW_d = 0 AND
EXISTS (SELECT RRb.max
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXRDW' AND SD.RIAGENDR=RRb.gender AND SD.RIDAGEYR>=RRb.ageStart AND SD.RIDAGEYR<=RRb.ageEnd AND SD.LBXRDW>RRb.max);

-- Computing LBXPLTSI
UPDATE SurveyD SD
SET SD.LBXPLTSI_d =
(SELECT RRa.min-SD.LBXPLTSI
 FROM ReferenceRanges RRa
 WHERE RRa.variable='LBXPLTSI' AND SD.RIAGENDR=RRa.gender AND SD.RIDAGEYR>=RRa.ageStart AND SD.RIDAGEYR<=RRa.ageEnd AND SD.LBXPLTSI<RRa.min)
WHERE EXISTS
(SELECT RRb.min
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXPLTSI' AND SD.RIAGENDR=RRb.gender AND SD.RIDAGEYR>=RRb.ageStart AND SD.RIDAGEYR<=RRb.ageEnd AND SD.LBXPLTSI<RRb.min);
UPDATE SurveyD SD
SET SD.LBXPLTSI_d =
(SELECT SD.LBXPLTSI-RRa.max
 FROM ReferenceRanges RRa
 WHERE RRa.variable='LBXPLTSI' AND SD.RIAGENDR=RRa.gender AND SD.RIDAGEYR>=RRa.ageStart AND SD.RIDAGEYR<=RRa.ageEnd AND SD.LBXPLTSI>RRa.max)
WHERE SD.LBXPLTSI_d = 0 AND
EXISTS (SELECT RRb.max
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXPLTSI' AND SD.RIAGENDR=RRb.gender AND SD.RIDAGEYR>=RRb.ageStart AND SD.RIDAGEYR<=RRb.ageEnd AND SD.LBXPLTSI>RRb.max);

-- Computing LBXMPSI
UPDATE SurveyD SD
SET SD.LBXMPSI_d =
(SELECT RRa.min-SD.LBXMPSI
 FROM ReferenceRanges RRa
 WHERE RRa.variable='LBXMPSI' AND SD.RIAGENDR=RRa.gender AND SD.RIDAGEYR>=RRa.ageStart AND SD.RIDAGEYR<=RRa.ageEnd AND SD.LBXMPSI<RRa.min)
WHERE EXISTS
(SELECT RRb.min
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXMPSI' AND SD.RIAGENDR=RRb.gender AND SD.RIDAGEYR>=RRb.ageStart AND SD.RIDAGEYR<=RRb.ageEnd AND SD.LBXMPSI<RRb.min);
UPDATE SurveyD SD
SET SD.LBXMPSI_d =
(SELECT SD.LBXMPSI-RRa.max
 FROM ReferenceRanges RRa
 WHERE RRa.variable='LBXMPSI' AND SD.RIAGENDR=RRa.gender AND SD.RIDAGEYR>=RRa.ageStart AND SD.RIDAGEYR<=RRa.ageEnd AND SD.LBXMPSI>RRa.max)
WHERE SD.LBXMPSI_d = 0 AND
EXISTS (SELECT RRb.max
 FROM ReferenceRanges RRb
 WHERE RRb.variable='LBXMPSI' AND SD.RIAGENDR=RRb.gender AND SD.RIDAGEYR>=RRb.ageStart AND SD.RIDAGEYR<=RRb.ageEnd AND SD.LBXMPSI>RRb.max);

## Matriz com  desvio final

* Matriz final com identificação das pessoas e desvios.

In [14]:
DROP VIEW IF EXISTS CorrelationMatrixWeighted;

CREATE VIEW CorrelationMatrixWeighted AS
SELECT DISTINCT SD.SEQN, SD.LBXIRN_d, SD.LBXTIB_d, SD.LBXSLDSI_d, SD.LBXWBCSI_d, SD.LBXLYPCT_d, SD.LBXMOPCT_d, SD.LBXNEPCT_d, SD.LBXEOPCT_d, SD.LBXBAPCT_d, SD.LBXRBCSI_d, SD.LBXHGB_d, SD.LBXHCT_d, SD.LBXMCVSI_d, SD.LBXMCHSI_d, SD.LBXMC_d, SD.LBXRDW_d, SD.LBXPLTSI_d, SD.LBXMPSI_d
FROM SurveyD SD, ReferenceRanges RR
WHERE SD.RIAGENDR=RR.gender AND SD.RIDAGEYR>=RR.ageStart AND SD.RIDAGEYR<=RR.ageEnd AND
(LBXIRN_d>0 OR LBXTIB_d>0 OR LBXSLDSI_d>0 OR LBXWBCSI_d>0 OR LBXLYPCT_d>0 OR LBXMOPCT_d>0 OR LBXNEPCT_d>0 OR LBXEOPCT_d>0 OR LBXBAPCT_d>0 OR LBXRBCSI_d>0 OR LBXHGB_d>0 OR LBXHCT_d>0 OR LBXMCVSI_d>0 OR LBXMCHSI_d>0 OR LBXMC_d>0 OR LBXRDW_d>0 OR LBXPLTSI_d>0 OR LBXMPSI_d>0);

SELECT COUNT(*) FROM CorrelationMatrixWeighted;
SELECT * FROM CorrelationMatrixWeighted;

## Gravação da matriz final com desvios

In [15]:
CALL CSVWRITE('../data/nhanes2005-2006/correlation-matrix-weighted.csv', 'SELECT * FROM CorrelationMatrixWeighted');

1751

# Rede de variáveis

* Nesta rede cada nó será uma variável e cada aresta indica que duas variáveis se correlacionam com uma certa identidade.

## Lista de pares de variáveis

* Esta view prepara a lista de correlação aos pares inicializada com 0.

In [16]:
DROP VIEW IF EXISTS VariablesCorrelation;
DROP VIEW IF EXISTS Variables;

CREATE VIEW Variables AS
SELECT DISTINCT variable AS var1 FROM ReferenceRanges;

CREATE VIEW VariablesCorrelation AS
SELECT DISTINCT Variables.var1, ReferenceRanges.variable AS var2, 0 AS correlation
FROM Variables, ReferenceRanges
WHERE Variables.var1 < ReferenceRanges.variable;

SELECT COUNT(*) FROM VariablesCorrelation;
SELECT * FROM VariablesCorrelation;

## Verticalização do survey

* As pessoas e variáveis que se apresentam originalmente em uma matriz são transformadas em uma lista: pessoa, variável e valor. Essa lista facilitará as análises subsequentes.

In [17]:
DROP VIEW IF EXISTS VerticalSurvey;

CREATE VIEW VerticalSurvey AS
  SELECT SU.SEQN, RR.variable, SU.LBXIRN AS value, 0 AS deviation
  FROM Survey SU, ReferenceRanges RR
  WHERE RR.variable='LBXIRN'
UNION
  SELECT SU.SEQN, RR.variable, SU.LBXTIB AS value, 0 AS deviation
  FROM Survey SU, ReferenceRanges RR
  WHERE RR.variable='LBXTIB'
UNION
  SELECT SU.SEQN, RR.variable, SU.LBXSLDSI AS value, 0 AS deviation
  FROM Survey SU, ReferenceRanges RR
  WHERE RR.variable='LBXSLDSI'
UNION
  SELECT SU.SEQN, RR.variable, SU.LBXWBCSI AS value, 0 AS deviation
  FROM Survey SU, ReferenceRanges RR
  WHERE RR.variable='LBXWBCSI'
UNION
  SELECT SU.SEQN, RR.variable, SU.LBXLYPCT AS value, 0 AS deviation
  FROM Survey SU, ReferenceRanges RR
  WHERE RR.variable='LBXLYPCT'
UNION
  SELECT SU.SEQN, RR.variable, SU.LBXMOPCT AS value, 0 AS deviation
  FROM Survey SU, ReferenceRanges RR
  WHERE RR.variable='LBXMOPCT'
UNION
  SELECT SU.SEQN, RR.variable, SU.LBXNEPCT AS value, 0 AS deviation
  FROM Survey SU, ReferenceRanges RR
  WHERE RR.variable='LBXNEPCT'
UNION
  SELECT SU.SEQN, RR.variable, SU.LBXEOPCT AS value, 0 AS deviation
  FROM Survey SU, ReferenceRanges RR
  WHERE RR.variable='LBXEOPCT'
UNION
  SELECT SU.SEQN, RR.variable, SU.LBXBAPCT AS value, 0 AS deviation
  FROM Survey SU, ReferenceRanges RR
  WHERE RR.variable='LBXBAPCT'
UNION
  SELECT SU.SEQN, RR.variable, SU.LBXRBCSI AS value, 0 AS deviation
  FROM Survey SU, ReferenceRanges RR
  WHERE RR.variable='LBXRBCSI'
UNION
  SELECT SU.SEQN, RR.variable, SU.LBXHGB AS value, 0 AS deviation
  FROM Survey SU, ReferenceRanges RR
  WHERE RR.variable='LBXHGB'
UNION
  SELECT SU.SEQN, RR.variable, SU.LBXHCT AS value, 0 AS deviation
  FROM Survey SU, ReferenceRanges RR
  WHERE RR.variable='LBXHCT'
UNION
  SELECT SU.SEQN, RR.variable, SU.LBXMCVSI AS value, 0 AS deviation
  FROM Survey SU, ReferenceRanges RR
  WHERE RR.variable='LBXMCVSI'
UNION
  SELECT SU.SEQN, RR.variable, SU.LBXMCHSI AS value, 0 AS deviation
  FROM Survey SU, ReferenceRanges RR
  WHERE RR.variable='LBXMCHSI'
UNION
  SELECT SU.SEQN, RR.variable, SU.LBXMC AS value, 0 AS deviation
  FROM Survey SU, ReferenceRanges RR
  WHERE RR.variable='LBXMC'
UNION
  SELECT SU.SEQN, RR.variable, SU.LBXRDW AS value, 0 AS deviation
  FROM Survey SU, ReferenceRanges RR
  WHERE RR.variable='LBXRDW'
UNION
  SELECT SU.SEQN, RR.variable, SU.LBXPLTSI AS value, 0 AS deviation
  FROM Survey SU, ReferenceRanges RR
  WHERE RR.variable='LBXPLTSI'
UNION
  SELECT SU.SEQN, RR.variable, SU.LBXMPSI AS value, 0 AS deviation
  FROM Survey SU, ReferenceRanges RR
  WHERE RR.variable='LBXMPSI'
;
  
SELECT * FROM VerticalSurvey;

## Gravação do survey vertical em CSV

In [18]:
CALL CSVWRITE('../data/nhanes2005-2006/vertical-survey.csv', 'SELECT SEQN,variable,value FROM VerticalSurvey');

47970

## Transformação da VIEW em tabela para permitir updates

In [19]:
DROP TABLE IF EXISTS VerticalSurveyD;
CREATE TABLE VerticalSurveyD (
  SEQN VARCHAR(8),
  variable VARCHAR(8),
  value DECIMAL(7,1),
  deviation DECIMAL(7,1),
  PRIMARY KEY(SEQN, variable)
) AS SELECT * FROM VerticalSurvey;

## Cáculo do desvio de variáveis que ultrapassam o limite

In [20]:
UPDATE VerticalSurveyD VS
SET VS.deviation =
(SELECT RRa.min-VS.value
 FROM Survey SUa, ReferenceRanges RRa
 WHERE RRa.variable=VS.variable AND SUa.SEQN=VS.SEQN AND SUa.RIAGENDR=RRa.gender AND SUa.RIDAGEYR>=RRa.ageStart AND SUa.RIDAGEYR<=RRa.ageEnd AND VS.value<RRa.min)
WHERE EXISTS
(SELECT RRb.min
 FROM Survey SUb, ReferenceRanges RRb
 WHERE RRb.variable=VS.variable AND SUb.SEQN=VS.SEQN AND SUb.RIAGENDR=RRb.gender AND SUb.RIDAGEYR>=RRb.ageStart AND SUb.RIDAGEYR<=RRb.ageEnd AND VS.value<RRb.min);

UPDATE VerticalSurveyD VS
SET VS.deviation =
(SELECT VS.value-RRa.max
 FROM Survey SUa, ReferenceRanges RRa
 WHERE RRa.variable=VS.variable AND SUa.SEQN=VS.SEQN AND SUa.RIAGENDR=RRa.gender AND SUa.RIDAGEYR>=RRa.ageStart AND SUa.RIDAGEYR<=RRa.ageEnd AND VS.value>RRa.max)
WHERE EXISTS
(SELECT RRb.max
 FROM Survey SUb, ReferenceRanges RRb
 WHERE RRb.variable=VS.variable AND SUb.SEQN=VS.SEQN AND SUb.RIAGENDR=RRb.gender AND SUb.RIDAGEYR>=RRb.ageStart AND SUb.RIDAGEYR<=RRb.ageEnd AND VS.value>RRb.max);
 
SELECT * FROM VerticalSurveyD WHERE deviation > 0;

## Cálculo da média dos desvios

* Tentativa de normalização dos valores, mas as médias estão estranahas.

In [21]:
SELECT variable, AVG(deviation) FROM VerticalSurveyD VS GROUP BY variable;

## Correlação de variáveis por pessoa

* Análise de pares de variáveis de pessoas que se correlacionam

In [22]:
DROP VIEW IF EXISTS VariablePairCorrelation;
DROP VIEW IF EXISTS IndividualVariablesCorrelation;

CREATE VIEW IndividualVariablesCorrelation AS
SELECT VS1.SEQN, CM.profile, VC.var1, VC.var2
FROM VariablesCorrelation VC, VerticalSurveyD VS1, VerticalSurveyD VS2, CorrelationMatrix CM
WHERE VS1.SEQN = VS2.SEQN AND VS1.variable = VC.var1 AND VS2.variable = VC.var2 AND 
      VS1.deviation > 0 AND VS2.deviation > 0 AND
      VS1.SEQN = CM.SEQN;

SELECT * FROM IndividualVariablesCorrelation
ORDER BY var1, var2;

## Correlação de pares de variáveis

* Agrupamento das correlações por pares de variáveis.
* Preparação para a montagem de rede com variáveis nos nós e arestas ligando variáveis que saíram dos limites juntas para a mesma pessoa.

In [23]:
DROP VIEW IF EXISTS VariablePairCorrelation;
CREATE VIEW VariablePairCorrelation AS
SELECT var1 AS source, var2 as TARGET, COUNT(*) AS weight
FROM IndividualVariablesCorrelation
GROUP BY var1, var2;

SELECT * FROM VariablePairCorrelation;

## Gravação de CSV de correlações para Gephi

In [24]:
CALL CSVWRITE('../data/nhanes2005-2006/variable-pair-correlation.csv', 'SELECT * FROM VariablePairCorrelation');

152

# Variable Network

* Rede de variáveis produzida no Gephi a partir do arquivo acima.

![variable network](variable-network.svg "Variable Network")

# Rede de perfis

* Retomada da rede de perfis.

## Análise de correlação entre pares de perfis

* Cada vez que duas pessoas compartilham uma variável fora dos limites, é definida uma aresta entre elas.
* As arestas são agrupadas por pares de perfil. Para cada par é contado o número de indivíduos/variáveis que coocorrem.

In [25]:
DROP VIEW IF EXISTS ProfileCorrelationWeight;
DROP VIEW IF EXISTS ProfileCorrelation;
DROP VIEW IF EXISTS IndividualVariablesCorrelationCopy;

CREATE VIEW IndividualVariablesCorrelationCopy AS
  SELECT SEQN, profile AS profileCopy, var1, var2
  FROM IndividualVariablesCorrelation;

CREATE VIEW ProfileCorrelation AS
  SELECT IPC1.profile, IPC2.profileCopy
  FROM IndividualVariablesCorrelation IPC1, IndividualVariablesCorrelationCopy IPC2
  WHERE IPC1.var1 = IPC2.var1 AND IPC1.var2 = IPC2.var2 AND
        IPC1.SEQN < IPC2.SEQN AND IPC1.profile <> IPC2.profileCopy;
      
CREATE VIEW ProfileCorrelationWeight AS
  SELECT PC.profile AS source, PC.profileCopy as target, COUNT(*) as weight
  FROM ProfileCorrelation PC
  GROUP BY PC.profile, PC.profileCopy;
  
SELECT COUNT(*) FROM ProfileCorrelationWeight;
SELECT * FROM ProfileCorrelationWeight;

## Gravação de pares de perfis com similaridade para rede

In [26]:
CALL CSVWRITE('../data/nhanes2005-2006/profile-pair-correlation.csv', 'SELECT * FROM ProfileCorrelationWeight');

9715

# Profile Network

![profile network](profile-network.png "Profile Network")