# NHANES Analysis of Anemia Variables - Part 3

In this sequence of notebooks, we will analyze the correlation among variables and profiles of individuals examined in NHANES survey. The focus is on variables related to anemia diagnosis. This work is based on the research of Patrícia Raia Nogueira Cavoto.


## Selecting the driver
The sentence below defines the driver for the database.

In [1]:
%defaultDatasource jdbc:h2:mem:db

# Profiles Network

* In this network each node is a profile and each edge indicates that two profiles are correlated in a certain intensity.

## Starting Part 3

* This notebook is divided into three parts due to memory constraints. The queries below retrieve files produced in the previous part (`sql-network-02-nhanes-complete-p2`).

In [2]:
DROP TABLE IF EXISTS ProfileCorrelation;
CREATE TABLE ProfileCorrelation (
  SEQN1 VARCHAR(8),
  profile1 VARCHAR(18),
  SEQN2 VARCHAR(8),
  profile2 VARCHAR(18)
) AS SELECT
  SEQN1, profile1, SEQN2, profile2
FROM CSVREAD('../data/nhanes2005-2006/profile-pair-correlation.csv');

## Correlation with counting

In the following query, the weight of correlation between two profiles is based on the number of individuals sharing variables out of ranges. If two individuals share more than one variable, it is computed as one correlation (the number of shared variables does not affect the weight of the correlation among profiles).

In [3]:
DROP VIEW IF EXISTS ProfileCorrFinalNWeight;
DROP VIEW IF EXISTS ProfileCorrNWeight;
DROP VIEW IF EXISTS ProfileCorrelationNWeight;
DROP VIEW IF EXISTS ProfileCorrelationUnique;

CREATE VIEW ProfileCorrelationUnique AS
  SELECT DISTINCT * FROM ProfileCorrelation;

CREATE VIEW ProfileCorrelationNWeight AS
  SELECT PC.profile1 AS source, PC.profile2 as target, COUNT(*) as weight
  FROM ProfileCorrelationUnique PC
  GROUP BY PC.profile1, PC.profile2;
  
CREATE VIEW ProfileCorrNWeight AS
SELECT source, target, weight w FROM ProfileCorrelationNWeight WHERE source < target
UNION
SELECT target, source, weight w FROM ProfileCorrelationNWeight WHERE source > target;

CREATE VIEW ProfileCorrFinalNWeight AS
SELECT source, target, SUM(w) AS weight
FROM ProfileCorrNWeight
GROUP BY source, target;

SELECT * FROM ProfileCorrFinalNWeight;

-- Gravação de pares de perfis com similaridade para rede
CALL CSVWRITE('../data/nhanes2005-2006/profile-pair-correlation-number.csv', 'SELECT * FROM ProfileCorrFinalNWeight');

# Exercise

Import the file previously created `/data/nhanes2005-2006/profile-pair-correlation-number.csv` in the Gephi. Which analyses can you do? Insert in the cell below an image of your analyses and a brief description of the result.