In [1]:
%load_ext sql

In [2]:
# Connect to the MIMIC database
%sql sqlite://///Users/leonidas/Desktop/AUTH_HealthData/Scripts/SQL_HSDA/mimic3.db

#### Aggregate functions for summarizing your result
We can use functions in SQL to run calculations that give useful information and summarize your data. 

|Function |What it returns|
|--------------|----------|
|**COUNT()**|Counts how many rows are in a particular column|
|**SUM()**|Adds together all the values in a particular column|
|**MIN()**|Returns the lowest value in a particular column|
|**MAX()**|Returns the highest value in a particular column|
|**AVG()**|Calculates the average of a group of selected values|

The best way to learn is by an example. Let's figure out how many patients we have in patients database:

In [3]:
%sql SELECT COUNT(subject_id) AS NUMBER_OF_PATIENTS FROM patients;
# You can also use * inside the COUNT function instead of subject_id, COUNT(*)

 * sqlite://///Users/leonidas/Desktop/AUTH_HealthData/Scripts/SQL_HSDA/mimic3.db
Done.


NUMBER_OF_PATIENTS
100


Have in mind that:
> When a column is included in a COUNT function, null values are ignored in the count. That is not the case when * is used.   

### EXERCISE 
Find the minimum the maximum so as the average time of hospital stay in days form table icustays. (Hint: Make use of the **los** column)

In [4]:
%sql SELECT MAX(los) AS MAX_STAY, MIN(los) AS MIN_STAY, AVG(los) AS Average_Stay \
FROM icustays;

 * sqlite://///Users/leonidas/Desktop/AUTH_HealthData/Scripts/SQL_HSDA/mimic3.db
Done.


MAX_STAY,MIN_STAY,Average_Stay
35.4065,0.1059,4.452456617647062


### Arithmetical operations

|Operator |Meaning | Operates on|
|--------------|----------|--------------|
| + |Addition |Numeric value|
|- 	|Subtraction|Numeric value|
|* 	|Multiplication	|Numeric value|
|/ 	|Division|Numeric value|
|% 	|Division Remainder| Numeric value|

- An example: 13 % 5 = 3 because the remainder of 13 divided by 5 is 3

In [None]:
%sql SELECT (row_id - subject_id) AS test_difference FROM admissions LIMIT 10;

## Calculating time, and time differences

As it is easily seen in the database, we have TIMESTAMP datatypes, in our database. 

In order to calculate time, or time differences we can use some of the functions of SQLITE in:

https://www.sqlite.org/lang_datefunc.html

In our case, the best choice is JULIANDAY() function. 

It provides us with the number of days since Nov 24, 4714 BC 12:00pm Greenwich time in the Gregorian calendar:

https://www.techonthenet.com/sqlite/functions/julianday.php

Just a reminder. JULIANDAY() can be used only by SQLITE. If we had a different implementation e.g. MySQL we have a different case. 

The following example calculates the lifespan of the patients in years rounding to 2 digits.

SQLite function **ROUND()** takes two arguments:
- **ROUND(X,Y)**
- **X**, the number which will be rounded
- **Y**, A number indicating up to how many decimal places N will be rounded.

In [5]:
%sql SELECT ROUND((JULIANDAY(dod)-JULIANDAY(dob))/365.24,2) \
AS AGE_OF_DEATH FROM patients;

 * sqlite://///Users/leonidas/Desktop/AUTH_HealthData/Scripts/SQL_HSDA/mimic3.db
Done.


AGE_OF_DEATH
71.44
36.23
87.09
76.98
48.9
300.53
82.67
79.45
88.14
82.39


### EXERCISE 
Find the days, by subject_id of hospital stay of the patients that did not pass away in hospital(hospital_expire_flag=0), by descending time. (Use **admissions** table)

In [6]:
%sql SELECT subject_id,ROUND((JULIANDAY(dischtime)-JULIANDAY(admittime)),2) AS HIME \
FROM admissions WHERE hospital_expire_flag =0 ORDER BY HIME DESC;

 * sqlite://///Users/leonidas/Desktop/AUTH_HealthData/Scripts/SQL_HSDA/mimic3.db
Done.


subject_id,HIME
40310,123.98
43798,39.7
44212,36.01
10061,25.0
10132,24.75
10127,22.39
10130,19.9
10083,17.5
41914,16.98
10119,13.91


## The GROUP BY Statement

- The AGGREGATE functions become more useful  when used in conjunction with the GROUP BY Statement.
- It is used to summarize multiple subsets of the data in the same query.

The GROUP BY clause comes after the FROM clause of the SELECT statement. In case a statement contains a WHERE clause, the GROUP BY clause must come after the WHERE clause.

` SELECT columnname, AGR_FUNCTION1, AGR_FUNCTION2 FROM tablename WHERE conditions_provided GROUP BY columnname;`

Let's count the number of male and female in **patients** table.

In [10]:
%sql SELECT gender,COUNT(gender) AS TOTAL_NUMBER FROM patients GROUP BY gender;

 * sqlite://///Users/leonidas/Desktop/AUTH_HealthData/Scripts/SQL_HSDA/mimic3.db
Done.


gender,TOTAL_NUMBER
F,100


## EXERCISE 
- Calculate the number of doses per drug from prescriptions table
- Order by maximum number of doses
- Limit to the first 10

In [None]:
%sql SELECT drug, COUNT(drug) AS number_of_doses FROM prescriptions \
GROUP BY drug ORDER BY number_of_doses DESC LIMIT 10

## Grouping by multiple fields
Grouping can be extended to multiple fields using a specified order. For example you can identify the numbers of deaths in terms of type of entrance using GROUP BY and including both fields (**admission_type**,**hospital_expire_flag**):

In [None]:
%sql SELECT admission_type, hospital_expire_flag, COUNT(subject_id) as number \
FROM admissions \
GROUP BY admission_type,hospital_expire_flag;

## The HAVING clause

HAVING clause is used to query subsets of aggregated groups (in a GROUP BY clause) just as WHERE is used to query subsets of rows.
The expression that follows a HAVING clause has to be applicable to these groups as WHERE is applicable to each row of data in a column.

e.g. Let's filter from the exercise before, the drugs that more than 100 of doses where provided to the patients.

In [None]:
%sql SELECT drug, COUNT(drug) AS number_of_doses FROM prescriptions \
GROUP BY drug HAVING number_of_doses>100 ORDER BY 2 DESC;

## EXERCISE 
SELECT the patients that have more than one **admissions** in the hospital by descending order.

In [None]:
%sql SELECT subject_id, COUNT(subject_id) AS number_of_admissions FROM admissions \
GROUP BY subject_id HAVING number_of_admissions > 1 \
ORDER BY number_of_admissions DESC;