# ex06-Doing Math Across Table Columns

As we already know, the demo database (i.e., demo.db3) was extraced from a numerical hydrological modeling. The major data types are of integers, decimals, or floating points. It is quite natural that we want to carry out some calculations or statistical analysis. SQL can handle calculations ranging from basic math through advanced statistics.

***Basic Math Operators***
<li>+ Addition
<li>- Subtraction
<li>* Multiplication
<li>/ Division (returns the quotient only, no remainder)
<li>% Modulo (returns just the remainder)
<li>^ Exponentiation
<li>Others 
    
Let’s try to use the most frequently used SQL math operators on the demo data. Instead of using numbers in queries, we’ll use the names of the columns that contain the numbers. When we execute the query, the calculation will occur on each row of the table.    

In [1]:
%load_ext sql

### 1. Connet to the given database of demo.db3

In [2]:
%sql sqlite:///data/demo.db3

u'Connected: @data/demo.db3'

If you do not remember the tables in the demo data, you can always use the following command to query.

In [3]:
%sql SELECT name FROM sqlite_master WHERE type='table'

 * sqlite:///data/demo.db3
Done.


name
rch
hru
sub
sed
watershed_daily
watershed_monthly
watershed_yearly
channel_dimension
hru_info
sub_info


### 2. Test Math Operator in an easy way

Using the SELECT statement, we can easily test the math operators.

In [4]:
%sql SELECT 3+4

 * sqlite:///data/demo.db3
Done.


3+4
7


In [5]:
%sql SELECT 12 * 4

 * sqlite:///data/demo.db3
Done.


12 * 4
48


In [6]:
%sql SELECT 12 % 4

 * sqlite:///data/demo.db3
Done.


12 % 4
0


In [7]:
%sql SELECT round(123.456,2) as Rounded

 * sqlite:///data/demo.db3
Done.


Rounded
123.46


### 3. Doing Math Across Table Columns

Take the table of watershed_monthly as an example

#### 3.1 Check the table colums firstly.

In [8]:
%sql SELECT * From watershed_monthly LIMIT 3

 * sqlite:///data/demo.db3
Done.


YR,MO,PREC_mm,SURQ_mm,LATQ_mm,GWQ_mm,PERCOLA_mm,TILEQ_mm,SW_mm,ET_mm,PET_mm,WYLD_mm,SYLD_tons,NO3_SURQ,NO3_LATQ,NO3_PERC,NO3_CROP,N_ORG,P_SOL,P_ORG,TILENO3
1981,1,96.2901611328,0.515981376171,0.412546992302,6.68811368942,19.9067058563,0.0,1854.22424316,6.8717417717,12.2690172195,8.68197631836,0.179334715009,0.000217399661778,0.00227017630823,1.31047689915,0.0,0.126228243113,0.00011268912931,0.0154068088159,0.0
1981,2,160.228042603,3.34680223465,0.645278871059,9.14877605438,34.3681221008,0.0,1766.25305176,9.16553211212,14.7731771469,13.8621845245,1.21271717548,0.000626709603239,0.00195567845367,1.17412638664,0.0,0.896599590778,0.00056043791119,0.109999984503,0.0
1981,3,136.652908325,3.82499432564,1.48131656647,18.5184955597,34.3672447205,0.0,1990.75354004,13.5204763412,23.3635005951,25.0185108185,1.26296019554,0.000849568168633,0.0075485506095,0.504496872425,0.0,0.679934620857,0.000544593494851,0.0833883434534,0.0


#### 3.2 Calculate the difference between two colummns

For example, we are interested in the difference between Potential evapotranspiration (PET_mm) and precipitation (PREC_mm). It is so-called Potential evapotranspiration deficit (PED). PED can be thought of as a drought index. It is the difference between how much water could potentially be lost from the soil through evapotranspiration and how much is actually available. When PED is high, plants do not have the full amount of water available they need for growth.

In [9]:
%%sql sqlite://
SELECT YR, MO,  
PREC_mm as Precipitation, 
PET_mm as PET, 
PET_mm-PREC_mm as PED 
From watershed_monthly LIMIT 10

Done.


YR,MO,Precipitation,PET,PED
1981,1,96.2901611328,12.2690172195,-84.0211439133
1981,2,160.228042603,14.7731771469,-145.454865456
1981,3,136.652908325,23.3635005951,-113.28940773
1981,4,118.857406616,36.1955604553,-82.6618461609
1981,5,84.5469818115,89.7725601196,5.22557830811
1981,6,44.837184906,123.683319092,78.8461341858
1981,7,32.3259849548,174.008895874,141.682910919
1981,8,20.6514968872,152.637496948,131.986000061
1981,9,16.579656601,115.852905273,99.2732486725
1981,10,59.6729316711,43.6953010559,-15.9776306152


###### We also can calculate the PED ratio to Precipitation.

In [10]:
%%sql sqlite://
SELECT YR, MO,  PREC_mm as Precipitation, 
PET_mm as PET, 
(PET_mm-PREC_mm)/PREC_mm*100.0 as PED_Ratio 
From watershed_monthly LIMIT 10

Done.


YR,MO,Precipitation,PET,PED_Ratio
1981,1,96.2901611328,12.2690172195,-87.2582857114
1981,2,160.228042603,14.7731771469,-90.779905373
1981,3,136.652908325,23.3635005951,-82.9030344971
1981,4,118.857406616,36.1955604553,-69.5470720035
1981,5,84.5469818115,89.7725601196,6.18067989672
1981,6,44.837184906,123.683319092,175.849876283
1981,7,32.3259849548,174.008895874,438.294180725
1981,8,20.6514968872,152.637496948,639.111057091
1981,9,16.579656601,115.852905273,598.76540909
1981,10,59.6729316711,43.6953010559,-26.7753404564


#### 3.3 Use math operators in a WHERE statement

For example, we could use the modulus operator (%) to filter the MOs.

In [11]:
%%sql sqlite://
SELECT RCH, YR, MO, FLOW_INcms, FLOW_OUTcms 
From rch 
WHERE YR>2009 
and RCH=10 
and MO % 3 = 0

Done.


RCH,YR,MO,FLOW_INcms,FLOW_OUTcms
10,2010,3,3.42989563942,3.08460569382
10,2010,6,736.301635742,734.99206543
10,2010,9,218.407974243,218.244567871
10,2010,12,36.955821991,36.8774642944


### 4. Do some statistics with Aggregate Functions

So far, we’ve performed math operations across columns in each row of a table. We also can calculate a result from values within the same column using aggregate function, which calculate a single result from multiple inputs. Two of the most-used aggregate functions in data analysis are avg() and sum().

#### 4.1 average

avg - calculates the average of all values in that column (omits null values).

In [12]:
%%sql sqlite://
SELECT avg(FLOW_INcms), avg(FLOW_OUTcms) 
From rch 

Done.


avg(FLOW_INcms),avg(FLOW_OUTcms)
559.361707683,557.279226083


#### 4.2 sum

sum - calculates the sum of the values in that column (omits null values).

In [13]:
%%sql sqlite://
SELECT sum(FLOW_INcms), sum(FLOW_OUTcms) 
From rch 

Done.


sum(FLOW_INcms),sum(FLOW_OUTcms)
4631514.93962,4614271.99196


#### 4.3 extreme values

max - calculates the maximum value in that column (omits null values).

min - calculates the minimum value in that column (omits null values).

In [14]:
%%sql sqlite://
SELECT min(FLOW_INcms), max(FLOW_OUTcms) 
From rch 

Done.


min(FLOW_INcms),max(FLOW_OUTcms)
0.201215535402,10499.5498047


### 5. Calculate by ourselves

We can calculated some values by the combination of those math operators. For example, we can calculate the average values of Flow_In and Flow_Out.

In [15]:
%%sql sqlite://
SELECT sum(FLOW_INcms)/COUNT(FLOW_INcms) as AVG_FlowIn, 
sum(FLOW_OUTcms)/COUNT(FLOW_OUTcms) as AVG_FlowOut
From rch 

Done.


AVG_FlowIn,AVG_FlowOut
559.361707683,557.279226083


### Summary

Aggregating data (also referred to as rolling up, summarizing, or grouping data) is creating some sort of total from a number of records. Sum, min, max, count, and average are common aggregate operations.

In fact, the above example did not present the real power of these aggregation functions. They will become more powerful only when they are used with ***GROUP BY*** and ***ORDER BY*** clauses.