# 2 Filtering data

In chapter 1, we have seen how to retrieve data with select. Now you may want to know how can we only retrieve the data that interest us,
but not all the rows. We can use the **where** statement to filter data that satisfied certain conditions.

In this chapter, we will use another database **weather_stations**, this database only have one table called **station_data**



In [1]:
%load_ext sql
%config SqlMagic.autocommit=False
%config SqlMagic.autolimit=20
%config SqlMagic.displaylimit=20
%sql postgresql://pliu:pliu@127.0.0.1:5432/weather_stations

In [3]:
%%sql
SELECT * from station_data limit 5;

 * postgresql://pliu:***@127.0.0.1:5432/weather_stations
5 rows affected.


station_number,report_code,year,month,day,dew_point,station_pressure,visibility,wind_speed,temperature,precipitation,snow_depth,fog,rain,hail,thunder,tornado
143080,34DDA7,2002,12,21,33.8,987.4,3.4,0.2,36.0,0.0,,1,1,1,1,1
766440,39537B,1998,10,1,72.7,1014.6,5.9,6.7,83.29999999999998,0.0,,0,0,0,0,0
176010,C3C6D5,2001,5,18,55.7,,7.3,4.3,69.1,0.0,,0,0,0,0,0
125600,145150,2007,10,14,33.0,,6.9,2.5,39.7,0.0,,0,0,0,0,0
470160,EF616A,1967,7,29,65.6,,9.2,1.2,72.4,0.04,,0,0,0,0,0


## 2.1 Filtering digit columns

To filter digit column, we need to build a boolean expression such as column_name comparator value

Possible comparators:
- equality : =
- inequality : !=, or <>
- greater than: >
- less than : <
- Greater than equal to : >=
- Less than equal to : <=

Below query is an example, in the boolean expression column name is year, comparator is =, value is 2010
This query should only return records where the year field equals to 2010


In [5]:
%%sql
select * from station_data where year=2010 limit 5

 * postgresql://pliu:***@127.0.0.1:5432/weather_stations
5 rows affected.


station_number,report_code,year,month,day,dew_point,station_pressure,visibility,wind_speed,temperature,precipitation,snow_depth,fog,rain,hail,thunder,tornado
719160,BAB974,2010,1,22,-22.8,1014.2,,10.2,-18.5,0.0,9.4,0,0,0,0,0
766870,7C0938,2010,3,22,48.0,871.2,4.4,1.5,50.8,0.11,,1,1,1,1,1
134624,11CEA1,2010,2,17,46.0,,3.4,2.6,46.0,,,0,0,0,0,0
384010,C67A6C,2010,3,24,14.4,,4.0,10.7,21.1,,,0,0,0,0,0
232210,DFDF58,2010,2,25,-7.3,,3.0,10.3,-2.2,,,0,0,0,0,0


What if I want to get all records where year is not equals to 2010. There are two possible ways (!=, or <>) to express inequality. Most database servers such as Mysql, Postgresql, SQLite, etc. support both. However, some database server such as **Microsoft Access and IBM DB2 only support <>**.

In [8]:
%%sql
select * from station_data where year!=2010 limit 5

 * postgresql://pliu:***@127.0.0.1:5432/weather_stations
5 rows affected.


station_number,report_code,year,month,day,dew_point,station_pressure,visibility,wind_speed,temperature,precipitation,snow_depth,fog,rain,hail,thunder,tornado
143080,34DDA7,2002,12,21,33.8,987.4,3.4,0.2,36.0,0.0,,1,1,1,1,1
766440,39537B,1998,10,1,72.7,1014.6,5.9,6.7,83.29999999999998,0.0,,0,0,0,0,0
176010,C3C6D5,2001,5,18,55.7,,7.3,4.3,69.1,0.0,,0,0,0,0,0
125600,145150,2007,10,14,33.0,,6.9,2.5,39.7,0.0,,0,0,0,0,0
470160,EF616A,1967,7,29,65.6,,9.2,1.2,72.4,0.04,,0,0,0,0,0


In [7]:
%%sql
select * from station_data where year<>2010 limit 5

 * postgresql://pliu:***@127.0.0.1:5432/weather_stations
5 rows affected.


station_number,report_code,year,month,day,dew_point,station_pressure,visibility,wind_speed,temperature,precipitation,snow_depth,fog,rain,hail,thunder,tornado
143080,34DDA7,2002,12,21,33.8,987.4,3.4,0.2,36.0,0.0,,1,1,1,1,1
766440,39537B,1998,10,1,72.7,1014.6,5.9,6.7,83.29999999999998,0.0,,0,0,0,0,0
176010,C3C6D5,2001,5,18,55.7,,7.3,4.3,69.1,0.0,,0,0,0,0,0
125600,145150,2007,10,14,33.0,,6.9,2.5,39.7,0.0,,0,0,0,0,0
470160,EF616A,1967,7,29,65.6,,9.2,1.2,72.4,0.04,,0,0,0,0,0


These two queries do the same thing. You should see the same output.

We can also qualify inclusive ranges using a BETWEEN statement, as shown here
(“inclusive” means that 2005 and 2010 are included in the range):


In [10]:
%%sql
SELECT * FROM station_data
WHERE year BETWEEN 2005 and 2010
limit 5

 * postgresql://pliu:***@127.0.0.1:5432/weather_stations
5 rows affected.


station_number,report_code,year,month,day,dew_point,station_pressure,visibility,wind_speed,temperature,precipitation,snow_depth,fog,rain,hail,thunder,tornado
125600,145150,2007,10,14,33.0,,6.9,2.5,39.7,0,,0,0,0,0,0
598550,C5C66E,2006,10,15,72.9,,14.2,1.7,82.0,0,,0,0,0,0,0
941830,229317,2007,4,19,66.5,994.9,,4.0,76.29999999999998,0,,0,0,0,0,0
932920,EB6580,2009,5,17,52.1,,12.4,7.3,59.4,0,,0,0,0,0,0
985310,A79DEC,2007,7,31,77.59999999999998,,11.8,3.4,82.5,0,,0,0,0,0,0


## 2.2 Combining multiple filtering condition

### 2.2.1 And operator

We can express the "between and" range with another expression. For instance the year must be greater than or equal to 2005 and less than or equal to 2010. We need to add two filter and combine the result of the two filter with an **and**.

Below query should return exactly the same result as the above query

In [11]:
%%sql
select * from station_data
where year >= 2005 and year<= 2010
limit 5

 * postgresql://pliu:***@127.0.0.1:5432/weather_stations
5 rows affected.


station_number,report_code,year,month,day,dew_point,station_pressure,visibility,wind_speed,temperature,precipitation,snow_depth,fog,rain,hail,thunder,tornado
125600,145150,2007,10,14,33.0,,6.9,2.5,39.7,0,,0,0,0,0,0
598550,C5C66E,2006,10,15,72.9,,14.2,1.7,82.0,0,,0,0,0,0,0
941830,229317,2007,4,19,66.5,994.9,,4.0,76.29999999999998,0,,0,0,0,0,0
932920,EB6580,2009,5,17,52.1,,12.4,7.3,59.4,0,,0,0,0,0,0
985310,A79DEC,2007,7,31,77.59999999999998,,11.8,3.4,82.5,0,,0,0,0,0,0


Note the "between and" express an inclusive range, for not inclusive range, we can not use it. As a result, evaluate the and of two filters become very useful

In [12]:
%%sql
select * from station_data
where year > 2005 and year< 2010
limit 5

 * postgresql://pliu:***@127.0.0.1:5432/weather_stations
5 rows affected.


station_number,report_code,year,month,day,dew_point,station_pressure,visibility,wind_speed,temperature,precipitation,snow_depth,fog,rain,hail,thunder,tornado
125600,145150,2007,10,14,33.0,,6.9,2.5,39.7,0,,0,0,0,0,0
598550,C5C66E,2006,10,15,72.9,,14.2,1.7,82.0,0,,0,0,0,0,0
941830,229317,2007,4,19,66.5,994.9,,4.0,76.29999999999998,0,,0,0,0,0,0
932920,EB6580,2009,5,17,52.1,,12.4,7.3,59.4,0,,0,0,0,0,0
985310,A79DEC,2007,7,31,77.59999999999998,,11.8,3.4,82.5,0,,0,0,0,0,0


### 2.2.2 Or operator

The OR operator will return the record if at least one of the criteria is true for the record. For instance, if we wanted only records with months 3, 6, 9, or 12, we can use below query:


In [16]:
%%sql
select * from station_data
where month=3
or month=6
or month=9
or month=12
limit 5;

 * postgresql://pliu:***@127.0.0.1:5432/weather_stations
5 rows affected.


station_number,report_code,year,month,day,dew_point,station_pressure,visibility,wind_speed,temperature,precipitation,snow_depth,fog,rain,hail,thunder,tornado
143080,34DDA7,2002,12,21,33.8,987.4,3.4,0.2,36.0,0.0,,1,1,1,1,1
821930,1F8A7B,1953,6,18,72.79999999999998,1007.1,12.4,3.6,81.29999999999998,0.0,,0,0,0,0,0
478070,D028D8,1981,6,27,73.4,,7.9,3.0,77.0,1.93,,0,0,0,0,0
471100,6A6704,1990,9,19,50.5,,6.0,4.1,62.5,0.0,,0,0,0,0,0
29880,921894,1986,12,26,13.9,,6.6,14.7,16.8,0.02,8.699999999999998,0,0,0,0,0



### 2.2.3 In operator

In the above example, we tested column month with a list of possible value. In this kind of situation, we can use the **In** operator.


In [15]:
%%sql
select * from station_data
where month in (3,6,9,12)
limit 5;

 * postgresql://pliu:***@127.0.0.1:5432/weather_stations
5 rows affected.


station_number,report_code,year,month,day,dew_point,station_pressure,visibility,wind_speed,temperature,precipitation,snow_depth,fog,rain,hail,thunder,tornado
143080,34DDA7,2002,12,21,33.8,987.4,3.4,0.2,36.0,0.0,,1,1,1,1,1
821930,1F8A7B,1953,6,18,72.79999999999998,1007.1,12.4,3.6,81.29999999999998,0.0,,0,0,0,0,0
478070,D028D8,1981,6,27,73.4,,7.9,3.0,77.0,1.93,,0,0,0,0,0
471100,6A6704,1990,9,19,50.5,,6.0,4.1,62.5,0.0,,0,0,0,0,0
29880,921894,1986,12,26,13.9,,6.6,14.7,16.8,0.02,8.699999999999998,0,0,0,0,0


We can also express the negation of the in operator by adding not in front of **in** operator.

In [14]:
%%sql
select * from station_data
where month not in (3,6,9,12)
limit 5;

 * postgresql://pliu:***@127.0.0.1:5432/weather_stations
5 rows affected.


station_number,report_code,year,month,day,dew_point,station_pressure,visibility,wind_speed,temperature,precipitation,snow_depth,fog,rain,hail,thunder,tornado
766440,39537B,1998,10,1,72.7,1014.6,5.9,6.7,83.29999999999998,0.0,,0,0,0,0,0
176010,C3C6D5,2001,5,18,55.7,,7.3,4.3,69.1,0.0,,0,0,0,0,0
125600,145150,2007,10,14,33.0,,6.9,2.5,39.7,0.0,,0,0,0,0,0
470160,EF616A,1967,7,29,65.6,,9.2,1.2,72.4,0.04,,0,0,0,0,0
719200,C74611,1978,2,5,-4.4,962.9,14.9,13.3,1.6,0.0,9.8,0,0,0,0,0


### 2.2.4 Arithmetic operators

We can notice 3,6,9,12 can all be divided by 3. So we can also use another way to get the same records as above examples.


**Note Oracle does not support the modulus operator. It instead uses the MOD() function.**

In [17]:
%%sql
select * from station_data
where month%3=0
limit 5;

 * postgresql://pliu:***@127.0.0.1:5432/weather_stations
5 rows affected.


station_number,report_code,year,month,day,dew_point,station_pressure,visibility,wind_speed,temperature,precipitation,snow_depth,fog,rain,hail,thunder,tornado
143080,34DDA7,2002,12,21,33.8,987.4,3.4,0.2,36.0,0.0,,1,1,1,1,1
821930,1F8A7B,1953,6,18,72.79999999999998,1007.1,12.4,3.6,81.29999999999998,0.0,,0,0,0,0,0
478070,D028D8,1981,6,27,73.4,,7.9,3.0,77.0,1.93,,0,0,0,0,0
471100,6A6704,1990,9,19,50.5,,6.0,4.1,62.5,0.0,,0,0,0,0,0
29880,921894,1986,12,26,13.9,,6.6,14.7,16.8,0.02,8.699999999999998,0,0,0,0,0


## 2.3 Filtering text columns

The rules for qualifying text fields follow the same structure, although there are subtle(small) differences. You can
**use =, AND, OR, and IN statements with text**. However, when using text, you must wrap literals (or text values you specify) **in single quotes**.


In [18]:
%%sql
SELECT * FROM station_data
WHERE report_code = '513A63'
limit 5;

 * postgresql://pliu:***@127.0.0.1:5432/weather_stations
1 rows affected.


station_number,report_code,year,month,day,dew_point,station_pressure,visibility,wind_speed,temperature,precipitation,snow_depth,fog,rain,hail,thunder,tornado
702223,513A63,2010,1,22,-23.1,,10,0.8,-15.6,0,,0,0,0,0,0


Note **postgresql does not support double quote** on string value. So below query will return error in postgresql. It works for Mysql or Sqlite

In [19]:
%%sql
SELECT * FROM station_data
WHERE report_code = "513A63"
limit 5;

 * postgresql://pliu:***@127.0.0.1:5432/weather_stations
(psycopg2.errors.UndefinedColumn) column "513A63" does not exist
LINE 2: WHERE report_code = "513A63"
                            ^

[SQL: SELECT * FROM station_data
WHERE report_code = "513A63"
limit 5;]
(Background on this error at: https://sqlalche.me/e/14/f405)


If we do not add " or ' on 513A63, the database server will get confused and think 513A63 is a column name rather than a text value. This single-quote rule applies to all text operations. For example, below query will return error.

In [20]:
%%sql
SELECT * FROM station_data
WHERE report_code = 513A63
limit 5;

 * postgresql://pliu:***@127.0.0.1:5432/weather_stations
(psycopg2.errors.SyntaxError) syntax error at or near "A63"
LINE 2: WHERE report_code = 513A63
                               ^

[SQL: SELECT * FROM station_data
WHERE report_code = 513A63
limit 5;]
(Background on this error at: https://sqlalche.me/e/14/f405)


We can also use string value in other filter operations. Below query will return all rows that has the three specific report_code.

In [21]:
%%sql
SELECT * FROM station_data
WHERE report_code IN ('513A63','1F8A7B','EF616A');

 * postgresql://pliu:***@127.0.0.1:5432/weather_stations
3 rows affected.


station_number,report_code,year,month,day,dew_point,station_pressure,visibility,wind_speed,temperature,precipitation,snow_depth,fog,rain,hail,thunder,tornado
470160,EF616A,1967,7,29,65.6,,9.2,1.2,72.4,0.04,,0,0,0,0,0
821930,1F8A7B,1953,6,18,72.79999999999998,1007.1,12.4,3.6,81.29999999999998,0.0,,0,0,0,0,0
702223,513A63,2010,1,22,-23.1,,10.0,0.8,-15.6,0.0,,0,0,0,0,0


### 2.3.1 Other useful functions

#### filter by using length

**length** operator can get the length of a text field. For instance, all report_code contains six characters or digit. Below query returns all rows that has incorrect report_code.


In [24]:
%%sql
SELECT * FROM station_data
WHERE length(report_code) != 5
limit 5

 * postgresql://pliu:***@127.0.0.1:5432/weather_stations
5 rows affected.


station_number,report_code,year,month,day,dew_point,station_pressure,visibility,wind_speed,temperature,precipitation,snow_depth,fog,rain,hail,thunder,tornado
143080,34DDA7,2002,12,21,33.8,987.4,3.4,0.2,36.0,0.0,,1,1,1,1,1
766440,39537B,1998,10,1,72.7,1014.6,5.9,6.7,83.29999999999998,0.0,,0,0,0,0,0
176010,C3C6D5,2001,5,18,55.7,,7.3,4.3,69.1,0.0,,0,0,0,0,0
125600,145150,2007,10,14,33.0,,6.9,2.5,39.7,0.0,,0,0,0,0,0
470160,EF616A,1967,7,29,65.6,,9.2,1.2,72.4,0.04,,0,0,0,0,0


#### Wild card text filtering with like operator

Another common operation is to use wildcards with LIKE followed by a regular expression, in the regular expression:
- % : means any number of characters an
- _ : means any single character.
- Any other character is interpreted literally.

So, if you wanted to find all report codes that start with the letter “A,” you would run this statement to find “A” followed by any characters


In [26]:
%%sql
select * from station_data
where report_code like 'A%'
limit 5;

 * postgresql://pliu:***@127.0.0.1:5432/weather_stations
5 rows affected.


station_number,report_code,year,month,day,dew_point,station_pressure,visibility,wind_speed,temperature,precipitation,snow_depth,fog,rain,hail,thunder,tornado
484750,A38C90,1988,6,24,72.59999999999998,,8.699999999999998,3.1,87.5,0.0,,0,0,0,0,0
985310,A79DEC,2007,7,31,77.59999999999998,,11.8,3.4,82.5,0.0,,0,0,0,0,0
724505,A49553,2005,4,28,42.7,,6.8,11.2,55.4,0.4199999999999999,,0,0,0,0,0
215350,ACE19E,1991,4,9,-6.5,,5.5,25.0,-3.4,0.02,2.4,0,0,0,0,0
209730,A70AAE,1986,3,26,-19.8,,8.5,12.1,-13.8,0.0299999999999999,4.7,0,0,0,0,0


If you wanted all report codes that have a “B” as the first character and a “C” as the third character,

In [28]:
%%sql
select * from station_data
where report_code like 'B_C%'
limit 5;

 * postgresql://pliu:***@127.0.0.1:5432/weather_stations
5 rows affected.


station_number,report_code,year,month,day,dew_point,station_pressure,visibility,wind_speed,temperature,precipitation,snow_depth,fog,rain,hail,thunder,tornado
999999,B6C2DE,1966,2,8,38.8,992.9,15.2,5.5,52.5,0.0,,0,0,0,0,0
60110,B8CB27,1997,1,20,41.7,1008.3,13.1,19.1,44.7,0.04,,0,0,0,0,0
64080,BECB51,1982,8,8,59.0,,2.5,11.0,65.5,0.0,,0,0,0,0,0
973800,B8C53F,2007,8,15,68.5,1011.0,7.4,7.5,80.5,0.0,,0,0,0,0,0
319760,B1CA98,1957,9,8,42.5,,,18.5,44.7,0.04,,0,0,0,0,0


## 2.4 Filtering boolean column

Booleans are true/false values. In the database world, typically **false is expressed as 0 and true is expressed as 1**. Some database platforms **(like MySQL) allow you to implicitly use the words true and false**

**SQLite, however, does not support this. It expects you to explicitly use 1 for true and 0 for false**. For instance, below query gets all records where there was tornado and hail:


In [30]:
%%sql
select * from station_data
where tornado=1 AND hail=1
limit 5;

 * postgresql://pliu:***@127.0.0.1:5432/weather_stations
5 rows affected.


station_number,report_code,year,month,day,dew_point,station_pressure,visibility,wind_speed,temperature,precipitation,snow_depth,fog,rain,hail,thunder,tornado
143080,34DDA7,2002,12,21,33.8,987.4,3.4,0.2,36.0,0.0,,1,1,1,1,1
724320,207979,1988,3,4,33.1,999.4,3.1,9.3,35.1,0.23,,1,1,1,1,1
743920,2ABE7D,1996,5,21,57.6,,5.8,7.5,70.0,0.0,,1,1,1,1,1
724460,B9B8B2,1986,10,2,64.2,976.0,5.5,6.5,65.4,0.6099999999999999,,1,1,1,1,1
700450,77D245,1985,11,4,-12.2,1023.7,6.2,16.3,-7.2,0.0,0.8,1,1,1,1,1


If you are looking for just true values, you do not even have to use the "= 1" expression. Because the fields are already Boolean (behind the scenes,
every **WHERE condition boils down to a Boolean expression**), they inherently qualify by themselves. Hence, you can achieve the same results by running the following query:


In [None]:
%%sql
select * from station_data
where tornado AND hail;

However, qualifying for false conditions needs to be explicit. To get all records with no tornado but with hail, run this query:

In [None]:
%%sql
SELECT * FROM station_data
WHERE tornado = 0 AND hail = 1;

You can also use the NOT keyword to qualify tornado as false:

In [None]:
%%sql
SELECT * FROM station_data
WHERE NOT tornado AND hail;

## 2.5 Handling null

You may have noticed that some columns, such as station_pressure and snow_depth , have null values. **A null is a value that has no value. It is the complete absence of any content. It is a vacuous state**.

In sql, **Null values cannot be determined with an = . You need to use the IS NULL or IS NOT NULL statements to identify null values**

Below query returns all rows that snow_depth is null


In [33]:
%%sql
select * from station_data
where snow_depth is null
limit 5;

 * postgresql://pliu:***@127.0.0.1:5432/weather_stations
5 rows affected.


station_number,report_code,year,month,day,dew_point,station_pressure,visibility,wind_speed,temperature,precipitation,snow_depth,fog,rain,hail,thunder,tornado
143080,34DDA7,2002,12,21,33.8,987.4,3.4,0.2,36.0,0.0,,1,1,1,1,1
766440,39537B,1998,10,1,72.7,1014.6,5.9,6.7,83.29999999999998,0.0,,0,0,0,0,0
176010,C3C6D5,2001,5,18,55.7,,7.3,4.3,69.1,0.0,,0,0,0,0,0
125600,145150,2007,10,14,33.0,,6.9,2.5,39.7,0.0,,0,0,0,0,0
470160,EF616A,1967,7,29,65.6,,9.2,1.2,72.4,0.04,,0,0,0,0,0


### 2.5.1 Why we even have null values in the database? Can we replace null by 0 for snow_depth?

The null values is useful for some use case. For example, for column such as snow_depth or precipitation, it does make sense. Not because it was a sunny day (in this case, it is better to record the values as 0), but rather because some stations might not have the necessary instruments to take those measurements. It might be misleading to set those values to 0 (which implies data was recorded), so those measurements should be left null.

In some columns, we can not have null values. For example, the **station_number** column should be designed that it never allows nulls. Because if it's null, all the rest columns of this row become orphan data that belongs to no station.

We can see that nulls values are ambiguous and it can be difficult to determine their business meaning. It is important that nullable columns (columns that are allowed to have null values) **have documented what a null value means from a business perspective**. **Otherwise, nulls should be banned from those table columns**.

Do not confuse nulls with empty text(i.e., '' ). This also applies to whitespace text (i.e., ' ') . These will be treated as values and
never will be considered null. A null is definitely not the same as 0 either, because 0 is a value, whereas null is an absence of a value.

### 2.5.2 Problems caused by null values

We know that precipitation column has null values, try to run the following query. You can notice that the returned rows do not contain any null values. Because null is not 0 or any number, it will not qualify to any condition. So the **precipitation <= 0.5** filtered out all rows that contain null value.


In [35]:
%%sql
SELECT station_number, precipitation FROM station_data
WHERE precipitation <= 0.5
limit 5

 * postgresql://pliu:***@127.0.0.1:5432/weather_stations
5 rows affected.


station_number,precipitation
143080,0.0
766440,0.0
176010,0.0
125600,0.0
470160,0.04


In [36]:
%%sql
SELECT station_number, precipitation FROM station_data
WHERE precipitation is null
OR precipitation <= 0.5
limit 5

 * postgresql://pliu:***@127.0.0.1:5432/weather_stations
5 rows affected.


station_number,precipitation
143080,0.0
766440,0.0
176010,0.0
125600,0.0
470160,0.04


The above query works, but we have a more elegant way of handling null values. We can use the **coalesce() function**
- coalesce(col_name,replacement_value): It takes a column name that may have null value, if the row value is null,
  then replace it with the given replacement_value.

Below query use coalesce to replace all null value of column precipitation by 0 than compare them with <=0.5. This will not modify the origin table.


In [38]:
%%sql
SELECT * FROM station_data
WHERE coalesce(precipitation, 0) <= 0.5
limit 5;

 * postgresql://pliu:***@127.0.0.1:5432/weather_stations
5 rows affected.


station_number,report_code,year,month,day,dew_point,station_pressure,visibility,wind_speed,temperature,precipitation,snow_depth,fog,rain,hail,thunder,tornado
143080,34DDA7,2002,12,21,33.8,987.4,3.4,0.2,36.0,0.0,,1,1,1,1,1
766440,39537B,1998,10,1,72.7,1014.6,5.9,6.7,83.29999999999998,0.0,,0,0,0,0,0
176010,C3C6D5,2001,5,18,55.7,,7.3,4.3,69.1,0.0,,0,0,0,0,0
125600,145150,2007,10,14,33.0,,6.9,2.5,39.7,0.0,,0,0,0,0,0
470160,EF616A,1967,7,29,65.6,,9.2,1.2,72.4,0.04,,0,0,0,0,0


If we want the other user can use the replaced value, we can create a new column by using below query

In [39]:
%%sql
SELECT report_code, coalesce(precipitation, 0) as rainfall
FROM station_data
limit 5;

 * postgresql://pliu:***@127.0.0.1:5432/weather_stations
5 rows affected.


report_code,rainfall
34DDA7,0.0
39537B,0.0
C3C6D5,0.0
145150,0.0
EF616A,0.04


## 2.6 Grouping Conditions

When you start chaining "AND" and "OR" together, you need to make sure that you organize each set of conditions between each OR in a way that groups related conditions.

For example, we need to find all rows that snow or sleet(rain with snow). For sleet to happen, there must be rain and a temperature less than or equal to 32 degrees. We could write below query.


In [40]:
%%sql
SELECT * FROM station_data
WHERE rain = 1 AND temperature <= 32
OR snow_depth > 0
limit 5;

 * postgresql://pliu:***@127.0.0.1:5432/weather_stations
5 rows affected.


station_number,report_code,year,month,day,dew_point,station_pressure,visibility,wind_speed,temperature,precipitation,snow_depth,fog,rain,hail,thunder,tornado
719200,C74611,1978,2,5,-4.4,962.9,14.9,13.3,1.6,0.0,9.8,0,0,0,0,0
29880,921894,1986,12,26,13.9,,6.6,14.7,16.8,0.02,8.699999999999998,0,0,0,0,0
700450,77D245,1985,11,4,-12.2,1023.7,6.2,16.3,-7.2,0.0,0.8,1,1,1,1,1
28010,423D51,2004,2,15,23.7,953.7,,6.6,26.0,0.02,26.4,0,0,0,0,0
710830,FC6047,2008,1,16,-13.7,990.6,0.4,21.9,-8.699999999999998,0.1,11.4,0,0,0,0,0


We are lucky this works, because AND, OR have the same priority, sql resolve from left to right. So **rain = 1 AND temperature <= 32** resolved to a value, then this value get resolved with **OR snow_depth > 0**. But for a begginer, this can be confusing, he will wonder if AND condition get resolved first or the OR condition get resolved first.

To avoid this, we can explicitly group conditions in parentheses. This makes not only makes the semantics clearer, but also the execution safer.

In [41]:
%%sql
SELECT * FROM station_data
WHERE (rain = 1 AND temperature <= 32)
OR snow_depth > 0
limit 5

 * postgresql://pliu:***@127.0.0.1:5432/weather_stations
5 rows affected.


station_number,report_code,year,month,day,dew_point,station_pressure,visibility,wind_speed,temperature,precipitation,snow_depth,fog,rain,hail,thunder,tornado
719200,C74611,1978,2,5,-4.4,962.9,14.9,13.3,1.6,0.0,9.8,0,0,0,0,0
29880,921894,1986,12,26,13.9,,6.6,14.7,16.8,0.02,8.699999999999998,0,0,0,0,0
700450,77D245,1985,11,4,-12.2,1023.7,6.2,16.3,-7.2,0.0,0.8,1,1,1,1,1
28010,423D51,2004,2,15,23.7,953.7,,6.6,26.0,0.02,26.4,0,0,0,0,0
710830,FC6047,2008,1,16,-13.7,990.6,0.4,21.9,-8.699999999999998,0.1,11.4,0,0,0,0,0
