<a id="top">

# Examples on how to do equivalent SQL idioms with Python [pandas](https://pandas.pydata.org/) library

#### Table of Contents

- [Performing an equivalent ```LIKE``` statement](#like)
- [Performing an equivalent ```IN``` statement](#in)
- [Performing an eqivalent ```SELECT DISTINCT``` statement](#distinct)
- [How to obtain rows where column value is ```Null```](#isnull)
- [Creating running total](#runningtotal)
- [pandas ```query()``` method](#query)
- [string concatenation examples](#string)
- [How to create row numbering on groups](#row_num)
- [How to create an equivalent ```CASE``` statement](#case)

In [1]:
import pandas as pd
import numpy as np

<a id="like">

### Performing an equivalent ```LIKE '%<chars>%'``` statement:

[[back to top]](#top)

In [2]:
data = pd.DataFrame({'food': ['bacon', 'pulled pork', 'bacon', 'pastrami','corned beef', 'bacon', 'pastrami', 'honey ham','nova lox'],
                 'ounces': [4, 3, 12, 6, 7.5, 8, 3, 5, 6]})
data

Unnamed: 0,food,ounces
0,bacon,4.0
1,pulled pork,3.0
2,bacon,12.0
3,pastrami,6.0
4,corned beef,7.5
5,bacon,8.0
6,pastrami,3.0
7,honey ham,5.0
8,nova lox,6.0


In [3]:
data[data['food'].str.contains('aco')]

Unnamed: 0,food,ounces
0,bacon,4.0
2,bacon,12.0
5,bacon,8.0


This is equivalent to doing something like:

```
SELECT * FROM data WHERE food LIKE '%aco%'
```

In [5]:
data[data['food'].str.startswith('past')]

Unnamed: 0,food,ounces
3,pastrami,6.0
6,pastrami,3.0


is equivalent to:

```
SELECT * FROM data WHERE food like 'past%'
```

In [7]:
data[data['food'].str.endswith('eef')]

Unnamed: 0,food,ounces
4,corned beef,7.5


is equivalent to:

```
SELECT * FROM data WHERE food LIKE '%eef'
```

<a id="in">

### Performing an equivalent ```IN()``` statement

[[back to top]](#top)

In [8]:
data

Unnamed: 0,food,ounces
0,bacon,4.0
1,pulled pork,3.0
2,bacon,12.0
3,pastrami,6.0
4,corned beef,7.5
5,bacon,8.0
6,pastrami,3.0
7,honey ham,5.0
8,nova lox,6.0


In [9]:
data[data['food'].isin(['bacon', 'pastrami'])]

Unnamed: 0,food,ounces
0,bacon,4.0
2,bacon,12.0
3,pastrami,6.0
5,bacon,8.0
6,pastrami,3.0


is equivalent to:

```
SELECT * FROM data WHERE food IN('bacon', 'pastrami')
```

<a id="distinct">

### Performing an equivalent ```SELECT DISTINCT```

[[back to top]](#top)

In [10]:
data[['food']].drop_duplicates()

Unnamed: 0,food
0,bacon
1,pulled pork
3,pastrami
4,corned beef
7,honey ham
8,nova lox


is equivalent to:

```
SELECT DISTINCT food FROM data
```

<a id="isnull">

### How to obtain rows where column value is Null/isna

[[back to top]](#top)

In [11]:
data = pd.DataFrame({'food': ['bacon', 'pulled pork', np.NAN, 'pastrami','corned beef', 'bacon', 'pastrami', 'honey ham','nova lox'],
                 'ounces': [4, 3, 12, 6, 7.5, 8, 3, 5, 6]})
data

Unnamed: 0,food,ounces
0,bacon,4.0
1,pulled pork,3.0
2,,12.0
3,pastrami,6.0
4,corned beef,7.5
5,bacon,8.0
6,pastrami,3.0
7,honey ham,5.0
8,nova lox,6.0


In [12]:
data[data['food'].isna()]

Unnamed: 0,food,ounces
2,,12.0


is equivalent to:

```
SELECT * FROM data WHERE food is null
```

<a id="runningtotal">

### Creating running total / cumulative sum

[[back to top]](#top)

In [16]:
data = pd.DataFrame({'food': ['bacon', 'pulled pork', 'bacon', 'pastrami','corned beef', 'bacon', 'pastrami', 'honey ham','nova lox'],
                 'ounces': [4, 3, 12, 6, 7.5, 8, 3, 5, 6]})
data.reset_index(level=0, inplace=True)
data

Unnamed: 0,index,food,ounces
0,0,bacon,4.0
1,1,pulled pork,3.0
2,2,bacon,12.0
3,3,pastrami,6.0
4,4,corned beef,7.5
5,5,bacon,8.0
6,6,pastrami,3.0
7,7,honey ham,5.0
8,8,nova lox,6.0


In [17]:
data['cum_ounces'] = data['ounces'].cumsum() + 1
data

Unnamed: 0,index,food,ounces,cum_ounces
0,0,bacon,4.0,5.0
1,1,pulled pork,3.0,8.0
2,2,bacon,12.0,20.0
3,3,pastrami,6.0,26.0
4,4,corned beef,7.5,33.5
5,5,bacon,8.0,41.5
6,6,pastrami,3.0,44.5
7,7,honey ham,5.0,49.5
8,8,nova lox,6.0,55.5


is equivalent to:

```
SELECT
    food,
    ounces,
    sum(ounces) OVER(ORDER BY index) as cum_ounces

FROM
    data
```

<a id="query">

### pandas query() examples

[[back to top]](#top)

**NOTE:** With pandas query() method, it has a few limitations and it seems awkward to pass a string to the query method. But it is convenient to use compared to pandas normal [boolean indexing syntax](http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#boolean-indexing). If you have column names with spaces in them, then you can't use the query() method.  You could remove the spaces from column names or use the normal pandas boolean indexing methods which work in all cases.  Although you can pass string with single quotes to the query method, you should alway pass string with double quotes to the query method because often times you may be querying for a string value.

In [18]:
data.query("food == 'bacon'")

Unnamed: 0,index,food,ounces,cum_ounces
0,0,bacon,4.0,5.0
2,2,bacon,12.0,20.0
5,5,bacon,8.0,41.5


```
SELECT * FROM data where food = 'bacon'
```

Here's a little known trick you can do with the query() method using local variables prepended with the @ symbol:

In [19]:
fav_food = 'bacon'
data.query("food == @fav_food")

Unnamed: 0,index,food,ounces,cum_ounces
0,0,bacon,4.0,5.0
2,2,bacon,12.0,20.0
5,5,bacon,8.0,41.5


This technique above is useful if you are using a GUI/widget framework.

But you are not limited to just one value, you can pass a list of values:

In [20]:
fav_food = ['bacon', 'honey ham']
data.query("food == @fav_food")

Unnamed: 0,index,food,ounces,cum_ounces
0,0,bacon,4.0,5.0
2,2,bacon,12.0,20.0
5,5,bacon,8.0,41.5
7,7,honey ham,5.0,49.5


```
SELECT * FROM data WHERE food IN('bacon', 'honey ham')
```

### More ```query()``` examples:

Query off of 2 columns or more:

In [21]:
data.query("food == 'bacon' and ounces > 4")

Unnamed: 0,index,food,ounces,cum_ounces
2,2,bacon,12.0,20.0
5,5,bacon,8.0,41.5


is equivalent to:

```
SELECT *

FROM
    data
   
WHERE
    food = 'bacon'
    and ounces > 4
```

Can also chain queries together:

In [22]:
data.query("food == 'bacon'").query("ounces > 4")

Unnamed: 0,index,food,ounces,cum_ounces
2,2,bacon,12.0,20.0
5,5,bacon,8.0,41.5


<a id="string">

### String concatenation examples:

[[back to top]](#top)

In [24]:
data['new_food'] = data['food'] + "_yummy"
data

Unnamed: 0,index,food,ounces,cum_ounces,new_food
0,0,bacon,4.0,5.0,bacon_yummy
1,1,pulled pork,3.0,8.0,pulled pork_yummy
2,2,bacon,12.0,20.0,bacon_yummy
3,3,pastrami,6.0,26.0,pastrami_yummy
4,4,corned beef,7.5,33.5,corned beef_yummy
5,5,bacon,8.0,41.5,bacon_yummy
6,6,pastrami,3.0,44.5,pastrami_yummy
7,7,honey ham,5.0,49.5,honey ham_yummy
8,8,nova lox,6.0,55.5,nova lox_yummy


is equivalent to:

```
In SQL Server:
SELECT
    *,
    food + "_yummy" as new_food
    
FROM
    data
    
In IBM DB2:
SELECT
    *,
    food || "_yummy" as new_food
    
FROM
    data
```

The operator symbol for string concatenation using SQL can vary by different database software.  It is usually the plus (```+```) symbol or the double vertical lines (```||```).  In Python, the + symbol is used to concatenate strings together.

In [25]:
data['new_columns'] = data['food'] + ' - ' + data['new_food']
data

Unnamed: 0,index,food,ounces,cum_ounces,new_food,new_columns
0,0,bacon,4.0,5.0,bacon_yummy,bacon - bacon_yummy
1,1,pulled pork,3.0,8.0,pulled pork_yummy,pulled pork - pulled pork_yummy
2,2,bacon,12.0,20.0,bacon_yummy,bacon - bacon_yummy
3,3,pastrami,6.0,26.0,pastrami_yummy,pastrami - pastrami_yummy
4,4,corned beef,7.5,33.5,corned beef_yummy,corned beef - corned beef_yummy
5,5,bacon,8.0,41.5,bacon_yummy,bacon - bacon_yummy
6,6,pastrami,3.0,44.5,pastrami_yummy,pastrami - pastrami_yummy
7,7,honey ham,5.0,49.5,honey ham_yummy,honey ham - honey ham_yummy
8,8,nova lox,6.0,55.5,nova lox_yummy,nova lox - nova lox_yummy


<a id="row_num">

### How to apply SQL's [row_number()](https://docs.microsoft.com/en-us/sql/t-sql/functions/row-number-transact-sql?view=sql-server-2017) function based on grouping:

[[back to top]](#top)

In [29]:
data = pd.DataFrame({'PartNo': [1, 1, 1, 2, 2, 2, 3, 3, 3],
                    'TotalCost': [101.30, 98.10, 100.50, 67.34, 56.56, 52.45, 201.32, 245.65, 234.67]
                    }
                   )
data

Unnamed: 0,PartNo,TotalCost
0,1,101.3
1,1,98.1
2,1,100.5
3,2,67.34
4,2,56.56
5,2,52.45
6,3,201.32
7,3,245.65
8,3,234.67


So let's say your objective is to create a ```ROW_NUM``` column that increments upwards for each unique part number, but want the ```ROW_NUM``` to be sorted or ordered by total cost.

To do this, we need to use a combination of ```sort_values()```, ```groupby()```, and ```cumcount()``` functions to create ```ROW_NUM``` column:

In [31]:
data['ROW_NUM'] = data.sort_values(by=['TotalCost']).groupby(['PartNo']).cumcount() + 1
data.sort_values(by=['PartNo', 'ROW_NUM'], inplace=True)
data

Unnamed: 0,PartNo,TotalCost,ROW_NUM
1,1,98.1,1
2,1,100.5,2
0,1,101.3,3
5,2,52.45,1
4,2,56.56,2
3,2,67.34,3
6,3,201.32,1
8,3,234.67,2
7,3,245.65,3


is equivalent to:

```
SELECT
    *,
    ROW_NUMBER() OVER(PARTITION BY PartNo ORDER BY TotalCost ASC) AS ROW_NUM
    
FROM
    data
```

<a id="case">

### How to create an equivalent ```CASE``` statement:

[[back to top]](#top)

In [32]:
import pandas as pd
data = pd.DataFrame({'food': ['bacon', 'pulled pork', 'bacon', 'pastrami','corned beef', 'bacon', 'pastrami', 'honey ham','nova lox'],
                 'ounces': [4, 3, 12, 6, 7.5, 8, 3, 5, 6]})
data

Unnamed: 0,food,ounces
0,bacon,4.0
1,pulled pork,3.0
2,bacon,12.0
3,pastrami,6.0
4,corned beef,7.5
5,bacon,8.0
6,pastrami,3.0
7,honey ham,5.0
8,nova lox,6.0


Let's say you want to create a new column called ```animal``` where the value of ```animal``` depends on the ```food```.  For instance, if the ```food``` is 'bacon', then ```animal``` maps to 'pig', if ```food``` is 'pastrami', then ```animal``` maps to 'cow', and so forth.  There are 2 ways you can do this.  You can define a mapping using Python dictionary or create a user-define function:

In [33]:
# Python dictionary
meat_to_animal = {
'bacon': 'pig',
'pulled pork': 'pig',
'pastrami': 'cow',
'corned beef': 'cow',
'honey ham': 'pig',
'nova lox': 'salmon'
}

# Python function
def meat2animal(column):
    if column == 'bacon':
        return 'pig'
    elif column == 'pulled pork':
        return 'pig'
    elif column == 'pastrami':
        return 'cow'
    elif column == 'corned beef':
        return 'cow'
    elif column == 'honey ham':
        return 'pig'
    else:
        return 'salmon'

Using the Python dictionary above in conjunction with pandas ```map()``` function, we can do:

In [34]:
data['animal'] = data['food'].map(meat_to_animal)
data

Unnamed: 0,food,ounces,animal
0,bacon,4.0,pig
1,pulled pork,3.0,pig
2,bacon,12.0,pig
3,pastrami,6.0,cow
4,corned beef,7.5,cow
5,bacon,8.0,pig
6,pastrami,3.0,cow
7,honey ham,5.0,pig
8,nova lox,6.0,salmon


or we can use our function ```meat2animal()```:

In [35]:
data['animal'] = data['food'].map(meat_to_animal)
data

Unnamed: 0,food,ounces,animal
0,bacon,4.0,pig
1,pulled pork,3.0,pig
2,bacon,12.0,pig
3,pastrami,6.0,cow
4,corned beef,7.5,cow
5,bacon,8.0,pig
6,pastrami,3.0,cow
7,honey ham,5.0,pig
8,nova lox,6.0,salmon


This is equivalent to SQL's ```CASE``` statement:

```
SELECT
    *,
    
    CASE
        WHEN food = 'bacon' THEN 'pig'
        WHEN food = 'pulled pork' THEN 'pig'
        WHEN food = 'pastrami' THEN 'beef'
        WHEN food = 'corned beef' THEN 'beef'
        WHEN food = 'honey ham' THEN 'pig'
        WHEN food = 'nova lox' THEN 'salmon'
    ELSE
        '???'
    END AS animal
    
FROM
    data
```

But what if you wanted to create an animal column based on values from 2 columns instead of one column?

We would then create a function that takes in an entire dataframe row and then we tell it to make IF/ELSE logic with the row:

In [36]:
def use2columns(row):
    if row['animal'] == 'pig' and row['ounces'] > 4:
        return 'Big Pig'
    elif row['animal'] == 'pig' and row['ounces'] <= 4:
        return 'Little Pig'
    else:
        return 'Other Animal'

but instead of using pandas ```map()``` function, we would use the ```apply()``` function.

In [37]:
data['animal2'] = data.apply(use2columns, axis='columns')
data

Unnamed: 0,food,ounces,animal,animal2
0,bacon,4.0,pig,Little Pig
1,pulled pork,3.0,pig,Little Pig
2,bacon,12.0,pig,Big Pig
3,pastrami,6.0,cow,Other Animal
4,corned beef,7.5,cow,Other Animal
5,bacon,8.0,pig,Big Pig
6,pastrami,3.0,cow,Other Animal
7,honey ham,5.0,pig,Big Pig
8,nova lox,6.0,salmon,Other Animal


is equivalent to

```
SELECT
    *,
    
    CASE
        WHEN animal = 'pig' and ounces > 4 THEN 'Big Pig'
        WHEN animal = 'pig' and ounces <= 4 THEN 'Little Pig'
    ELSE
        'Other Animal'
    END AS animal2
    
FROM
    data
```

[[back to top]](#top)