# From SQL to Pandas Cheatsheet


## `SELECT`

SQL: 

```sql
# select all columns
SELECT * FROM table; 
# select specific columns
SELECT Name, Age FROM table; 
```

Pandas:

```python
df
df[["Name", "Age"]]
```

## `LIMIT`

SQL: 

```sql
# select first 10 rows
SELECT * FROM table LIMIT 10; 
```

Pandas:

```python
df.head(10)
```

## `DISTINCT`

SQL: 

```sql
# select all distinct countries
SELECT DISTINCT Country FROM table; 
```

Pandas:

```python
df["Country"].unique()
```

## `WHERE`

SQL: 

```sql
# select all columns where age is greater than 30
SELECT * FROM table WHERE Age > 30; 
# select all columns where country is USA
SELECT * FROM table WHERE Country = 'USA'; 
```

Pandas:

```python
df[df["Age"] > 30]
df[df["Country"] == "USA"]
```

## `IN`

SQL: 

```sql
# select all rows where country is USA or Canada
SELECT * FROM table WHERE Country IN ('USA', 'Canada'); 
```

Pandas:

```python
df[df["Country"].isin(["USA", "Canada"])]
```

## `NOT IN`

SQL: 

```sql
# select all rows where country is not USA or Canada
SELECT * FROM table WHERE Country NOT IN ('USA', 'Canada');
```

Pandas:

```python
df[~df["Country"].isin(["USA", "Canada"])]
```

## `LIKE`

SQL: 

```sql
# select all rows where name contains John
SELECT * FROM table WHERE Name LIKE '%John%'; 
```

Pandas:

```python
df[df["Name"].str.contains("John")]
```

Similar to `LIKE`, there's also an option to include symbols with special meaning in `str.contains`. Pandas utilizes the enormously powerful "regular expressions" - `regex` - for this. 

Here's a short list of symbols to start out:

- `^` matches the start of the string. (`'^A'` is equivalent to SQL `'A%'`)
- `$` matches the end of the string. (`'z$'` is equivalent to SQL `'%z'`)
- `.` is a wildcard symbol that represents a single character, similar to SQL `_`.

```python
(
stores
    # Find cities that have 'e' as the second letter.
    .loc[stores['city'].str.contains('^.e', regex = True)]
)
```

If you ever need to include one of these symbols as part of the string, you can use `'\'` to "escape" the next character so it will lose its special meaning.

```py 
.str.contains('Mr.', regex = True)
```
will find both 'Mr.' and 'Mrs', because of the wildcard symbol `'.'`.

```py
.str.contains('Mr\.', regex = True)
```
will only find 'Mr.' because the '.' was escaped and lost its wildcard meaning.

## `BETWEEN`

SQL: 

```sql
# select all rows where age is between 18 and 30
SELECT * FROM table WHERE Age BETWEEN 18 AND 30; 
```

Pandas:

```python
df[(df["Age"] >= 18) & (df["Age"] <= 30)]
```

## `AND`

SQL: 

```sql
# select all rows where country is USA and age is greater than 30
SELECT * FROM table WHERE Country = 'USA' AND Age > 30; 
```

Pandas:

```python
df[(df["Country"] == "USA") & (df["Age"] > 30)]
```

## `OR`

SQL: 

```sql
# select all rows where country is USA or Canada
SELECT * FROM table WHERE Country = 'USA' OR Country = 'Canada'; 
```

Pandas:

```python
df[(df["Country"] == "USA") | (df["Country"] == "Canada")]
```

## `ORDER BY`

SQL: 

```sql
# order by age ascending
SELECT * FROM table ORDER BY Age ASC; 
# order by age descending
SELECT * FROM table ORDER BY Age DESC; 
```

Pandas:

```python
df.sort_values(by="Age")
df.sort_values(by="Age", ascending=False)
```

## `GROUP BY`

SQL: 

```sql
# select country and average age, grouped by country
SELECT Country, AVG(Age) FROM table GROUP BY Country;
# select country, city and average age, grouped by country and city
SELECT Country, City, AVG(Age) FROM table GROUP BY Country, City; 
```

Pandas:

```python
df.groupby("Country").agg({"Age": "mean"})
df.groupby(["Country", "City"]).agg({"Age": "mean"})
```

## `AS`

SQL: 

```sql
# rename column as Total
SELECT COUNT(*) AS 'Total' FROM table; 
```

Pandas:

```python
df.rename(columns={"COUNT(*)": "Total"})
```

## `JOIN`

SQL: 

```sql
# select all columns from both tables, where id is the same
SELECT t1.*, t2.* FROM table1 t1 INNER JOIN table2 t2 ON t1.id = t2.id; 
```

Pandas:

```python
pd.merge(table1, table2, on="id", how="inner")
```

# Some math operations

In SQL, you can use the `SUM()`, `AVG()`, `MIN()`, `MAX()` and `COUNT()` functions to perform aggregations.

```sql
SELECT SUM(qty), AVG(qty), MIN(qty), MAX(qty), COUNT(qty)
FROM sales;
```

In pandas, you can use the `DataFrame.sum()`, `DataFrame.mean()`, `DataFrame.min()`, `DataFrame.max()`, and `DataFrame.count()` methods to perform aggregations.

```python
(
sales
    [['qty']] # select the quantity column
    .agg(['sum', 'mean', 'min', 'max', 'count']) # aggregate the data
)
```

# Yeni Bölüm