# seekwellpandas: Basic Usage

This notebook demonstrates the basic usage of the seekwellpandas library, which extends pandas with SQL-like functionality.

## Setup

First, let's import the necessary libraries and create some sample data.

In [1]:
import pandas as pd
import seekwellpandas

# Create sample data
people = pd.DataFrame({
    'id': range(1, 11),
    'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank', 'Grace', 'Henry', 'Ivy', 'Jack'],
    'age': [25, 30, 35, 28, 22, 40, 33, 45, 27, 31],
    'city': ['New York', 'London', 'Paris', 'Tokyo', 'Sydney', 'Berlin', 'Moscow', 'Rome', 'Madrid', 'Toronto']
})
people

Unnamed: 0,id,name,age,city
0,1,Alice,25,New York
1,2,Bob,30,London
2,3,Charlie,35,Paris
3,4,David,28,Tokyo
4,5,Eve,22,Sydney
5,6,Frank,40,Berlin
6,7,Grace,33,Moscow
7,8,Henry,45,Rome
8,9,Ivy,27,Madrid
9,10,Jack,31,Toronto


## Basic Operations

### Select

The `select` method allows you to choose specific columns from the DataFrame.


In [2]:
people.select('name', 'age')

Unnamed: 0,age,name
0,25,Alice
1,30,Bob
2,35,Charlie
3,28,David
4,22,Eve
5,40,Frank
6,33,Grace
7,45,Henry
8,27,Ivy
9,31,Jack


Negative selections are also supported.

In [3]:
people.select('-id')

Unnamed: 0,age,city,name
0,25,New York,Alice
1,30,London,Bob
2,35,Paris,Charlie
3,28,Tokyo,David
4,22,Sydney,Eve
5,40,Berlin,Frank
6,33,Moscow,Grace
7,45,Rome,Henry
8,27,Madrid,Ivy
9,31,Toronto,Jack


### Where

Use `where` to filter rows based on a condition.

In [4]:
people.where_('age > 30')

Unnamed: 0,id,name,age,city
2,3,Charlie,35,Paris
5,6,Frank,40,Berlin
6,7,Grace,33,Moscow
7,8,Henry,45,Rome
9,10,Jack,31,Toronto


### Group By

Group the data by a specific column.

In [5]:
people.group_by('city').agg({'age': 'mean'})

Unnamed: 0_level_0,age
city,Unnamed: 1_level_1
Berlin,40.0
London,30.0
Madrid,27.0
Moscow,33.0
New York,25.0
Paris,35.0
Rome,45.0
Sydney,22.0
Tokyo,28.0
Toronto,31.0


### Order By

Sort the DataFrame based on one or more columns.

In [6]:
people.order_by('age', ascending=False)

NameError: name 'columns' is not defined

### Limit

Limit the number of rows returned.

In [6]:
people.order_by('age').limit(5)

Unnamed: 0,id,name,age,city
4,5,Eve,22,Sydney
0,1,Alice,25,New York
8,9,Ivy,27,Madrid
3,4,David,28,Tokyo
1,2,Bob,30,London


## Advanced Operations

### Join

Demonstrate joining two DataFrames.

In [7]:
countries = pd.DataFrame({
    'city': ['New York', 'London', 'Paris', 'Tokyo', 'Sydney'],
    'country': ['USA', 'UK', 'France', 'Japan', 'Australia']
})

people.join_(countries, on='city')

Unnamed: 0,id,name,age,city,country
0,1,Alice,25,New York,USA
1,2,Bob,30,London,UK
2,3,Charlie,35,Paris,France
3,4,David,28,Tokyo,Japan
4,5,Eve,22,Sydney,Australia


### Union

Combine two DataFrames vertically.

In [8]:
other_people = pd.DataFrame({
    'id': range(11, 16),
    'name': ['Karen', 'Leo', 'Mike', 'Nina', 'Oscar'],
    'age': [29, 36, 41, 24, 38],
    'city': ['Chicago', 'Dublin', 'Amsterdam', 'Oslo', 'Stockholm']
})
all_people = people.union(other_people)
all_people

Unnamed: 0,id,name,age,city
0,1,Alice,25,New York
1,2,Bob,30,London
2,3,Charlie,35,Paris
3,4,David,28,Tokyo
4,5,Eve,22,Sydney
5,6,Frank,40,Berlin
6,7,Grace,33,Moscow
7,8,Henry,45,Rome
8,9,Ivy,27,Madrid
9,10,Jack,31,Toronto


### With Column

Add a new column based on an expression.

In [9]:
all_people.with_column('age_group', 'age // 10 * 10')

Unnamed: 0,id,name,age,city,age_group
0,1,Alice,25,New York,20
1,2,Bob,30,London,30
2,3,Charlie,35,Paris,30
3,4,David,28,Tokyo,20
4,5,Eve,22,Sydney,20
5,6,Frank,40,Berlin,40
6,7,Grace,33,Moscow,30
7,8,Henry,45,Rome,40
8,9,Ivy,27,Madrid,20
9,10,Jack,31,Toronto,30


### Rename Column

Rename an existing column.

In [10]:
all_people.rename_column('city', 'location')

Unnamed: 0,id,name,age,location,age_group
0,1,Alice,25,New York,20
1,2,Bob,30,London,30
2,3,Charlie,35,Paris,30
3,4,David,28,Tokyo,20
4,5,Eve,22,Sydney,20
5,6,Frank,40,Berlin,40
6,7,Grace,33,Moscow,30
7,8,Henry,45,Rome,40
8,9,Ivy,27,Madrid,20
9,10,Jack,31,Toronto,30
