Pandas Syntax Cheatsheet

This guide covers essential pandas syntax across data inspection, basic manipulation, aggregation, joins/unions, datetime operations, and null handling.

Getting Started

To get started with this project, follow these steps:

Clone the repository:

git clone https://github.com/MarkPhamm/pandas-tutorial.git
cd pandas-tutorial

Create a virtual environment:
```
python -m venv venv
```
Activate the virtual environment:
- On Windows:
```
venv\Scripts\activate
```
- On macOS and Linux:
```
source venv/bin/activate
```
Install the required packages:
```
pip install -r requirements.txt
```

1. Data Inspection

1.1 View the first few rows

df.head()

1.2 View the last few rows

df.tail()

1.3 View column names

df.columns

1.4 Access a single column

df['col']
# or
df.col

1.5 Check the shape of the DataFrame

df.shape

1.6 Check data types and non-null counts

df.info()

1.7 Get summary statistics

df.describe()

1.8 Check for duplicates

df.duplicated().sum()

1.9 See value counts in a column

df['col'].value_counts()

1.10 .empty to check if a df is empty

if df.empty:
   df = pd.DataFrame({'SecondHighestSalary': [None]})

2. Basic Manipulation

2.1 Select specific columns

df[['col1', 'col2']]

2.2 Filter rows using condition

df[df['col'] > 100]

2.3 Filter rows using `.between()`

df[df['col'].between(10, 50)]

2.4 Rename columns

df.rename(columns={'old_name': 'new_name'})

2.5 Sort values

df.sort_values(by='col', ascending=False)

2.6 Select rows using `.loc` (label-based)

df.loc[5]

2.7 Select rows using `.iloc` (position-based)

df.iloc[5]

2.8 Change datatypes using `astypes`

df = df.astype({'col_name': 'desired_dtype'})
# or 
df['col_name'] = df['col_name'].astype('desired_dtype')

Return all rows where the 'verified' column is NOT True

df[~df['col'] == 'target']

3. Aggregation Functions

3.1 Group by and sum

df.groupby('group_col')['value_col'].sum().reset_index()

3.2 Group by and count

df.groupby('group_col')['value_col'].count().reset_index()

3.3 Group by and count unique values

df.groupby('group_col')['value_col'].nunique().reset_index()

3.4 Group by with multiple aggregations using `.agg()`

df.groupby('group_col').agg({
    'col1': 'sum',
    'col2': 'mean',
    'col3': 'nunique'
}).reset_index()

4. Join and Union

4.1 Merge two DataFrames on different keys

df1.merge(df2, left_on='key1', right_on='key2', how='inner')

4.2 Merge using `how='left'`, `how='right'`, or `how='outer'`

df1.merge(df2, on='key', how='left')

4.3 Union two DataFrames (like SQL `UNION ALL`)

pd.concat([df1, df2], ignore_index=True)

4.4 Union with deduplication (like SQL `UNION`)

pd.concat([df1, df2], ignore_index=True).drop_duplicates()

5. Datetime Functions

5.1 Create a timestamp

pd.Timestamp('2025-04-12')

5.2 Create a time delta

pd.Timedelta(days=30)

5.3 Convert a column to datetime

df['date_col'] = pd.to_datetime(df['date_col'])

5.4 Extract day from datetime

df['day'] = df['date_col'].dt.day

5.5 Convert datetime to monthly period

df['month'] = df['date_col'].dt.to_period('M')

5.6 Calculate time difference in seconds

(df['end_time'] - df['start_time']).dt.total_seconds()

6. Null Handling

6.1 Check for null values

df['col'].isnull()

6.2 Filter rows with null values

df[df['col'].isnull()]

7. Duplicates Handling

7.1 Drop duplicate rows

df.drop_duplicates()

7.2 Drop duplicates based on specific columns

df.drop_duplicates(subset=['col1', 'col2'])

7.3 Keep the last occurrence of duplicates

df.drop_duplicates(keep='last')

7.4 Keep no duplicates at all

df[df.duplicated() == False]

7.5 Get only duplicate rows

df[df.duplicated()]

7.6 Mark duplicates with a boolean

df.duplicated()

8. CSV File Handling

8.1 Read a CSV file

df = pd.read_csv('data.csv')

8.2 Read a CSV with index column

df = pd.read_csv('data.csv', index_col=0)

8.3 Write a DataFrame to CSV

df.to_csv('output.csv', index=False)

8.4 Read a CSV with specific columns

df = pd.read_csv('data.csv', usecols=['col1', 'col2'])

8.5 Read only first N rows from a CSV

df = pd.read_csv('data.csv', nrows=100)

8.6 Skip initial rows while reading

df = pd.read_csv('data.csv', skiprows=1)

8.7 Handle missing values while reading

df = pd.read_csv('data.csv', na_values=['NA', 'null', 'NaN'])

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

MarkPhamm/pandas-tutorial

Folders and files

Latest commit

History

Repository files navigation

Pandas Syntax Cheatsheet

1. Data Inspection

1.1 View the first few rows

1.2 View the last few rows

1.3 View column names

1.4 Access a single column

1.5 Check the shape of the DataFrame

1.6 Check data types and non-null counts

1.7 Get summary statistics

1.8 Check for duplicates

1.9 See value counts in a column

1.10 .empty to check if a df is empty

2. Basic Manipulation

2.1 Select specific columns

2.2 Filter rows using condition

2.3 Filter rows using .between()

2.4 Rename columns

2.5 Sort values

2.6 Select rows using .loc (label-based)

2.7 Select rows using .iloc (position-based)

2.8 Change datatypes using astypes

Return all rows where the 'verified' column is NOT True

3. Aggregation Functions

3.1 Group by and sum

3.2 Group by and count

3.3 Group by and count unique values

3.4 Group by with multiple aggregations using .agg()

4. Join and Union

4.1 Merge two DataFrames on different keys

4.2 Merge using how='left', how='right', or how='outer'

4.3 Union two DataFrames (like SQL UNION ALL)

4.4 Union with deduplication (like SQL UNION)

5. Datetime Functions

5.1 Create a timestamp

5.2 Create a time delta

5.3 Convert a column to datetime

5.4 Extract day from datetime

5.5 Convert datetime to monthly period

5.6 Calculate time difference in seconds

6. Null Handling

6.1 Check for null values

6.2 Filter rows with null values

7. Duplicates Handling

7.1 Drop duplicate rows

7.2 Drop duplicates based on specific columns

7.3 Keep the last occurrence of duplicates

7.4 Keep no duplicates at all

7.5 Get only duplicate rows

7.6 Mark duplicates with a boolean

8. CSV File Handling

8.1 Read a CSV file

8.2 Read a CSV with index column

8.3 Write a DataFrame to CSV

8.4 Read a CSV with specific columns

8.5 Read only first N rows from a CSV

8.6 Skip initial rows while reading

8.7 Handle missing values while reading

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

2.3 Filter rows using `.between()`

2.6 Select rows using `.loc` (label-based)

2.7 Select rows using `.iloc` (position-based)

2.8 Change datatypes using `astypes`

3.4 Group by with multiple aggregations using `.agg()`

4.2 Merge using `how='left'`, `how='right'`, or `how='outer'`

4.3 Union two DataFrames (like SQL `UNION ALL`)

4.4 Union with deduplication (like SQL `UNION`)

Packages