# Axis Specification in Pandas and Numpy
The axis specification is the same in Pandas and Numpy.  

However this terminology is a bit different from that used in matrix algebra and databases.

This notebook presents examples of how to specify the axis.  There are two distinct cases:
* applying row or column operations
* adding or removing rows or columns

In [1]:
import pandas as pd
print("Pandas Version {}".format(pd.__version__))
import numpy as np
print("Numpy Version {}".format(np.__version__))

Pandas Version 0.25.3
Numpy Version 1.17.4


## Definition of "Row" and "Column"
As in an Excel spreadsheet or a database table:
* columns are the attributes and run vertically
* rows are the records (or observations) and run horizontally.

## Column and Row Operations

### Definition of "Column Operation"
A "column operation" operates on a column of data.  For example, the column operation of "sum" will sum each column, resulting in one sum per column.

In Pandas and Numpy, this is specified as axis=0 or axis='index'.

### Definition of "Row Operation"
A "row operation" operates on a row of data.  For example, the row operation of "sum" will sum each row, resulting in one sum per row.

In Pandas and Numpy, this is specified as axis=1 or axis='columns'.

### Pandas and Numpy Terminology
In Pandas and Numpy, the a column operation is said to take place "across the rows", hence the axis specification of 0 or 'index'.

Whether you call it a "column operation", or "summing across the rows", the axis specification is axis=0, or axis='index'.

In Pandas and Numpy, the a row operation is said to take place "across the columns", hence the axis specification of 1 or 'columns'.

Whether you call it a "row operation", or "summing across the columns", the axis specification is axis=1, or axis='columns.

In [2]:
# create dataframe
df1 = pd.DataFrame(np.arange(12).reshape((3,4)),
                  columns="col1 col2 col3 col4".split(),
                  index="row1 row2 row3".split())
df1

Unnamed: 0,col1,col2,col3,col4
row1,0,1,2,3
row2,4,5,6,7
row3,8,9,10,11


In [3]:
# sum each of the 4 columns
df1.sum(axis=0)

col1    12
col2    15
col3    18
col4    21
dtype: int64

In [4]:
# sum each of the 3 rows
df1.sum(axis=1)

row1     6
row2    22
row3    38
dtype: int64

## Adding or Removing Columns and Rows
To delete or add columns, specify axis=1 or axis='columns'.  
To delete or add rows, specify axis=0 or axis='index'.

For dropping columns or rows, there is an alternative syntax shown below.

In [5]:
df1.drop(['col1', 'col3'], axis=1)

Unnamed: 0,col2,col4
row1,1,3
row2,5,7
row3,9,11


In [6]:
df1.drop(columns=['col1', 'col3'])

Unnamed: 0,col2,col4
row1,1,3
row2,5,7
row3,9,11


In [7]:
df1.drop(['row2'], axis=0)

Unnamed: 0,col1,col2,col3,col4
row1,0,1,2,3
row3,8,9,10,11


In [8]:
df1.drop(index=['row2'])

Unnamed: 0,col1,col2,col3,col4
row1,0,1,2,3
row3,8,9,10,11


For adding rows or columns, the axis specification is required.

In [9]:
# create a new df with one row
df2 = pd.DataFrame(data=[[12, 13, 14, 15]], index=['row4'], columns=['col1', 'col2', 'col3', 'col4'])
df2

Unnamed: 0,col1,col2,col3,col4
row4,12,13,14,15


In [10]:
# add the row to the dataframe, specify axis=0 or axis='index'
pd.concat([df1, df2], axis=0)

Unnamed: 0,col1,col2,col3,col4
row1,0,1,2,3
row2,4,5,6,7
row3,8,9,10,11
row4,12,13,14,15


In [11]:
# create a new dataframe with one column
df3 = pd.DataFrame(data=[[33], [77], [1111]], columns=['col5'], index=['row1', 'row2', 'row3'])
df3

Unnamed: 0,col5
row1,33
row2,77
row3,1111


In [12]:
# add the column to the dataframe
pd.concat([df1, df3], axis=1)

Unnamed: 0,col1,col2,col3,col4,col5
row1,0,1,2,3,33
row2,4,5,6,7,77
row3,8,9,10,11,1111


## Summary

**Column Operation:** axis=0 or axis='index'  
**Row Operation:** axis=1 or axis='columns'  

Example, column operation sum: `df.sum(axis=0)`, the result is one sum per column.

**Add/Remove Column:** axis=1 or axis='columns'  
**Add/Remove Row:** axis=0 or axis='index'  

Example: create new columns: `df.concat([df1, df2], axis=1)`, the result has the columns in df1 and the columns in df2