# Dropping Single-Value Columns

Sometimes as you boil your data down, you end up with columns that either contain no data at all (or very little) or contain the same value in every row.  Columns like this are all noise and no signal, and I generally like to get rid of them.

Dropping empty (or nearly empty) columns is easy: use `dropna()` with appropriate values for `how` and `thresh`.

But what about columns that contain the same non-NaN value everywhere, such as columns A, C, and E in the example below?

In [23]:
import pandas as pd
import numpy as np

df = pd.DataFrame({ 'A' : 12.,
                    'B' : list(range(4)),
                    'C' : 3,
                    'D' : ["Trumpet","Elephant","Elephant","Trumpet"],
                    'E' : 'Bad Eggs' })
df

Unnamed: 0,A,B,C,D,E
0,12.0,0,3,Trumpet,Bad Eggs
1,12.0,1,3,Elephant,Bad Eggs
2,12.0,2,3,Elephant,Bad Eggs
3,12.0,3,3,Trumpet,Bad Eggs


Here is one way to do it.  I found this on a forum, and I wish I could find it again so I could give credit to the original source.  But maybe it is a common maneuver anyway.  Anyway, the explanation given here is my own.

I will go through it step by step.  Obviously it could be done more concisely.

First, if we use the `apply()` method of the dataframe with `axis=0`, then whatever function we specify will be iterated row-by-row down each column, yielding a result which is a series in which each element corrresponds to a column of the dataframe, and which, conveniently, is indexed by column name.

Below we use this technique with the `nunique` function, and get a series which shows the number of unique values in each column.  The elements in this series with a value of 1 correspond to the single-value columns we would like to drop.

In [24]:
ser_unique = df.apply(pd.Series.nunique, axis=0)
ser_unique

A    1
B    4
C    1
D    2
E    1
dtype: int64

Now, if from the above series we can get a list of the column names which have only one unique value, then we can pass this list to the `drop()` method and accomplish our goal.

Well, it turns out that by using a combination of boolean indexing and the `index` property of a series, we can obtain just such a list:

In [25]:
del_list = ser_unique[ser_unique == 1].index
del_list

Index(['A', 'C', 'E'], dtype='object')

Technically, what we have above is an Index object and not a list, but that will also work for passing to the `drop()` method.  If you want to use a list instead, you can wrap the expression above in `list(...)`.  It seems to work either way.

And then all that remains is to use the list (or index) to drop the columns:

In [26]:
df.drop(del_list,axis=1,inplace=True)
df.head()

Unnamed: 0,B,D
0,0,Trumpet
1,1,Elephant
2,2,Elephant
3,3,Trumpet
