# Selecting Columns

### Introduction

In the last lesson, we saw how the different components of a pandas dataframe.  In this lesson, we'll go a little deeper by seeing how we can select data from a dataframe.  Specifically, we'll see how to select different columns from a dataframe.

### Selecting Columns from a dataframe

Now let's see how we can select specific columns of data from a dataframe.  A first step, may just be seeing which columns are available.  We can do so with the `columns` method:

In [1]:
import pandas as pd
movies_df = pd.read_csv('https://raw.githubusercontent.com/fivethirtyeight/data/master/bechdel/movies.csv')
movies_df.columns

Index(['year', 'imdb', 'title', 'test', 'clean_test', 'binary', 'budget',
       'domgross', 'intgross', 'code', 'budget_2013$', 'domgross_2013$',
       'intgross_2013$', 'period code', 'decade code'],
      dtype='object')

Then from there, we choose to see a specific column.

In [43]:
movies_df['budget_2013$'][:2]

0    13000000
1    45658735
Name: budget_2013$, dtype: int64

So we select a single column by using the bracket accessors followed by the name of the column.  

Another way to select a column is with dot notation.

In [44]:
movies_df.imdb[:3]

0    tt1711425
1    tt1343727
2    tt2024544
Name: imdb, dtype: object

But this only works with columns that do not have a space in them.

In [45]:
movies_df.decade code

SyntaxError: invalid syntax (<ipython-input-45-7680dc6c1ab7>, line 1)

If we wish to select more than one column, we can do so by passing through a list of columns.

In [35]:
movies_df[['year', 'imdb']][:2]

Unnamed: 0,year,imdb
0,2013,tt1711425
1,2012,tt1343727


So above we have a list of columns inside of our bracket accessors.  Sometimes it's nice to store the columns we wish to select as a separate variable.

In [36]:
cols = ['year', 'budget']
movies_df[cols][:2]

Unnamed: 0,year,budget
0,2013,13000000
1,2012,45000000


Now one thing to note is that whenever we are working with more than one column of data, we are working with a dataframe and not a series.

In [37]:
cols = ['year', 'budget']
type(movies_df[cols])

pandas.core.frame.DataFrame

And finally, note that none of our operations modified the original dataframe.

In [40]:
movies_df[:2]

Unnamed: 0,year,imdb,title,test,clean_test,binary,budget,domgross,intgross,code,budget_2013$,domgross_2013$,intgross_2013$,period code,decade code
0,2013,tt1711425,21 &amp; Over,notalk,notalk,FAIL,13000000,25682380.0,42195766.0,2013FAIL,13000000,25682380.0,42195766.0,1.0,1.0
1,2012,tt1343727,Dredd 3D,ok-disagree,ok,PASS,45000000,13414714.0,40868994.0,2012PASS,45658735,13611086.0,41467257.0,1.0,1.0


However, sometimes we may wish to modify the dataframe.  For example, we may want to change the names of our columns so that none of them have spaces.  A good way to do this is to use the `columns` method to get a list of columns, cut and paste the list, and then just change the relevant ones.

In [46]:
movies_df.columns

Index(['year', 'imdb', 'title', 'test', 'clean_test', 'binary', 'budget',
       'domgross', 'intgross', 'code', 'budget_2013$', 'domgross_2013$',
       'intgross_2013$', 'period code', 'decade code'],
      dtype='object')

In [47]:
updated_cols = ['year', 'imdb', 'title', 'test', 'clean_test', 'binary', 'budget',
       'domgross', 'intgross', 'code', 'budget_2013$', 'domgross_2013$',
       'intgross_2013$', 'period_code', 'decade_code']

In [48]:
movies_df.columns = updated_cols

In [49]:
movies_df.columns

Index(['year', 'imdb', 'title', 'test', 'clean_test', 'binary', 'budget',
       'domgross', 'intgross', 'code', 'budget_2013$', 'domgross_2013$',
       'intgross_2013$', 'period_code', 'decade_code'],
      dtype='object')

### Summary

We then moved onto selecting a column of data, which we can do with either the bracket accessors `movies_df['year']` or with the dot notation like `movies_df.year`.  And we can select multiple columns of data with a list of columns inside of our bracket accessors, like `movies_df[['year', 'imdb']]`.

Then we saw that we can get a list of columns with the `.columns` method, and change the columns of our dataframe with `df.columns = []`.