# Working with columns

### Introduction

In the last lesson, we learned about selecting loading in data with a pandas dataframe, and then selecting rows.  But to train our machine learning models, what will be even more important is working with our columns.  Think about it.  We'll have to train our model on our feature data and target data, with something like the following:


```python
from sklearn.tree import DecisionTreeRegressor

model = DecisionTreeRegressor()
model.fit(X, y)

```
To do so, we'll first be selecting columns when we assign a column in our dataframe to be our target data `y`.  And we'll be selecting columns when we choose certain certain columns from our dataframe to be our feature data X.  Let's see how we can work with columns in a dataframe. 

### Exploring Columns

Let's get started by loading up our data once again.

In [4]:
import pandas as pd
url = 'https://raw.githubusercontent.com/fivethirtyeight/data/master/bechdel/movies.csv'
df = pd.read_csv(url)
df[:1]

Unnamed: 0,year,imdb,title,test,clean_test,binary,budget,domgross,intgross,code,budget_2013$,domgross_2013$,intgross_2013$,period code,decade code
0,2013,tt1711425,21 &amp; Over,notalk,notalk,FAIL,13000000,25682380.0,42195766.0,2013FAIL,13000000,25682380.0,42195766.0,1.0,1.0


Now we can see a list of all of the columns in our dataframe, with the following.

In [6]:
df.columns

Index(['year', 'imdb', 'title', 'test', 'clean_test', 'binary', 'budget',
       'domgross', 'intgross', 'code', 'budget_2013$', 'domgross_2013$',
       'intgross_2013$', 'period code', 'decade code'],
      dtype='object')

### Selecting a single column

And from there, we can see a specific column, by using our bracket accessors.

In [7]:
df['year']

0       2013
1       2012
2       2013
3       2013
4       2013
        ... 
1789    1971
1790    1971
1791    1971
1792    1971
1793    1970
Name: year, Length: 1794, dtype: int64

Now another way to select a specific column is with the dot notation.

In [8]:
df.year

0       2013
1       2012
2       2013
3       2013
4       2013
        ... 
1789    1971
1790    1971
1791    1971
1792    1971
1793    1970
Name: year, Length: 1794, dtype: int64

Let's assign the column `domgross` to the variable `y`.

However, the dot notation cannot be used with some column names, like those with spaces.

In [9]:
df.decade code

SyntaxError: invalid syntax (<ipython-input-9-c600b1b3a395>, line 1)

In [17]:
y = df['domgross']
y

0       25682380.0
1       13414714.0
2       53107035.0
3       75612460.0
4       95020213.0
           ...    
1789    70327868.0
1790    10324441.0
1791    41158757.0
1792     4000000.0
1793     9000000.0
Name: domgross, Length: 1794, dtype: float64

### Selecting mulitple columns

Now let's move onto selecting multiple columns.  

In [10]:
df.columns

Index(['year', 'imdb', 'title', 'test', 'clean_test', 'binary', 'budget',
       'domgross', 'intgross', 'code', 'budget_2013$', 'domgross_2013$',
       'intgross_2013$', 'period code', 'decade code'],
      dtype='object')

Let's select `year` and `title` as our columns.

In [12]:
columns = ['year', 'title']
selected_df = df[columns]
selected_df[:3]

Unnamed: 0,year,title
0,2013,21 &amp; Over
1,2012,Dredd 3D
2,2013,12 Years a Slave


So we just greatly reduced the number of columns, and assigned this smaller dataframe to `selected_df`.  Let's go over how we did this.

We used the following format:

* dataframe, bracket accessors, list of columns 

```python
df[ ['col_1', 'col_2']]
```

It can be hard to keep track of all of those brackets, so it is nice to first assign the list of columns to a variable.

In [14]:
columns = ['year', 'title']
selected_df = df[columns]

selected_df[:3]

Unnamed: 0,year,title
0,2013,21 &amp; Over
1,2012,Dredd 3D
2,2013,12 Years a Slave


### Summary

In this lesson, we learned about how to select columns from our pandas dataframe.  We can start by seeing all of the columns with `columns` method.

In [23]:
df.columns

Index(['year', 'imdb', 'title', 'test', 'clean_test', 'binary', 'budget',
       'domgross', 'intgross', 'code', 'budget_2013$', 'domgross_2013$',
       'intgross_2013$', 'period code', 'decade code'],
      dtype='object')

We can select a single column by either using the bracket accessors or the dot notation, and then assign that column a variable.

In [18]:
year = df['year']

In [19]:
year = df.year

We can select multiple columns by still using the bracket accessors, and then passing through a list of columns that we would like to select.

In [22]:
cols = ['year', 'title']
selected = df[cols]
selected[:3]

Unnamed: 0,year,title
0,2013,21 &amp; Over
1,2012,Dredd 3D
2,2013,12 Years a Slave
