## Working with pandas dataframes and columns
Let's learn some basics of working with pandas dataframes. 

### Read in the dataset
Let's read in the Iris dataset from a URL, just like we did in the lecture: 

In [1]:
import pandas as pd

# url to get file from
url = "http://mlr.cs.umass.edu/ml/machine-learning-databases/iris/iris.data"

# read the file into a dataframe
iris = pd.read_csv(url, header=None)

### Add column names
Column names are stored as a list and can be accessed with the following syntax: 

`df.columns`

In [None]:
iris.columns

We can see that there are no column names. Let's create a list of column names and apply it to our dataframe: 

In [None]:
iris.columns = ['Sepal Length', 'Sepal Width', 'Petal Length', 'Petal Width', 'Class']

In [None]:
iris.head()

### Rename Columns
Actually... it's a terrible idea to have spaces in our column names. That makes it very hard to work with columsn downstream. Let's rename our columns. 

We can either just feed the columns attribute a new list of names: 

In [None]:
iris.columns = ['SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth', 'Class']

In [None]:
iris.head()

But what if we have hundreds or thousands of columns? That's pretty tedious. Also, how do we know our columns will be in the same order as our list? 

The safest way to rename columns is using pandas rename() method, and a dictionary of old to new name mappings: 

In [None]:
iris = iris.rename(columns={"Sepal Length":"SepalLength", 
                   "Sepal Width":"SepalWidth",
                   "Petal Length":"PetalLength",
                   "Petal Width":"PetalWidth",
                   "Class":"Class"
                  })

In [None]:
iris.head()

### Viewing a column
We can pull out just a single column from a datframe using the following syntax:
`df["column_name"]`

In [None]:
iris["SepalLength"].head()

You can also select a column as: 

In [None]:
iris.SepalLength.head()

### Viewing Multiple Columns
We can use a similar syntax to view multiple dataframe columns, we just feed it a list instead of a single column name: 

In [None]:
iris[ ["SepalLength", "SepalWidth"]  ].head()

### loc and iloc
You can use loc and iloc to select data in pandas when you don't know the column name, or if you want to grab out a row by position.

- iloc = select by index
- loc = selecting by label or boolean/conditional 

The syntax is: 

`df.iloc[<row selection>, <column selection>]
df.loc[<row selection>, <column selection>]`

In [None]:
iris.iloc[0] # first row 

In [None]:
iris.iloc[1:5] # second to fifth rows

In [None]:
iris.iloc[-1] # last row 

In [None]:
iris.iloc[:,0].head() # first column

In [None]:
iris.iloc[:,-1].head() # last column

In [None]:
# first five rows and third and fourth columns
iris.iloc[0:5, 2:4].head() 

### Setting an Index
Setting an index on a dataframe makes it much easier to work with downstream. You set the index as the main point of reference in your dataset, these will become your row labels. 

In [None]:
iris = iris.set_index("Class")

In [None]:
iris.head()

### Add a new column
Adding a new column to an existing dataframe is easy: 

In [None]:
iris["fake_column"] = "testing"

In [None]:
iris.head()

### Delete a column
Let's get rid of that fake column using the drop() method. We need to add an argument `axis=1` to let pandas know we want to drop the column. If we wanted to drop by row, we would use `axis=0`. 

In [None]:
iris = iris.drop("fake_column", axis=1)

In [None]:
iris.head()

### Adding new rows
If you want to add new rows to a pandas dataframe, use the append method, as follows: 

In [None]:
iris.loc["Iris-mimosa"] = [42,42,42,42]

In [None]:
iris