# Accessing Data within Pandas

## Introduction
In this lesson we're going to dig into various methods for accessing data from our Pandas Series and DataFrames.

## Objectives

You will be able to:
* Understand and explain some key Pandas methods
* Access DataFrame data by using the label
* Perform boolean indexing on both Series and DataFrames
* Use simple selectors for series
* Set new Series and DataFrame inputs

## Importing pandas and the data

First, let's make sure we import `pandas` as `pd`.

In [None]:
import pandas as pd

To show how to access data with Pandas, let's use the "wine" data set in the scikit-learn library (you might have heard about this library before - you'll use it extensively when we get to machine learning!). Don't worry about the code below, we're essentially just making sure you have access to the wine data set.

The data contained in the wine data set are the results of a chemical analysis of wines grown in Italy. It contains the quantities of 13 wine constituents. 

In [None]:
from sklearn.datasets import load_wine

data = load_wine()
df = pd.DataFrame(data.data, columns=data.feature_names)

Great! Our data set is now stored in the variable `df`. As you know, you can look at its elements by using `df` or `print(df)`.

In [None]:
print(df)

Now what if you only want to see only a few lines of the data, based on certain constraints? You'll learn how to access data in this lesson!

## Methods and attributes to access data information

It won't be a surprise that our `df` object is a pandas DataFrame object. Let's verify this using the `type()`-function

In [None]:
type(df)

There are some methods and attributes associated with pandas objects (both DataFrames *and* series!) which makes retrieving information from the data particularly easy. Some commonly used methods:
- `.head()`
- `.tail()`

And attributes:
- `.index`
- `.columns`
- `.dtypes`
- `.shape`

### Some methods: `.head()`, `.tail()` and `.info()`

By using `.head()` and `.tail()`, you can select the first $n$ rows from your dataframe. The default $n$ is 5, but you can change this value inside the parentheses. For example:

In [None]:
# First 5 rows of df
df.head()

In [None]:
# last 3 rows of df
df.tail(3)

To get a concise summary of the dataframe you can use `.info()`

In [None]:
df.info()

### Some attributes

Using `.index` you can access the index or row labels of the DataFrame.

In [None]:
df.index

Using `.columns`, you can access the column labels of the DataFrame.

In [None]:
df.columns

Using `.dtypes` returns the dtypes in the DataFrame (compare with `.info()!)

In [None]:
df.dtypes

`.shape` returns a tuple representing the dimensionality  (in `(rows,columns)` ) of the DataFrame.

In [None]:
df.shape

## Selecting dataframe information

In the previous section, we deliberately omitted 2 very important attributes:
- `.iloc`, which is a pandas dataframe indexer used for integer-location based indexing / selection by position.
- `.loc`, which has 2 use cases:
       - Selecting by label / index
       - Selecting with a boolean / conditional lookup


### `.iloc`

You can use `.iloc` to select single rows. To select the 4th row, you can use `.iloc[3]` like:

In [None]:
df.iloc[3]

You can use a colon to select several columns. Note that you'll use a structure `.iloc[a:b]` where the row with index `a` will be included in the selection and the row with index `b` is excluded.

In [None]:
df.iloc[5:8]

Next, you can use `,` to perform *column* selections based on their index as well. The command below selects full columns 3-6:

In [None]:
df.iloc[:,3:7]

Last but not least, you can perform column and row selections at once:

In [None]:
df.iloc[5:10,3:9]

### `.loc`

 #### a) `.loc` label-based indexing

You can `.loc` to select columns based on their (row index and) column name. Examples:

In [None]:
df.loc[:,"magnesium"]

An alternative method here is simply calling `df["magnesium"]`!

In [None]:
df.loc[7:16,"magnesium"]

#### b) boolean indexing using `.loc`

Sometimes you'd like to select certain rows in your data set based on the value for a certain variable. Imagine you'd like to create a new dataframe that only contains the wines with an alcohol percentage below 12. This can be done as follows:

In [None]:
df.loc[df["alcohol"]<12]

You can verify that simply using `df[df["alcohol"]<12]`, you can obtain the same result!

However, the .`loc` attribute is useful if you'd only want the color intensity for the wines with an alcohol percentage below 12. You can obtain the result as follows:

In [None]:
df.loc[df["alcohol"]<12, ["color_intensity"]]

## Selectors for series

Until now we've only really discussed pandas DataFrames. Most of these methods and selectors are also applicable to pandas series. See how you can convert a one-column DataFrame into a Pandas Series:

In [None]:
# Let's save our color intensity dataframe into an object `col_intensity`
col_intensity = df["color_intensity"]

In [None]:
type(col_intensity)

Note how col_intensity is now a pandas *Series*.

Many of the commands discussed before are readily applicable to series:

In [None]:
col_intensity[0:3]

In [None]:
col_intensity[col_intensity > 8] # or col_intensity.loc[col_intensity>8]

## Changing and setting values in DataFrames and series

### Changing values

Imagine that for some reason, you're not interested in the color intensity values for color intensities above 10, and simply want to set all color intensities to 10 when they are bigger than 10. You can use a selector method and then assign it a new value, just like this:

In [None]:
df.loc[df["color_intensity"]>10, "color_intensity"] = 10

### Creating new columns

Now imagine that we want to create a new column named "shade" which has a value "light" when the color_intensity is below 7, and "dark" when the intensity is > 7. This can be done as follows:

In [None]:
df.loc[df["color_intensity"]>7, "shade"] = "dark"
df.loc[df["color_intensity"]<=7, "shade"] = "light"

Have another look at `df`. `shade` is added as a 14th column! 

## Summary

We've introduced a range of techniques for accessing information in Pandas Series and DataFrames, selecting rows and columns, changing values, and creating new columns! Now, it's time for some practice! Let's start working on a lab where you will get a chance to combine some of these methods!