# Manipulating with Pandas
## CHAPTER 1: Extracting and transforming data
### iloc and loc in pandas

![loc and iloc](Pandas-selections-and-indexing.png)

#### iloc
`iloc` is not usually used unless we want to access the first and last column as `df.iloc[0] and df.iloc[-1]`

_Remarks:_

1. Note that .iloc returns a Pandas Series when one row is selected, and a Pandas DataFrame when multiple rows are selected, or if any column in full is selected. To counter this, pass a single-valued list if you require DataFrame output.
2. When using .loc, or .iloc, you can control the output format by passing lists or single values to the selectors.
3. When using .loc, or .iloc, you can control the output format by passing lists or single values to the selectors.
4. When selecting multiple columns or multiple rows in this manner, remember that in your selection e.g.[1:5], the rows/columns selected will run from the first number to one minus the second number. e.g. [1:5] will go 1,2,3,4., [x,y] goes from x to y-1.

#### loc
`loc` is more popular when extracting data than `iloc`.

##### Label-based / Index-based indexing using .loc
Examples: 

In [None]:
# Select rows with index values 'Andrade' and 'Veness', with all columns between 'city' and 'email'
data.loc[['Andrade', 'Veness'], 'city':'email']
# Select same rows, with just 'first_name', 'address' and 'city' columns
data.loc['Andrade':'Veness', ['first_name', 'address', 'city']]
 
# Change the index to be based on the 'id' column
data.set_index('id', inplace=True)
# select the row with 'id' = 487
data.loc[487]

##### Boolean / Logical indexing using .loc
Note that there are two output modes that .loc will return when we execute: 
![.loc output examples](loc_indexer_returns_series_or_dataframe.png)

Examples:

In [None]:
# Select rows with first name Antonio, # and all columns between 'city' and 'email'
data.loc[data['first_name'] == 'Antonio', 'city':'email']
 
# Select rows where the email column ends with 'hotmail.com', include all columns
data.loc[data['email'].str.endswith("hotmail.com")]   
 
# Select rows with last_name equal to some values, all columns
data.loc[data['first_name'].isin(['France', 'Tyisha', 'Eric'])]   
       
# Select rows with first name Antonio AND hotmail email addresses
data.loc[data['email'].str.endswith("gmail.com") & (data['first_name'] == 'Antonio')] 
 
# select rows with id column between 100 and 200, and just return 'postal' and 'web' columns
data.loc[(data['id'] > 100) & (data['id'] <= 200), ['postal', 'web']] 
 
# A lambda function that yields True/False values can also be used.
# Select rows where the company name has 4 words in it.
data.loc[data['company_name'].apply(lambda x: len(x.split(' ')) == 4)] 
 
# Selections can be achieved outside of the main .loc for clarity:
# Form a separate variable with your selections:
idx = data['company_name'].apply(lambda x: len(x.split(' ')) == 4)
# Select only the True values in 'idx' and only the 3 columns specified:
data.loc[idx, ['email', 'first_name', 'company']]

#### ix - alternative way for .loc and .iloc
The `ix[] indexe`r is a hybrid of `.loc` and `.iloc`. Generally, ix is label based and acts just as the `.loc` indexer. However, `.ix` also supports integer type selections (as in `.iloc`) where passed an integer. __This only works where the index of the DataFrame is not integer based__. `ix` will accept any of the inputs of `.loc` and `.iloc`.

Examples:

In [None]:
# ix indexing works just the same as .loc when passed strings
data.ix[['Andrade']] == data.loc[['Andrade']]
# ix indexing works the same as .iloc when passed integers.
data.ix[[33]] == data.iloc[[33]]
 
# ix only works in both modes when the index of the DataFrame is NOT an integer itself.

### Positional and labeled indexing
Given a pair of label-based indices, sometimes it's necessary to find the corresponding positions. In this exercise, you will use the Pennsylvania election results again. The DataFrame is provided for you as election.

Find x and y such that e`lection.iloc[x, y] == election.loc['Bedford', 'winner']`. That is, what is the row position of `'Bedford'`, and the column position of 'winner'? Remember that the first position in Python is 0, not 1!

To answer this question, first explore the DataFrame using `election.head()` in the IPython Shell and inspect it with your eyes.

In [None]:
# Assign the row position of election.loc['Bedford']: x
election.index
x = election.index.get_loc('Bedford')

# Assign the column position of election['winner']: y
y = election.columns.get_loc('winner')

# Print the boolean equivalence
print(election.iloc[x, y] == election.loc['Bedford', 'winner'])

### Indexing and column rearrangement
There are circumstances in which it's useful to modify the order of your DataFrame columns. We do that now by extracting just two columns from the Pennsylvania election results DataFrame.

Your job is to read the CSV file and set the index to `'county'`. You'll then assign a new DataFrame by selecting the list of columns `['winner', 'total', 'voters']`. The CSV file is provided to you in the variable filename.

In [None]:
# Import pandas
import pandas as pd

# Read in filename and set the index: election
election = pd.read_csv(filename, index_col='county')

# Create a separate dataframe with the columns ['winner', 'total', 'voters']: results
results = election[['winner', 'total', 'voters']]

# Print the output of results.head()
print(results.head())

### Slicing rows
The Pennsylvania US election results data set that you have been using so far is ordered by county name. This means that county names can be sliced alphabetically. In this exercise, you're going to perform slicing on the county names of the election DataFrame from the previous exercises, which has been pre-loaded for you.

In [None]:
# Slice the row labels 'Perry' to 'Potter': p_counties
p_counties = election.loc['Perry':'Potter']

# Print the p_counties DataFrame
print(p_counties)

# Slice the row labels 'Potter' to 'Perry' in reverse order: p_counties_rev
p_counties_rev = election.loc['Potter':'Perry':-1]

# Print the p_counties_rev DataFrame
print(p_counties_rev)

### Slicing columns
Similar to row slicing, columns can be sliced by value. In this exercise, your job is to slice column names from the Pennsylvania election results DataFrame using `.loc[]`.

It has been pre-loaded for you as election, with the index set to `'county'`.