# Chapter 2 - Data Preparation Basics
## Segment 1 - Filtering and selecting data

In [3]:
import numpy as np
import pandas as pd

from pandas import Series, DataFrame

### Selecting and retrieving data
You can write and index value in two forms:
* Label index Or
* Integer index

In [4]:
series_obj = Series(np.arange(8), index=['row 1', 'row 2', 'row 3', 'row 4', 'row 5', 'row 6', 'row 7', 'row 8'])
series_obj

row 1    0
row 2    1
row 3    2
row 4    3
row 5    4
row 6    5
row 7    6
row 8    7
dtype: int64

In [5]:
# Label index
series_obj['row 7']

6

In [6]:
# Integer index
series_obj[[0, 7]]

row 1    0
row 8    7
dtype: int64

In [7]:
# We will create a DataFrame
# We will create 36 random numbers. We want to create the same random numbers as the instructor
# To do this, we will set the seed for the random number generator
np.random.seed(25)

# Then, we create the dataframe object from the random 36 numbers which we will reshape in 6 rows and 6 columns
# We will create Label index for the rows and also column names
DF_obj = DataFrame(np.random.rand(36).reshape((6, 6)),
                   index=['row 1', 'row 2', 'row 3', 'row 4', 'row 5', 'row 6'],
                   columns=['column 1', 'column 2', 'column 3', 'column 4', 'column 5', 'column 6'])
DF_obj

Unnamed: 0,column 1,column 2,column 3,column 4,column 5,column 6
row 1,0.870124,0.582277,0.278839,0.185911,0.4111,0.117376
row 2,0.684969,0.437611,0.556229,0.36708,0.402366,0.113041
row 3,0.447031,0.585445,0.161985,0.520719,0.326051,0.699186
row 4,0.366395,0.836375,0.481343,0.516502,0.383048,0.997541
row 5,0.514244,0.559053,0.03445,0.71993,0.421004,0.436935
row 6,0.281701,0.900274,0.669612,0.456069,0.289804,0.525819


In [8]:
# Now we will use the Loc indexer
# Loc takes rows and/or columns indexers and it will return only those specific rows and/or columns
# In this case, we will select row 2 and row 5 as well column 5 and column 2
DF_obj.loc[['row 2', 'row 5'], ['column 5', 'column 2']]

Unnamed: 0,column 5,column 2
row 2,0.402366,0.437611
row 5,0.421004,0.559053


### Data slicing
You can use slicing to select and return a slice of several values from a data set. Slicing uses index values so you can use the same square brackets when doing data slicing.

How slicing differs, however, is that with slicing you pass in two index values that are separated by a colon. The index value on the left side of the colon should be the first value you want to select. On the right side of the colon, you write the index value for the last value you want to retrieve. When you execute the code, the indexer then simply finds the first record and the last record and returns every record in between them. 

In [9]:
# We will select every row between row 3 and row 7
series_obj['row 3' : 'row 7']

row 3    2
row 4    3
row 5    4
row 6    5
row 7    6
dtype: int64

### Comparing with scalars
Now we're going to talk about comparison operators and scalar values. Just in case you don't know that a scalar value is, it's basically just a single numerical value. You can use comparison operators like greater than or less than to return true/false values for all records to indicate how each element compares to a scalar value.

In [10]:
# From the DataFrame we created, we will return a boolean value for all the values whether or not they are less than 0.2
# We get a dataframe of boolean values based on the defined comparison operator
DF_obj < 0.2 

Unnamed: 0,column 1,column 2,column 3,column 4,column 5,column 6
row 1,False,False,False,True,False,True
row 2,False,False,False,False,False,True
row 3,False,False,True,False,False,False
row 4,False,False,False,False,False,False
row 5,False,False,True,False,False,False
row 6,False,False,False,False,False,False


### Filtering with scalars

In [11]:
# We can use comparison operators and scalars values to return only the values that satisfy the comparison condition we define
# Here, from the Series object, we print only the values greater than 6
series_obj[series_obj > 6]

row 8    7
dtype: int64

### Setting values with scalars

In [12]:
# Setting is where you select all records associated with specified label and set those values equal to scalar
# Here,  from the Series object, we will select row 1, row 5 and row 8 and set them to 8
series_obj['row 1', 'row 5', 'row 8'] = 8
series_obj

row 1    8
row 2    1
row 3    2
row 4    3
row 5    8
row 6    5
row 7    6
row 8    8
dtype: int64

Filtering and selecting using Pandas is one of the most fundamental things you'll do in data analysis. Make sure you know how to use indexing to select and retrieve records.