# Chapter 05: Getting Started with pandas

## Initial Setup

In [0]:
import pandas as pd
import numpy as np

## Why `pandas`?

- Contains data structures and data manipulation tools designed to make data cleaning and analysis fast and easy in Python.
- Work with tabular or heterogeneous data. (unlike Numpy)
- Can be used in tandem with other libraries (Numpy, statsmodel, scikit-learn,etc.)

## 5.1 Introduction to pandas Data structures

### Series

- A Series is a one-dimensional array-like object containing a sequence of values and an associated array of data labels, called its *index*
- Another way to think about a Series is as a fixed-length, ordered dictionary, as it is a mapping of index values to data values.

In [2]:
# Example - Create a series with default index
obj = pd.Series([-3,7,8,2])
obj

0   -3
1    7
2    8
3    2
dtype: int64

In [3]:
# Example - Create a series with user-defined index
obj = pd.Series([3,4,5,6], index=['a','z','f','e'])
obj

a    3
z    4
f    5
e    6
dtype: int64

In [4]:
# Example - Create a series with dictionary
prices = {'Keyboard': 500, 'Mouse': 100, 'Laptop': 2000, 'Speaker': 750}
obj = pd.Series(prices)
obj

Keyboard     500
Mouse        100
Laptop      2000
Speaker      750
dtype: int64

In [5]:
# Example - Create a series with dictionary and override keys order
prices = {'Keyboard': 500, 'Mouse': 100, 'Laptop': 2000, 'Speaker': 750}
products = ['Laptop', 'Keyboard', 'Mouse', 'Headphone']
obj = pd.Series(prices, index=products)
obj

Laptop       2000.0
Keyboard      500.0
Mouse         100.0
Headphone       NaN
dtype: float64

- Pandas uses `NaN` (Not a Number) to mark missing or not available values.
- Using `isnull`, `notnull` to check missing data.

In [0]:
#Example - Using isnull, notnull to check missing data
prices = {'Keyboard': 500, 'Mouse': 100, 'Laptop': 2000, 'Speaker': 750}
products = ['Laptop', 'Keyboard', 'Mouse', 'Headphone']
obj = pd.Series(prices, index=products)

In [7]:
obj.isnull()

Laptop       False
Keyboard     False
Mouse        False
Headphone     True
dtype: bool

In [8]:
obj.notnull()

Laptop        True
Keyboard      True
Mouse         True
Headphone    False
dtype: bool

### DataFrame


- A DataFrame represents a rectangular table of data and contains an ordered collection of columns, each of which can be a different value type (numeric, string, boolean, etc.).
- The DataFrame has both a row and column index, it can be thought of as a dictionary of Series all sharing the same index.
- While a DataFrame is physically two-dimensional, you can use it to represent higher dimensional data in a tabular format using hierarchical indexing.

In [9]:
# Example - Create a simple DataFrame 
# using a dictionary of equal-lengh list of Numpy array

data = {
    'product': ['Laptop', 'Mouse', 'Keyboard', 'Headset', 'Webcam'],
    'price': [1500, 200, 400, 600, 200],
    'brand': ['Asus', 'Logitech','Logitect','SteelSeries', 'Bluelover']
}
frame = pd.DataFrame(data)
frame

Unnamed: 0,product,price,brand
0,Laptop,1500,Asus
1,Mouse,200,Logitech
2,Keyboard,400,Logitect
3,Headset,600,SteelSeries
4,Webcam,200,Bluelover


- The resulting DataFrame will have its index assigned automatically as with Series, and the columns are placed in sorted order
- Missing values will be present as NaN in DataFrame.
- A columnin a DataFrame can be retrieved as a Series either by dict-like notation or by attribute

In [0]:
# Example - Get data from column in DataFrame
data = {
    'product': ['Laptop', 'Mouse', 'Keyboard', 'Headset', 'Webcam'],
    'price': [1500, 200, 400, 600, 200],
    'brand': ['Asus', 'Logitech','Logitect','SteelSeries', 'Bluelover']
}
frame = pd.DataFrame(data)

In [11]:
frame['product']

0      Laptop
1       Mouse
2    Keyboard
3     Headset
4      Webcam
Name: product, dtype: object

In [12]:
frame.brand

0           Asus
1       Logitech
2       Logitect
3    SteelSeries
4      Bluelover
Name: brand, dtype: object

- Rows can be retrieved by position or name with the special `loc` attribute

In [14]:
# Example - Get data from row in DataFrame using index
data = {
    'product': ['Laptop', 'Mouse', 'Keyboard', 'Headset', 'Webcam'],
    'price': [1500, 200, 400, 600, 200],
    'brand': ['Asus', 'Logitech','Logitect','SteelSeries', 'Bluelover']
}
frame = pd.DataFrame(data)
frame.loc[1]

product       Mouse
price           200
brand      Logitech
Name: 1, dtype: object

- If you assign a **lists or arrays** to a column, the value's length must match the length of the DataFrame.
- If you assign a **Series** to a column, it's labels will be realigned exactly to the DataFrame's index, inserting missing values in any holes.

In [15]:
# Example - Assign a Series to a column in DataFrame
data = {
    'product': ['Laptop', 'Mouse', 'Keyboard', 'Headset', 'Webcam'],
    'price': [1500, 200, 400, 600, 200],
    'brand': ['Asus', 'Logitech','Logitect','SteelSeries', 'Bluelover']
}
frame = pd.DataFrame(data, columns=['product', 'price', 'brand', 'available'])
availableValues = pd.Series([True, False, True], index=[1,2,4])

frame['available'] = availableValues
frame

Unnamed: 0,product,price,brand,available
0,Laptop,1500,Asus,
1,Mouse,200,Logitech,True
2,Keyboard,400,Logitect,False
3,Headset,600,SteelSeries,
4,Webcam,200,Bluelover,True


In [16]:
# Example - Create new column in DataFrame using data within
data = {
    'product': ['Laptop', 'Mouse', 'Keyboard', 'Headset', 'Webcam'],
    'price': [1500, 200, 400, 600, 200],
    'brand': ['Asus', 'Logitech','Logitect','SteelSeries', 'Bluelover']
}
frame = pd.DataFrame(data, columns=['product', 'price', 'brand', 'available'])

frame['isTrustedBrand'] = frame.brand != 'Bluelover'
frame

Unnamed: 0,product,price,brand,available,isTrustedBrand
0,Laptop,1500,Asus,,True
1,Mouse,200,Logitech,,True
2,Keyboard,400,Logitect,,True
3,Headset,600,SteelSeries,,True
4,Webcam,200,Bluelover,,False


<h3><b>IMPORTANT NOTE</b></h3>

> The column returned from indexing a DataFrame is a **view** on the underlying data, **not a copy**. Thus, any in-place modification to the Series will be reflected in the DataFrame.

In [0]:
# Example - Create new DataFrame with nested Dictionary 
# and Display transposed DataFrame using .T notation
data = {
    'Logitech': {'Mouse': 400, 'Keyboard': 600},
    'SteelSeries': {'Headset': 700,'Keyboard': 800}
}

frame = pd.DataFrame(data)

In [18]:
frame

Unnamed: 0,Logitech,SteelSeries
Headset,,700.0
Keyboard,600.0,800.0
Mouse,400.0,


In [19]:
frame.T

Unnamed: 0,Headset,Keyboard,Mouse
Logitech,,600.0,400.0
SteelSeries,700.0,800.0,


- The `values` attribute returns the data contained in the DataFrame as a two-dimensional arrays.
- If the DataFrame's are different `dtypes`, the `dtype` of the values array will be chosen to accommodate all of the columns.

- Possible data inputs to DataFrame Constructor

| Type | Notes |
|------|-------|
| 2D ndarray | A matrix of data, passing optional row & column labels|
| dict of arrays, lists or tuples | Each sequence becomes a column in the DataFrame, all sequences must be the same length|
| NumPy structured/record array | Treated as the "dict of arrays" case |
| dict of Series | Each value becomes a column, indexes from each Series are unioned together to form the result's row index if no explicit index is passed |
| dict of dicts | Each inner dict becomes a column, keys are unioned to form the row index as in the "dict of Series" case |
| List of dict or Series| Each item becomes a row in the DataFrame, union of dict keys or Series indexes become the DataFrame's column labels |
| List of list or tuples | Treated as the "2D ndarry" case|
| Another DataFrame | The DataFrame's indexes are used unless different ones are passed |
| NumPy MaskedArray | Like the "2D ndarry" case except masked values become NA/missing in the DataFrame result|

### Index Objects

- **pandas**'s Index objects are responsible for holding the axis label and other metadata
- Index object are **immutable** and thus can't be modified by the user
- `pandas` Index can contain duplicate labels. Selections with duplicate labels will select all occurences of that label

- Some Index methods and properties

| Method | Description |
|--------|-------------|
| append | Concatenate with additional Index objects, producing a new Index |
| difference | Compute set of differences as an Index |
| intersection | Compute set intersection |
| union | Compute set union |
| isin | Compute boolean array indicating whether each value is contained in the passe collection |
| delete | Compute new Index with element at index i deleted |
| drop | Compute new Index by deleteing passed values |
| insert | Compute new Index by inserting element at Index i |
| is_monotonic | Return `True` if each element is greater than or equal to the previous element |
| is_unique | Return `True` if the Index has no duplicate values |
| unique | Compute the array of unique values in the Index |

## 5.2 Essential Functionality

### Reindexing

- Reindexing is to create a new object with the data **comformed** to a new index
- Calling `reindex` on Series rearranges the data according to the new index, introducing missing values if any index values were not already present

In [21]:
# Example - Reindexing an series
obj = pd.Series([3.5, 7.8, 8.9, -1.2], index=['d','a','e','h'])
obj

d    3.5
a    7.8
e    8.9
h   -1.2
dtype: float64

In [22]:
obj2 = obj.reindex(['a','b','c','d','e'])
obj2

a    7.8
b    NaN
c    NaN
d    3.5
e    8.9
dtype: float64

- During reindexing, we are able to do some interpolation or filling of values.

In [23]:
# Example - Forward fill values during reindexing
obj = pd.Series(['Product 01', 'Product 02', 'Product 03'], index=[0,2,4])
obj

0    Product 01
2    Product 02
4    Product 03
dtype: object

In [24]:
obj2 = obj.reindex(range(6), method='ffill')
obj2

0    Product 01
1    Product 01
2    Product 02
3    Product 02
4    Product 03
5    Product 03
dtype: object

- With DataFrame, `reindex` can alter either the row inedx, columns, or both. **By default**, passing a sequence will reindex only the rows in the result.

In [25]:
# Example - Reindex rows, columns in DataFrame
frame = pd.DataFrame(
    np.arange(9).reshape((3,3)),
    index=['a','b','c'],
    columns=['Product 01', 'Product 02', 'Product 03']
)

# Reindex rows
frame2 = frame.reindex(['a','b','c','d'])
frame2

Unnamed: 0,Product 01,Product 02,Product 03
a,0.0,1.0,2.0
b,3.0,4.0,5.0
c,6.0,7.0,8.0
d,,,


In [28]:
# Reindex columns
newProductIndexList = ['Product 01', 'New Product', 'Product 02']
frame3 = frame.reindex(columns = newProductIndexList)
frame3

Unnamed: 0,Product 01,New Product,Product 02
a,0,,1
b,3,,4
c,6,,7


In [29]:
# Reindex columns and rwos
frame4 = frame.loc[['a','b','c','d'], newProductIndexList]
frame4

Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

See the documentation here:
https://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
  return self._getitem_tuple(key)


Unnamed: 0,Product 01,New Product,Product 02
a,0.0,,1.0
b,3.0,,4.0
c,6.0,,7.0
d,,,


- `reindex` function arguments

| Argument | Description |
|----------|-------------|
| index | New sequence to use as index. Can be index instance or any other sequence-like Python data structure. An Index will be used exactly as is without any copying |
| method | Interpolation (fill) method, 'ffill' fills forward, while 'bfill' fill backward |
| fill_value | Substitute value to use when introducing missing data by reindexing |
| limit | When forward-or backfilling, maximum size gap (in number of elements) to fill |
| tolerance | When forward-or backfilling, maximum size gap (in absolute numeric distance) to fill for inexact matches|
| level | Match simple index on level of Multiindex, otherwise select subset of|
| copy | If `True`, always copy underlying data even if new index is equivalent to old index, if `False`, do not copy the data when the indexes are equivalent |

### Dropping Entries from an Axis

- Dropping one or more entries from an axis if you already have an index array of list without those entries.

In [30]:
# Example - Drop entries in Series
obj = pd.Series(np.arange(4.), index=['a','b','c','d'])
obj

a    0.0
b    1.0
c    2.0
d    3.0
dtype: float64

In [31]:
newObj = obj.drop(['b','c'])
newObj

a    0.0
d    3.0
dtype: float64

- With DataFrame, index values can be deleted from eiter axis

In [32]:
# Example - Drop rows, columns in DataFrame
frame = pd.DataFrame(
    np.arange(16).reshape((4,4)),
    index=['Product 01', 'Product 02', 'Product 03', 'Product 04'],
    columns = ['one', 'two', 'three', 'four']
)

frame.drop(['Product 03'])

Unnamed: 0,one,two,three,four
Product 01,0,1,2,3
Product 02,4,5,6,7
Product 04,12,13,14,15


In [33]:
# Drop columns
frame.drop(['one','three'], axis='columns')

Unnamed: 0,two,four
Product 01,1,3
Product 02,5,7
Product 03,9,11
Product 04,13,15


### Indexing, Selection, and Filtering

<h3><b>IMPORTANT NOTE:</h3></b>

> Slicing with  labels in Series **behaves differently** than normal Python slicing in that the end-point inclusive

In [34]:
# Example - Slicing with labels vs Slicing in normal Python
obj = pd.Series(
    np.arange(5),
    index=['a','b','c','d','e']
)

# Slicing with labels
obj['b':'d']

b    1
c    2
d    3
dtype: int64

In [35]:
# Slicing in normal Python
obj[1:3]

b    1
c    2
dtype: int64

- Boolean DataFrame can be used in indexing

In [36]:
# Example - Indexing using boolean DataFrame
frame = pd.DataFrame(
    np.arange(16).reshape((4,4)),
    index=['Product 01','Product 02','Product 03','Product 04'],
    columns=['one','two','three','four']
)

frame[frame < 5] = 0 # Set all values, which are slower than 5, to 0
frame

Unnamed: 0,one,two,three,four
Product 01,0,0,0,0
Product 02,0,5,6,7
Product 03,8,9,10,11
Product 04,12,13,14,15


- Using either axis labels (`loc`) or intergers (`iloc`) allows us to select a subset of the rows and columns from a DataFrame with NumPy-like notation.

In [37]:
# Example - Using loc & iloc
frame = pd.DataFrame(
    np.arange(16).reshape((4,4)),
    index=['Product 01','Product 02','Product 03','Product 04'],
    columns=['one','two','three','four']
)

frame.loc['Product 03', ['two', 'four']]

two      9
four    11
Name: Product 03, dtype: int64

In [38]:
frame.iloc[[1,2], [3,0,1]]

Unnamed: 0,four,one,two
Product 02,7,4,5
Product 03,11,8,9


In [39]:
frame.loc[:'Product 02', 'two']

Product 01    1
Product 02    5
Name: two, dtype: int64

In [40]:
frame.iloc[:, :4][frame.two > 5]

Unnamed: 0,one,two,three,four
Product 03,8,9,10,11
Product 04,12,13,14,15


- Indexing options with DataFrame

| Type | Notes |
|------|-------|
| df[val] | Select single column or sequence of columns from the DataFrame; special case conveniences; boolean array (filter rows), slice (slice rows), or boolean DataFrame (set values based on some criterion)|
| df.loc[val] | Select single row or subset of rows from the DataFrame by label|
| df.loc[:, val] | Select single column or subset of columns by label |
| df.iloc[where] | Select single row or subset of rows from the DataFrame by integer position |
| df.iloc[:, where] | Select single column or subsest of columns by integer position |
| df.iloc[where_i, where_j] | Select both rows and columns by integer position |
| df.at[label_i, label_j] | Select a single scalar value by row and column label |
| df.iat[i,j] | Select a single scalar value by row and column position (integers) |
| `reindex` method | Select either rows or columns by labels |
| `get_values`, `set_values` methods | Select single value by row and column label|

### Interger Indexes

- Working with `pandas` objects indexed by integers is different from using indexes of build-int Python data structures.
- To keep thing consistent:
    - **Using `loc` for labels index**
    - **Using `iloc` for integers index**

In [41]:
# Example - Error while trying to access element at "-1"
example01 = pd.Series(np.arange(3.), index=['a','b','c'])
print(example01)
print('Element at index -1 = {}'.format(example01[-1]), end='\n\n') # No error

a    0.0
b    1.0
c    2.0
dtype: float64
Element at index -1 = 2.0



In [45]:
example02 = pd.Series(np.arange(3.))
try:
    print(example02)
    print('Element at index -1 = {}'.format(example02[-1]), end='\n\n') # Error
except:
    print("Can't get element at index -1")

0    0.0
1    1.0
2    2.0
dtype: float64
Can't get element at index -1


### Arithmetic & Data Alignment

- When adding together objects, if any index pairs are not the same, the respective index in the result will be the union of the index pairs.
- This is similar to `OUTER JOIN`

In [46]:
#Example - Joining two Series, each one has few different indexes from the other.
series01 = pd.Series(np.arange(4), index=list('ABCD'))
series02 = pd.Series(np.arange(5, 9), index=list('ACEF'))

series01 + series02

A    5.0
B    NaN
C    8.0
D    NaN
E    NaN
F    NaN
dtype: float64

In [47]:
#Example - Joining two DataFrame, each one has few different rows and columns from the other.
df01 = pd.DataFrame(
    np.arange(9).reshape((3,3)), 
    columns=list('ABC'),
    index=list('DEF')
)

df02 = pd.DataFrame(
    np.arange(12).reshape((4,3)), 
    columns=list('ACE'),
    index=list('CDEF')
)

df01+df02

Unnamed: 0,A,B,C,E
C,,,,
D,3.0,,6.0,
E,9.0,,12.0,
F,15.0,,18.0,


- Flexible arithmetic methods (NOTE: prefix 'r-' stand for reverse)

| Method | Description |
|--------|-------------|
|add, radd | Methods for addition +|
|sub, rsub | Methods for subtraction - |
|div, rdiv | Methods for division / |
|floordiv, rfloordiv | Methods for floor division //|
|mul, rmul | Methods for multiplication *|
|pow, rpow | Methods for exponentiation **|

- **Operations between DataFrame and Series**
    - By default, arithmetic between DataFrame and Series matches the index of the Series on the DataFrame's columns, broadcasting down the rows.
    - If an index is not found in either the DataFrame's columns r the Series's index, the object will be reindexed to form the union
    - Use arithmetic methods with specifix axis to broadcast over the columns, matching on the rows

In [48]:
# Example - Operation between DataFrame and Series
# Dataframe - Series
df = pd.DataFrame(
    np.arange(12).reshape((4,3)),
    columns=list('XYZ'),
    index=list('ABCD')
)
s1 = df.iloc[0]

df - s1

Unnamed: 0,X,Y,Z
A,0,0,0
B,3,3,3
C,6,6,6
D,9,9,9


In [49]:
# DataFrame + Series with different index => Automatic Reindex
s2 = pd.Series(np.arange(3), index=list('XWY'))
df+s2

Unnamed: 0,W,X,Y,Z
A,,0.0,3.0,
B,,3.0,6.0,
C,,6.0,9.0,
D,,9.0,12.0,


In [50]:
# DataFrame + Series and broadcast change over the columns
s3 = df['Y']
df.add(s3, axis='index')

Unnamed: 0,X,Y,Z
A,1,2,3
B,7,8,9
C,13,14,15
D,19,20,21


### Function Application & Mapping

- NumPy `ufuncs` (element-wise array methods) also work with pandas objects

In [51]:
# Example - Using numpy.abs() with DataFrame
df = pd.DataFrame(np.arange(-2,2).reshape((2,2)))
np.abs(df)

Unnamed: 0,0,1
0,2,1
1,0,1


- Using `apply()` method to apply a function on one-dimensional arays to each column or row.

In [52]:
# Example - Using apply() with DataFrame to execute a function for each row in DataFrame
differenceMaxMin = lambda s: s.max() - s.min()

df = pd.DataFrame(np.arange(-4,5).reshape(3,3))
df

Unnamed: 0,0,1,2
0,-4,-3,-2
1,-1,0,1
2,2,3,4


In [54]:
df.apply(differenceMaxMin) # Apply for each columns

0    6
1    6
2    6
dtype: int64

In [55]:
df.apply(differenceMaxMin, axis='columns')# Apply for each rows

0    2
1    2
2    2
dtype: int64

- Using `applymap` (for DataFrame) and `map` (for Series) to do element-wise Python functions.

In [56]:
# Example - Do elemental-wise operation on an DataFrame
df = pd.DataFrame(np.arange(-4,5).reshape((3,3)))

negativeToZero = lambda x: 0 if x < 0 else x
df.applymap(negativeToZero)

Unnamed: 0,0,1,2
0,0,0,0
1,0,0,1
2,2,3,4


### Sorting & Ranking

- Use `sort_index()` to sort lexicographically by row or column index and return a new & sorted object
- Data is stored in ascending order by default. Set optional parameter `ascending=False` for descending order.
- Use `sort_values()` to sort by values. Any missing values are sorted to the end of the Series by default

In [57]:
# Example - Sort a DataFrame by row index and column index
df = pd.DataFrame(
    np.arange(-4,5).reshape((3,3)),
    columns=list('CAB'),
    index=list('ZXY')
)

df.sort_index(axis=0) # Sort by row index, add "axis = 0" is optional

Unnamed: 0,C,A,B
X,-1,0,1
Y,2,3,4
Z,-4,-3,-2


In [58]:
df.sort_index(axis=1, ascending=False) # Sort by column index in descending order

Unnamed: 0,C,B,A
Z,-4,-2,-3
X,-1,1,0
Y,2,4,3


In [59]:
df.sort_values(by=['C']) # Sort by values of column C in ascending order

Unnamed: 0,C,A,B
Z,-4,-3,-2
X,-1,0,1
Y,2,3,4


- Ranking assigns ranks from one through the number if valid data points in an array. By default, the `rank()` method breaks ties by assigning each group the mean rank.
- Specify parameter `method=` to change tie-breaking method (e.g `method='first'` rank by appearnace order)
- Specify parameter `ascending=` to change result order
- Tie-breaking methods with rank:

| Method | Description | 
|--------|-------------|
| average | Default: assign the average rank to each entry in the equal group |
| min | Use the minimum rank for the whole group |
| max | Use the maximunm rank for the whole group |
| first | Assign ranks in the order the values appear in the data |
| dense | Like "min" but ranks always increase by 1 in between groups rather than the number of equal elements in a group |

In [60]:
# Example - Ranking a Series
s = pd.Series([6,2,8,9,2,9,0])

s.rank() # Ranking by valuues

0    4.0
1    2.5
2    5.0
3    6.5
4    2.5
5    6.5
6    1.0
dtype: float64

In [61]:
s.rank(method='first') # Ranking by first appearance

0    4.0
1    2.0
2    5.0
3    6.0
4    3.0
5    7.0
6    1.0
dtype: float64

## 5.3 Summarizing & Computing Descriptive Statistics

- Options for reduction methods

| Method | Description |
|--------|-------------|
| axis | Axis to reduce over; 0 for DataFrame's rows and 1 for columns |
| skipna | Exclude missing values; `True` by default |
| level | Reduce grouped by level if the axis is hierachically indexed (Multiindex)|

- Descriptive and summary statistics

| Method | Description |
|--------|-------------|
| count | Number of non-NA values |
| describe | Compute set of summary statistic for Series or each DataFrame column |
| min, max | Compute minimum and maximum values |
| argmin, argmax | Compute index locations (integers) at which minimum or maximum value obtained, respectively |
| idxmin, idxmax | Compute index labels at which minimum or maximum value obtained, respectively |
| quantile | Compute sample quantile ranging from 0 to 1 |
| sum | Sum of values |
| mean | Mean of values |
| median | Arithmetic median (50% quantile) of values |
| mad | Mean absolute deviation from mean value |
| prod | Product of all values |
| var | Sample variance of values |
| std | Sample standard deviation of values |
| skew | Sample skewness (third moment) of values |
| kurt | Sample kurtosis (fourth moment) of values |
| cumsum | Cumulative sum of values |
| cummin, cummax | Cumulative minimum or maximum of values, respectively |
| cumprod | Cumulative product of values |
| diff | Compute frist arithmetic difference |
| pct_change | Compute percent changes |

### Unique Values, Value Counts & Membership

- Unique, value counts, and set memebership methods

| Method | Description |
|--------|-------------|
| isin | Compute boolean array indicating whether each Series values is contained in the passed sequence of values |
| match | Compute integer indices for each vale in an array into another array of distinct values, helpful for data alignment and join-type operations |
| unique | Compute array of unique values in a Series, returned in the orer observed |
| values_counts | Return a Series containing unique values as its index and frequencies as its values, ordered count in descending order |