# Chapter 1

- Explore dataframe:
    - See first 5 rows: `df.head(5)`
    - See rows and columns: `df.shape`
    - See summary statistics : `df.describe()`
    - See column informations : `df.info()`
    - See columns : `df.columns`
    - See rows : `df.index`
    - See rows in 2-D numpy array: `df.values`
- Sort dataframe: `df.sort_values(["col1","col2"], ascending = [True, False])`
- Subset dataframe: 
    - `df[(df["col1"] < val1) & (df["col2"] == val2)]`
    - `df["col"].isin([val1, val2])`
- Create new column: `df["new_col"] = df["col"]*2`


# Chapter 2

- Summary statistics of a column:
    - `df["col"].mean()`
    - `df["col"].min()`
- cumulative sum : `df['cum_col'] = df["col"].cumsum()`
    - Others are `cummin()`, `cummax()`, `cumprod()`
- Drop duplicate : `df = df.drop_duplicates(subset = ["column", "column2"])`
- Count categorical values in a column: `df['col'].value_counts(ascending=True,normalize=True)`
- Grouping and aggregating : `df.groupby(["col1","col2"])["another_col"].agg([min,max,sum])`
- custom summmary on dataframe column:
```
def custom_func1(column):
    return column.quantile(0.3)

def custom_func2(column):
    return column.quantile(0.4)
df[["col1","col2"]].agg([custom_func1, custom_func2])
```
- pivot table on dataframe:
```
df.pivot_table(index='cat_col_as_row_number', 
               columns='cat_col_as_columns', 
               values='num_col_as_values',
               fill_value = 0,
               margins = True,
               aggfunc=[np.mean,np.median])
```


# Chapter 3

- Setting a column as index: `df.set_index(['col1','col2'])`
- Removing index: `df.reset_index()`
- Sorting with index: `df.sort_index(level=['col1','col2'], ascending = [True, False])`
- Discard index: `df.reset_index(drop=True)`
- Searching with filtering: `df[df['col'].isin(['val1','val2'])]`
- Searching with indexing: `df.loc[['val1','val2']]` # For simple index
- Searching with indexing: `df.loc[[('col1val1','col2val1'), ('col1val2','col2val2')]]` # Multi-level index
- Multi-level index slicing: 
    1. `df.sort_index(level=['col1','col2'], ascending = [True, False])`
    2. `df.loc[('col1val1','col2val1') : ('col1val2','col2val2')]`
- Slicing with date index: `df.loc['2010-08-01' : '2011-02-28',:]`

# Chapter 4

- import csv : `df = =pd.read_csv('filename.csv')`
- export csv : `df.to_csv('filename.csv', index = False, encoding='utf-8') # False: not include index`
- Extract month information : `dataframe["column"].dt.month`
- Extract year information : `dataframe["column"].dt.year`
- Detect Missing values:
    1. In any column : `df.isna().any()`
    2. Total missing entries : `df.isna().sum()`
- Read pickle file: `pd.read_pickle("dataset/filename.pkl")`
- barplot : `df.plot(kind="bar")`
- line plot : `df.plot(x="col1", y="col2")`
- scatter plot : `df.plot(x="col1", y="col2", kind="scatter", title = "Some Title")`
- Dictionary to Dataframe:
    1. `df1 = pd.DataFrame(list_of_dictionary)`
    2. `df2 = pd.DataFrame(dict_of_list)`
- overlapping histogram : 
```
# Overlapping histogram
df[df["type"] == "conventional"]["avg_price"].hist(alpha=0.5, bins=20)
df[df["type"] == "organic"]["avg_price"].hist(alpha=0.5, bins=20)
plt.legend(["conventional", "organic"])
plt.show()
```