***Pandas Functions***
======================
* pd.util.testing.makeDataFrame() --- create a random DataFrame

*Notebook Settings*
-------------------
* pd.set_option('max_rows', 10) --- set 10 to be the maximum number of rows to be displayed
    * For reference, see 
        * < https://pandas.pydata.org/pandas-docs/stable/user_guide/options.html > 
        * < https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.set_option.html >

***DataFrame Methods & Functions***
===================================

*Reading Files*
---------------
* pd.read_csv(..., parse_dates=[0], chunksize=25]) --- read csv file

*Descriptive Functions*
-----------------------
* df.describe() --- return the columns' statistics
* df.info() --- return high level summary of the columns
* df.shape --- return the rows and columns
* df.size --- return the total number of possible values (nrows * ncols)
* df.count() --- return the number of non-zero items in each column
* df.hist(fig_size=...) --- return the histogram of all numerical attributes
* df.corr() --- return the correlation matrix

*Time-series Functions*
-----------------------
* df.X1.resample('1y').median() --- to convert higher frequency data to lower frequency data based on some aggregation method
* df.X1.diff(1) --- to take the first order difference
* df.X1.rolling(5).mean()
* pd.plotting.autocorrelation_plot(data.X1)

*Data Retrival Functions*
-------------------------
* df[ (df.X1 == 'abc') &| (df.X2 == 'def') ]
* df.X1.isin(['abc'])
* df.select_dtypes(include=[['datetime', 'object']], exclude=[['float']]) --- to include/exclude columns of defined datatypes
* df.isnull() --- return a DataFrame with Boolean values testing if cell is na
* df.sample(frac=0.5) --- randomly return 50% of the data
* df.nlargest(10, 'X1') --- return the top 10 rows based on X1
* df.X1.value_counts() --- return the frequency count of the values in X1

*Information Retrival Function*
-------------------------------
* df.index --- to return the index method
* df.memory_usage(deep=True) --- if deep=True, it will retrieve the true memory usage for string/object datatype else it will retrieve the size of the pointer
* df.duplicated(keep=False) --- return a Boolean series where ALL occurrence of duplicated records will display True

*Data Manipulation Functions*
-----------------------------
* df.applymap(some_function) --- to apply some function to all elements in the table
* df['X1'].apply(some_function, axis=0) --- to apply some function at the row leveel
* df.rename(columns= {'col1': 'column1'}, inplace=True)
* df.values --- generate a 2D numpy array
* df.explode('X1') --- expand list in cell into row items
* df['X1'].pct_change() --- generate series of percentage change 
* df.sort_values(by=['...'], ascending=False, inplace=False)
* df['X1'].astype(str) --- change the column datatype to string
* df.fillna(999) --- fill na values with 999
* df.fillna(method = 'bfill') --- this fill na values with the last known values
* df.X1.replace( {'values_2b_replaced': 'replacement_value', }, inplace= True ) --- to replace values 
    * df.replace( '[A-Za-z]', '', regex= True) --- to replace all alphabetical into blank
* df.X1.where( df.X1 > condition, replacement_values, inplace= True) --- replace values if fufill certain conditions
* df.X1.str.split('-', expand= True) --- delimit the strategy by '-' into columns

###
* pd.to_datetime(x) --- to convert the column to datetime datatype
* pd.factorize(data.X1) --- to encode the values

*Drop Functions*
----------------
* df.drop([['col1']], axis=1, inplace=True) --- remove column if axis=1
* df.dropna(axis=0, thresh=4, inplace=False) --- require at least 4 non-NA items otherwise remove row, use subset parameter if targeting specific column
* df.drop_duplicates(keep=False, ignore_index=True) --- if keep=False, drop all duplicates, if keep=first, drop first duplicate occurrence

*Aggregation, Appending & Joining*
----------------------------------
* df.groupby('X1')['X2'].mean() --- to group the data
* df.groupby(['X1','X2'])['X3','X4'].mean()
* df.groupby('X1').agg( {'X2': 'mean', 'X3': np.count_nonzero} ).rename(columns= {'X2': 'XX2', 'X3': 'XX3'} ).reset_index().round()
* df.groupby('X1').X2.transform(some_function)
###
* pd.crosstab(df.X1, df.X2, margin=True) --- create a frequency distrubtion table
* pd.crosstab(df.X1, df.X2, values= df.X3, aggfunc=np.mean)
###
* pd.concat( [df1, df2], ignore_index=False, axis=0, join='outer')
* df1.append(df2)
###
* df1.merge(df2, left_on= df1.keyA, right_on= df2.keyA, how= {'right', 'left', 'inner','outer'} )
* pd.merge(df1, df2, left_on= df1.keyA, right_on= df2.keyA, how= {'right', 'left', 'inner','outer'})

*Export Data*
-------------
df.to_excel('output.xlsx', index= False)
df.to_csv('output.csv', index= False)

---
***Pandas Series Methods & Functions***
=======================================

*Descriptive Functions*
-----------------------
* s.value_counts(normalize=False) --- return the count of values

*Information Retrival Function*
-------------------------------
* s.unique() --- return the distinct values
* s.nunique() --- return the number of distinct values

*Data Manipulation Functions*
-----------------------------
* s.to_frame() --- convert series to DataFrame
---

***DASK DataFrame***
====================
```import dash.dataframe as dd```
* DASK split the data into partitions
###
###
*DASK Methods*
-------------
* dd.read_csv()

***END***
=========