---
# General Python Cheat Sheet

|Command | description|
|---------|-------------|
|`my_var = 5` | Creates a variable called `my_var` and assigns value `5` to it .|
|`print(my_var)` | Prints `my_var` to screen. Will show `5`|
|`if ... :`, `elif ... :`, `else ... :` | Building blocks of if-statments. Use indented new line after.|
| `==`, `!=`, `<`, `>`, `<=`, `>=`, `in`  | Comparison operators used for conditions.|
| `(...) & (...)` `\| (...)`  | AND and OR conjunctions to combine conditions. Keep parentheses in mind.|
| `def my_function(a):`   | Start of function definition which accepts a parameter `a`. Use indented new line after.|
| `# Comments start with a hastag`   | A line of commented code. Will not be executed.|

---
# Pandas CheatSheet

> Contains commonly used `pandas` commands used throughtout workshop


## Table of Contents

- [Pandas CheatSheet for Developers](#pandas-cheatsheet-for-developers)
  - [Introduction-What-is-Pandas?](#introduction-what-is-pandas)
  - [Key and Imports](#key-and-imports)
  - [Importing Data](#importing-data)
  - [Exporting data](#exporting-data)
  - [Viewing/Inspecting Data](#viewinginspecting-data)
  - [Selection](#selection)
  - [Data cleaning](#data-cleaning)
  - [Filter, Sort, and Groupby](#filter-sort-and-groupby)
  - [Join/Combine](#joincombine)
  - [Statistics](#statistics)
  - [Data Visualization with dataframe](#data-visualization-with-dataframe)
    - [Terminology And Definitions](#terminology-and-definitions)
    - [Type of plots](#type-of-plots)

## Introduction-What-is-Pandas?

> Pandas is a widespread library used for data cleaning, analysis and visualization. We will heavily rely on `pandas` in this course.  

**[🔼Back to Top](#table-of-contents)**

## Key and Imports

> We use following shorthand in the cheat sheet:

|Command | description|
|----------|-------------|
|`import pandas as pd`|import pandas library|
|`df` | Refers to any Pandas Dataframe object.|
|`df.col1` | Refers to any Pandas Series object (a single column of a df, in this case "col1").|

**[🔼Back to Top](#table-of-contents)**

## Importing Data

|Command | description|
|---------|-------------|
|`pd.read_csv(filename)` | It read the data from CSV file.|
|`pd.read_excel(filename)` | It read the data from an Excel file.|


**[🔼Back to Top](#table-of-contents)**

## Exporting data

|Command | description|
|-------------|----------|
|`df.to_csv(filename)`| It writes to a CSV file.|
|`df.to_excel(filename)`| It writes to an Excel file.|


**[🔼Back to Top](#table-of-contents)**

## Viewing/Inspecting Data

|Command | description|
|-------------|----------|
|`df.head(n)`| It returns first n rows of the DataFrame.By default it will return first 5 rows|
|`df.tail(n)` | It returns last n rows of the DataFrame.By default it will return last 5 rows|
|`df.shape` | It returns number of rows and columns.|
|`df.col1.value_counts(dropna=False)`| It views unique values and counts.|

**[🔼Back to Top](#table-of-contents)**

## Selection

|Command | description|
|-------------|----------|
|`df["col1"]` | It returns column with the label col as Series.|
|`df.col1` | Equivalent to above (if name has no spaces)|
|`df[["col1", "col2"]]`| It returns columns as a new DataFrame.|
|`df.iloc[0,:]`| It returns first row.|
|`df.iloc[0,0]` | It returns the first element of first column.|

**[🔼Back to Top](#table-of-contents)**

## Data cleaning

|Command | description|
|-------------|----------|
|`df.columns` = ['a','b','c'] | Renames the columns.|
|`df.col1.astype(float)`| It converts the datatype of col1 to float.|
|`df.col1.replace(1, 'one')`| It replaces all the values equal to 1 with 'one'.|
|`df.col1.str.replace("one", "two")`| It replaces all the values equal to "one" with "two".|

**[🔼Back to Top](#table-of-contents)**

## Filter, Sort, and Groupby

|Command | description|
|-------------|----------|
|`df[df.col1 > 0.5]` | Returns the rows where column col1 is greater than 0.5|
|`df[(df.col1 > 0.5) & (df.col1 < 0.7)]`| Returns the rows where 0.7 > col1 > 0.5|
|`df.sort_values("col1")` | It sorts the values by col1 in ascending order.|
|`df.sort_values("col2", ascending=False)` | It sorts the values by col2 in descending order.|
|`df.groupby("col1").col2.mean() | It calculates the mean of col2 for every unique col1 group.|
|`df.groupby("col1").agg(np.mean`) | It calculates the average across all the columns for every unique col1 group.|

**[🔼Back to Top](#table-of-contents)**

## Join/Combine

|Command | description|
|-------------|----------|
|`df1.append(df2)`| Its task is to add the rows in df1 to the end of df2(columns should be identical).|
|`pd.concat([df1, df2], axis=1)`| Its task is to add the columns in df1 to the end of df2(rows should be identical).|
|`df1.join(df2,on=col1,how='inner')`| SQL-style join the columns in df1 with the columns on df2 where the rows for col have identical values, 'how' can be of 'left', 'right', 'outer', 'inner'.|

**[🔼Back to Top](#table-of-contents)**

## Statistics

> The statistics functions can be applied to a Series, which are as follows:

|Command | description|
|-------------|----------|
|`df.describe()`| It returns the summary statistics for the numerical columns.|
|`df.mean()` | It returns the mean of all the columns.|
|`df.count()`| It returns the count of all the non-null values in each dataframe column.|
|`df.max()`| It returns the highest value from each of the columns.|
|`df.min()`| It returns the lowest value from each of the columns.|
|`df.median()`| It returns the median from each of the columns.|
|`df.std()`| It returns the standard deviation from each of the columns.|
|`df.col1.corr(df.col2)` | It returns the correlation between col1 and col2. Alternatively use `scipy.stats`.|

**[🔼Back to Top](#table-of-contents)**

## Data Visualization with dataframe

### Terminology And Definitions

|data| DataFrame|
|--------|------|
|`x`| label or position, default None|
|`y` | label, position or list of label, positions, default None Allows plotting of one column versus another|
|`ax `| matplotlib axes object, default None|
|`subplots`| boolean, default False. Make separate subplots for each column|
|`layout`|tuple (optional) (rows, columns) for the layout of subplots|
|`figsize`| a tuple (width, height) in inches|
|`title `| string or list. Title to use for the plot. If a string is passed, print the string at the top of the figure. If a list is passed and subplots is True, print each item in the list above the corresponding subplot.|
|`legend`| False/True/’reverse’. Place legend on axis subplots|
|`fontsize `| int, default None. Font size for xticks and yticks|

**[🔼Back to Top](#table-of-contents)**

### Type of plots

`Note it is a part of data Visualization`

|king|type|
|-----|-----|
|`‘line’ `| line plot (default)|
|`‘bar’ `| vertical bar plot|
|`‘barh’ `| horizontal bar plot|
|`‘hist’ `| histogram|
|`‘box’ `| boxplot|
|`‘area’`| area plot|
|`‘pie’ `| pie plot|
|`‘scatter’ `| scatter plot|

**[🔼Back to Top](#table-of-contents)**