# Pandas Quick Reference

## Data Loading
**Properties:**
- None

**Functions:**
- `pd.read_csv()` — Load a CSV file into a DataFrame

## Data Exploration
**Properties:**
- `df.shape` : Returns the dimensiors of dataframe in the form of Tuple of (rows, columns).

- `df.columns` : 'Index object' containing column names. *Index object* is also iterable.

- `len(df)` — Returns the number of rows in the DataFrame

**Functions:**
- `df.head()` — Return the first 5 rows.

- `df.sample(n)` — Randomly return `n` rows.

- `df['col'].value_counts()` : Returns **frequency count** of unique values in a **column**


![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)
#### 1. value_counts() :
   1. **If applied on 'dataframe' i.e. `df.value_counts()` :** 
      - Returns a Series containing the **frequency** of each distinct row the in the **DataFrame**.

      - By default :  
        - It excludes rows with any NA values.

        - Sorts the results in **descending order by frequency**, and returns the raw counts.

      - **PARAMETERS :**
        - **subset**: Specify columns to count unique combinations for, instead of all columns.

        - **normalize**: If ***True***, returns the relative frequencies (proportions) instead of raw counts. i.e `[frequency / total_rows]`

        - **sort**: If ***False***, sorts by column values instead of frequency. i.e. the order in which they appeared.

        - **ascending**: If ***True***, sorts in ascending order.

        - **dropna**: If ***False***, includes rows with NA values in the counts.

---

   2. **If applied on Series : `Series.value_counts()`  :**  

       - Returns a Series with the counts of  frequecy of each unique value in that Series.

       - Results are sorted by ***frequency in descending order by default***.

       - The **normalize** parameter returns proportions instead of counts if set to True.

       - The **dropna** parameter controls whether missing values are included.

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)


## Missing Values
**Properties:**
- None

**Functions:**
- `df.isnull()` : Returns DataFrame of `booleans` where values are missing.

- `df[var].isnull()` : Series of `booleans` where the column has **NaN**

- `df.isnull().mean()` : Returns **mean** of missing values in each **column**.

- `df.dropna()` / `df[cols].dropna()` : Drop rows with `NaN`



## Data Selection & Filtering
**Properties:**
- `df.columns` — Used with list comprehension for dynamic column filtering

**Functions:**
- `df['column']` — Select single column (Series)
- `df[cols]` — Select multiple columns
- `[var for var in df.columns if ...]` — Python list comprehension to filter column names

## Combining / Merging
**Properties:**
- None

**Functions:**
- `pd.concat([...], axis=1)` — Concatenate DataFrames/Series column-wise

## Visualization (via Pandas + Matplotlib)
**Properties:**
- None

**Functions:**
- `df['column'].hist()` — Histogram plot of a column
- `df['column'].plot.density()` — KDE (density) plot of a column